Data Strategy for Mid-Market Companies: From Raw Data to Competitive Intelligence in 90 Days
Most mid-market companies are not short on data. They are short on connection. The CRM holds customer records that the email platform never sees. GA4 tracks web behavior that never reaches the sales team. Finance runs on numbers the marketing team cannot access. The result: decisions made on partial information, revenue leaks that go undetected for months, and a growing gap between companies that have built data infrastructure and those still exporting CSVs by hand. A deliberate data strategy for mid-market companies closes that gap – and 90 days is enough time to get the first real intelligence output running.
Tools and Prerequisites
Before starting, you need three things in place: ownership (one person accountable for the data audit), access credentials to every system that holds customer or transaction data, and a cloud environment. The stack referenced here uses Google BigQuery as the primary warehouse, Google Analytics 4 for behavioral data, Mautic as the open-source CRM and email automation layer, and Tableau for visualization. AWS users can substitute Amazon Athena for BigQuery with minimal architectural changes. Budget roughly 15-20 hours of internal time across the first four weeks, plus whatever infrastructure costs apply to your data volume.
The 4-Layer Data Architecture
Every functional data system operates on four layers: collection, storage, transformation, and intelligence. Collection means instrumented data sources – GA4 events, Mautic contact activity, CRM records, transactional databases. Storage means a central warehouse where all sources land in raw form; BigQuery handles this efficiently at mid-market scale with costs that stay under $50/month for most organizations under 100GB. Transformation is where raw data becomes usable: dbt models, SQL views, or BigQuery scheduled queries that join customer IDs across sources and standardize field names. Intelligence is the output layer – Tableau dashboards, automated alerts, or AI-driven scoring models that sit on top of the clean data.
Most mid-market companies have layer one partially working and layers two through four completely missing. That is the actual problem.
Step 1: The Audit (Weeks 1-4)
Map every data source in the business. For each system, document: what data it holds, how often it updates, what the unique customer identifier is, and whether an API or export exists. Common sources include the CRM (HubSpot, Salesforce, or Mautic), the ESP, GA4, the e-commerce or ERP platform, and any paid media accounts. The output of this phase is a single spreadsheet with columns for source name, data type, identifier field, update frequency, and export method. No engineering required yet – this is reconnaissance. One common finding at this stage: the same customer exists under three different email addresses across four systems, and no one has ever reconciled them.
Step 2: Unify (Weeks 5-8)
This is the engineering phase. Connect each source to BigQuery using native connectors (GA4 to BigQuery is free and built-in), Fivetran or Airbyte for CRM and ESP data, and custom scripts for anything else. In BigQuery, create a customers_unified table that resolves identity across sources using email as the primary key, with a fallback to phone number or user ID. A basic dbt model handles this in under 100 lines of SQL. Set BigQuery scheduled queries to refresh overnight. At the end of week eight, you should be able to run a single query that returns a customer record with CRM stage, last email open, last web session, and last purchase date in one row.
For teams already using Mautic, the operational differences between Mautic and hosted ESPs affect how contact data is exported – Mautic’s database is directly accessible via MySQL, which simplifies the pipeline considerably.
Step 3: First Intelligence Output (Weeks 9-12)
Build one dashboard in Tableau that answers the question your leadership team asks most often. Common first outputs: customer lifetime value by acquisition channel, revenue cohort analysis, or email engagement correlated with purchase probability. Tableau connects directly to BigQuery via the native connector (host: bigquery.googleapis.com, authentication: service account JSON). The dashboard should auto-refresh daily. This is also where Tableau as-a-Service (AaaS) becomes practical: the infrastructure – BigQuery setup, data pipelines, dbt models, Tableau server – is managed externally, and the client receives the dashboards and the underlying access without maintaining the stack themselves.
Understanding CRM revenue benchmarks by channel helps you know whether the numbers surfacing in your first dashboards are normal or worth investigating further.
Case Study: The Revenue Leak Hidden by ESP Reporting
One e-commerce client had been sending promotional email campaigns for 18 months. ESP reports showed a consistent 22% open rate and 3.1% click rate – results that looked fine against industry averages. When the unified BigQuery layer was built and email engagement was joined to actual transaction data, a different picture appeared. The segment generating the highest open rates was almost entirely composed of contacts who had not purchased in over 14 months. The ESP had been optimizing sends toward re-engagement with lapsed customers while suppressing sends to active buyers who preferred a different send frequency. The revenue-per-email for the “high engagement” segment was $0.09. For the suppressed active segment, it was $1.34. The fix was a Mautic segmentation rebuild and a frequency cap adjustment – changes that took four days to implement once the data made the problem visible.
Data Innovation, a Barcelona-based AI and data company that builds and operates intelligent systems where humans and AI agents work together, has documented that ESP-native reporting conceals revenue leaks of this type in a significant share of mid-market email programs, because ESPs optimize for engagement metrics rather than downstream revenue attribution.
This is consistent with broader findings: McKinsey research on personalization shows that companies using unified customer data generate 40% more revenue from those activities than those relying on siloed channel metrics.
Common Mistakes
- Building the dashboard before the pipeline. Tableau connected to a broken or partial data source produces confident-looking wrong numbers. Fix the data first.
- Using email as the only identity key without deduplication. One customer with three email addresses becomes three customers. Revenue attribution breaks completely.
- Skipping the audit phase. Teams that jump straight to BigQuery setup routinely miss two or three data sources that later invalidate the unified model.
- Confusing data volume with data quality. A warehouse with 50 million rows of poorly structured data is harder to work with than 500,000 clean, well-labeled records.
One honest limitation worth naming: the 90-day timeline assumes a relatively contained data environment. Companies with legacy ERP systems, multiple regional databases, or complex data governance requirements will need to extend the unification phase. Trying to compress it creates technical debt that surfaces later as dashboard inconsistencies.
The 90-Day Data Strategy Checklist
| Phase | Weeks | Key Output |
|---|---|---|
| Audit | 1-4 | Data source inventory spreadsheet with identifiers and export methods |
| Unify | 5-8 | customers_unified table in BigQuery, pipelines running on schedule |
| Intelligence | 9-12 | One live Tableau dashboard refreshing daily from BigQuery |
There is also a measurement dimension that goes beyond dashboards. Understanding the difference between delivery rate and inbox placement rate is one example of how channel-level data needs to be interpreted correctly before it enters the unified model – otherwise the email engagement numbers feeding your Tableau dashboard are already misleading. According to Gartner, through 2025, organizations that invest in data and analytics governance will outperform their peers on most business metrics – but only if the underlying data infrastructure is sound.
Expected Outcomes and Next Steps
By the end of 90 days, a mid-market company running this sequence has a working data warehouse, a single customer view that spans at least three previously siloed systems, and one actionable intelligence output that did not exist before. Revenue leaks become visible. Channel attribution becomes traceable. The commercial case for the next phase – predictive scoring, AI-assisted segmentation, or automated reporting – becomes concrete rather than theoretical.
The deeper value of a data strategy for mid-market companies is not the dashboards themselves. It is the organizational shift from guessing to knowing – and the compounding advantage that comes from every decision made on complete information rather than partial signals.
If your ESP reporting looks healthy but revenue-per-email feels flat, or if your CRM data and your web analytics have never been in the same room, the process above is where to start. We have documented the full pipeline configuration at datainnovation.io – including the dbt models and BigQuery schema that the case study above was built on. If your numbers look like the ones in that email story, the revenue leak identification process is replicable and the first audit takes less time than most teams expect.
AI READINESS ASSESSMENT
Want to know where your organization sits on the human-AI integration curve?
Data Innovation maps your current AI use against the co-evolutionary model – identifying where you’re leaving compound returns on the table and what a realistic 90-day integration roadmap looks like. Trusted by Nestle, Reworld Media, and Feebbo Digital.