Collect and organize your data

Gather historical data on various marketing and non-marketing variables, such as advertising spend, pricing, and revenue or performance metrics.

The required data includes:

Data Type Description
Media data Contains the exposure metric by channel, geo, and time period. Possible metrics include, but are not limited to, spend, impressions, and clicks, which can differ by channel. The key is that these are intervenable units, meaning they represent media efforts that one can reasonably control. All media values must be non-negative.
Media spend Containing the media spending per channel and time span. The media data and media spend must have the same dimensions.
Control variables Contains the confounders that have a causal effect on both the target KPI and the media metric (such as Google query volume (GQV)). The selection of control variables is important for estimating the causal effect from an MMM, see Causal graph.
KPI The target KPI for the model to predict. For example, revenue amount or number of application installations. This is also the response variable of the MMM.
Revenue per KPI Contains the average revenue for a KPI unit. In the absence of accurate revenue per KPI, we strongly recommend that you approximate a rational value. If such information is unavailable, see When the KPI is not revenue. Note that 'revenue per KPI' is not required if revenue is your KPI.
Geo population Contains the population for each geo. Geo population (such as Nielsen DMA TV household population) is used to scale the media metric to put all geos on a comparable scale, see Input data for more details about media scaling.

Meridian offers the option to model any media channel's effect based on reach and frequency data, see Reach and frequency.

Data Type Description
Reach The reach data is the number of unique individuals exposed to the channels' ad within each time period.
Frequency Frequency is the average number of times a person is exposed to an advertisement. It is equal to the total number of impressions divided by the reach for each time period.

Meridian also offers the option to include organic media and non-media treatments. For more information, see Organic media and non-media variables.

Data Type Description
Organic media Organic media variables are media activities that have no direct cost. These can include, but are not limited to, impressions from newsletters, a blog post, social media activity, or email campaigns.
Non-media treatments Non-media variables are marketing activities that are not directly related to media, such as running a promotion, the price of a product, and a change in a product's packaging or design.

KPI

The KPI is the \(y\) variable on the left hand side of the Model Spec. The KPI can be either revenue or some other non-revenue KPI, such as conversions.

Some modelers prefer to use a non-revenue KPI as the response variable, even when revenue is ultimately the KPI. Meridian lets you translate KPI units to revenue by providing revenue per KPI data for each geographic unit and time period. For more information, see Value of the KPI is unknown.)

Media, organic media, non-media treatment and control variables

Media, organic media, non-media treatment and control variables should have time series data available.

  • Media variables: For each paid media, the dataset must include the spend for each media channel, which is used as the denominator for ROI calculations. Additionally, each paid media must include one of the following for modeling purposes:

    • A single media exposure metric, such as impressions, clicks, or spend
    • Reach and frequency
  • Organic media variables: Organic media has no associated spend and can be excluded from the media spend input. Additionally, each organic media must include one of the following for modeling purposes:

    • A single media exposure metric, such as impressions or clicks.
    • Reach and frequency For more information about modeling with organic media, see Organic media.
  • Non-media treatments Non-media variables are marketing activities that are not directly related to media and have no direct marketing cost associated with them. They differ from control variables because they are considered to be intervenable, and therefore are treatment variables under the causal model. For more information about modeling with non-media treatments, see Non-media treatments.

  • Control variables: The purpose of control variables is to control for confounding. Focus on collecting variables that have a causal effect on both the target KPI and the media metric or media execution. Because it is difficult to come up with a comprehensive list of variables affecting KPI, it can be more practical to focus on variables that affect media budget and planning decisions. You can start by asking your marketing planner what information might have played a role, either consciously or subconsciously, in their decision making. For more information about modeling with control variables, see Control variables.

    Examples of control variables include market competition, and Google query volume (GQV). For more information about GQV, see Understanding query volume as a confounder for search ads.

  • Seasonality-related variables: Seasonality-related variables, such as holiday dummies, are typically incorporated as control variables in the model specification. However, Meridian is equipped with an automatic seasonality and trend adjustment functionality, implemented through the time-varying intercept model specification. Therefore, the inclusion of separate seasonality variables is not required.

    Alternatively, you can disable the automatic seasonality adjustment and include your own seasonality variables.

Data collection

For each of the variables, you must ascertain the type of data to be collected. Media or marketing plans can be utilized for the purpose of determining the appropriate variables to be collected. You can then collect media exposure for Google channels, including metrics such as clicks and impressions, by utilizing MMM Data Platform. Furthermore, MMM Data Platform also offers reach and frequency data specifically for YouTube. For more information, see Use MMM Data Platform.

Collecting Google Query Volume (GQV) data is optional, although omitting GQV might create bias to your model estimates. However, you can run Meridian without GQV data.

Make sure that your data is in the proper format to run the model. For more information about the format, see the data examples in Supported data types and formats.

Granularity

Generally speaking, finer data granularity provides more accurate insights and can help identify actionable results. Consider the granularity of data from the following aspects.

Geographic granularity

Best Practice: Collect data at the geo level. This level of granularity lets you account for geo-level nuances, and use Meridian's hierarchical Bayesian framework to yield tighter credible intervals on estimates such as ROI. Note that certain geos can exhibit a low volume of observations. Consequently, it is advisable to exclude those geos from the dataset prior to model fitting to help ensure the robustness of model estimation. For more information, see Geo-selection and national-level data.

Acceptable Alternative: If geo level data is not available, you can use national data. However, check that your national data has a sufficient number of data points per effect that you are trying to measure. For more information, see Amount of data needed.

Time granularity

Best Practice: Collect data at the weekly level. Weekly data presents an advantageous equilibrium between the degree of variation and the extent of noise, particularly when compared to daily or monthly data.

Acceptable Alternative: In the absence of weekly data, you can test daily or monthly data as an alternative. However, when daily data is utilized, the model can experience an extended runtime. Additionally, non-convergence or wide credit intervals on model estimates can arise when monthly data is used.

Media granularity

We recommend maintaining the number of media channels below 20 to help ensure sufficient variation and volume for each media channel to yield a robust estimation. For media channels with low media spend, it is advisable to combine them with other channels to avoid susceptibilities with ROI estimation. For more information, see Channels with low spend.

Timeframe

As a general rule of thumb, historical data should be a minimum of two years' worth of weekly data for geo-level models and three years' of data for national-level models. If only monthly data is available, then we recommend using a minimum of three years' worth. It is important for the model to have enough data points to provide accurate calculations. However, determining the amount of data can be more complex and ultimately depends on what your data is like. For more specific guidance about the amount of data needed, see Amount of data needed.

After you have collected your data, perform an exploratory data analysis to make sure that your data is accurate and complete.

Lead-generating businesses with long sales cycles

For lead-generating businesses with long sales cycles, best practices depend on your target variable, such as what outcome you want to measure. If generating a lead takes multiple months, then you can take more immediate action KPIs into account, such as number of conversions, number of site visits, or form entries.