After you collect your data, perform an exploratory data analysis (EDA) to find and address any data quality issues. This is a critical step in the marketing mix modeling (MMM) process because it lets you assess the data to confirm that it accurately represents the marketing efforts, customer responses, and other relevant metrics. By correcting issues discovered through the EDA process, you can improve the reliability of the model output.
The basic process for performing an EDA is:
- Run a data review to identify any missing or incomplete data.
- Fix missing values in your raw input files.
- Evaluate the accuracy of the data.
- Correct any anomalies, outliers, or inaccuracies in the data.
- Check the correlation between your KPI, media, and control variables.
There are many ways to approach EDA, and so Meridian doesn't provide the visualizations for this process. We recommend that you find the right balance for your needs between running a thorough granular analysis for greater confidence and a quick check of high-level data that gives less detailed insight.
Consider these guidelines as you produce your own visualizations to assist with your EDA:
Checking data completeness: Check for missing values in the data.You can create charts that show the percentage of data completeness for each variable (channel), then investigate the variables that show as incomplete.
To further refine your EDA, you can create visualizations that show the number of observations by year, month, week, and weekday. Look for unexpectedly lower observations for any time period.
Checking data accuracy: Ensure that data is accurate and free from anomalies or outliers that could skew results. Creating visualizations to check for accuracy can include comparing the share of media spend for each channel and checking the trend of a channel to identify anything unusual. You can compare these visualizations against the media plan or work with the marketing team to help identify whether the data is accurate and granular enough.
Checking correlation between variables: Though correlation between KPI, media, and control variables is not required, creating visualizations to check for correlation can be helpful in the following use cases:
Measuring the correlation between media and control variables to see if there is any unexpected relationship. This can help you decide whether to keep or remove any media or control variable.
Identifying multicollinearity. When two or more variables in the media and control variables are highly correlated with each other, they create multicollinearity, which can cause regression models to have difficulty calculating the impact of the collinear variables. By identifying multicollinearity in your data review, you can decide which variables to include or exclude from your model.
After you have confidence that your data is accurate and complete, you can load the data using a supported format, and then create your model.