Set knots
Stay organized with collections Save and categorize content based on your preferences.

Jump to:

How the knots argument works
Choosing the number of knots for time effects in the model
Using knots and binary dummies

How the `knots` argument works

Meridian uses a time-varying intercept approach for modeling time effects ( Spline (mathematics), Wikipedia. Ng, Wang, & Dai. 2021). This approach models time effects \(\mu = [\mu_1, \dots, \mu_T]\) for each of the \(T\) time periods (a weekly-level, three-year MMM has \(52 \times 3\) time periods). The \(T\) time effects are modeled with possibly fewer than \(T\) many parameters using the relationship:

\[\mu = W \ast b\]

Where:

\(\mu\) is \(1 \times T\) representing the effect of each time \(t=1, \dots ,T\), \(W\) is a \(T \times K\) deterministic weight matrix
\(b\) (called knot_values in Meridian) is \(K \times 1\) where \(K \leq T\).

Bayesian posterior inference is done on \(b\), which is translated in terms of \(\mu\) according to the weight matrix \(W\). The number of knots \(K\) is determined by user input. The weight matrix \(W\) is determined by the L1 distance of a time period to the two neighboring knots.

To clarify how L1 distance determines the weight matrix, consider time period \(9\), where the two neighboring knots are at \(6\) and \(11\). The L1 distance from time period \(9\) and knot \(11\) is \(2\). The L1 distance from time period \(9\) and knot \(6\) is \(3\). So, the knot at \(6\) gets weight \(0.4 = 1 - \frac{3}{2+3} \) and the knot at \(11\) gets weight \(0.6 = 1 - \frac{2}{2+3} \). The weighted average of these two neighboring knots determines the value of \(\mu_9\).

Notice that when knots < n_times, there is some level of dimensionality reduction going on. The n_times periods are modeled with fewer than n_times parameters. The weight function determines how the time periods are combined.

Choosing the number of knots for time effects in the model

When you think about how to set knots of ModelSpec, it is helpful to think of the two extremes: knots can be anywhere from one to the number of time periods (n_times). When knots = n_times, there is no dimensionality reduction and each time period gets its own parameter. In a geo-level model, having as many knots as time periods is identifiable because you have multiple geos, and therefore multiple observations, per time period. When knots = 1, all time periods are measured with a single parameter which is equivalent to saying time has no effect. This absence of effect becomes a common intercept for all time periods.

When 1 < knots < n_times, you are in the middle of these two extremes. You can try a range of values that span the space of eligible values. For information about how to think about the middle of these two extremes, see Bias-variance trade-off.

We recommend that you try the following:

Geo level models should start at the default (knots = n_times). If you notice that overfitting is extreme or media effect estimates are unrealistic, then consider reducing the number of knots. The need to reduce the number of knots is more likely to apply as the number of geos per time point decreases.

Note: n_times is the number of time periods with the number of max_lag weeks subtracted.
National level models should start at the default 1 knot and increase the number of knots from there. Continue to increase until overfitting becomes extreme or media effect estimates become unrealistic.
A similar number of knots can return similar results, such as knots = 10 and knots = 11, so it can be helpful to spread out the values that you want to try.

For information that might help you develop algorithms for knot selection, see Knot selection in sparse Gaussian processes with a variational objective function in the Wiley online library.

Bias-variance trade-off

It can be helpful to think of the setting the number of knots as a bias-variance trade-off. When knots = n_times, each time period gets its own parameter and so the effect of a given time period is estimated using only data from that time period. However, knots = n_times is high-variance because of less data points available at a given time period.

When knots < n_times, each knot is estimated using the data of nearby time periods, with closer time periods getting more weight. Since the two closest knots determine the inference of a particular time period, the effect of a given time period is estimated by that time period's data and by nearby time periods' data. As the number of knots decreases, nearby time points are more and more influential on the inference for a particular time point, with the closer time points getting more weight. This decreases variance because more and more time points are used to estimate the effect of a given time period. However, the data isn't from the given time period, which increases bias.

In summary, more knots reduce bias in time effect estimates, while fewer knots reduce variance in time effect estimates. As an analyst, you can tune where on the bias-variance trade-off you want to be. If time is an important confounder between media and the KPI, then the bias-variance trade-off in estimating time effects translates to a bias-variance trade-off in estimating the causal effects of media.

Additionally, you can choose to have different bias-variance trade offs for different time regions. You do this by setting knots to a list, which specifies knot locations. Knot locations can be dense in areas where you prefer low bias in the estimates (such as a holiday season), and sparse in areas where the analyst prefers low variance in the estimate (such as an off-holiday season).

When to consider using fewer knots

When you set the number of knots, it also can be helpful to think about how time affects the media execution. Control variables should be confounding variables that impact both media execution and the KPI. For more information about control variables, see Selecting control variables.

Similar logic applies for time. If time isn't a factor for media execution, then time isn't a true confounding variable and you can avoid spending too many degrees of freedom on modeling time with low-bias, for example, high knots. Advertisers need to consider whether time plays a role in planning around media execution. For example, a travel brand likely uses time for media planning, whereas a snack brand might not. Also, consider whether time is really the important confounding variable, or if time is a proxy for some other variable that can be directly modeled, likely with fewer degrees of freedom. For example, was time really the confounding variable that drove media execution? Or was it how many COVID cases there were nationwide? Advertisers know their own media planning strategy and have insight into these topics.

When you must use `knots < n_times`

There are situations where you must set knots < n_times, for example, in a national-level model where you don't have multiple observations per time period and there are not enough degrees of freedom for each time period to get its own parameter. Note that some level of dimensionality reduction is necessary.

Another example is when you must include a national-level media or national-level control variable. By definition, national-level variables change over time but not over geo. Such a variable is perfectly collinear with time and is thus redundant with a model that has a parameter for each time period. If you set knots close to n_times, you technically might have an identifiable model. However, it still might be weakly identifiable and lead to problems. Given the concerns around estimating time effects in a national model, it is even more important to have high quality controls in a national model than in a geo model. For more information about high quality controls, see Selecting control variables.

Choosing knots or binary dummies

This section provides guidance on using knots for seasonal effects versus binary dummies for certain events, such as holidays and sporting events. Users often wonder whether knots and binary dummies can or should be used together for these circumstances.

Consider the following when deciding what to use:

knots of ModelSpec can also be used to set knot locations. Often this is preferable to setting binary dummies on controls.
Binary dummies put into the controls argument of InputData will get geo hierarchy, whereas knots won't get geo hierarchy. This can be desirable for holidays or events that might have strong geo differences in the effects. For example, specific types of sporting events can be more impactful in certain geos.

Set custom priors when outcome is not revenue

Set the `max_lag` parameter