Jump to:
- How the
knots
argument works - Choosing the number of knots for time effects in the model
- Using knots and binary dummies
How the knots
argument works
Meridian uses a time-varying intercept approach for modeling time effects ( Spline (mathematics), Wikipedia. Ng, Wang, & Dai. 2021). This approach models time effects \(\mu = [\mu_1, \dots, \mu_T]\) for each of the \(T\) time periods (a weekly-level, three-year MMM has \(52 \times 3\) time periods). The \(T\) time effects are modeled with possibly fewer than \(T\) many parameters using the relationship:
\[\mu = W \ast b\]
Where:
\(\mu\) is \(1 \times T\) representing the effect of each time \(t=1, \dots ,T\), \(W\) is a \(T \times K\) deterministic weight matrix
\(b\) (called
knot_values
in Meridian) is \(K \times 1\) where \(K \leq T\).
Bayesian posterior inference is done on \(b\), which is translated in terms of \(\mu\) according to the weight matrix \(W\). The number of knots \(K\) is determined by user input. The weight matrix \(W\) is determined by the L1 distance of a time period to the two neighboring knots.
To clarify how L1 distance determines the weight matrix, consider time period \(9\), where the two neighboring knots are at \(6\) and \(11\). The L1 distance from time period \(9\) and knot \(11\) is \(2\). The L1 distance from time period \(9\) and knot \(6\) is \(3\). So, the knot at \(6\) gets weight \(0.4 = 1 - \frac{3}{2+3} \) and the knot at \(11\) gets weight \(0.6 = 1 - \frac{2}{2+3} \). The weighted average of these two neighboring knots determines the value of \(\mu_9\).
Notice that when knots < n_times
, there is some level of dimensionality
reduction going on. The n_times
periods are modeled with fewer than n_times
parameters. The weight function determines how the time periods are combined.
Choosing the number of knots for time effects in the model
When you think about how to set knots
of ModelSpec
, it is helpful to think
of the two extremes: knots can be anywhere from one to the number of time
periods (n_times
). When knots = n_times
, there is no dimensionality
reduction and each time period gets its own parameter. In a geo-level model,
having as many knots as time periods is identifiable because you have multiple
geos, and therefore multiple observations, per time period. When knots = 1
,
all time periods are measured with a single parameter which is equivalent to
saying time has no effect. This absence of effect becomes a common intercept for
all time periods.
When 1 < knots < n_times
, you are in the middle of these two extremes. You can
try a range of values that span the space of eligible values. For information
about how to think about the middle of these two extremes, see
Bias-variance trade-off.
We recommend that you try the following:
Geo level models should start at the default (
knots = n_times
). If you notice that overfitting is extreme or media effect estimates are unrealistic, then consider reducing the number of knots. The need to reduce the number of knots is more likely to apply as the number of geos per time point decreases.National level models should start at the default
1
knot and increase the number of knots from there. Continue to increase until overfitting becomes extreme or media effect estimates become unrealistic.A similar number of knots can return similar results, such as
knots = 10
andknots = 11
, so it can be helpful to spread out the values that you want to try.
For information that might help you develop algorithms for knot selection, see Knot selection in sparse Gaussian processes with a variational objective function in the Wiley online library.
Bias-variance trade-off
It can be helpful to think of the setting the number of knots as a bias-variance
trade-off. When knots = n_times
, each time period gets its own parameter and
so the effect of a given time period is estimated using only data from that time
period. However, knots = n_times
is high-variance because of less data points
available at a given time period.
When knots < n_times
, each knot is estimated using the data of nearby time
periods, with closer time periods getting more weight. Since the two closest
knots determine the inference of a particular time period, the effect of a given
time period is estimated by that time period's data and by nearby time periods'
data. As the number of knots decreases, nearby time points are more and more
influential on the inference for a particular time point, with the closer time
points getting more weight. This decreases variance because more and more time
points are used to estimate the effect of a given time period. However, the data
isn't from the given time period, which increases bias.
In summary, more knots reduce bias in time effect estimates, while fewer knots reduce variance in time effect estimates. As an analyst, you can tune where on the bias-variance trade-off you want to be. If time is an important confounder between media and the KPI, then the bias-variance trade-off in estimating time effects translates to a bias-variance trade-off in estimating the causal effects of media.
Additionally, you can choose to have different bias-variance trade offs for
different time regions. You do this by setting knots
to a list, which
specifies knot locations. Knot locations can be dense in areas where you prefer
low bias in the estimates (such as a holiday season), and sparse in areas where
the analyst prefers low variance in the estimate (such as an off-holiday
season).
When to consider using fewer knots
When you set the number of knots, it also can be helpful to think about how time affects the media execution. Control variables should be confounding variables that impact both media execution and the KPI. For more information about control variables, see Selecting control variables.
Similar logic applies for time. If time isn't a factor for media execution, then time isn't a true confounding variable and you can avoid spending too many degrees of freedom on modeling time with low-bias, for example, high knots. Advertisers need to consider whether time plays a role in planning around media execution. For example, a travel brand likely uses time for media planning, whereas a snack brand might not. Also, consider whether time is really the important confounding variable, or if time is a proxy for some other variable that can be directly modeled, likely with fewer degrees of freedom. For example, was time really the confounding variable that drove media execution? Or was it how many COVID cases there were nationwide? Advertisers know their own media planning strategy and have insight into these topics.
When you must use knots < n_times
There are situations where you must set knots < n_times
, for example, in a
national-level model where you don't have multiple observations per time period
and there are not enough degrees of freedom for each time period to get its own
parameter. Note that some level of dimensionality reduction is necessary.
Another example is when you must include a national-level media or
national-level control variable. By definition, national-level variables change
over time but not over geo. Such a variable is perfectly collinear with time and
is thus redundant with a model that has a parameter for each time period. If you
set knots
close to n_times
, you technically might have an identifiable
model. However, it still might be
weakly identifiable
and lead to problems. Given the concerns around estimating time effects in a
national model, it is even more important to have high quality controls in a
national model than in a geo model. For more information about high quality
controls, see
Selecting control variables.
Choosing knots or binary dummies
This section provides guidance on using knots for seasonal effects versus binary dummies for certain events, such as holidays and sporting events. Users often wonder whether knots and binary dummies can or should be used together for these circumstances.
Consider the following when deciding what to use:
knots
ofModelSpec
can also be used to set knot locations. Often this is preferable to setting binary dummies on controls.Binary dummies put into the
controls
argument ofInputData
will get geo hierarchy, whereas knots won't get geo hierarchy. This can be desirable for holidays or events that might have strong geo differences in the effects. For example, specific types of sporting events can be more impactful in certain geos.