Methodology comparison¶

This document outlines key differences between Sparse Synthetic Control (SparseSC), as used by GeoLift, and regression‑based approaches such as Geo‑Based Regression (GBR) and Time‑Based Regression (TBR) from Google’s GeoexperimentsResearch (“GeoX”). It is intended to be descriptive rather than promotional. The best choice depends on data, design, and operational constraints.

Understanding Google’s GeoX (GBR/TBR) Approach¶

Based on the README.md from the “GeoexperimentsResearch” package, its core methodologies are:

Geo-Based Regression (GBR): Detailed in Vaver and Koehler (2011).
Time-Based Regression (TBR): An evolution described in Kerman, Wang, and Vaver (2017).

These are primarily regression‑based techniques for analysing geo‑experiments. They typically model the outcome in a geographic area by regressing it on its own pre‑intervention values, values from control geographies, and potentially other covariates. The treatment effect is inferred by comparing the actual post‑intervention outcomes to what the regression model predicts would have happened in the absence of the treatment.

When SparseSC may offer advantages over GBR/TBR¶

While GBR and TBR are valuable and established methods, SparseSC (the engine of GeoLift) offers several advantages, particularly in addressing common challenges in marketing analytics:

Reliance on Parametric Functional Form Assumptions:
- GBR/TBR: As regression models, they assume a specific functional form (e.g. linear relationships, pre‑specified interactions). If this form is incorrect (misspecified), counterfactual predictions can be biased.
- SparseSC: While regression appears internally for tuning, the construction of the synthetic control is non‑parametric. It forms a weighted average of control units without assuming a single global functional form, offering flexibility when pre‑treatment dynamics are complex.
Handling of Unobserved Time-Varying Confounders:
- GBR/TBR: Time fixed effects and trend terms help, but unobserved time‑varying confounders can remain problematic.
- SparseSC: Can mitigate some unobserved confounding when pre‑treatment features proxy latent factors and are appropriately weighted (via V). This is not guaranteed and depends on data quality/coverage.
Predictor Selection and Model Specification:
- GBR/TBR: Selecting covariates and interactions can be subjective and prone to specification search.
- SparseSC: Uses regularisation (e.g. LASSO for V, Ridge for W) and time‑blocked cross‑validation on the pre‑treatment period to automate predictor weighting, reducing manual specification risk. All predictors are standardised to zero mean and unit variance prior to V‑matrix optimisation, ensuring feature importance weights are scale‑invariant.
Transparency and Interpretability of the Counterfactual:
- GBR/TBR: The counterfactual is a model prediction; interpretability varies with specification.
- SparseSC: The synthetic control is an explicit weighted average of observed controls; donor units and weights are transparent.
Robustness to Extrapolation:
- GBR/TBR: Model extrapolation can occur when post‑period features fall outside the fit range.
- SparseSC: Non‑negative weights summing to one encourage interpolation within the donor convex hull.
Handling High-Dimensional Data:
- GBR/TBR: High‑dimensional covariates require careful management to avoid multicollinearity and overfitting.
- SparseSC: Regularisation helps sift many potential predictors to identify relevant ones for matching.

Conclusion¶

SparseSC can be advantageous when pre‑treatment dynamics are complex and suitable donor pools exist. GBR/TBR may be preferable when simpler models suffice, covariates are well understood, or operational simplicity is a priority. In practice, it is sensible to assess both approaches given the data and decision context.