R: Applied Markov attribution for Google Analytics Free using Scitylana and BigQuery
This blog post is all about showing how powerful click-stream data is. Especially when you put the data in BigQuery or any other C-store based OLAP warehouse with separated compute and storage. BigQuery at the time of writing offers 10 GB of free storage and 1 TB/month of free Query – for life (yes, that is what they promise). There is absolute no excuse not to get started with BigQuery.
I will use following tools to show the power of Scitylana, BigQuery and R.
- Scitylana, extracts click-stream data from Google Analytics. (Get a free account here with 30 full trial)
- BigQuery (Go get an account if you don’t have one)
- RStudio (Go get it, it you don’t already have it)
Multi-channel funnel attribution
The art of benchmarking a multi-touchpoint sales journey. Multiple models can be applied. E.g. Markov chains, Shapley value, Funnel based, CNN etc.
I’ll use Markov Chains (very popular) to calculate exactly that. There are multiple articles on this subject, but this article has a secret sauce: The Scitylana click-stream data set. Well… this translates approximately to: A very affordable Google Analytics click-stream data set. (Without the need for GA360)
With Scitylana, anyone can dance the Markov chains. (a big thanks to Davide Altomare, David Loris and their effort bulding the ChannelAttribution R package)
Let’s dig in…
- EDIT you data extraction in Scitylana.
- Scroll to Upload Data to Google BigQuery
- Click Authorize BigQuery Access
- Click Proceed to Google
- Login (click advanced if you get “This app isn’t verified message” and click Go to scitylana.com)
- Click Allow
- When returned to scitylana.com – scroll to Upload Data to Google BigQuery
- Enter Google Project Id (Find it in your BigQuery account
- Select data location (US, EU or asianortheast-1)
- Click Save
Now data will be uploaded to BigQuery – you can proceed to next step.
Note: When you’re new to Scitylana you need to wait a couple of days before running the rest of the steps, since you have no data to run the script on. Don’t cry, it’s definitely worth waiting for.
Step 2: Spin up RStudio and run the script
- Download the script here, https://github.com/mbilling/markov-attribution-r-scitylana (or checkout with git)
- Open the Markov Attribution.R file in RStudio
- In the first lines of the script you need to enter your Google Project ID
- Enter you Google Analytics View ID (it’s the same name as you table in BigQuery in the Scitylana data set.
- Select all text in the script in RStudio (e.g. CTRL+A) and click the Run button.
- First time you run the script the bigrquery package will ask for access to your BigQuery data, just follow the on-screen instructions and paste the Google access token back into the console with prompted.
When the script is done, it writes a file in your workspace (/Documents on windows) in the following naming format, markov_attribution_<viewid>_<date>_<time>.csv
My test data looks like the following when loaded into Excel.
When plotting Markov and Last Touch attributions in get the following chart.
it’s obvious that last touch attribution might not be enough for ad evaluation.