
Marketing Attribution
Previously I wrote about applying Markov Model Attribution using R on a Google Analytics click-stream data-set in BigQuery to find which campaign initiatives comes out with best performance.
Shapley Value is another similar Machine Learning algorithm that is very popular for calculating the worth of a campaign.
Attribution becomes important if you spend a lot of resources on any type of marketing activity.
Shapley Value is used by Google Analytics (data-driven in GA 360), Google Adwords and Google Attribution 360.
I will address using Shapley Value on your Scitylana data in another post.
A good kickstart
My goal with this blog post is to show you another attribution model called Funnel Based Attribution (FBA). But also to kick start you very concrete.
When you have read this post you will be able to
Currently this is my favourite algorithm. I’ll get back to that in a bit.
The goal of this blog post is to use FBA – not describe it in detail. It has already been beautifully described by the inventor Vladyslav Flaks. CEO at OWOX.
The prerequisites
What I like about the algorithm is that is very easy to understand.
It seems reasonable to me that steps in on a customers journey is as much about the campaign as about is the site communication itself.
This is what makes the FBA way more actionable than other more popular attribution models. The model entangles campaigns and funnel steps into one system. This is also the drawback – since you need to defined your funnel.
To calculate Funnel Based Attribution we must have access to at least the individual pages/events etc. that makes up a session. These informations will be used for defining the micro funnel steps on the site – from front-page to e.g. and order-confirmation page or a transactionId etc.
I recommend you try the Scitylana click-stream data that we build from your Google Analytics Free data and our Google Analytics tracking script plugin.
It takes 5 mins. to get your first day of data in BigQuery. The trial includes pulling 30 days back in time and continues to update the data for 2 weeks before you need to pay anything.
Start your free trial now - it takes approximately 2 minutes
No strings attached – it’s just a trial
The Setup
In this article I will present a solution for calculating FBA with the Scitylana data-set directly in BigQuery with a single SQL statement.
Your data-set needs the following columns populated to work.
- sl_userId (aka. clientId)
- sl_sessionId
- sl_timestamp
- M_transactionRevenue
- channelGrouping
- pagePath
You can change M_transactionRevenue to goals or any other metric you have that represents value. You can also, and should, experiment with changing channelGrouping.
E.g. if you have different kinds of email categories then drill down into and build join them with the channelGrouping building a custom channel grouping dimension.
The SQL statement consists of 3 main steps
- For each funnel step, calculate the “attribution” value on historic data.
- Filter sessions so we have the sessions that contributes with a unique funnel progression – from one funnel step/state to another funnel step/state.
- Group by channelGrouping and sum the total attribution value multiplied with the revenue/value of a Purchase (or our most wanted response).
Funnel
I chose to model the funnel in this article to fit the AIDA model. I define the funnel steps in the following way.
- Attention All sessions
- Interest Non-bounced sessions (> 1 page view)
- Desire Sessions with Add-to-basket event
- Action Sessions with a transaction

This chart shows an example of how the funnel stats on a website could look like.
The green attribution values can be calculated on the historic data.
You can calculate these numbers on full history or just a certain time range using the formular in Flaks’ article.
All visitors start out in the Attention state (they have entered our site with or without a campaign as source).
We are interested in the sessions where a visitors state changes. We only look at visitors that have completed the full funnel.
The SQL statement provided below has state definitions for the 3 steps above, Interest, Desire and Action.
The Value calculation for each step is done with this SQL statement
SELECT scoreInterest/total AS valueInterest, scoreDesire/total AS valueDesire, scoreAction/total AS valueAction FROM ( SELECT *, scoreInterest+scoreDesire+scoreAction AS total FROM ( SELECT Attention, 1 - Interest/Attention AS scoreInterest, 1 - Desire/Attention AS scoreDesire, 1 - Action/Attention AS scoreAction FROM ( SELECT ( SELECT COUNT(DISTINCT sl_userId) AS users FROM `scitylana.XXXXXXXX`) AS Attention, ( SELECT COUNT(*) FROM ( SELECT sl_userId, SUM(sl_pageviewOrder) FROM `scitylana.XXXXXXXX` GROUP BY sl_userId HAVING SUM(sl_pageviewOrder) > 1)) AS Interest, ( SELECT COUNT(DISTINCT sl_userId) FROM `scitylana.XXXXXXXX` WHERE pagePath LIKE '%DESIRE%') AS Desire, ( SELECT COUNT(DISTINCT sl_userId) FROM `scitylana.XXXXXXXX` WHERE pagePath LIKE '%ACTION%') AS Action ) ) )
To adopt the query to your own data-set you just need to search and replace XXXXXXXX with the your Google Analytics View Id and you need to define each of the Desire and Action steps with you own SQL filter.
Change the SQL part
pagePath LIKE ‘%DESIRE%’
with your own filter and the also the SQL part
pagePath LIKE ‘%ACTION%’
This is the complete statement where you can change the funnel constraints.
SELECT channelGrouping AS Channel, CAST( SUM(order_revenue_weighted) AS Int64) AS AttributedValue FROM ( SELECT sl_userId, timestamp, channelGrouping, valueInterest * ifnull(interest, 0) + valueDesire * ifnull(desire, 0) + valueAction * ifnull(action, 0) AS value_weight, order_revenue, (valueInterest * ifnull(interest, 0) + valueDesire * ifnull(desire, 0) + valueAction * ifnull(action, 0)) * order_revenue AS order_revenue_weighted FROM ( SELECT sl_userId, timestamp, channelGrouping, CASE WHEN __interest=1 AND interest IS NULL THEN 1 ELSE NULL END AS interest, CASE WHEN __desire=1 AND desire IS NULL THEN 1 ELSE NULL END AS desire, CASE WHEN __action=1 AND action IS NULL THEN 1 ELSE NULL END AS action, order_revenue FROM ( SELECT sl_userId, timestamp, channelGrouping, interest AS _interest, desire AS _desire, action AS _action, interest AS __interest, desire AS __desire, action AS __action, LAG(interest) OVER (PARTITION BY sl_userId, order_time ORDER BY timestamp) AS interest, LAG(desire) OVER (PARTITION BY sl_userId, order_time ORDER BY timestamp) AS desire, LAG(action) OVER (PARTITION BY sl_userId, order_time ORDER BY timestamp) AS action, order_revenue FROM ( SELECT sl_userId, timestamp, channelGrouping, interest AS _interest, desire AS _desire, action AS _action, LAST_VALUE(interest IGNORE NULLS) OVER (PARTITION BY sl_userId, order_time ORDER BY timestamp DESC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS interest, LAST_VALUE(desire IGNORE NULLS) OVER (PARTITION BY sl_userId, order_time ORDER BY timestamp DESC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS desire, LAST_VALUE(action IGNORE NULLS) OVER (PARTITION BY sl_userId, order_time ORDER BY timestamp DESC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS action, order_revenue, order_time FROM ( SELECT sl_userId, timestamp, channelGrouping, interest, desire, action, FIRST_VALUE(CASE WHEN revenue IS NOT NULL THEN timestamp ELSE NULL END IGNORE NULLS) OVER (PARTITION BY sl_userId ORDER BY timestamp ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS order_time, FIRST_VALUE(revenue IGNORE NULLS) OVER (PARTITION BY sl_userId ORDER BY timestamp ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS order_revenue FROM ( SELECT sl_userId, timestamp, channelGrouping, SUM(interest) interest, SUM(desire) desire, SUM(action) action, SUM(revenue) revenue FROM ( -- Select Interest sessions SELECT sl_userId, MIN(sl_timeStamp) timestamp, channelGrouping, 1 interest, NULL desire, NULL action, NULL revenue FROM `scitylana.XXXXXXXX` WHERE sl_userId IN ( SELECT sl_userId FROM `scitylana.XXXXXXXX` GROUP BY 1 HAVING SUM(M_transactionRevenue) > 0) AND sl_sessionId != '(not set)' GROUP BY sl_userId, channelGrouping HAVING SUM(sl_pageviewOrder) > 1 UNION DISTINCT -- Select Desire sessions SELECT sl_userId, MIN(sl_timeStamp) timestamp, channelGrouping, NULL interest, 1 desire, NULL action, NULL revenue FROM `scitylana.XXXXXXXX` WHERE sl_userId IN ( SELECT sl_userId FROM `scitylana.XXXXXXXX` GROUP BY 1 HAVING SUM(M_transactionRevenue) > 0) AND sl_sessionId != '(not set)' AND (pagePath LIKE '%DESIRE%' OR pagePath LIKE '%ACTION%') GROUP BY sl_userId, channelGrouping UNION DISTINCT -- Select Action sessions SELECT sl_userId, MIN(sl_timeStamp) timestamp, channelGrouping, NULL interest, NULL desire, 1 action, SUM(M_transactionRevenue) revenue FROM `scitylana.XXXXXXXX` WHERE sl_userId IN ( SELECT sl_userId FROM `scitylana.XXXXXXXX` GROUP BY 1 HAVING SUM(M_transactionRevenue) > 0) AND sl_sessionId != '(not set)' AND pagePath LIKE '%ACTION%' GROUP BY sl_userId, channelGrouping ) GROUP BY 1, 2, 3 ) ) WHERE order_revenue IS NOT NULL) ) ), -- Calculate friction/value score for each step in the funnel ( SELECT scoreInterest/total AS valueInterest, scoreDesire/total AS valueDesire, scoreAction/total AS valueAction FROM ( SELECT *, scoreInterest+scoreDesire+scoreAction AS total FROM ( SELECT Attention, 1 - Interest/Attention AS scoreInterest, 1 - Desire/Attention AS scoreDesire, 1 - Action/Attention AS scoreAction FROM ( SELECT ( SELECT COUNT(DISTINCT sl_userId) AS users FROM `scitylana.XXXXXXXX`) AS Attention, ( SELECT COUNT(*) FROM ( SELECT sl_userId, SUM(sl_pageviewOrder) FROM `scitylana.XXXXXXXX` GROUP BY sl_userId HAVING SUM(sl_pageviewOrder) > 1)) AS Interest, ( SELECT COUNT(DISTINCT sl_userId) FROM `scitylana.XXXXXXXX` WHERE pagePath LIKE '%DESIRE%') AS Desire, ( SELECT COUNT(DISTINCT sl_userId) FROM `scitylana.XXXXXXXX` WHERE pagePath LIKE '%ACTION%') AS Action ) ) ) ) VALUES WHERE interest IS NOT NULL OR desire IS NOT NULL OR action IS NOT NULL ) GROUP BY 1 ORDER BY 2 DESC
When running the query against your Google Analytics data in BigQuery you get a result-set containing the total attributed value for each channel group.

If you don’t track revenue with Google Analytics on your site, maybe you don’t sell your product online, then you can change the metric from transaction revenue (M_transactionRevenue) to a Goal Completion metric or your own metric. To use your GA goals you need to upgrade to our Startup Plan. If you are from a Startup, NGO or NON profit, we offer a full licence at a very low price. Apply to see if you are eligible here.
We are very helpful – connect with us on Scitylana support or connect with me on LinkedIn and I’ll help you get started. If you are tech savvy you can absolutely get started now and start using the model tomorrow when your first batch of data arrives in BigQuery.
Future work
If we focus too much on the campaign value itself we might overlook the effects of our website. On our web site we could interpret a product page as a campaign that is leading the visitor to the checkout flow.
The checkout can be evaluated as a micro funnel on the site.
We can calculate a kind of double funnel analysis. (1) The full customer journey – evaluating the campaign – and (2) the micro funnel of the site – evaluating product related pages.
I have had succes creating a modified version of the query that isn’t specific to channelGrouping. Then you dynamically can break down the attribution value on any session scoped dimension you like.
I will address this approach in future article.
If you're still reading - put your own Google Analytics data into BigQuery
No strings attached – it’s just a trial