Marketing Attribution

Previously I wrote about applying Markov Model Attribution using R on a Google Analytics click-stream data-set in BigQuery to find which campaign initiatives comes out with best performance.
Shapley Value is another similar Machine Learning algorithm  that is very popular for calculating the worth of a campaign.
Attribution becomes important if you spend a lot of resources on any type of marketing activity.

Shapley Value is used by Google Analytics (data-driven in GA 360), Google Adwords and Google Attribution 360.

I will address using Shapley Value on your Scitylana data in another post.

A good kickstart

My goal with this blog post is to show you another attribution model called Funnel Based Attribution (FBA). But also to kick start you very concrete.
When you have read this post you will be able to
Currently this is my favourite algorithm. I’ll get back to that in a bit.

The goal of this blog post is to use FBA – not describe it in detail. It has already been beautifully described by the inventor Vladyslav Flaks. CEO at OWOX.

The prerequisites

What I like about the algorithm is that is very easy to understand.
It seems reasonable to me that steps in on a customers journey is as much about the campaign as about is the site communication itself.
This is what makes the FBA way more actionable than other more popular attribution models. The model entangles campaigns and funnel steps into one system. This is also the drawback – since you need to defined your funnel.

To calculate Funnel Based Attribution we must have access to at least the individual pages/events etc. that makes up a session. These informations will be used for defining the micro funnel steps on the site – from front-page to e.g. and order-confirmation page or a transactionId etc.
I recommend you try the Scitylana click-stream data that we build from your Google Analytics Free data and our Google Analytics tracking script plugin.
It takes 5 mins. to get your first day of data in BigQuery. The trial includes pulling 30 days back in time and continues to update the data for 2 weeks before you need to pay anything.

Start your free trial now - it takes approximately 2 minutes

No strings attached – it’s just a trial

The Setup

In this article I will present a solution for calculating FBA with the Scitylana data-set directly in BigQuery with a single SQL statement.
Your data-set needs the following columns populated to work.

  • sl_userId (aka. clientId)
  • sl_sessionId
  • sl_timestamp
  • M_transactionRevenue
  • channelGrouping
  • pagePath

You can change M_transactionRevenue to goals or any other metric you have that represents value. You can also, and should, experiment with changing channelGrouping.
E.g. if you have different kinds of email categories then drill down into and build join them with the channelGrouping building a custom channel grouping dimension.

The SQL statement consists of 3 main steps

  1. For each funnel step, calculate the “attribution” value on historic data.
  2. Filter sessions so we have the sessions that contributes with a unique funnel progression – from one funnel step/state to another funnel step/state.
  3. Group by channelGrouping and sum the total attribution value multiplied with the revenue/value of a Purchase (or our most wanted response).

Funnel

I chose to model the funnel in this article to fit the AIDA model. I define the funnel steps in the following way.

  • Attention All sessions
  • Interest Non-bounced sessions (> 1 page view)
  • Desire Sessions with Add-to-basket event
  • Action Sessions with a transaction

This chart shows an example of how the funnel stats on a website could look like.
The green attribution values can be calculated on the historic data.
You can calculate these numbers on full history or just a certain time range using the formular in Flaks’ article.
All visitors start out in the Attention state (they have entered our site with or without a campaign as source).

We are interested in the sessions where a visitors state changes. We only look at visitors that have completed the full funnel.

The SQL statement provided below has state definitions for the 3 steps above, Interest, Desire and Action.

The Value calculation for each step is done with this SQL statement

   SELECT
      scoreInterest/total AS valueInterest,
      scoreDesire/total AS valueDesire,
      scoreAction/total AS valueAction
    FROM (
      SELECT
        *,
        scoreInterest+scoreDesire+scoreAction AS total
      FROM (
        SELECT
          Attention,
          1 - Interest/Attention AS scoreInterest,
          1 - Desire/Attention AS scoreDesire,
          1 - Action/Attention AS scoreAction
        FROM (
          SELECT
            (
            SELECT
              COUNT(DISTINCT sl_userId) AS users
            FROM
              `scitylana.XXXXXXXX`) AS Attention,
            (
            SELECT
              COUNT(*)
            FROM (
              SELECT
                sl_userId,
                SUM(sl_pageviewOrder)
              FROM
                `scitylana.XXXXXXXX`
              GROUP BY
                sl_userId
              HAVING
                SUM(sl_pageviewOrder) > 1)) AS Interest,
            (
            SELECT
              COUNT(DISTINCT sl_userId)
            FROM
              `scitylana.XXXXXXXX`
            WHERE
              pagePath LIKE '%DESIRE%') AS Desire,
            (
            SELECT
              COUNT(DISTINCT sl_userId)
            FROM
              `scitylana.XXXXXXXX`
            WHERE
              pagePath LIKE '%ACTION%') AS Action ) ) )

To adopt the query to your own data-set you just need to search and replace XXXXXXXX with the your Google Analytics View Id and you need to define each of the Desire and Action steps with you own SQL filter.

Change the SQL part

pagePath LIKE ‘%DESIRE%’ 

with your own filter and the also the SQL part

pagePath LIKE ‘%ACTION%’

This is the complete statement where you can change the funnel constraints.

SELECT
  channelGrouping AS Channel,
  CAST( SUM(order_revenue_weighted) AS Int64) AS AttributedValue
FROM (
  SELECT
    sl_userId,
    timestamp,
    channelGrouping,
    valueInterest * ifnull(interest,
      0) + valueDesire * ifnull(desire,
      0) + valueAction * ifnull(action,
      0) AS value_weight,
    order_revenue,
    (valueInterest * ifnull(interest,
        0) + valueDesire * ifnull(desire,
        0) + valueAction * ifnull(action,
        0)) * order_revenue AS order_revenue_weighted
  FROM (
    SELECT
      sl_userId,
      timestamp,
      channelGrouping,
      CASE
        WHEN __interest=1 AND interest IS NULL THEN 1
        ELSE NULL
      END AS interest,
      CASE
        WHEN __desire=1 AND desire IS NULL THEN 1
        ELSE NULL
      END AS desire,
      CASE
        WHEN __action=1 AND action IS NULL THEN 1
        ELSE NULL
      END AS action,
      order_revenue
    FROM (
      SELECT
        sl_userId,
        timestamp,
        channelGrouping,
        interest AS _interest,
        desire AS _desire,
        action AS _action,
        interest AS __interest,
        desire AS __desire,
        action AS __action,
        LAG(interest) OVER (PARTITION BY sl_userId, order_time ORDER BY timestamp) AS interest,
        LAG(desire) OVER (PARTITION BY sl_userId, order_time ORDER BY timestamp) AS desire,
        LAG(action) OVER (PARTITION BY sl_userId, order_time ORDER BY timestamp) AS action,
        order_revenue
      FROM (
        SELECT
          sl_userId,
          timestamp,
          channelGrouping,
          interest AS _interest,
          desire AS _desire,
          action AS _action,
          LAST_VALUE(interest IGNORE NULLS) OVER (PARTITION BY sl_userId, order_time ORDER BY timestamp DESC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS interest,
          LAST_VALUE(desire IGNORE NULLS) OVER (PARTITION BY sl_userId, order_time ORDER BY timestamp DESC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS desire,
          LAST_VALUE(action IGNORE NULLS) OVER (PARTITION BY sl_userId, order_time ORDER BY timestamp DESC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS action,
          order_revenue,
          order_time
        FROM (
          SELECT
            sl_userId,
            timestamp,
            channelGrouping,
            interest,
            desire,
            action,
            FIRST_VALUE(CASE
                WHEN revenue IS NOT NULL THEN timestamp
                ELSE NULL
              END IGNORE NULLS) OVER (PARTITION BY sl_userId ORDER BY timestamp ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS order_time,
            FIRST_VALUE(revenue IGNORE NULLS) OVER (PARTITION BY sl_userId ORDER BY timestamp ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS order_revenue
          FROM (
            SELECT
              sl_userId,
              timestamp,
              channelGrouping,
              SUM(interest) interest,
              SUM(desire) desire,
              SUM(action) action,
              SUM(revenue) revenue
            FROM (
                -- Select Interest sessions
              SELECT
                sl_userId,
                MIN(sl_timeStamp) timestamp,
                channelGrouping,
                1 interest,
                NULL desire,
                NULL action,
                NULL revenue
              FROM
                `scitylana.XXXXXXXX`
              WHERE
                sl_userId IN (
                SELECT
                  sl_userId
                FROM
                  `scitylana.XXXXXXXX`
                GROUP BY
                  1
                HAVING
                  SUM(M_transactionRevenue) > 0)
                AND sl_sessionId != '(not set)'
              GROUP BY
                sl_userId,
                channelGrouping
              HAVING
                SUM(sl_pageviewOrder) > 1
              UNION DISTINCT
                -- Select Desire sessions
              SELECT
                sl_userId,
                MIN(sl_timeStamp) timestamp,
                channelGrouping,
                NULL interest,
                1 desire,
                NULL action,
                NULL revenue
              FROM
                `scitylana.XXXXXXXX`
              WHERE
                sl_userId IN (
                SELECT
                  sl_userId
                FROM
                  `scitylana.XXXXXXXX`
                GROUP BY
                  1
                HAVING
                  SUM(M_transactionRevenue) > 0)
                AND sl_sessionId != '(not set)'
                AND (pagePath LIKE '%DESIRE%'
                  OR pagePath LIKE '%ACTION%')
              GROUP BY
                sl_userId,
                channelGrouping
              UNION DISTINCT
                -- Select Action sessions
              SELECT
                sl_userId,
                MIN(sl_timeStamp) timestamp,
                channelGrouping,
                NULL interest,
                NULL desire,
                1 action,
                SUM(M_transactionRevenue) revenue
              FROM
                `scitylana.XXXXXXXX`
              WHERE
                sl_userId IN (
                SELECT
                  sl_userId
                FROM
                  `scitylana.XXXXXXXX`
                GROUP BY
                  1
                HAVING
                  SUM(M_transactionRevenue) > 0)
                AND sl_sessionId != '(not set)'
                AND pagePath LIKE '%ACTION%'
              GROUP BY
                sl_userId,
                channelGrouping )
            GROUP BY
              1,
              2,
              3 ) )
        WHERE
          order_revenue IS NOT NULL) ) ),
    -- Calculate friction/value score for each step in the funnel
    (
    SELECT
      scoreInterest/total AS valueInterest,
      scoreDesire/total AS valueDesire,
      scoreAction/total AS valueAction
    FROM (
      SELECT
        *,
        scoreInterest+scoreDesire+scoreAction AS total
      FROM (
        SELECT
          Attention,
          1 - Interest/Attention AS scoreInterest,
          1 - Desire/Attention AS scoreDesire,
          1 - Action/Attention AS scoreAction
        FROM (
          SELECT
            (
            SELECT
              COUNT(DISTINCT sl_userId) AS users
            FROM
              `scitylana.XXXXXXXX`) AS Attention,
            (
            SELECT
              COUNT(*)
            FROM (
              SELECT
                sl_userId,
                SUM(sl_pageviewOrder)
              FROM
                `scitylana.XXXXXXXX`
              GROUP BY
                sl_userId
              HAVING
                SUM(sl_pageviewOrder) > 1)) AS Interest,
            (
            SELECT
              COUNT(DISTINCT sl_userId)
            FROM
              `scitylana.XXXXXXXX`
            WHERE
              pagePath LIKE '%DESIRE%') AS Desire,
            (
            SELECT
              COUNT(DISTINCT sl_userId)
            FROM
              `scitylana.XXXXXXXX`
            WHERE
              pagePath LIKE '%ACTION%') AS Action ) ) ) )
  VALUES
  WHERE
    interest IS NOT NULL
    OR desire IS NOT NULL
    OR action IS NOT NULL )
GROUP BY
  1
ORDER BY
  2 DESC

When running the query against your Google Analytics data in BigQuery you get a result-set containing the total attributed value for each channel group.

If you don’t track revenue with Google Analytics on your site, maybe you don’t sell your product online, then you can change the metric from transaction revenue (M_transactionRevenue) to a Goal Completion metric or your own metric. To use your GA goals you need to upgrade to our Startup Plan. If you are from a Startup, NGO or NON profit, we offer a full licence at a very low price. Apply to see if you are eligible here.

We are very helpful – connect with us on Scitylana support or connect with me on LinkedIn and I’ll help you get started. If you are tech savvy you can absolutely get started now and start using the model tomorrow when your first batch of data arrives in BigQuery.

Future work

If we focus too much on the campaign value itself we might overlook the effects of our website. On our web site we could interpret a product page as a campaign that is leading the visitor to the checkout flow.
The checkout can be evaluated as a micro funnel on the site.
We can calculate a kind of double funnel analysis. (1) The full customer journey – evaluating the campaign – and (2) the micro funnel of the site – evaluating product related pages.

I have had succes creating a modified version of the query that isn’t specific to channelGrouping. Then you dynamically can break down the attribution value on any session scoped dimension you like.

I will address this approach in future article.

If you're still reading - put your own Google Analytics data into BigQuery

No strings attached – it’s just a trial

// Type writer effect