What is path analysis?
Path analysis is a technique web and product analysts use to understand how users move through a website or app. It helps uncover friction points and dark patterns in the user journey.
For example: if you notice many users returning to a previous page, that’s a red flag. Something on the current page isn’t working such as broken UI, confusing copy, or a missing step.
Why run path analysis in R and not simply through the GA4 interface?
You can run path analysis directly in GA4’s Explore section. It’s visually appealing, but the view quickly becomes crowded — and it only shows the most frequent paths.
With R, you’re not limited to the top results. You can count every possible path, no matter how rare. You can also mix dimensions, such as combining page location with events, to build a richer picture of user behavior.
For this analysis, we’re going to use Google’s Merchandise Store BigQuery Export.
We want to understand how users move through the website and what they actually do once they are there. For this, we are interested in page views and ecommerce events.
Events like “session_start” or “first_visit” don’t tell us much about navigation, so we will exclude them. We also want the results in chronological order, so we can later reconstruct user paths.
Here is the query with comments:
select
-- Unique user identifier (anonymous in GA4)
user_pseudo_id,
-- Extract the session_id from event_params
(select value.int_value from unnest(event_params) where key = "ga_session_id") as session_id,
-- The event name (e.g. page_view, purchase)
event_name,
-- Extract the page URL from event_params
(select value.string_value from unnest(event_params) where key = "page_location") as page_location
from
-- Public GA4 sample ecommerce dataset
bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*
where
-- Exclude events that don't describe navigation or user actions
event_name not in ("session_start", "first_visit", "scroll", "user_engagement")
order by
-- Order events by user, session, and timestamp to reconstruct paths
user_pseudo_id, session_id, event_timestamp