The unbearable lightness of… naming things…
…or why Button Clicked won’t get you anywhere (or not too far anyways)
I am generally a pretty indecisive and non-opinionated person, in most situations I can easily see and empathize with both sides and have a hard time taking one (analysis-paralysis, anyone?). But there is one thing that I am now pretty certain about. If you are working in product analytics, please don’t name your user interaction events ‘Page Viewed’ and ‘Button Clicked’.
The Proof is in the “Poking”
In one of my previous roles, as I joined the company and opened their product analytics tool, I saw that the vast majority of events were ‘Button Clicked’, ‘Page Viewed’, and ‘Form Submitted’, and thought: ‘This is going to be a journey!’. And I was not wrong.
Within the first few days I heard from a product manager: ‘This product analytics tool sucks!’. Another person said: ‘I’m pretty sure the numbers in this tool are wrong’.
Of course, there was nothing wrong with the tool itself, and while numbers were indeed wrong in one specific part of the funnel, the rest of it was fine. The product managers just couldn’t use it because they couldn’t reliably and confidently select the right events to trend or to create funnels with.
I even did some live “user research” sessions when I asked product managers to open our existing tool and its slightly shinier competitor and plot one of the product’s key actions, while narrating their thought process. Now I could see for myself that the tool was indeed not the culprit. They were stumped between Button Clicked and Form Submitted, as well as, whichever of these events they picked, how to correctly apply additional filters to get to the required user action. Faced with this conundrum every time they tried to investigate some trend, they just gave up and chalked it up to the ‘tool’ not helping them enough, while really it was in their power all along to suggest a different naming.
Incentives, incentives
Why was it implemented that way? Because an engineer (an a brilliant one at that - one of my favorite engineers ever to work with, and one of the most knowledgeable about event tracking SDKs) built it in a way that optimized for an easy implementation on dev side as they didn’t have a data or a PM thought partner to develop a full-fledged taxonomy.
I recently asked my Twitter to chime in on their event naming preference, and ‘descriptive naming’ of the action won, but not unanimously.
I expected it to be even closer, given that I follow some data engineering and ML crowd there (I guess, they don’t follow me back! 😅), as another argument I’ve heard “pro” generic naming was the ease of wrangling all the event data into a big ol’ ML model. Although if you think about it, that argument also falls apart fairly quickly. Once you get beyond some basic event properties like ‘type’, ‘page’, ‘source’, and start collecting some richer event metadata, you end up with a colander of a segment.button_clicked table in the warehouse, that has lots of columns and lots of them (all the properties specific to a handful of events) having lots of NULLs.
Where I was going with it anyways…
So, why descriptive naming? If you’ve worked with a product team, you know that the Voldermort-like concept of ‘self-serve analytics’ is not a privilege but a must. You, as a data person, really cannot be a bottleneck for the need to look up the adoption of a feature, pull a cohort of users who did X in the last Y days, or segment a funnel.
Product managers and UX designers are curious people, their curiosity and exploration while they are in the flow of product development is something to be encouraged and enabled. They have to have the right systems in place for that, and the need to parse through a bunch of event tracking schema documentation (if it even exists, because who are we kidding) or the constant confusion of how many filters to apply to a generic event is the opposite of the right system.
User-centric taxonomy design
But, of course, there is no need to take the ‘descriptive naming’ to an extreme. My former colleague Gaurav, I think, got it right.
For marketing websites and landing pages that primarily enable users navigate through content, Page Viewed, Link Clicked nomenclature is sufficient (for page views, in tools like Segment you can even leverage ‘page’ calls that come with a set of standard properties, just don’t forget to add a custom ‘name’ property, or your tracking plan will be blown up).
But for the actual product surface with a more involved functionality, when you think about naming an event, think: ‘If John from the product marketing team, needs to see how many people shared their book reading badge to social, how would he start looking for it in Amplitude… he would probably start typing ‘shared…’... so we should name it Shared to Social - that way he won’t miss it’.
And we can add properties for what they shared (shared asset: ‘badge’/’new year’s resolution’), where they shared (destination: Facebook / Twitter), and which screen did the share happen from (source: user profile, challenge completion, home screen).
There is a different risk here - of getting too descriptive. Should this be ‘Shared Challenge Badge to Social’. Or even ‘Shared Challenge Badge to Facebook – Challenge Completion Screen’. But it is also pretty easy to think through this consideration: what is the most common way the data will be aggregated? Is the most common metric ‘% users sharing (anything) to social’? Or is it more common to look at challenge badges separately? It is generally much easier to ‘group by’ in product analytics tools than to ‘union’, so you should always go by the most common level of granularity.
All in all, it all boils down to the notion of carefully considering the end user of your data vs the skillset and resources of your data team. If you have more analytics engineer-type folks on your team and have tools and time to massage the raw event data before it makes it into your product analytics interface of choice, you can spend less time defining the taxonomy and QAing data with the eng team, otherwise - you will need to be more intentional and deliberate about your taxonomy (and don’t forget about event properties, as Josh Wills (former Head of Data Engineering at Slack) once said, the cheapest way to join the data is in production!). It may sound daunting, but it is a bit of a muscle to build with the team (plus, if you can swing it, there are some tools out there that can make things easier, although a spreadsheet works too).
If you’ve read all this way and still don’t think I’m totally off-base, I will be more than happy to share my ‘taxonomy guidelines’ template with you! Hit me up in the comments!
Voldermort-like concept of ‘self-serve analytics’
^^^ made me lol, thank you
Hi, I'm also thinking about taxonomies. Not event level data, but most common case as data product is a big phat table (>1000 columns from which we make selections) with with columns that have sometimes unique values per row, but are often less granular.
Currently we are very user focused and have short prefixes (abrrevations) for 'aspects' of the data and sometimes at the end something like "_code". So very user oriented and hybrid (not many strict recipies) for ourselves sometimes confusing.
I would like to have a first prefix based on on granularity (since we are mixing so many levels into one table), than aspect (if still needed) and at the end always a clear indicator what sort of variable it is (ID, CODE, DESC(RIPTION)). To make it clear for ourselves and do the engineering / ml stuff we want to do easily.
Anyway, so far my own ramblings. I wanted to write I was interested in the template ;-)