Sixteen mistakes I made in digital analysis

Daniel Hansson

Digital Strategy - SE

Valtech

December 09, 2019

Are you interested in getting into the nitty-gritty of digital analytics, but are you scared it might be complex, difficult and easy to make mistakes? This is a fast-forward past the stumbling blocks that I encountered.

I like to remind myself that I am in the exploration business. The faster I fail, the faster I learn and the more successful I become. Look at these mistakes as lessons, and don't be scared to make mistakes of your own.

Implementation

#1: I have used corrupt data in my analyses

I have joined new teams several times and not been familiar with, or simply ignored, the tracking history. I've drawn conclusions based on corrupted data countless times. Often it was the case that tracking broke down entirely, or that the tracking functionality had submitted incorrect data.

Now I always ask for a historical review of the tracking. If there is no documentation or annotation structure, I make sure we get started on one.

Analysis and conclusions

#2: I have missed seasonal variations

I have often drawn rash conclusions based on data collected over too limited a period. Sometimes you don't have access to historical data – tracking might be missing, for example – making it tempting to draw conclusions from what little data you have.

I have run A/B tests during periods that were not relevant to the product and its use.

I have run A/B tests and analyses during periods that were not of significance to the product and its use. One example was an experiment to promote drama shows in a video on demand service, even though the supply of dramas is poor at the end of May. Another was analysing television listings data during Christmas week and thinking that the behaviour I was seeing was representative of the full year.

Now I ask other analysts and colleagues about which seasonal variations are important with regard to the context, industry and product before I start an analysis or experiment. I extract historical data spanning multiple years for the domain I am investigating.

Swedes appear to make their only informed wine selection around New Year's Eve (Source: Google Trends)

#3: I have passed on incorrect data

This may be the most annoying mistake, as it entails diminished trust and a risk of incorrect decisions. Data can be incorrect for a variety of causes. Errors in segments, errors in filters, corrupted tracking, bugs in the analytics tool itself, UX problems with the analytics tool, etc. If there's one mistake you want to try to avoid at any cost, this is it.

For large-scale or otherwise important analyses, I often have another analyst, preferably from another team, double-check my segments and filters, and provide comments on the conclusions I draw. But it doesn't always help, because it is easy to make mistakes. If you are uncertain, communicate your uncertainty. Just as when you report on the results of usability tests and other research methods – show some humility and avoid taking a “this is the way it is” attitude.

#4: I have sub-optimised

I have been blinded by the effects of new features and content to the point that I did not see that they lead to behaviours with an adverse impact on higher-level objectives.

An example would be a new functionality that draws clicks but then leads to behaviours associated with a high exit rate. Coupled with the fact that the goal of the product is to work to increase time spent. Such new functionality may also lower product loyalty compared to another functionality.

Now I measure every last experiment against multiple higher-level objectives whenever I conduct A/B testing. For new functionality, I usually follow up on higher-level metrics like loyalty, session duration, rate of return to previous pages and exit rate.

Sample results from an ongoing A/B test that appears to be going well at first sight.

Whereas in fact we cannot see any positive impact on the higher-level objective.

#5: I've missed cannibalism

Many times I have missed the fact that behaviours and clicks simply bounce around in the service when running A/B tests or adding new functionality. It may appear at first glance as if the experiment is going well and generating value. Whereas in fact clicks are being cannibalised from behaviour that is more valuable to the business. It can also be the case that clicks are shifted to a functionality with a worse effect on loyalty than where they came from.

Now I track all the elements in the view I am running A/B tests on to see how the clicks shift compared to the original version. I also try to analyse how cannibalism impacts loyalty and higher-level objectives.

#6: I have drawn quantitative conclusions based on insufficient data

Sometimes there is simply too little data. I have often been tempted to draw quantitative conclusions and generalise in spite of this. Sometimes things have gone badly wrong. For example, there were times when I was unable to extract data in the aggregated way I wanted to. What I did then, for example, was to:

manually review the 50 most used pages in a specific report, or
conduct numerous spot-checks.

This can yield ideas for hypotheses. But it is also dangerous to draw general conclusions in this way. If you are looking at pronounced long-tail behaviour, it may be the case that the 50 most frequently-used views represent an insignificant behaviour and a very small percentage of total use at the same time.

Nowadays I only use insufficient quantitative data for generating hypotheses (associated with a higher risk level, which I communicate).

#7: I have drawn rash conclusions regarding cause and effect

A difficulty can be when there is a third variable that is the actual cause of the correlation

The fact that the correlation between two variables is high is indicative of a relationship (e.g. number of conversions and number of visits from search). Many times I have drawn overly hasty conclusions about which variable is the cause and which the effect. Whereas in actual fact the situation may be the other way around. Another difficulty can be when there is a third variable that is the actual cause of the correlation (e.g. money spent in AdWords).

Now I try to perform a causation analysis when the overall analysis is of vital importance (possibly omitting this step for hypothesis generation at the feature level). I try to rule out third variables. I examine causality and look to make sure that changes in one variable are also reflected in the other variable.

#8: I have let preconceived notions affect my analysis

Several times I have had a hypothesis in mind that I tried to validate using quantitative data and experiments. Instead of looking at the data with an open, neutral eye, I unconsciously analysed the data in a biased manner. That way lies trouble. Once a feature ships and the results of A/B testing or quantitative analyses turn out, over time, not to hold water, you're the one holding the bag and being called on to explain why.

Nowadays I have another analyst, preferably one from outside the team, review major analyses before they are presented.

A/B testing and experiments

#9: I have run A/B tests that were too limited in scope

At times, the long-term positive impact has taken so long to come about that there was no time to discover the change in the A/B test.

I have wrapped up A/B tests too early, even after finding something statistically significant, and have drawn conclusions based on the results. The results were then communicated, creating expectations in stakeholders and clients as to lasting future effects.

Several times it turned out that there was an initial level of curiosity and user engagement with new functionality, but that interest then dropped off after a few days/weeks, coupled with decreasing loyalty to the new functionality. This was especially common in cases where we had a high percentage of loyal users who used the product frequently.

In other cases, the user's change curve was so drawn-out that the experiment only covered the initial negative impact change. At times, the long-term positive impact has taken so long to come about that there was no time to include the change in the A/B test.

Now I make my experiments as long as I can to reduce the risk of jumping to conclusions. I monitor the trend in users’ rate of return to the functionality (use cohort chart). I do follow-up testing. I am careful about preparing my clients to expect such follow-up to show, over time, whether the functionality creates value or needs to be removed. Shipping a feature is not the same thing as creating value and does not mean that the functionality should necessarily remain in the product.

Sample cohort chart showing what percentage of users return to a given functionality.

#10: I have drawn conclusions based on too few experiments

I have at times neglected to carry out follow-up A/B tests. For instance because a follow-up would take too long time to collect data or implement. Compared to consistently performing follow-up tests, this significantly elevates the risk that randomness, content, time of testing, season, etc. will impact the results of the experiment.

Now I almost always perform follow-up A/B tests. If the original test and the follow-up test produce the same results, your analysis will be significantly more reliable than if you had only run one test.

#11: I have communicated the results of A/B testing that anticipated future effects

I have repeatedly presented results (“Element A produced a 70% higher CTR than the original element”) in a way that led stakeholders, the project team and the client to expect similar results on release.

It is important to understand that the statistical significance only applies to element A having a higher CTR than the original. In addition to not having sufficient reliability at the level of impact enhancement, the content, season and other factors can affect the level of improvement.

Nowadays I only indicate whether a variation performs better or worse than the original.

#12: I have neglected to forward A/B test data to my analytics tool

There have been times when I forgot or neglected to integrate my A/B testing tool with Google or Adobe Analytics. This leads to two problems:

I have no data for key metrics and segments, as the A/B testing tool often does not track everything that my analytics tool is able to.
I am left with only one source of data, which is a risk if test implementation was faulty.

I make sure to integrate, e.g. Optimizely, with Google or Adobe Analytics. I follow up to make sure that the result looks the same in both the A/B testing tool and the analytics tool.

#13: I have failed to segment my A/B tests

I have repeatedly prematurely dismissed hypotheses when reviewing the results of completed A/B tests. At the superficial level in these cases, the generic result showed no impact improvements whatsoever.

Don't test the same generic experience. You often get greater upside from context optimisation.

In point of fact, we can often identify major improvements within certain segments (first-time visitors, visitors from campaigns, mobile users, etc.) that are not visible at the aggregate level.

In addition to the ability to personalise experience, design and landings that segmenting affords, it is also very difficult to accomplish major positive changes in the generic visitors category when running A/B tests. Generic testing produces generic results, as someone put it.

In my opinion it's best to segment even before conducting the test. In other words, to set up different tests for each segment instead of segmenting the results. Users within a segment often have different needs. Try to understand and address them, and try to harness the optimisation potential. Don't test the same generic experience. You often get greater upside from context optimisation.

Conditions

#14: I have underestimated the user's motivation

It is not uncommon for stakeholders and clients to attend conferences and read blogs that show amazing A/B testing results. For tests run on small changes. Expectations run high that large, lasting results will flow from the A/B test. I have repeatedly accepted assignments where the prospect for improving the impact were slight.

It has proved in many cases that users are so motivated to solve their problems that you can get away with relatively lousy UX. Examples include booking a doctor's appointment online, registering a change of address, applying for a building permit, or buying a product available in only one place. It doesn't have to be waste to work with these processes, but there is a risk that the improvements will be so minor that the optimisation investment is not cost-effective.

Now I try to begin by understanding the level of the user's motivation. Whether the user is able to perform the process using different methods or using a competitor's service. And what barriers need to be cleared to work with those alternative methods.

#15: I have spent analytics time on things that I cannot influence

When an organisation's culture and politics throw up too big roadblocks, there's only one thing to do if you want to grow or accomplish anything: jump over to a new assignment or employer.

More than once, I have accepted assignments where the customer wanted “to start doing data-driven work”. These customers had often made up their minds about what needed to be done, and it was hard to budge them from their convictions. Things like performing A/B testing or creating decorative dashboards and reports.

In fact, a lot of that work was waste. Corporate culture and politics have led to situations where decisions are still not taken on the basis of data. Only based on the “right” data. Dashboards and reports end up not being used for analysis and hypothesis generation, but for reporting purposes only. A/B tests are run on an insufficient volume of traffic. In this situation you end up performing experiments that are of limited value and insight potential.

Several of these frustrating problems can be solved with the right knowledge and coaching. But when an organisation's culture and politics throw up too big roadblocks, there's only one thing to do if you want to grow or accomplish anything: jump over to a new assignment or employer.

These days I take care to at least attempt to understand the organisation's culture and politics before deciding to accept an assignment. A good question to ask to rapidly gauge an organisation's culture is "When was the last time you removed a shipped feature from your product?". The answer to this often shows you how effectively the organisation monitors business outcome and acts on data.

#16 I have worked as a specialist outside the development team

There have been quite a few times where I worked as an analyst outside the development team, especially when I was working with multiple teams simultaneously. This usually does not end well, for several reasons.

It makes it difficult to influence priorities in the backlog (which can impede my progress).
All too often the result is an uphill credibility battle when you are presenting an analysis or proposing hypotheses for validation (the not invented here syndrome).
I have missed product adjustments, bugs and insights that were only communicated within the team.
The gap between the concept developer/UXer and me grew too wide.

Now I always try to make sure I am part of a development team. The analytics role produces maximum traction when the methods are integrated naturally into ordinary product development work. Into research, hypothesis validation, monitoring and prioritisation.

Continuous learning

I make mistakes all the time. Which is not something I worry about as long as I learn and do not repeat them (too often). More worrisome would be if I didn't venture to try out new methods, tools or approaches to problem solving, comprehension and hypothesis generation.

Go for it. Getting there is fun.