Insights

Lean UX in Practice

Digital Strategy - SE
Valtech

October 21, 2016

“A backlog is just a list of untested hypotheses. The only acceptance criterion is business outcome.” Jeff Gothelf’s words sound smart enough. But how do things work out in practice?

I work as a consultant for SVT on the services Svt.se and Svtplay.se. These are websites with a lot of traffic and with clear and measurable targets. The culture at SVTi (which is in charge of developing digital services for SVT) encourages personal responsibility and decentralised decisions in the teams, and chalks up mistakes as a learning process. In other words, it has all the preconditions for working on learning over feature development. Check, check and check.

Creating a data-driven culture

Which is what it’s all about, after all. To learn what creates utility as fast as possible, rather than developing features and content using the spray and pray method. In our awesome team, we always try to demo learning, and learning alone. In other words insights, data from experiments, and what we get out of our KPI monitoring.

In our awesome team, we always try to demo learning instead of shipped features.

Teams I worked on in the past used to demo what we had developed. A bunch of features and functions. It always felt good to show what we had been toiling on. But the focus was wrong. It established the wrong truths. That we had created value now that our features had shipped. We could create waste and get away with it. Not a good thing when you are working with valuable license money.

The effect of only demoing learning became clear after a while. Stakeholders and our team began to share common objectives. Editors wanted to run their own A/B tests. And help achieve the objectives. Requisitioners wanted estimates of the anticipated effects of various ideas before they “ordered”.

We’re not exactly big fans of dogmatic doctrines, processes or methods.

A model showing how we work with requirements, click to magnify

1. Requirement and initiative

The team receives a requirement. Ideally, we want to receive requirements formulated in an exploratory, Socratic style. Like: “Explore personalising recommendations in SVT Play". Or: “What if we move the program listings from Svt.se to Svtplay.se?”

Requirements can also arise from initiatives within the team. Good ideas, in other words. Or they can result from the hacking we do during scheduled hacking days.

2. Research

How much should we research a requirement? Tough question, but the answer is simple. Enough so that you can formulate sufficiently accurate hypotheses. Is the cost to validate the hypothesis high? Research more. Do you have the ability to test hypotheses cheaply (recommended as a strategy)? Keep the research to a minimum.

We receive a large requirement. The first thing we do is calculate potential compared to the targets for the various services. Will the requirement help us reach our targets? We calculate and verify using traditional web analysis. We have metrics for almost everything, down to the last detail. But there is also valuable data we don’t have. For example, we can’t correlate total time watched with adjustments we make to the interface.

Examples of flow analysis for page types on Svtplay.se

We look at our metrics: history, seasonality and trends over time. Using monthly monitoring and all the A/B tests we perform (around 50 over the last year) we get a good appreciation of the magnitude of the idea’s potential. E.g. moving the program listings from Svt.se to Svtplay.se. We have metrics on bounce rate, exits, conversions to Svtplay.se, etc. We have metrics on visitor volume over time, and are able to track the trend in searches for “program listings” over time using Google Trends.

In many cases we can help our stakeholder understand how the requirements will affect our overall targets as early as this stage. To understand the potential in the form of visitor volume and target achievement. Managing impact expectations at an early stage.

3. Design studio

We conduct design studios if needed. This is done to capture perspectives and solution ideas from all stakeholders and team members. And to get the whole team to understand the big picture and the problem we want to solve: team alignment. Design studios can be prepared thoroughly or done ad hoc, depending on the complexity of the problem, the risk, and the cost of testing ideas.

There are a lot of good things about design studios. But sometimes they suffer from too much design by committee. There’s no definition or edge in the proposed solutions. Ill-defined concepts with no clear direction. But sometimes it turns out really well. And may affect the feedback culture in the group result. When it’s good it’s good.

4. Hypotheses

From here on we’ll be talking about hypotheses. Never about requirements.

The output of the research and design studio work is hypotheses. Requirements have become hypotheses. From here on we’ll be talking about hypotheses all the time. Never about requirements. “Hypotheses” set the focus on an uncertainty that needs to be validated. “Requirements” set the focus on something that is already established as being good for the service and its users. And we don’t know that yet.

You can write hypotheses in all sorts of nerdy ways. We write them more or less however we want to. Sometimes as requirements, and sometimes in a more structured form with target groups. The most important thing is to make sure we include how the hypothesis will be validated. Then you’re good to go.

5. MVP (Minimum Viable Product)

In the team, we discussed the absolute minimum version of what we need to build to test the hypothesis. Often you end up running one or more dumb and quick experiments in the form of A/B tests. We hard-code a new recommendations row into Svtplay.se’s homepage at certain times of relevance to the test. We insert a semi-dynamic listings element on Svt.se’s homepage.

In other cases we develop a more complete service that we then run A/B tests on. Sometimes on a few, sometimes many. It depends on the volume of traffic and the risk that variations will perform so poorly that they will affect the target achievement of the entire service. We don’t want to risk reducing the number of video views by several thousand due to the validation of a hypothesis.

Example of a dumb A/B test in Optimizely that can be set up quickly

If we see potential after completing the A/B test, we almost always conduct a follow-up test. We want to reduce the risk of the results having been affected by content, timing or other factors. The same test again. Sometimes we see some potential and then adjust the proposed solutions and test again. Then we run a follow-up test again.

We’re talking razor-thin adjustments here

Validated hypotheses then move on to the prioritisation process. Hypotheses that were not validated are discarded or rewritten. If the stakeholder wants to proceed with hypotheses that were not validated, we spell out what the consequences will be. How it will impact the service’s targets. The team makes it clear that from here on out we do not take any responsibility for how this hypothesis will impact target achievement.

(I cannot stress enough how important it is to always include the service’s overall targets in A/B tests. We often see positive results in A/B testing on the micro level, i.e. new content or features, while the overall target is affected negatively or not at all. It also frequently involves cannibalism and the risk of suboptimisation.)

6. Development

Development and shipping as per usual.

7. Follow-up over time

You might think that thanks to analysis and validation of hypotheses, our accuracy is such that we are certain of having created value. But no. We need to follow up over time. We might have missed something. Seasonality, content or something else may have affected the results. Something we didn’t think of.

We write down how to follow up on the hypothesis in our requirements tool. So that everyone is able to do it. Measuring points and how to navigate to the report.

KPI monitoring as a safety net

This process is somewhat simplified. But not by much. We work using qualitative methods when needed. We do so in the research phase, if the level of complexity is high. If the A/B test returns wacky data, we may want to understand why. But we never use qualitative methods as a use test to validate hypotheses. That would be unprofessional.

Of course sometimes we do not bother to validate hypotheses. We just forge ahead. We do that for minor requirements and in cases where we have plenty of knowledge about how the service will be affected by the requirement.

We have a gigantic (I’m not joking) follow-up sheet where we follow up on both KPIs and detailed metrics every month. That is our safety net to keep us from letting through requirements that do not create value over the long term. The follow-up document puts us in a position to be able to make tactical changes and reprioritise.

Image by Johan Larsson under CC BY 2.0