If you’re working on CRO, you likely have an ever growing list of A/B tests in your pipeline. Some of these ideas will be good, backed by data and / or as a result of careful analysis. Others will be mediocre and some you simply don’t know how to evaluate.
For most brands, It’s not possible to test everything due to internal bandwidth and a limited amount of traffic. It’s therefore crucial to have a framework to determine how your team is going to prioritise experiments in a way that allows you to test the highest potential ideas first and find the fastest route to value.
There are a lot of models for prioritisation, most notably the PIE and ICE frameworks. Whilst we’ve found them helpful, we also believe both are lacking in one way or another.
The PIE framework might be the most widely known in the conversion optimisation space. It includes three variables:’
Potential – How much improvement can be made on the pages?
Importance – How valuable is the traffic to the pages? (amount of traffic, etc.)
Ease – How complicated will the test be to implement on the page or template?
The problem with the PIE framework is that the criteria of each variable leaves way too much open to interpretation. How do you objectively determine the potential of a test idea? If we’d know in advance how much potential an idea has, we wouldn’t need prioritisation models. Or for example, if you’re part of a larger team, and you want to push your idea through, why not tack on a couple points to potential (since it’s a subjective number anyway). In an ideal world, frameworks would remove subjectivity. In addition, it’s hard to objectively place the importance of Ease, as well as Importance.
The ICE Score is the default prioritisation framework and was invented/popularised by GrowthHackers’ founder Sean Ellis.
Impact – What will the impact be if this works?
Confidence – How confident am I that this will work?
Ease – What is the ease of implementation?
It’s got a similar problem to the PIE framework in that if we could guess what the impact would be why would we test it? Additionally, it’s also got the problem of “how confident am I in this idea?” Again, how could we know this in advance? As objective and “experience-based” as you’d like this to be, it’s almost impossible to get a consistent and objective rating here. Again, it’s easy to skew, too, if you really want to pursue the idea. Or even if we really tried to score test ideas as accurately as we can, 2 scores out of the 3 are a lot about “gut feeling.”
There’s another ICE framework, perhaps a little lesser known to the optimisation community. This one is also an acronym, which stands for:
Impact – could be measured in sales growth, cost savings, etc. Anything that benefits the company.
Cost – straightforward, how much does this idea cost to implement?
Effort – how many resources are available and how much time is required for this idea?
This ICE framework is more specific on the criteria by which you rate tests. It also makes the scale smaller: you can only score things 1 or 2, depending on whether you believe the opportunity is “high” or “low.” You then add up all the numbers and you have an aggregate score. You make decisions off that number.
With binary scale like this, you avoid the error of central tendency. The smaller response scale tends to make things more accurate, too. As Jared Spool said, “anytime you’re enlarging the scale to see higher-resolution data it’s probably a flag that the data means nothing.”
This one is better but it’s still not perfect – the potential impact is still quite subjective. And you could have many ideas that all score a 3 or 4. Then how do you prioritise those?
With the problems of other prioritisation frameworks in mind, the PXL framework was created by CXL.
This framework brings these 3 benefits:
It makes any “potential” or “impact” rating more objective
It helps to foster a data-informed culture
It makes “ease of implementation” rating more objective
A good test idea is one that can impact user behaviour. So instead of guessing what the impact might be, this framework asks you a set of questions about it.
Is the change above the fold? → Changes above the fold are noticed by more people, thus increasing the likelihood of the test having an impact
Is the change noticeable in under 5 seconds? → Show a group of people control and then variation(s), can they tell the difference after seeing it for 5 seconds? If not, it’s likely to have less impact
Does it add or remove anything? → Bigger changes like removing distractions or adding key information tend to have more impact
Does the test run on high traffic pages? → Relative improvement on a high traffic page results in more absolute dollars.
If all we have is discussing opinions about what to test, prioritisation becomes meaningless.
The PXL model asks everyone to bring data:
Is it addressing an issue discovered via user testing?
Is it addressing an issue discovered via qualitative feedback (surveys, polls, interviews)?
Is the hypothesis supported by mouse tracking heat maps or eye tracking?
Is it addressing insights found via digital analytics?
Having weekly discussions on tests with these 4 questions asked from everyone will quickly make people stop relying on just opinions.
“Data is the antidote to delusion” – Alistar Croll & Benjamin Yoskovitz (authors of Lean Analytics)
Then we also put bounds on Ease of implementation by bracketing answers according to the estimated time. Ideally you’d have a test developer be part of prioritisation discussions.
Even though developers tend to underestimate how long things will take, forcing a decision based on time here makes it more objective. Less of a “shot in the dark.”
We made this under the assumption of a binary scale – you have to choose one or the other. So for most variables (unless otherwise noted), you choose either a 0 or a 1.
But we also wanted to weight certain variables because of their importance – how noticeable the change is, if something is added/removed, ease of implementation. So on these variables, we specifically say how things change. For instance, on the Noticeability of the Change variable, you either mark it a 2 or a 0.
All organisations operate differently, so it’s naive to think that the same prioritisation model is equally effective for everyone.
CXL built this model with the belief that you can and should customise the variable based on what matters to your business.
For example, maybe you’re operating in tangent with a branding or user experience team, and it’s very important that the hypothesis conforms to brand guidelines. Add it as a variable.
Maybe you’re at a startup whose acquisition engine is fuelled primarily by SEO. Maybe your funding depends on that stream of customers. So you could add a category like, “doesn’t interfere with SEO,” which might alter some headline or copy tests.
Point is, all organisations operate under different assumptions, but by customising the template, you can account for them, and optimise your optimisation program.
If you have lots of test ideas, you need a way to prioritise them. How you prioritise them is important, both for the quality of your tests and optimisation as well as the organisational efficiency.
The PXL model was made to eliminate as much subjectivity as possible while maintaining customisability.