Normative databases can fulfill a key role in assessing package design. After all, it’s difficult to accurately evaluate your brand’s design in a vacuum, and norms provide context to help gauge its effectiveness relative to the category or industry. In fact, action standards are sometimes established based on database comparisons precisely for this reason.
For many brands, an important question still remains: which normative database is most valuable in assessing design?
How should I assess the value of a normative database?
Some assume that a database’s value directly corresponds with its size, when in fact this is a secondary consideration. The most important elements to consider in a normative database are around data quality, composition, diversity, and predictiveness, and the following are some key questions to guide you in this research:
Is the database rooted in retail reality?
This may seem tongue-in-cheek, but many normative databases operate in an alternate reality. If a database includes pre-market designs that never made it to market (or were modified before launch), for example, any comparisons will be inherently flawed. After all, a “first draft” design scrapped shortly after it was tested is not the same as a final design that made it to the store shelf, and yet those are regularly compared in standard normative databases.
Designalytics only includes designs that were launched to market in our database, because these are the only ones that are worth measuring against.
Does the database offer transparency?
Many pre-market databases are ”black boxes”—each study was commissioned by clients under NDA agreements, which means your brand has no visibility on the database’s composition, diversity, or representativeness.
Conversely, Designalytics’ normative database is fully transparent, because we conduct syndicated testing independent of any specific paid work that we do with our clients. Since no one has commissioned these tests, we can share the insights freely. When clients do pay us to conduct pre-market testing, those results are entirely confidential and aren’t needed for the database.
Is there consistency among the stimuli used in the database?
We’ve all heard the term “apples to apples” as an allusion to the importance of consistency when comparing things. With most normative databases, it’s more like comparing apples to… the entire produce department at a grocery store.
The stimuli presented to consumers (such as a shelf planogram provided by each client) is wildly inconsistent—each with a varying number of products, as well as disparate brand block facings, position, adjacencies, and competitive set composition—making comparisons problematic and often misleading. For example, a pragmatic brand manager might test a realistic planogram where their brand only has a couple of facings on a lower shelf, but if the database is the result of more idealistic arrangements, results will show poor performance that have nothing to do with the design’s inherent effectiveness in getting attention.
Designalytics removes the stimulus variability, thus creating a more consistent normative database where brands can count on an “apples to apples” comparison.
Is there a diversity among the test results in the database?
Most normative databases built on commissioned studies are inherently imbalanced. If a brand does frequent testing with a given design measurement vendor, for example, the database will include a surplus of test results from that brand; Relatedly, the brands who do not work with that client represent a glaring blind spot, since none of those pre-market designs will be in the database.
Designalytics’ syndicated model ensures comprehensive coverage of the design landscape. We evaluate 12 brands (each of the leading brands, as well as top challenger brands) in a given product category across 150+ CPG categories, offering in-depth data and analysis based on 20+ performance factors. This guarantees diversity in our database, and makes it ideal for reliable benchmarking.
To provide some context, it’s helpful to see how an actual redesign might look in two different normative databases.
One redesign, two very different results
Recently, a legacy design measurement provider used by a majority of CPG companies tested the redesign for Jet Puffed, the popular marshmallow brand. The redesign performed very well in this provider’s testing—so much so that it named Jet Puffed as one of the “most effective designs of the year” in its annual awards.
Designalytics’ testing of the same redesign yielded a diametrically opposed result—there was a nearly 30-point difference in committed purchase preference¹ for the old design versus the updated version. This metric is our most predictive, yielding a 90% correlation with business outcomes, so we had a very high degree of confidence in a sales decline for the brand from day one.
Given the chasm between our results and the other research provider, though, we were curious. So we looked into the sales outcomes for the brand in the six-month period after the redesign launched, and then compared it with the same period during the prior year. The results? The new design prompted a share drop of nearly 4 points, equating to annualized sales losses of over $23 million².
The launch of the new design took place during the pandemic, so we wondered whether that may have been a factor in the decrease. As it turns out, though, sales for Jet Puffed were down during this period in spite of the fact that category sales were up nearly 5%. Baking and snacking were on the rise during those fraught months at the start of Covid—so all things being equal, one would expect the brand's sales to climb along with the rest of the category. That’s not what happened, and the new design was clearly a driving force behind this slump.
Think about it: This design won an award as one of the top performers in a research firm’s database. If you were comparing your brand’s design to this one, you’d draw a completely misguided (and potentially brand-busting) conclusion.
So how did our system correctly predict this outcome, while another completely missed the mark? That brings us to the most important question of all when It comes to your normative database:
Can the results in a normative database be tied to sales performance?
As we noted above, it is vital to have clean, reliable, and useful data when comparing the effectiveness of designs, and that’s why Desiganlytics has gone to such lengths to ensure our database’s consistency, diversity, and transparency.
Yet in-market results are the ultimate barometer of success in package design. If pre-market design test results are not tied to sales performance in any way, then brands can’t realistically expect to predict business outcomes.
By testing thousands of in-market designs and redesigns and cross-referencing test performance against actual in-market outcomes (via point-of-sale scanner data), Designalytics has been able to reverse-engineer our exercise designs, analytics, and success thresholds to align with business results.
In short, we’ve built a system that can actually predict, with greater than 90% accuracy, whether or not a given redesign will drive brand growth. We’ve also drilled down to the metric level—by conducting ongoing meta-analysis of our redesign database, we’ve correlated changes in specific metrics (when comparing old and new designs) with year-over-year sales trends following the design changes.
Why does this matter? Well, imagine having a design that ranks in the top quintile of a normative database for standout performance but only performs slightly above average on choice driver communication. Is this performance indicative of growth potential? Only by understanding the correlations between various metric performance and in-market business results can you confidently predict business outcomes.
A better normative database is built on better data and predictive capability
Many design research providers —have built databases that, among the other shortcomings outlined above, contain design assessments completely disconnected from in-market outcomes. Some actually seem to deny that design measurement can actually predict sales outcomes—which is surprising, because we’re proving that it can, day in and day out.
The best part? You don’t have to take our word for it, nor do you have to wait two years to pilot from project beginning to sales tracking. We’ll show you why our normative database is better, with specific examples like the one above. And then we can help you audit past redesigns for their validation results and business outcomes, compared to our system’s assessment. When it comes to leveraging design for growth, don’t trust, verify.
Get in touch to see the difference a better normative database can make.
¹ Committed purchase preference is a proprietary measure of purchase conversion potential of a design based on purchase choices and degree of interest over a favorite brand.
² The annualized impact simply applies the annual value of the share change assuming it would continue for another six months. Though other factors can certainly contribute to the outcomes, declines in design performance negatively impact the effectiveness of almost all other factors involved. The same analysis can and should be replicated using independently-sourced scanner data for your own redesign initiatives.