The data is in and your review score still matters - EEDAR

"But the 80s might be the new 90s," says Patrick Walker, EEDAR's Head of Insights and Analytics

The past several years have seen significant change in the games industry; everything from who plays games to how and where those games are paid for and consumed has been shifting and diversifying. Just perusing the topics that have dominated recent headlines - games as a service, free-to-play, steam sales, casual mobile gamers, eSports, virtual reality - indicates how rapidly the games industry is evolving.

This dynamic pace of change has led many in the industry to question whether the average of review scores, surfaced by sites such as, is still a critical metric. Recently, the COO of Double Fine games, Justin Bailey, made headlines by stating that Metacritic "doesn't really matter, as far as sales of the game."

The importance of average review scores (i.e. Metacritic scores) as a measure of game quality has always been a subject of debate. Common criticisms of the metric argue that game quality can't be summarized in a single score and that the metric has been misused by publishers as a measuring stick for incentives and business decisions.

Historically, data has shown a strong relationship between console sales and average review score, especially early in a console cycle. While the causality of this relationship has been debated (Do people actually go to, or are they just reading reviews on the bigger sites like IGN and Gamespot? Is the Metacritic score actually measuring a game's quality or some other aspect, like development budget or marketing?), it is generally agreed upon that a Metacritic score can provide insight into how well a game did financially. However, this is changing at the beginning of the 8th console generation.

"while the correlation between review scores and sales is still "strong", the 8th generation has seen a decline in the strength of the relationship compared to the early 7th Generation"

Several high-profile new IP releases, Destiny and Titanfall, have had massive financial success despite critical reception and average review scores that fell below industry expectations. A common theory is that there may be a growing disconnect between the features that actually drive sales and the features that critics focus on when reviewing a game (e.g. lack of endgame content in Destiny at launch, the lack of a single player campaign in Titanfall).

In order to investigate if there is data to support this growing skepticism, EEDAR conducted an analysis to investigate whether the relationship between sales and average review score is as strong at the beginning of the 8th generation as it was at the beginning of the 7th generation. EEDAR included titles in the analysis for which the relationship between review score and sales has been the strongest in the past, HD console titles (PS3, PS4, Xbox 360, Xbox One) in the core genres released through September 2014.

EEDAR compared the correlation between review score and sales success at the Beginning of the 7th generation, the Middle/late 7th generation, and the Early 8th generation. Sales success was measured as the normalized North America 6 Month physical and digital unit sales.

The data suggests that there is some validity to claims that review score is less critical to sales. The correlation between two sets of numbers is measured by Pearson's r, where a correlation above 0.7 is evaluated as very strong and a correlation between 0.4 and 0.6 is evaluated as moderately strong. In the beginning of the 7th generation, as consumers were encountering new kinds of games the correlation between reviews and sales performance was especially high for New IPs. Consumers trusted media sources to provide guidance on which games to buy. This relationship weakened as the generation continued, however, as consumers became more knowledgeable in the brands and experiences they enjoyed, and relied less on critical reception.

Interestingly, while the correlation between review scores and sales is still "strong", the 8th generation has seen a decline in the strength of the relationship compared to the early 7th Generation, despite new business models and experiences. EEDAR's theory behind this weakening relationship is that the rise of smartphones and social media has created an environment in which consumers are getting their information from a broader range of sources.

A deeper investigation into sales success at different review score ranges reveals the score ranges and game types for which review score might not be as critical to success as it was in the past. The graphs below show sales success normalized on a scale of 1 to 5. The first graph shows sales success for HD console titles in core genres at different average review score ranges for New IPs, comparing the early 7th generation to the early 8th generation.

The data reveals two important trends: First, New IP in the 8th generation that received moderately high review scores (70s and mid 80s) have outperformed titles in these review score ranges in the Early 7th generation. This data point is in line with the success of Destiny, which has an average review score in the mid-70s, and Titanfall, with an average review score in the mid-80s. Second, there has yet to be a new IP in the 8th generation with an average review score in the high 80s or 90s (87+) (The Last of Us: Remastered is considered an existing IP on PS4 by EEDAR). There is still a significant sales performance penalty against titles with review scores below 70, and it's a penalty stronger than in the early 7th generation.

The second graph shows the same data for Existing IPs - sales success for HD console titles in core genres at different average review score ranges.

The data on the relationship between sales and average review score reveals a similar trend to the data on New IPs. While there is clearly a penalty for achieving a low review score for an Existing IP in the 60s or even the 70s, there is less of a strong relationship between achieving a breakout review score in the high 80s or 90s and breakout success. The sales performance score for 8th generation Existing IPs is approximately the same between games with an average review score between 80 and 86 and games with an average review score of over 87.

The data for both Existing and New IPs suggests a common theme - 80s are the new 90s. While achieving a breakout review score does not appear to be as critical as it used to be, it is still important to release an HD console game that reaches a certain threshold of critically-determined quality.

The broader point here is that the video game industry is changing rapidly and understanding game performance requires leveraging and combining a broader array of data sources. EEDAR believes that the average review score of a game is still important to predicting the sales success of a title on the HD consoles, and we leverage average review score heavily in our forecasting models to provide expert driven services that evaluate the quality of an unreleased game using the average review score framework. However, we also believe that implementing more sources of data is more important than ever. To this point, our models and services are heavily leveraging new data sources, such as social media data, in combination with more traditional approaches.

Note on methodology: EEDAR calculates Physical and Digital sales through a combination of sources, including point of sale data from a partnership with the NPD Group and EEDAR internal models for worldwide digital and physical sales based on data from the proprietary EEDAR database.

More stories

Take-Two's Zynga takeover | Podcast

Latest episode available to download now, also discusses how Apple Arcade breaks free-to-play

By GamesIndustry Staff

What do we expect to see in 2022? | Podcast

The team share their predictions on our latest podcast

By GamesIndustry Staff

Latest comments (10)

Dan Tubb Investment manager, Edge7 years ago
‘Recently, the COO of Double Fine games, Justin Bailey, made headlines by stating that Metacritic "doesn't really matter, as far as sales of the game."’

I was rather baffled by that story as for me personally it runs completely counter to how I select games. For me the real metacritic score, and by that I mean the user score, is always the primary method of identifying good games. If the game is an indie or in a genre I like after having a good metacritic user score then it is usually an instant buy for me, then I just hope I find the time to actually play it.

Some people think that user score are bad because are subject to review bombs, but that generally happens when they implement some FU-DRM or some such. And the pro-journalists just don’t have gamers trust anymore.
1Sign inorRegisterto rate and reply
James Berg Games User Researcher 7 years ago
There's a total of 4 new IPs on console that broke 87 in 2006, and possibly up to 7 in 2007 (not familiar with some of the games). I believe it's 8 in the 87+ this year, without spending a ton of time double-checking, and again, just counting console.

The margins for error on sample sizes that small are going to be pretty darn big. Can we get a look at more detailed data? I'm somewhat dubious about the claims here, especially given they're normalizing sales data (which counts high sales as being less impactful).
1Sign inorRegisterto rate and reply
Jordan Lund Columnist 7 years ago
"For me the real metacritic score, and by that I mean the user score, is always the primary method of identifying good games."

o_O - For me the user score is so driven by fanboy love and rage it's a completely useless figure.
7Sign inorRegisterto rate and reply
Show all comments (10)
Patrick Walker VP, Insights, EEDAR7 years ago
James, the low sample size is definitely something to take into account as there have not been that many high scoring new IPs thus far in the 8th generation. The correlations in the first graph have a higher sample size and are statistically significant. As you point out, the individual review score bands in the column graphs have a higher margin of error, especially for the high review scores. I still think there is value in investigating the potential trends in the data that may become more firmly established as the generation continues.
0Sign inorRegisterto rate and reply
James Berg Games User Researcher 7 years ago
Yeah, the first graph is awesome, and really interesting. The others are indicative of a possible pattern, but without knowing the sample sizes or MoE, those results could be entirely stats being random, as is their wont ;) Still, cool direction for research.
0Sign inorRegisterto rate and reply
Adam Jordan Community Management/Moderation 7 years ago
I think the best example anyone can use here is CoD vs BF.

When BF3 was against CoD: MW 56 1/2 (Light hearted joke :3) - Metacritic user scores were a mess and the CoD score was lowered because the reviews consisted of the lowest score possible and a tag line of "Doesn't include Dinosaurs". The other problem with user review scores (Whether it be on metacritic, steam or elsewhere) is that if a game doesn't work perfectly the instant it is released, it is subjected to a low score. I can understand the frustration but as someone mentioned, user reviews are subjected to fanboyism and mob mentality. Sometimes it is just led by technical issues. I have seen so many games get smashed by "Game doesn't work on my super1337gamingPCthatisfrom1667" reviews that it isn't even funny any more.

Metacritic pro review scores are murky at best. I particularly don't follow pro-reviews purely because of an experience I had back in 2000 (Give or take a few years) It was a game called Skies of Arcadia for the Dreamcast. Every gaming magazine was smashing it to pieces, giving it a 30% score because it was the era of Final Fantasy 7 and 8.

I decided to pick up the game because I was able to get it in a buy one, get one half price sale and you know what? I couldn't put the game down. I loved SoA every second I played it and it took me 3 months to complete but it didn't end there. I continued to play it even though the story and solutions to the puzzles were clear in my mind. I then completed it within 1 month and still picked it up...between casual playing of Phantasy Star Online.

2002 came and the end of the Dreamcast, I sold mine to get a Xbox the next year but in 2004 I bought one again from my friend. Bought Skies of Arcadia from Ebay and off I went again. 2009 I went to Germany and took my Dreamcast with me and SoA kept my sanity in check. 2014 and my Dreamcast and SoA are still going strong...I even started a new game this year.

My point is that while that was one game that got hit with a low score. Not every game that hits the 80's and 90's are fantastic (Granted some of them do earn their score...looking at you Shadow of Mordor) but I rarely use Metacritic or pro-reviews any more...not used them since that fateful day 14 years ago.

In fact I am quite happy to take word of mouth over any review. My friends are the review boards both pro and user. Sure, they can still tip the scale if I am still unsure about a game but if the price is right, I will shrug it off and grab it.

Such as Lichdom: Battlemage. When it was in early access and came out of it, it was getting mixed reviews and even negative "stay away" reviews on Steam. I did stay away, keeping the game on my wishlist and keeping an eye on updates. Then it got a heavy discount. £8.99 later, I grabbed it and started playing and enjoying. The metacritic score for it is 69. Not a low score but in terms of "super amazing GOTY" games, it's not a great score either.

But yeah, overall, Metacritic will always have a place in the world and in decisions, I just think people will start to use it more as a final decider solution rather than a Metacritic only decision
1Sign inorRegisterto rate and reply
Robin Clarke Producer, AppyNation Ltd7 years ago
Yeah, user review scores are useless for ascertaining anything other than if a product is literally defective. Anyone using them to guide buying decisions would end up with an incredibly unadventerous library.

I think EEDAR's analysis may be putting the cart before the horse. Reviewers have traditionally been reticent to stick their necks out and rate anything that isn't being pushed as an 'event' release by publishers in the 90%+ bracket. These tend to be games with the strongest commercial appeal, and it's rarer now (although it still sometimes does happen) for publishers to try to push a Driv3r-style clunker as a tentpole release.
1Sign inorRegisterto rate and reply
Metacritic rating at best reports on a game's quality, it doesn't inject it with quality. If a Metacritic rating broadly tallies with sales it's because it broadly tallies with how good the game is - and how good it is remains the prime influencer of success. Lots of high reviews scores does influence a game's sales but it's not the deciding factor. And even if it were the decider, what dictates those high review scores? We're back to quality again. The players still ultimately decide everything, we're best off building our games for them on not worrying about, or basing bonuses on, Metacritic.
7Sign inorRegisterto rate and reply
Joonas Laakso Production Lead, Next Games7 years ago
What Barry said above. I don't believe that users at large are looking at review scores when making their purchase decisions. Hardcore gamers who frequent gaming sites (anybody who's aware of this site's existence), sure, but that's a tiny, tiny piece of the mainstream buying audience.

There may very well be correlation between Metacritic and sales (after the fact), but sales surely are not caused by the Metacritic score. I don't understand why we even have this discussion without looking closely at the actual purchase habits of consumers. The reason local retail is still so big is that so many game sales still happen at the physical store, browsing covers (not online). If people made rational purchase decisions, like they think they do, they would all buy the cheapest online option, based on reviews. That's just not happening.
0Sign inorRegisterto rate and reply
Sean Kauppinen Founder & CEO, IDEA7 years ago
If MetaCritic and EEDAR want to stay relevant, they need to start tracking sentiment within YouTube reviews and Let's Play videos, plus Twitch streams. That's where the core gets its game info and interest in this console generation. That's where data would be relevant.
1Sign inorRegisterto rate and reply

Sign in to contribute

Need an account? Register now.