Sections

Reviews, scores, sales and playtime - a Steam perspective

Will Metacritic affect your PC game's success?

Recently EEDAR´s Head of Insights, Patrick Walker, released an update in the ongoing debate on whether review scores, notably on Metacritic.com, has any influence on sales. The question is pertinent because, as Patrick notes: "common criticisms of the metric argue that game quality can't be summarized in a single score and that the metric has been misused by publishers as a measuring stick for incentives and business decisions." The debate resurfaces from time to time, for example touched upon on GamesBeat back in 2009, and more recently by Justin Baily from Double Fine, Warren Spector and on Ars Technica.

As part of an on-going project on exploring player behavior via the Steam platform (see e.g. here and here) we decided to run a quick correlation analysis across games on Steam, focusing on sales (game ownership) and a two different aggregate review/ranking scores. In addition, we wanted to explore if review scores correlated with the actual amount of time people spend playing games on Steam.

Looking at playtime adds an interesting perspective to the debate about the potential relationship between review scores and sales: It is one thing to investigate sales and how these correlate with review scores - but do people actually play the games they buy? If our customers do not care to play the games we produce, there is at least a chance they will not be repeat customers, which is about as desirable as them not buying our games in the first instance ...

"Looking at playtime adds an interesting perspective to the debate about the potential relationship between review scores and sales: It is one thing to investigate sales and how these correlate with review scores - but do people actually play the games they buy?"

The dataset we are using here consists of records from a bit over 3000 Steam titles (only full games were included, e.g. no demos), and over 6 million players (this corresponds to about 3.5 per cent of all Steam profiles, or about 8 per cent if related to the active accounts), covering a bit over 5 billion hours of playtime. The data are from the Spring 2014, and thus not up-to-date on any developments in the Fall.

There are some assumptions in using data from Steam - for example some uncertainties about how Valve tracks playtime. For a breakdown, see here (scroll down to "limitations and caveats"). It should be noted that any analyses on the topic of game sales - which is shown clearly in the debate on the topic - needs to make some assumptions and estimates when it comes to sales figures. This adds some uncertainty, but is hard to avoid given the confidentiality of sales figures.

The good thing about game ownership data from Steam is that we do not have to make any sales estimates based on collecting information from a variety of sources, with all the potential sources of error and bias that risks imposing on the data. Working on a platform like Steam means that the sales data (ownership) are readily available, and we can account for free games, demos, etc. Note that how the game got there - for example whether the user in question bought it or if it was a gift from a friend, is not immediately obvious. On the negative side, we only have data from Steam sales, not e.g. mobile platforms.

"It is important to note that we are not specifically looking at correlations for high-ranking, low-ranking etc. games, but general correlations across the entire scale of reviews"

It is important to note that we are not specifically looking at correlations for high-ranking, low-ranking etc. games, but general correlations across the entire scale of reviews. We will publish some results on low vs. high-ranking games later.

Finally, it should be mentioned that the work presented here, and all previous analyses on the topic of reviews scores and sales, including EEDARs, is correlational in nature. This means that no causal relationships can be identified, only speculated about.

Causal relationships define why sets of variables covary, i.e. change values according to a specific relationship. The only way to do so is via experimental research, which is tricky to apply here, as a scientific approach in this case would demand for control of confounding factors, i.e. factors that could influence game sales without being related to Metacritic scores/game reviews (e.g. Christmas sales spikes, to take a currently pertinent example!).

What this means in practice is that even if it was shown that Metacritic scores correlated with number of sold units, revenue, playtime or similar metric we are interested in, we cannot, from a correlational analysis, tell if it is the Metacritic scores that resulted in the metric behaving as it is. It may be that Metacritic scores have no systematic relationship with sales at all, or conversely that sales impact on Metacritic scores in a systematic fashion. This issue something that is mentioned too rarely in these types of analyses, including those published earlier on the topic of review scores and sales.

"There are, unfortunately, many examples of correlational analysis being misinterpreted as causal experiments, with disastrous or, in some cases, hilarious results"

There are, unfortunately, many examples of correlational analysis being misinterpreted as causal experiments, with disastrous or, in some cases, hilarious results.

Rather than just correlating with scores from Metacritic.com, we wanted to see if we could get some more review-type measures for games. For this, we turned to Gaugepowered. Gaugepowered is a platform used by Steam players to rate games, follow game sales and observe basic game statistics such as median playtime, number of players playing the games or game community value. It provides a means of obtaining a review score (ranking) that is more directly influenced by the players as compared to Metacritic aggregate review scores.

We harvested review scores from MetaCritic.com for 1426 games in the dataset, and player ranking scores from SteamGauge.com for 1213 games. We then ran a simple Pearson correlation analysis with scores from the two sites against: a) game ownership on Steam (~sales) and b) aggregate playtime (i.e. how much time the games had actually been played).

For game ownership there is a statistically significant correlation at r=0.22 for Metacritic and r=0.25 for SteamGauge (r being the correlation coefficient), but neither of these explain a lot of the variance in the dataset. Just because a relationship between variables is statistically significant at some defined level of propability, does not mean the relationship explains a lot of the variance in the dataset, especially at sample sizes like the ones used here.

We also analysed the correlation between the total playtime of the games weighted by the total number of players and the two sets of scores. For Steamgauge we obtained r-coefficients of 0.22, for Metacritic 0.06, again indicating no strong relationships. It is interesting to note that Steamgauge, the more directly player-derived ranking of the two used, in both cases correlate better than Metacritic scores. It may be that ranking on platforms such as Steamgauge are more important - for games on Steam - than Metacritic scores, in terms of using these ranks to predict or estimate sales. Additional research will be needed to validate this idea, however.

In other words, from this analysis, there is no strong evidence supporting a direct correlation between Metacritic scores, Steamgauge rankings and game ownership or playtime.

It would appear that such review scores have no or only minimal relationship with whether or not people buy and play specific games, although this conclusion can only be made for the games investigated here, i.e. the Steam platform.

Latest comments (9)

Paul Johnson Managing Director / Lead code monkey, Rubicon Development3 years ago
I think people overuse and definitely overthink these kinda scores.

If a game gets 90% it will doubltess be a good quality game, lots of content and options, stable, replayable, whatever. But you still might not like it. It could be a console FPS which is my idea of hell.

However a 30% score probably means the game is rubbish whether you like the genre or not.

Customers don't want or need anything else.
3Sign inorRegisterto rate and reply
Alex Lemco Writer 3 years ago
I'd be interested to see games sales data collected and lined up against the posting of positive videos from YouTube personalities such as TotalBiscuit, PewDiePie and Angry Joe.

Edited 1 times. Last edit by Alex Lemco on 24th December 2014 1:08am

1Sign inorRegisterto rate and reply
Marty Howe Director, Figurehead Studios3 years ago
What about some disclosure, about which journalists got paid money to write a good review, or free lunch, or a plane ticket, hotel room etc.
1Sign inorRegisterto rate and reply
Show all comments (9)
Dan Pearson Business Development, Purewal Consulting3 years ago
Appynation has an excellent list of those sites (all mobile) which ask for payment for reviews. All reputable major sites have full disclosure lists saying whether they've had food or accommodation paid for or if reviews have taken place at a review event under controlled conditions.
2Sign inorRegisterto rate and reply
Bjørn Jacobsen Audio Design, Io Interactive3 years ago
Which is one of the core problems of the entire industry IMHO.
Customers being picky enough to wait for reviews is a good thing, only problem is that reviewers who are corrupt mess up this instantly.

I was glad to see so many of my non-dev-gamer-friends check reviews before they bought a game, to see if it was actually quality and stop buying games from any studio or any publisher if they had bad experiences with them in the past, because of all the problems with bad releases earlier. Which was a really good thing to witness.

Leaving out all the reviews from certain people, would definitely be nice! I'd love that myself. also tired of fanboy journalists reviewing games who apparently can't see a single flaw in the games they love and give it 100% or 10/10 yet it has obvious problems.
Sigh.
0Sign inorRegisterto rate and reply
Jeff Wayne Technical Architect 3 years ago
Personally, I highly value the metacritic score on Steam - the vast majority of games I've bought on Steam have been swung based on the metacritic score. I have zero value in magazine reviews and celebrity/personality reviews since they can be bought/swayed.

Whilst there are problems with metacritic (the zealots for and against a game), somewhere in the middle there is alot of useful information in reviews from actual players.
0Sign inorRegisterto rate and reply
Patrick Walker VP, Insights, EEDAR3 years ago
Cool study. This is a nice extension to the work EEDAR did because our study was based on an intentionally small sample for which there is past evidence that metacritic correlates with sales (console, core genres). Some thoughts on the results:

-I would expect a lower correlation between Steam ownership and review scores because of Steam price pulsing. Much of ownership on Steam is driven by flash sales where the prices significantly reduced. This is often a way for people to pick up games that caught their attention at release but were not quite worth buying at a higher price point. We do much of our analyses on digital platforms with revenue as the outcome variable rather than ownership or unit sales. When a unit sale could have been at $2 or $60 its all about revenue as the bottom line.

-The lack of a correlation between play time and review score is very interesting and relates to the broader idea of critic review being a poor fit for games that operate as a service and do not launch in their final state. These games will over index in play time and are difficult to review at launch. League of Legends is a great example. A great game that is in a constant state of improvement that has a review score in the 70s from when it launched years ago.

And I agree with Alex, a cool extension of this research would be to add social media into the equation. EEDAR is starting to do a lot of modeling with social media data so I will probably do an article on this at some point in Q1.
0Sign inorRegisterto rate and reply
Andreia Quinta Photographer, Studio52 London3 years ago
@ Bjorn
Customers being picky enough to wait for reviews is a good thing, only problem is that reviewers who are corrupt mess up this instantly.
Hence why Twitch and Youtube's 'let's play' videos are so successful. Game-enthusiasts just trust them (a lot) more since they're alike.
0Sign inorRegisterto rate and reply
Kenny Lynch Community Rep/Moderator 3 years ago
I am not wholly sure why people would move from media that have shaky ethics to those that have none. While it may have started simply as people talking about games they love, of course marketing comes in and exploits the situation.

The part I find interesting about the research is not how the player scores effect playtime but vice versa. This study would seem to point to the fact that the amount players play a game does not correlate with the rating they give it. This may have to do with different models of game play. You could play a slow mmo fit a long time without thinking it good, and play a casual game like Hoplite for half an hour and love it. Still is interesting though.

Edited 1 times. Last edit by Kenny Lynch on 29th December 2014 8:22am

0Sign inorRegisterto rate and reply

Sign in to contribute

Need an account? Register now.