Defining Success: Why Metacritic Should Be Irrelevant
Warren Spector tackles the thorny issue of Metacritic's impact on the industry in his second exclusive column for GamesIndustry International
To qualify for column treatment here, a subject has to involve something I've been thinking about a lot - a problem I can't solve or a question I can't answer. This time, I started with a question that's plagued me for years (yes... even before the reviews started coming in on Disney Epic Mickey!). That question is "Why does everyone care so much about Metacritic?"
But as I started thinking about turning that question into a column actually worth reading - something that wouldn't sound like sour grapes or clichéd developer whining - I realized that the thing I was really thinking about, the thing we should probably all be thinking about, was bigger than that. The topic worth talking about was how we define success in gaming, how we measure it and how we use that measurement, for good and ill.
I started thinking about ratings and rankings, yes, but also about how reviews relate (or don't) to critical analysis and about how different parts of the game business balance art and commerce. Clearly, Metacritic is part of the answer, but it's just a part. I'll still talk about Metacritic, some, but it won't be the only thing I talk about, as I originally intended.
So, let's talk about success.
Obviously, there's personal, individual success. You know, things like Fame (people recognize you), Respect or Significance (people acknowledge your worth and recognize your contributions to your field), Wealth (people see you driving a fancy car or - ahem! - owning several houses)... These are interesting but a topic for a therapist not a game developer like me. That's not the kind of success I want to talk about here.
What I am interested in is how we define an individual game's success. With thousands of games produced and offered to players each year, this is a vital question - for developers, for publishers, for players and even for critics and historians whose job is to explain our medium to us and to future generations. How do we balance and describe commercial success vis a vis critical success and what might be called significance? How do games relate to other media? And how do we communicate how a game fits into - or stands out from - the continuum of games and media?
Needless to say, these are huge questions and that hugeness allows me to rationalize the fact that I'm not close to answering any one of them, definitively. That, of course, is where all of you come in: I'll lay out my thoughts here, hazy though they may be, and then you get to tell me where I've gone wrong. Don't let me down!
Games as Commercial Art
Clearly, games are a commercial art, and success in a commercial medium requires that we focus on the "commercial" side of the equation. A game has to generate some amount of revenue and, more important, profit.
But what about the "art" side? My personal approach to game design is to offer players a variety of ways to act and then, as much as possible, allow them to define success for themselves. Why should the various stakeholders in the creation of a game not have similar power? Well, clearly, they shouldn't. The term "commercial art" is nicely ambiguous, capturing the idea that individual stakeholders might have very different definitions of "success."
For example, most if not all publishers think first about revenue and profit - no surprise, it's their job to do so. That's a relatively simple measure of success, right? Did the game make money - yes/no? (In reality, the determination of profitability is surprisingly difficult; nothing's ever simple where an Excel spreadsheet is involved, but conceptually, you get what I'm saying...)
Players have a different definition, typically putting a variety of poorly defined characteristics at the top of their lists - "fun" (whatever the hell that means), "playtime" (duration of play and value for money), peer approval (we're all playing the same games and can experience and talk about similar experiences).
A lot of developers fall in line with publishers and put revenue and profit at the top of their definitional lists as well. Others put creativity or personal expression up there.
My point is, when you go into any project, you need to know what you're shooting for. Someone gets to make that call. If it's you, your financial backers need to know what you're shooting for. If it's the biz guys, you need to know what return on investment they're expecting. Regardless, the development team needs to know. This is a minefield that must be traversed clearly and carefully. The arc of your career (even whether you have a career) hinges on understanding this and negotiating with stakeholders until you're on the same page.
I've been lucky - or foolish enough - that I've been able to define success largely for myself. And my definition rarely begins with questions of commercial appeal, sales, revenue or profit. (Okay, to be honest, I've never started there...) Before a development deal closes I tell every money person and publisher I work with (or even think about working with) one thing:
My obligation is to sell one more copy of a game to get someone to fund my next one. (Does saying this publicly ensure that I'll never work again? Why do I do this sort of thing?)
I'm not saying that commercial considerations are beneath me. You have to make some money or it really is "career-over." But if you look at the mission statement I've used at Ion Storm and Junction Point you'll see that some things are higher priorities than the unholy trinity of Sales, Revenue and Profit.
"The fact is, few of us will ever approach that Platonic ideal of art, commerce and critical acclaim achieved in equal measures"
For example, the projects I've chosen to work on put "player expression" at the top of the list - if players aren't telling their own stories, creating unique experiences through their playstyle choices, the game just doesn't seem worth making. It doesn't matter how much money a different kind of game might make. My success definition prioritizes something above that.
Similarly critical to me is "innovation," defined as including something in the game no one has ever seen before. To my mind, it's better to fail at something new (and worth doing) than it is to succeed by executing well against a well-understood problem.
Starting a game without trying for the highest level of quality seems kind of soul crushing, though achieving what might be called "Game of the Year quality" is rare and wondrous. And great review scores are always nice. But these come somewhere after player expression and innovation, and ahead of money. For me. Your mileage may vary.
At the end of the day, we all want our work to "succeed" in every way, at all levels, with all people. We all want to create art that sells like hotcakes, makes a ton of money and wins the admiration of consumers and critics alike. Anyone who says "I want to make games that reach the smallest possible audience" is probably lying, and anyone who says "Screw making art - I just want to make a ton of money" probably has no chance of making either.
The fact is, few of us will ever approach that Platonic ideal of art, commerce and critical acclaim achieved in equal measures. And that means each of us must decide for ourselves how to define success. For me, that means achieving a kind of self-satisfaction - knowing my team accomplished the goals we set as effectively as we could, given circumstances. For me, that means living up to the priorities captured in a personal mission statement. For me, that means doing well enough, commercially, that someone will step up and fund my next game.
But over the years, review scores - even pre-release score predictions and especially post-release aggregate review scores - are, more and more, accepted as the most significant measure of success. Publishers use them to determine marketing and PR budgets, print runs and distribution plans and even developer royalties.
Pre-release, consultants are often hired these days by publishers who feel the need of impartial evaluations of games in development. Such consultants offer assessments of gameplay, story and graphics, as well as a prediction of post-release review scores. These services are typically offered by ex-reviewers whose individual opinions carried far less weight when they were writing than they do after they stop - something I'll go to my grave not understanding. To be blunt, in my experience, these consultants rarely reveal anything about gameplay, story or graphics that developers don't already know.
But publishers, who should probably trust themselves and their experience a bit more, do give great weight to these evaluations, assuming they actually reflect what will be post-release review scores. But, more important, they use these assessments to predict a finished game's likely level of commercial success.
Why does this matter? Well, as I said, such predictions drive marketing budgets and PR plans as well as expenditures that determine shelf space in stores and so on. In other words, review score predictions become self-fulfilling prophecies, with games performing as expected as a result of the use of predictions to drive plans.
After release, the assessment of a game's success passes out of consultants' hands - even out of the hands of internal testing and evaluation resources - and into the hands of two groups with different but related agendas:
First, assessment of a game's "quality" falls into the hands of gamers who, pleased or displeased enough, may express their opinions in online forums and in conversation with friends.
In addition, of course, a game's quality is assessed by game reviewers, whose opinions appear in consumer and specialist print magazines, mainstream newspapers, and on websites and television programs.
Finally, reviews are aggregated into a single "score," determined by sites like Metacritic.
Obviously, gamers have every right to evaluate games themselves, and to share their opinions with other gamers. It would be insane to argue against that, or even to think about it too much.
Reviews and reviewers? That's a very different story. We - gamers, developers, publishers and reviewers themselves - need to think more about the purpose of reviews. Only by doing this will we get better reviews and understand how best to interpret them, as business people, creators and consumers.
In gaming, there's a widely held, if largely unspoken belief that the function of a review is to tell players whether a game is "good" or "bad." There's usually enough description of genre, story and/or gameplay to inform players what kind of game they're dealing with. Reviewers typically offer reasons for liking or not liking a game.
But, fundamentally, game reviewers exist, it seems, to say whether a game is good or bad, with little concern for the success criteria I went on about earlier. The end result is a score of some kind - 96 out of 100... 8 out of 10... a B+... 3.5 stars... A game is good or bad.
Frankly, that's not good enough and it misses the point of reviews, to my way of thinking.
"When we put our faith in Metacritic as an impartial, scientific measure of quality, we should probably ask ourselves whether the crowd - the crowd of journalists as well as players - is really wise or just mediocre, incapable of recognizing and rewarding the new and different"
When I read a review by pioneering film critics (like Roger Ebert, Pauline Kael, Judith Christ, etc.) or any of a number of more modern film reviewers I'm, of course, interested in their good/bad assessments. After all, these are smart people whose expertise gives them a status that makes their opinions worthwhile. However, the real value of their reviews is something very different, something more valuable - something I'd argue is largely missing from game reviews today.
That valuable thing is a consistent critical voice. Great critics don't focus on questions of good or bad. Their assessments of a film's success or failure are supported by an underlying philosophy they apply to all films. It doesn't matter if you agree or disagree with their assessment of an individual film - heck, they often disagreed with one another. What matters is that, reading them, you can weigh your own likes and dislikes to determine if you will like a film, based on their review.
Frankly, the best film reviewer for me wrote for the Austin American-Statesman many years ago. He was a bad writer and I disagreed with his reviews pretty much all the time. But that was the point - I could tell reading his reviews whether I was going to like a film or not. He had a consistent voice. He loved a kind of film I knew I didn't like and hated a kind of film I loved. He didn't tell me if a film was good or bad - he told me whether a movie was right for me, something a score or numerical rating wouldn't have done. That was a hugely valuable service.
To provide that service, a critic needs a consistent voice, a philosophy applied to all games - and these characteristics need to be clear to readers. And I see all of this as being lacking in today's game review world. An enthusiastic "This game sucks!" or "The AI is bad" or "This game gets a 4 out of 5" tells me next to nothing I need to know.
So what does any of this have to do with Metacritic, the thing that got me started thinking about this in the first place?
There's no questioning the appeal of aggregating lots of individual reviews to get at some normalized measure of that subjective thing we call "quality." In today's era of Big Data, many people feel that everything can be measured, quantified and scientifically studied. To an extent, they're right.
As one developer (who asked not to be named) put it, Metacritic "is a fine cross section of established voices and existing trends, but it often misses things that reach other audiences or try new things. Too many of the sources only reward things they feel safe rewarding... We continue to believe there is a correct mono cultural taste. Bad for discoverability and bad for diversity."
By this logic, Metacritic, at best, rewards games that are conventional and well understood by players and critics alike. New and challenging things are, by their very nature, disruptive and easily misunderstood. Aggregation of opinion, at best, offers hope and guidance to people whose goal is to maximize profitability but little to people whose priorities lie elsewhere (see earlier discussion of definitions of success) or who depend on the constancy and relative predictability of the status quo.
When we put our faith in Metacritic as an impartial, scientific measure of quality, we should probably ask ourselves whether the crowd - the crowd of journalists as well as players - is really wise or just mediocre, incapable of recognizing and rewarding the new and different. (Before you come back at me with arguments, look at the highest rated games and then we'll talk...)
Beyond questions of the utility of its data, there are two aspects of Metacritic's methods that undercut its credibility in my eyes and, should, I think call its accuracy and even the validity of its data into question.
First, the aggregation of data is skewed by the selection of review sites included in the aggregation. For example, with Disney Epic Mickey, I know of several perfect scores (higher even than I would have given the game!) that were simply not listed or included in the game's average. And we're talking about high profile, well-respected sites here. By way of contrast, our worst review scores, typically from Some Guy With a Website, were integrated in the aggregation instantly. I know this sounds like sour grapes and if you want to interpret this in that way, you're welcome to do so. But with future projects, team bonuses, and so on hanging in the balance, I don't think it's inappropriate to ask for a public discussion of the reasoning behind reviews that are included in Metacritic's calculations and those that are not.
Second, the data are skewed by the weighting and summary-conversion systems employed by the people behind Metacritic's ratings. Though I have no idea what criteria are applied or the decisions made, my understanding is that certain reviews are given greater weight in determining a game's overall score. I applaud the concept here, but without knowing exactly how the weighting is applied, the validity of the score is called into question.
Similarly, the widely discussed conversion of certain reviewers' letter grades into a numerical rating is arbitrary. And the conversion of 1-10 scales and 1-5 star ratings to Metacritic's 1-100 point scale introduces a host of problems. Is a B and 80? Who decides? Is a 3 star review a 60? Again, who says? And what are the ramifications of making that decision?
Now, I realize that all of these questions about data are, to an extent, evened out when one considers that the procedural limitations apply to all games equally (at least one hopes so...). But then we're back to the question of whether the aggregation of several review scores tells us anything of value. What exactly is it measuring and does this measurement reflect one, several or none of the success criteria we can apply to a game?
Again, using my own games as examples, Metacritic provides some interesting insights.
The highest rated games I've worked on are Deus Ex and the Thief games. That's great. I'm hugely proud of the teams that made those games great (at least in my eyes!). Who doesn't like great scores, the awards that flow from them and the credibility with publishers they provide?
By contrast, the Disney Epic Mickey games are the lowest rated games I've worked on. I'm still proud of the teams and of the games themselves.
And you know what? Despite significantly lower Metacritic scores, Disney Epic Mickey is the best-selling game I've ever worked on, by a substantial margin. And the sequel, Disney Epic Mickey: The Power of Two, is the second best selling game in a thirty-year career.
Reviewers and some gamers may have preferred Deus Ex and Thief (and I have the fan mail to prove it!), but I received substantially more fan mail - and more heartfelt messages of thanks... and more fan art... and more everything - on the DEM games than on both DX games I worked on combined.
Metacritic be damned - I'll take an emotional connection with players and the praise of Disney fans any day of the week. My definition of success - notably empowering players to craft unique experiences and offering players things they've never seen or done in a game before - were well served by games that unarguably Metacritic'ed poorly.
So where does all of this leave us? I guess I'd say this:
Make sure you define success on each project you undertake as surely and precisely as you define your gameplay, technical, artistic and, yes, fiscal constraints. That definition of success will guide all your development decisions at least as much as your elevator pitch high-concept will.
If you're lucky and/or stubborn, define success for yourself and pay no heed to the desires of outside parties or the tyranny of the aggregated audience.
If you're not lucky (I assume stubbornness will be no problem for game developers!) you'll have to include collaborators in your definitional efforts, but don't let go of your agenda, even if you have to keep it hidden. Personal satisfaction depends on personal goals, and game development is too grindingly difficult to sacrifice personal satisfaction.
Whether personally or collaboratively determined, make sure everyone involved in the project understands the success criteria so results can be measured. And then make sure the appropriate measurement tools are being used. If that happens to be Metacritic, or some other review aggregation service, know that's the case going in. If the right tool is a close reading of fan mail to track player impact or emotional engagement, know that. If your goal is to influence other developers, know that, and find appropriate measures of success (citations by other developers, the opinions of trusted critical voices, etc.).
At the end of the day, understand that "success" is a word with as many meanings as there are people to define it. Choose your meaning carefully and live with the joy - and consequences - of your choice. In this way, life is like a game. If only more games were like life. Accomplishing that goal, making games more like life and encouraging others to do the same, is my definition of success - what's yours?
Warren Spector left academia in 1983 and has been making games, as well as lecturing and writing about them, ever since. You can follow him on Twitter under the user name @Warren_Spector.