User researchers in the games industry are often asked to produce "fun scores" for games in development -- overall grades for how well a game is doing and an indicator of review scores to come. Some publishers keep extensive databases of the fun ratings for every game they've tested, so they can say things like "this game is more fun than 95% of games." It's thought of as a key metric that can be plotted in a hopefully upward curve towards release, showing the improvement and quality of the game as a whole.
As a professional games user researcher for more than 15 years, I've asked this question many times and my games have generally scored pretty well. In spite of that, I've come to consider overall fun ratings to be detrimental to the process of shipping a good game. Don't get me wrong, fun is absolutely the most important attribute of my games; it's why I make games and it's why people play my games. But I don't believe measuring the fun of a game as a whole is especially valuable in most circumstances.
"Fun is absolutely the most important attribute of my games, but measuring the fun of a game as a whole isn't valuable in most circumstances"
My objection is not that fun is difficult to measure or that we shouldn't measure it. Simply asking players how much fun they're having is a reasonable first approximation of their experience, the same way asking someone if they're too cold or too hot is a good approximation of temperature. And scores for individual users can be useful in understanding what's working or not working in a game. But where we as an industry get into trouble is averaging player scores together into a single number to represent the game as a whole.
Here's why I hate average fun scores:
It stops the conversation
Once you give a game an overall score, the conversation ends. If it's a high score, the team pats themselves on the back and tunes out all the other issues, because how bad could they be? If it's a low score, the team immediately starts looking for reasons to dismiss the entire study -- this wasn't the target audience, they didn't play long enough to get to the fun part, etc. The bad score can't be true, so the entire test must be bad.
When you give an overall score to someone's game, you're judging everything they did on the game simultaneously; you're attacking (or buffing) their self-image as a developer. With smaller, more granular scores, it's easier for a creator to separate themselves from the project.
Overall fun scores block the conversation that is the real purpose of testing a game in development. We don't just want to grade the game; we want to have an engaged, productive conversation about how it can be made better. Fun scores can inhibit that discussion.
It's not actionable
When players say a game isn't fun, there isn't a clear next step based on that statement alone. Sure, we want the game to be fun, that's what my games are for. But just knowing it isn't fun doesn't dictate a particular course of action. Does that mean it's too hard or too easy? Are players confused or bored? There's no "add more fun" button on my keyboard.
The correct response to knowing a game isn't fun is to dig down into other data, looking for clues as to what the problem is. Since you're going to be doing that anyway, the fun question becomes redundant.
It's particularly subject to bias
People always tell you to your face that the game is fun unless it's absolutely awful. Because fun is perceived to be the most important thing about a game, players who want to please their hosts will at a bare minimum say the game is fun.
"If your game is only fun when played with fun people, then it's not actually fun."
This is especially true of the kind of in-house playtesting I do, where the participants know they're talking to the people who made the game. They've just walked up to a building with the studio's name on it, entered a lobby that's usually covered in gigantic prints of art from the game and a case full of awards. It's unrealistic to expect a neutral position after that kind of build up.
A bit of bias certainly isn't disqualifying in a playtester. People who are inclined to like a game will still have trouble figuring out the controls or navigating the game world. But an overall fun rating is exactly the sort of vague, holistic question that will be most affected by bias.
Multiplayer fun doesn't count
Even Tic-Tac-Toe is fun if it's played with other people. The most boring activities in the world can be fun if there's another human involved. The fun generated by other people is a given, something that would be equally true of any other game that group of friends could be playing. If your game is only fun when played with fun people, then it's not actually fun.
This is a particular challenge for games where the PvP mode is built first. Because AI tends to be a later addition to games, most games that intend to include both PvE and PvP tend to build PvP first and can easily fool themselves into thinking the game is fun when they really just enjoy playing with their co-workers. You can somewhat offset this effect by making sure the developers can't talk to each other during an internal playtest, but there's still an inherent lift from playing with good people you know.
Now, I recognize that I'm an outlier on this topic. People in the games industry honestly want their games to be fun and expect their researchers to ask about it and provide an averaged metric.
Here are a few suggestions for ways to better handle overall fun scores:
Ask it then set it aside
There can be a lot of pressure to ask about fun. The players expect it, the team expects it, and if we don't ask it one of them will bring it up in a way that might sabotage the other goals of the study.
Therefore, sometimes the smartest thing we can do is ask upfront for an overall evaluation of the game and then ignore it. Even if we do nothing with that data, asking it clears those general impressions out of the way and the participants' subsequent feedback will be much cleaner. They may even feel more comfortable about bringing up problems now that they've given a good overall grade.
"Sometimes the smartest thing we can do is ask upfront for an overall evaluation of the game and then ignore it"
Just because we asked it doesn't mean we have to emphasize the data from the question. Ask it early, then set it aside and dive into the details where the real work of making a better game happens. Don't make it the centerpiece of the study.
Ask it later
Games simply aren't fun to play at many points in development, and there are many parts of a game such as menus that aren't intended to be fun on their own. There are a lot of studies where fun isn't at issue, and we can simply put off asking about it until later in the development process.
Saying "we're not testing for fun yet, that's something we'll do in a few months" has the advantage of being true (we do need to ask the question eventually) and less confrontational than straight up telling someone that asking about fun isn't helpful. The person making the request of you is correct, just not right now.
Use it to help the team accept the real issues
User research on a game can sometimes feel relentlessly negative, continually pointing out what's wrong with the game. To offset this, presentation of results should open with something positive and establish that we're all buddies on the same team before digging down into everything that needs to be fixed.
Sure, the overall fun score will be higher than it probably deserves to be, but that's fine; just because it's not an accurate metric doesn't mean it can't be deployed to good effect. Put that nice, high fun score upfront to get everyone in a good mood, then go into all the details of what needs more work.
It's much easier for a development team to operate from a mindset of "it's fun but we still need to fix these things" than from "everything is awful." An inflated fun score can still serve the game by putting everyone in the right mental space for absorbing the harsher details.
Enable the "but"
Participants often say "it's fun but..." as in "it's fun but a little confusing" or "it's fun but it's not for me." The second half of those statements can be extremely useful in making a better game even if the first part isn't.
We have to make it easy for people to qualify their overall statement. Ask them the big fun question first, then ask all the little detail questions. Having given a good overall score, participants may be much more willing to criticize specific aspects of the game.
Narrow the focus
"The best way to make a game fun overall is to ignore the overall fun of the game while it's in development"
Similarly to the "fun but..." concept, by asking about a narrow section of the game we make it easier for players to say critical things without feeling like they're criticizing the game makers. Players are willing to say "the game was fun, but this mission wasn't fun" or even "this mission was fun, but that one fight at the end wasn't fun." We just have to give them the chance to rate both the whole and the parts.
Score the player, not the game
The way we make fun scores valuable is not to analyze the overall average score, but to look at what differs between the players who said the game was very fun and those who said it was merely kinda fun. Did the ones who had fun use particular weapons? Did they have more prior experience in the genre? Did they travel around a different side of the map? Each of those differences is a clue into what parts of the game are working and what needs improvement.
Mushing all the ratings together into an overall score misses the point. Different players have different experiences, and those differences hold the key to making a better game.
While we've been talking about fun, this argument applies to every type of global rating for a game. It doesn't matter if the game is fun as a whole, well balanced on average, or easy to learn in general. People keep playing or quit based on their current experience right now. We have to earn every second of ongoing player time, moment by moment. A game can be good overall while still having giant potholes in the player journey. "It gets better later" is cold comfort to a player who isn't having fun now.
Games user research is at its best when it focuses on helping designers achieve their vision for the game; directly grading that vision is counter to the spirit of the profession. Instead, we succeed best when we take the ultimate value of that vision as a given and frame our work in terms of discovering all the tiny places where players aren't experiencing the designers' intent. There's no prize for correctly grading the game, but making the game better has tangible rewards for the development team and the player.
It can seem paradoxical, but the best way to make a game fun overall is to ignore the overall fun of the game while it's in development. By setting aside the larger goal and perfecting the individual moments and mechanics, we produce something more than the fun of its parts.
John Hopson has been doing research in the games industry for over sixteen years and is currently the Head of Analytics at ArenaNet, makers of Guild Wars 2.
Over his career he has been the lead researcher for blockbuster games such as Halo, Age of Empires, Destiny, World of Warcraft, Overwatch, and Hearthstone. He's also the author of a number of articles on the intersection of psychology and games, including the infamous 'Behavioral Game Design'.
John holds a Ph.D. in Behavioral and Brain Sciences from Duke University and is the former chair of the IGDA's Games User Research SIG.