Elliot Callighan is a composer and sound designer, and the owner of Unlock Audio. He also teaches as adjunct faculty in the film and game programs at DePaul University in Chicago.
One of the wonderful things about sound is that it can accomplish many different things. Additionally, the same sound used in one context can have a completely different meaning in another. This is true from an emotional, informative and clarity standpoint. In my game audio classes at DePaul University, I always point out moments when we hear the same thing in games but have a very different response or reaction to them.
Sound is powerful. If you work in games, you should think about its capabilities for your project, or work with someone who understands what effects it can have and all the ways it can be used. A single sound can be doing many different things at once. With that in mind, here are eight different ways sound can be used in games:
This is probably the most straightforward entry on this list. When an action happens -- such as a character moving, using an ability, or the player selecting something in the UI -- we need to hear something that seems "appropriate" concurrently. If we don't hear something when expected, it can be one of the most powerful ways to lose that sense of suspending disbelief or "buying" into the experience. These sounds need to be present, but also need to be choreographed to the visual gesture. Starting or stopping "out of sync" is just as much of a glaring error as not having the sound at all.
Pretty much every game is brimming with sound filling this role, but for an even more visceral example check out the game, Perception.
The premise of Perception is everything we see is based on sound in the space/world. Think of it as similar to echo-location. If something isn't generating sound on its own, the only way we see it is if sound travels out, bounces off surfaces in the environment, and returns to the listener. Everything we see is based on sound, so if we see anything it's because that action/object/event has an intrinsic sound with it. For some, seeing all the sounds that populate our game worlds can help make it clear how vital the sounds are.
A very powerful intent from a design perspective is what our player is focusing on. Are they marvelling at the art or environment of a new area in an RPG with a massive world? Will they be able to make the jump from one level to another in a platformer? Do they need to be ready to dodge an enemy attack?
Most times, the auditory and visual cues work in conjunction with one another. This makes it very persuasive in telling the player that something is important and deserves their attention. However, having separate visual and auditory cues can be very powerful and have incredible effects on the player. Look at this sequence from Amnesia: The Dark Descent:
Can you imagine how boring hopping between boxes would be without hearing the splashing footsteps coming after you? Are the boxes the real focus this whole time? No, not at all. That constant auditory reminder of impending doom is so strong -- so strong that the player doesn't need to see the footsteps of the monster in the water to be utterly terrified of it.
We are used to different spaces sounding differently. If you yell in a small room, it sounds very different than yelling in an empty sports arena. Not only does it take longer for sound to reach a listener's ear in a larger space, but when we're in large spaces most of what we hear is reflected sound as opposed to direct sound.
The sound of our voices goes out in every direction, with very little of it travelling directly to a listener's ear when in a large space like an arena. A listener may still hear this sound even if it doesn't travel directly to their ears, but not until after it's bounced off a number of surfaces. This is what's called reflected sound, and it's most of what we'll hear in a large space.
"If we don't acknowledge and emulate these sound characteristics, our game worlds will never feel real"
In a small space, the listener will be closer. This means more will go directly to their ear, and the reflected sound will take less time to reach their ear. This gives a very different character to everything we hear in a small space as opposed to a large space. Additionally, the materials present in these spaces play a huge part in what sounds we hear. We hear certain types of sounds more when hard, flat surfaces are present as opposed to curved, cloth couches.
If we don't acknowledge and emulate these sound characteristics, our game worlds will never feel real. Game audio folks spend a lot of time ensuring game worlds feel real. Here is a portion of the implementation used in Hitman 2 to ensure this happened:
This is referencing the emotion a player feels while experiencing your game. While the previous point pertained to making a space feel right in terms of physics, this point is in terms of emotion.
A large space can be awe-inspiring, majestic, threatening, magical, exciting, intense -- the list could go on forever. Every sound and note we hear has the ability to give an emotional impression, but only if we want it to, and know how to execute it well. Watch the opening to Bioshock on mute. It looks creepy, but this experience lacks any sort of visceral emotional response.
Now, play the opening with the sound on and close your eyes. Pay attention to how much emotion you feel with no visual component. Of course, the real impact is achieved when we have both the visual and working together, but pay attention to where the emotional part of the experience is coming from.
My favourite example of this is Doom, as well as any Tarantino movie. More than having a sound be present for a gesture, audio can add a layer of intensity that isn't naturally there. Any team creating an experience that is highly "stylised" will have given significant thought to how the audio contributes to the overall aesthetic. It is that important.
Case in point: if id Software hadn't been incredibly thoughtful on what the nature of Doom was, the end experience could have missed the whole point of its world. Don't believe me? Listen to this:
A far cry from the intended experience of Doom, right?
This effect can be true for entire soundscapes like in Doom, but it can also be true for single ability or action sounds. There are certain sounds that just aren't appropriate for a healing spell, right? How can we communicate something positive, negative, disorienting, dangerous, all in a single sound?
Here are a number of ability sounds I created for Card Chronicles: Sentinels, which all needed to communicate the nature of the ability being used by the quality of its corresponding sound. What makes the healing spell sound feel appropriate for a healing spell?
Promote immersion (in VR)
This is very different than the contextual/narrative and space defining audio we looked at already. While you could describe both these capabilities of audio as making something in your game feel "believable," immersion is the sense of the player actually being in that space.
Immersion is getting someone to lose the sense of their physical self and feel like they are actually in another space, or occupying a body other than their own. Your attention shifts from controlling something in a digital world using your physical body to feeling as if you are actually occupying the digital world. That is a huge jump.
"How can we communicate something positive, negative, disorienting, dangerous, all in a single sound?"
Advancements and accessibility of technology have made techniques such as spatial and ambisonic audio integral to the VR experience. More than hearing something to your left or right, we can accurately simulate that qualities of sound emulating from different points in 3D space in different sized spaces, with different materials, and when you're looking in one direction versus another.
But the ultimate audio sensation of immersion is through binaural audio experiences. Not only do we hear the qualities of sound being affected by different spaces, but we can experience how the human ear perceives audio in that space as well. While ambisonic and spatial audio are incredible experiences, they are ultimately taking "believeablity" to a higher level. Ambisonics audio, in particular, has many applications in VR since it is a format of an entire sphere of sound around a point in physical space, and can be converted to playback in headphones.
However, for the most immersive audio experience, nothing beats binaural audio. This audio format makes you feel like you are actually there. If you want to better understand what this difference in experience is, grab some headphones and listen to this:
Pretty amazing, isn't it?
Now, the drawback with binaural audio is that it's generated relative to where a microphone or listener is. Since we usually move in our game environments, it's not easy to replicate accurately, and it can cause motion sickness if not used correctly. That's how powerful this stuff can be.
Setting pace as gameplay function
The most common example of this is in the use of rhythm games. If you've ever seen a serious Dance Dance Revolution tournament, you know how fast and intense these games can be. A big reason the player is able to quickly and accurately time their feet to the visual cues is because of the music giving them a constant frame of reference.
A different example of this is any sort of timer that has an associated audio cue. Many games have a timer that is ever present, but when we get to our last ten seconds or so, the timer's audio either becomes audible or is louder in the mix. It gets the point across that you need to complete an action/puzzle/objective sooner rather than later.
There are a couple of different flavours to this one, and many more than I can speak about depending on the genre and mechanics of your game. But the two that I can touch on with certainty that they'll be relevant to you are: transitioning between story/cinematic and gameplay, as well as loading screens.
Particularly in AAA titles, we are potentially switching between linear story elements and gameplay sequences regularly. In the playthroughs I've had with recent military shooters, 15 minutes of the game can have two or three moments of linear story. Using audio in conjunction with a visual effect or shift can make this transition feel effortless and seamless -- almost like playing through a movie as opposed to pausing your gameplay experience.
Another flavour of smoothing transitions is during loading screens. Developers have come up with a ton of great ways to make loading screens less of a "drag" on the experience, such as Namco having the Star Blade mini game. But sometimes a traditional loading is inevitable, and audio and music can help make these moments much more interesting. Mute the video below if you want to see how much a loading screen with audio can be a complete bore.
Let's wrap up
We've looked at eight different ways audio can enhance your game, but there many others. In order to ensure your game has an engaging experience, the audio needs hit on all of these dimensions and be purposeful in its execution. Time and thought need to be given to what you're trying to accomplish and how audio can help achieve it. Without that, a game will be missing an entire dimension of effective and engaging experience.