Digital Foundry's guide to games media assets
Maximising the impact of screens and trailers
Flashback to 1999. Computec Media UK is in the process of readying PlayStation World magazine and the editorial team I am leading is presenting dummy pages and concepts to focus groups assembled by market researchers MORI.
Part of the approach back then was to put serious thought into the way screenshots would be used in the magazine: custom framegrabbers were ordered to get the best possible image quality and the brief given to the editorial staff was simple - to capture the defining moments of every game we review in those screenshots. Our aim was to capture the essence of gaming on the page, to make the pastime seem as fun, enjoyable and exciting as possible. The results of this approach were acknowledged in our focus group testing: screenshots were actually more important to our readers than the text in showcasing the software.
You can spend days faffing around posing models, getting the camera just so and all that, but personally I've always been a fan of showing something real
Steve Lycett, Sumo Digital
In putting this feature together, it occurred to me that the lessons learned back then, combined with the utilisation of today's capture technology would make for an interesting guide for developers and publishers in how to create the most dynamic, exciting and relevant media assets. Back in 1999 we wanted to use screenshots (and subsequent to that, video via covermounted DVDs) to show gameplay at its best - and I daresay that this is also the objective of game-makers looking to make their products as exciting as possible.
In this piece I'll be covering off the Digital Foundry approach to the acquisition of screenshots and then turning the focus onto game trailers. No-one knows better than the developer what content to include in these productions; instead I'll be covering the technical issues in making your presentations look as good as possible when they hit the internet.
Right now there are many different ways in which screenshots are created: carefully posed in-engine framebuffer dumps and paused gameplay with custom camera angles are two favourite techniques. Entire features have been written about how the majority of screenshots put out there are being carefully prepared and in many situations are not representative of the final product, but perhaps the real issue here is whether those precious, genuine moments of gameplay magic are being captured and transmitted to the intended audience.
Back in the day, the traditional technique of using framebuffer dumps was a bit of a no-brainer - analogue outputs will never match the digital precision of extracting your assets straight from VRAM, and the resolution of the PS1 and PS2 eras was such that offline upscaling made sense: the alternative was horrifically jaggy blown-up images that looked especially poor in print. CRT monitors also had a way and blending and blurring gameplay graphics, making untouched framegrabber shots quite misrepresentative of the way the game would be viewed.
Things have changed in the era of the HD console. LCD panels have replaced the CRT, every HD console comes with the digital precision of an HDMI output and most games media is now consumed over the internet - again, mostly on flatscreen displays. Where most framegrabbers never used to be up to the job of producing pro-level marketing assets, today's HDMI capture cards definitely are when utilised correctly, and the entry-level offerings make upgrading an existing PC into a recording station a relatively cheap operation.
So, why make the change at all? On console at least, the bespoke framebuffer dumping tools for console platforms (Xbox Neighborhood in particular) are rather clumsy and unwieldy, relying on the user to press the "grab" button at exactly the right time: a torturous procedure that rarely yields dynamic results.
The advantage of shifting to HD capture is simple: every single frame to issue from the source hardware is recorded, allowing you to go back, review and pick out the exact frame that gives the most dynamic representation of gameplay. You'd be surprised at how much can happen in 1/60th of a second.
Some of the developers Digital Foundry has worked with already use this approach.
"Typically we want to get screenshots that convey the impression of what it's like to actually play the game. You can spend days faffing around posing models, getting the camera just so and all that, but personally I've always been a fan of showing something real," says Sumo Digital's Steve Lycett.
"We'll generally set up a multiplayer game, rig one of the machines to be captured and get down to playing the game, making sure to not hold back! Once we're done swearing at each other, we'll run through the footage, pick out those special moments everyone loves, say where I've taken out one of the test team with a particularly nicely placed shot and passed them just before the finish line to steal the win. Then not only do I have some nice real game action screens to use, I can also rub someone's face into the fact that not only have I beat them, but it's on the web for all to see."
As Lycett points out, there are other crucial advantages aside from the dynamic nature of the shots being generated: by releasing screenshots derived from actual gameplay, you are giving the audience a more authentic, honest look at the product you want them to buy and from my experience, the core gamer audience in particular will appreciate that.
If you're in the process of creating a game trailer, this approach has other significant benefits. In an industry where every man-hour counts, the ability to use the same captures for screenshots and game trailers cuts down on duplication of effort: one capture session provides the raw materials for both sets of assets: it's a lesson we learned in the games media when screenshots and coverdisc video were required. Elimination of the duplication in effort saves an enormous amount of time, which is better spent elsewhere.
In creating video game trailers, several approaches seem to have come to the fore in recent years. There is the basic CG approach, where a mood is set, where teasers are given about the content of the game. Sometimes - as in the case of the recent Dead Island video - this can give a game a certain buzz, but always the audience will be left wondering what the actual product is going to look like. If there is a massive difference between the content of the teaser and the final game, the audience may well feel cheated or short-changed.
Once media outlets have their hands on your video, you are effectively at the mercy of their encoders. The better the quality of the source video, the better the result of a second generation encode.
Gameplay trailers are the natural progression, but even then there are different levels of coverage applicable to different audiences. A cross-format project with a PC SKU instantly gives you a head-start (assuming the computer version is at the same level of development). Using the PC version allows you to capture excellent quality video at 720p on max settings, and by using a software-based tool such as FRAPS you do not even require dedicated capture hardware. In this way you can show the game looking at its best very easily - you can even create a 1080p asset, just as DICE did with its recent Battlefield 3 trailers.
However, while this approach will work for general videos aimed at a generic audience, the fact is that enthusiast players want to see the products working on their platform, and when they don't see them, they begin to worry about the quality of the game. DICE's Battlefield 3 trailers look nothing short of sensational but pretty much the only question Digital Foundry readers are asking us is how the game will look on console, shorn of its 1080p base resolution and minus the cutting edge DirectX 11 effects. I can imagine that unveiling the console versions is a future element of the ongoing marketing campaign.
Crytek employed an intriguing strategy along these lines: mainstream trailers were generated using the PC version of Crysis 2, but standalone extended gameplay segments from both Xbox 360 and PlayStation 3 SKUs were distributed too - marketing initiatives aimed specifically at the hardcore gamers. If Crytek hadn't released these videos, the chances are that games media outlets would have made their own - in this situation, the developer/publisher remains in control of the media assets being created for the game while answering the questions the audience has about how the title looks on their platform.
Digital Foundry has created a number of gameplay trailers, and our hardware has been used by developers on countless others. Our tips of getting the best-looking assets out there are fairly straightforward. As we see it, there are three major technical elements to bear in mind when it comes to the creation of video assets such as gameplay capture or trailers.
First up, it's worth bearing in mind that internet video plays out on a very different colour space to the output of the video games consoles. HDMI output uses 24-bit RGB, and most capture cards immediately downscale this to 16-bit YPrPb (YUY2), immediately resulting in a downsampling of the chroma data (something we set out to avoid with our own hardware). This occurs because YUY2 is the favoured output format of the HDMI ports of the camcorders most capture cards were designed to work with: game capture is a very niche market.
The core data then gets squeezed down again when encoding internet video, this time being converted into the 12-bit YV12 format. The chances are that your video will not be viewed at 720p, so chroma takes a hit again when re-encoded into standard def formats, emphasising the compromises.
All you can do here is to be aware of the issue and be selective in the clips you use - any kind of video using pure reds or blues can look pretty poor, but you won't know how poor until the final asset is encoded. HUD components using red or blue persistent elements will show this artifact and should be avoided.
The next element to factor in is video compression. Capture and editing typically takes place using an intermediate codec such as Apple's ProRes or the superior PC/Mac cross-platform alternative, CineForm HD. However, when exporting your final video, the usual route is to use the editing system's standard h.264 compressor - with Final Cut Pro this is Quicktime. Apple's implementation of h.264 is very limited compared to the full potential of the actual spec, so for improved picture quality at the same level of bandwidth, and for more encoding options, it's highly recommended that you install the free x264 Quicktime encoder on a Final Cut Pro system. This open source h.264 encoder is swiftly becoming the industry standard (it's used by YouTube and Gaikai to name but two), and it's not only faster than Quicktime but offers visibly superior results and better compression.
Sticking with Quicktime is fine so long as you give your edit enough bandwidth to retain as much quality as possible - but how much bandwidth you require varies according to the material you are encoding. A slow moving game with muted colour schemes such as The Chronicles of Riddick or Alan Wake will require far less bandwidth than something like the colourful, action-packed Bayonetta.
So what's the solution? A quality-based encode (CRF in x264, for example) allows you to specify a set quality level that every frame will adhere to, ensuring you'll get the result you want. In the CRF range, 23 is the lowest quality we'd recommend for a source asset, while anything below 17 will be a waste of bandwidth - visual refinements will go unnoticed by the human eye.
Once media outlets have their hands on your video, you are effectively at the mercy of their encoders (many of which are painfully poor), but the rule of thumb is that the better the quality of the source video, the better the result of a second generation encode.
The final factor to bear in mind is the effect that screen-tear has on the quality of your final video asset. Games consoles update at 60Hz, and so a torn frame is displayed for 16.66ms. As internet video assets are typically supplied at 30 frames per second, tearing can look twice as bad as it actually is - any torn frames will be displayed for twice as long as they would be during actual gameplay.
Most trailers and game captures are captured and edited at 60FPS, with the final asset then downscaled to the internet standard 30FPS: literally every other frame is thrown away. The selection of which frames are discarded is entirely arbitrary (usually every other frame is dropped), which can have very serious ramifications for the quality of your video.
V-sync clean-up is built into Digital Foundry's performance analysis tools: since we can ascertain the exact point where a frame is tearing, it's relatively straightforward to then re-stitch the video into a coherent whole. Here's a comparison showing our tools at work, and the improvement in visual quality you get:
While our tools automate the procedure, there is a relatively painless way to do it yourself and at the very least remove the worst of the tearing artifacts before your assets are shipped off to the media.
The majority of console games operate at a capped 30 frames per second, dropping v-sync when frame-rate dips below that target. In this case, torn frames are always immediately followed by clean frames. Even if your editing suite is discarding the good frames in creating the 720p30 asset, it is still a relatively simple procedure to ensure that only the compromised frames are being tossed away.
The first step is simply to create your trailer as per normal and export your conventional 720p30 asset. The second step is to export (preferably in a lossless format, but ProRes or CineForm will do) a 720p60 video and then re-import this into a new editing project.
Place your video onto the timeline, then immediately below it, place your video down on a second track, with a one frame off-set (and turning off the audio) - literally just move the video along one single frame. Now, watch your 720p30 export, see where the tearing is most obviously visible and then simply cut those areas out of the top track on the timeline. Re-export and all the torn areas you identified should now be fixed.
If the game being captured tops out at 60 frames per second and exhibits tearing (Gran Turismo 5 and Bayonetta are good examples), then this technique won't work as the raw video stream will have consecutive torn frames. Short of a heroic effort in Photoshop operating on a frame-by-frame basis, you're stuck and solutions such as Digital Foundry's re-stitching algorithm really are the only way forward.
Of course, if you have a game that is running at 60Hz, that is a marketable selling point that cannot be illustrated with the limitations of streaming internet video. There's nothing stopping you streaming your own content at 60Hz, but as our experiments using the Flash player demonstrate fairly clearly, even at standard def, you cannot get anything like a reliable and consistent stream.
However, there is nothing to stop you providing 60Hz downloads from your own website or even from the PlayStation Network. While Sony's guidelines discourage the use of 60Hz video, there are one or two examples available to download on PSN - Housemarque's Super Stardust HD tips videos are encoded to the standard, for example. You can export straight from a Final Cut or Premiere Pro timeline directly into h.264 using Quicktime, but this is not the optimal solution for the reasons pointed out earlier: Apple is more concerned about video compliancy across its own devices than it is for getting the most bandwidth efficient encodes.
Our approach is to stick with x264 and use the StaxRip front-end. This provides a ready to run PS3 profile that targets all the strengths of Sony's decoder. Simply change the preset to "Placebo" to make the encoder do the most exhaustive search for bandwidth-saving motion vectors and then set x264 on its way. While max settings 720p60 can also be supported on the WMV files required for the Xbox Live Marketplace (Microsoft's own free Expression Encoder 4 is thankfully excellent and it's the only show in town), unfortunately we've yet to see a single example posted on the network: a shame, because the 360 decodes this standard with no problems at all.
Video game trailer production remains something of an arcane art - hopefully these tips will help in improving the quality of your final assets, and understanding what happens when they're "out there"...