Tech Focus: L.A. Noire's MotionScan Animation
Depth Analysis discuss L.A. Noire's astonishing facial animation tech
A monster hit worldwide, Team Bondi's L.A. Noire has garnered plaudits for its innovative new motion capture techniques, which represents a colossal leap forward in the fidelity of in-game acting. The developers took over 400 actors into its Los Angeles studio, digitising every aspect of their performance via a ring of 32 cameras, turning that data into beautifully animated 3D models which are then integrated into the game.
The result is simply remarkable. The actors in L.A. Noire are so realistically rendered that facial recognition tools from the likes of Picasa can be used to identify the performer from screenshots alone, while the animation is so authentic that lip-readers have been able to easily follow the dialogue without subtitles. MotionScan is another great example of how innovative developers are taking their tech to the next level without the need for a new generation of gaming hardware.
Part of the success of the L.A. Noire motion capture tech is down to the creation of a sister studio, Depth Analysis, who are continually improving and refining MotionScan, while offering the technology to partners throughout the gaming and motion picture businesses - and beyond.
In this interview, Digital Foundry discusses MotionScan with Head of Depth Analysis R&D, Oliver Bao and head of communications, Jennie Kong.
Q: Depth Analysis seems to have evolved in concert with Team Bondi and the production of L.A. Noire. Was it always your intent to incorporate MotionScan into L.A. Noire right from the earliest beginnings of the game's production? I'm envisaging a set-up similar to Peter Jackson developing WETA alongside his film production company.
Jennie Kong: Depth Analysis was born alongside Team Bondi studio when L.A. Noire was conceived. It was apparent from the very beginning to the team that if the game was to potently hold its own as a true detective game, where interrogation of the game's suspects was key, then a new capture technology would be instrumental.
Team Bondi had worked with existing mocap systems back in 2004 and in understanding the limitations of each, we knew that it would have been a stretch to achieve the high degree of subtlety needed for the type of game we envisioned to make. We needed a technology to reproduce performance as realistically as possible. So yes, it was always our intent to create MotionScan from the beginning, though the shape and scope of it evolved with the game. In many ways, L.A. Noire influenced the technology, and MotionScan influenced some of the game too. I guess the rest is history.
It was always our intent to create MotionScan from the beginning, though the shape and scope of it evolved with the game.
Q: To what extent did the development work for L.A. Noire help to define and improve the feature-set of MotionScan?
Jennie Kong: There were a few minor things that we worked on. A couple of examples include: idle loops (animation loops of a character in some emotional state), blending between animations, skinning to allow head movements, texture transfer between frames, etc.
Q: Across the development of MotionScan, you've moved from six to 26 to 32 cameras. What were the technological demands that prompted this increase in raw data? What do 32 cameras give you that 26 don't?
Oliver Bao: I would say redundancy mainly; should four cameras not capture properly during the shoot we are still able to use the take. Having the extra samples also allow us to reconstruct a better 3D surface in general and allow us remove reconstruction noise a little more easily.
Q: There's been a lot of discussion about the "uncanny valley". You did a lot of testing across a number of years before the production capture of L.A. Noire began. In what ways did you need to refine and alter the tech to get a basic look that seemed "right"?
Jennie Kong: Once we had the foundation of the MotionScan rig up and running, the team worked on a number of tests with the goal of making it looks as realistic as possible. Our focus then turned to ensuring that we could render cost-effectively for a video game console. When we had refined the technology and capture rig to achieve the most realistic presentation; we laid a lot of importance in getting the right "look" by fine-tuning it to match what the current gen consoles could do or else it wouldn't blend in with the rest of the game world.
Q: Let's talk about the process and tech used in acquiring the raw data. What type and spec of camera do you use, and why choose this model specifically?
Oliver Bao: We chose Camera-Link machine vision cameras with 1600x1200 at 30fps back in 2005/2006. These were the best we could get back then within our price range with each model calibrated for colour and gain at factory already - that is, all units are identical in image quality. With facial performance, we didn't really need high frame rates (60fps or above), and two megapixels was the highest resolution we could get. Machine vision cameras are of small form factor, robust, work under controlled environments (also cheaper) and often used in remote sensing or monitoring.
Q: Do you work with lossless streams from the camera, or do you use something like the RED wavelet codec to keep the data rate manageable?
Oliver Bao: We work with lossless streams from the camera. We do perform lossless compression (around 2:1) on capture machines before written to HDD.
Q: Once the image is acquired, your servers piece together a 3D map of the performance assimilated from the 32 cameras. Can you give us some idea of how this is achieved and what your software does to overcome potential errors? How important are elements like the ambient lighting in ensuring accuracy?
Oliver Bao: We look to stereo reconstruction to generate a 3D patch per camera pair. These 16 patches are then aligned to form a single point cloud, and a mesh is generated with noise filtered out as much as we could. We would then fit a regular mesh on top in conjunction of temporal filtering to ensure smooth rendering.
The mesh sequence is textured, compressed and packaged for client to use at their chosen settings. The capture system assumes Lambertian surface such that it's viewed independent. We lit the capture volume as flat as possible to allow re-lighting in-game later in real time.
The larger consideration of any studio wanting to maximise MotionScan from the ground up is to consider how they may want to tell the story through the tech and the performance of the characters.
Q: Your cameras acquire data at 30 frames per second - very much a video games standard. Movies operate at 24fps, while film-makers like James Cameron and Peter Jackson are keen to move to 48fps or even 60fps. To what extent is Depth Analysis compatible with cutting edge movie requirements?
Oliver Bao: The main reasons not to operate above 30fps before were mainly down to cost and storage-based (capacity and write speed) requirements needed immediately for our video games projects. For the next rig, we are planning to move to higher frame rates.
Q: Can you foresee applications for MotionScan outside of the movie and games industries?
Jennie Kong: Indeed, any industry where training and roleplay (with talking heads) may be needed, would benefit from MotionScan technologies.
Q: Can you talk a little about the integration process of the MotionScan data into L.A. Noire? Were you using the Rockstar RAGE engine? How easy was it to incorporate the new tech?
Jennie Kong: Neither Team Bondi or Depth Analysis have used Rockstar's RAGE engine in L.A. Noire. Team Bondi developed their own engine in 2004 and Depth Analysis' depression code was provided to them for it to fit within their engine. This approach is how we've been leveraging MotionScan with other new clients.
Q: What are the major considerations game developers should factor into their existing tech in order to accommodate MotionScan? Would we be right in assuming that higher poly models for the character faces are essential in making the most of the system?
Oliver Bao: Depth Analysis works closely with developers to discuss the goals of their game and how MotionScan can be used to support their project. We would ensure necessary steps are taken to be compatible with their existing technology and vice versa - if they want to use a lot of close-ups of their characters, then yes, higher poly counts would be essential and MotionScan supports that level of detail.
Jennie Kong: I think the larger consideration of any studio wanting to maximise MotionScan from the ground up is to consider how they may want to tell the story through the tech and the performance of the characters. For this approach, they would need to integrate in shooting it more like a film, which involves a new set of considerations that other facial rigs do not pick up on. Because with MotionScan, what you shoot and see is what you get in the game - the game producer and director will need to think about things such as continuity with their actors (losing/gaining weight, getting tanned, etc), hiring great actors for the best performance, etc.
Q: With L.A. Noire you are using direct likenesses of the actors involved. To what extent is the MotionScan data flexible? Can you remap the animation onto, say, an alien face? Or would the best approach be to put the original actor through make-up to achieve a similar effect?
Oliver Bao: Working with L.A. Noire was a straight-forward process as we wanted everyone to look like themselves. The system is like filming in 3D; what you see is what you get. It was faster in our experience to shoot more variations than it is to touch up in post-production animation later.
Many customers keen to use MotionScan have already asked for retargeting and we're currently looking into it. As MotionScan strives to capture and present the most authentic capture, it would be tough for something like an alien face, because who can definitively say how an alien face is supposed to behave? But yes, we are looking at non-human capture and the challenges around presenting that.
Q: Pretty much the only real criticism of L.A. Noire's animation concerns the disconnect between body movement and the MotionScan facial data. MotionScan focuses firmly on the face - going forward, how can this situation be improved?
Jennie Kong: We would have loved to have spent more time on fine-tuning that for L.A. Noire but it wasn't feasible due to the scope of the scripting and talent involved. Moving forward, we will be developing full body capture and so anticipate that this will no longer be an issue once that technology is ready for commercial use.