Ghost in the Machine
Alex Kipman, the man behind the Kinect technology, reveals why you and Microsoft's new kit have more in common than you think
Microsoft's Kinect has been touted as an entirely new evolutionary branch of videogaming, but it's also been dismissed by some as a casual gaming dead end. Whichever side you fall on, it's an impossible development to ignore.
What powers Kinect is some unique and very powerful technology, technology which is the brainchild of a team led by Alex Kipman, Microsoft's director of incubation. GamesIndustry.biz spoke to Kipman on the eve of Kinect's US launch, in an interview which will be published in two parts.
While much of Kinect's technology remains secretive, Kipman here explains the thinking and processes behind the motion controller, and how he feels it can change the face of gaming forever.
Q: We heard a lot, when Kinect's technical specifications were first made public, about an on-board chip which meant that all processing would be self-contained, that no extra load would be placed on the 360 itself, but this has since been removed, placing processing load on the console's own chip - what was the reasoning behind that?
Alex Kipman: The answer is simple - at the end of the day there's an understanding in research that we did. 'How close are we to hitting the theoretical ceiling of the Xbox 360 with the games that we have and are creating for this generation?
The answer is, as much as we like to talk about bits and percentages, you take a game like, I don't know, Call of Duty: Black Ops - there's a significant amount of processing, be it CPU or GPU, that still remains on the table. So after that, when we came to this revelation about games, and future games that would be coming to Xbox, we looked at it and we said - 'is it worth the trade-off to put on-board processing on the device when we think we can create magical, unique, deep, thorough experiences without it?'
That trade off is easy, it's about the affordability of the device. From the perspective of bringing to market this amazing deal, £129.99 with Kinect Adventures, plus sensor - buy one and have your entire family play, it's a very interesting customer value proposition. We can create games which are as rich and thorough and as deep as the games which we have on our platform today and which we will have tomorrow.
Then the conversation becomes simple: you start moving into a world which says, why keep something complicated when you can make it simple? We decided to have our cake and eat it too.
Q: But does it not mean that, at some point down the line, a developer is going to hit that 85 or 90 per cent CPU capacity and say 'if I want Kinect control, I'm going to have to cut something out of my game?'
Alex Kipman: Well, not that figure of 10 or 15 per cent, we're actually in single digits, but the philosophy is correct. It's a trade-off. As we create games, you can think about the platform as a set of paints and paintbrushes. You can think about our game creators as the painters which use this palette to paint.
What Kinect brings to the table is a new set of paints and paintbrushes, it broadens the palette and allows you to do different things. Not all features are created equal, you can totally imagine a game that's using practically the entire of the Xbox 360 and still uses identity recognition. You can have a game that uses a small vocabulary of voice recognition that will still have pretty much 100 per cent of the processing. And on and on.
You can shop, in a way, in the platform by menu, and you can choose the paint colours and paintbrushes you have. This is no different than saying, 'what physics engine, what AI engine, what graphics engine' you're going to be using. I can make the same argument that, hey, I'm going to be using Engine X off the shelf, I'm going to be giving up control over the hardware. There's some amount of resources that I give up for the price of the flexibility and the time to market of using a middleware engine.
Same thing applies here. At the end of the day you have to choose the correct set of paint colours to tell the stories you want. Now when I look at Kinect, it really allows us to create brand new experiences. Experiences that you haven't been able to see or have before. As I talk to the creative folk around the world, the people who are storytellers, both inside Microsoft and outside of Microsoft, their eyes light up. They're storytellers, they look at Kinect as a set of tools that allows them to tell stories that they've always wanted to tell and haven't been able to. Kinectimals is a great example of this.
Going forward, that remains true - as people learn to use the palette, it's the beginning of a journey, not the end. As we evolve the palette, things are going to become more and more interesting. Like looking at the path as a means to predict the future.
Just look at the evolution of Xbox Live, from where we launched on the original Xbox to where we are today. Just look at the evolution of any franchise from when they launched on 360 to where they are today. Look at Halo 2 compared to Halo: Reach. Fable to Fable III. Call of Duty to Modern Warfare or Black Ops. We didn't change the hardware, we didn't change the platform. These franchises look dramatically different today to when they started. The same is true about Kinect going forward, in terms of allowing people to tell brand new stories.
Q: Can you break down the data pipeline in terms of Kinect's latency for us a little bit please? Using USB must mean there's a baseline lag that can't be optimised any further - alongside the rest of the processing does that mean that there's a certain level of unavoidable lag?
Alex Kipman: I would say no. I'll give you a real world example and then I'll try and break it down for you. Take driving. Driving was one of the first experiences that we showed, with Burnout Paradise, it's one of the cool experiences we'll have at launch with Kinect Joyride. Driving as a genre is a genre where, if you have any noticeable lag, you can't play the game - you'll oversteer or you'll understeer. That results in a sub-optimal experience.
In this world you can be, and we are, predictive, about where we're going to be. You can use strategy. I'll just mention one, but there are many, where you can precisely understand where you're going to be before you're there.
That's a generic answer, but let me break it apart for you in bits and pieces. The first thing you have to think about, when you're thinking about Kinect titles, is that we're moving from a digital world - a world of zeroes and ones, a world of cause and effect, of yes and no - into an analogue world, where you are the controller. In that world, where you are the controller and we're looking at the real world, understanding human speech, using motion and identity recognition, this is not a world of yes and no. It's a world of maybes.
It's not a world of true and false, this is a world of probables. From that perspective, you have to break the problem apart differently. So if you think about it, the actual human introduces, and forget about USB, the devices, anything like that, the actual human introduces lag. But differently. If you look at the physical space that you have to traverse, to move your thumb on a joypad, and you look at the physical space you have to traverse to drive a car, or punch someone, or paddle down the river - it takes you longer.
You as a human are going to take longer to traverse the real space because you're actually traversing more physical space. So the first kind of component that we think about, and have to worry about, is the actual human factor and what the human does in terms of adding lag into the system. The next one is about physics. And physics laws, well, they're laws, they're not subjective. Light only travels so fast, and there are plenty of other rules that people have come up with that we can't work around.
In the world of zeroes and ones, all you're doing is sending zeroes and ones down a pathway. In our world, we're actually perceiving the world. We are visualising the world and we're understanding the acoustic characteristics of the world. You know what, that takes longer as well. Now, pass all of this rich data to the console, where the Kinect brain lives, and there's more processing. In the world of zeroes and ones, zero means accelerate, one means brake.
In our world, as you correctly identified, there's a whole heck of science fiction turned science fact to really work in terms of our sophisticated set of algorithms that translate all of this noisy data of voice and visuals into human understanding, full body motion, identity recognition, voice recognition, and that takes time.
So when I look at the entire chain, look at what the human adds, what the physical barriers add in terms of laws of physics and what processing adds, you find out pretty quickly that simply adding these numbers up means you wouldn't be able to drive a car.
As a matter of fact you would find that there's a reason that these kinds of science fiction turned science fact technologies haven't been available before. And this is where I'll tell you, 'hey, there's been a breakthrough'. Quite a significant number of them. We've introduced them into the pipeline to essentially erase it - and essentially be comparative in terms of the immersion you get and the responsiveness you get.
The best way to experience this is to see it. And it can be from you using your hands or your voice to navigate the dash, and seeing this ability, precision and lag-free behaviour of the cursor, all the way to playing any one of our games from Dance Central to Joy Ride, which is my favourite example, because if it were laggy, you wouldn't be able to drive that car.
Q: We did notice, though, in one or two games, a slight lag to some of the actions. In Kinect Adventures' river rafting section for example, whilst other things, like Dance Central, seemed very responsive. Is that something inherent in the software design rather than being a hardware issue?
Alex Kipman: That's an interesting one. You're right, on the platform side, and Dance Central is a great example, there is no lag. Let me talk about Kinect Adventures for a minute, because what I'm definitely not saying is that there's a half-second of lag on Kinect Adventures.
Kinect Adventures, as a title, is designed not only to enable you to have fun, and to have fun instantaneously, but also to teach you about the platform. It's one of the titles that's supposed to be simple, fun and approachable to everyone.
From that perspective it's meant to be an easy-in, easy-out type of experience where a lot of people can be having fun together, cheering each other on. This is the game that probably went through the most playtesting, and the most usability testing to make sure we tuned the experience so that, being the packed-in title, it's something that everyone can enjoy. Something that everyone can get learning from, and understanding of how to use the system.
I'm not sure what you've seen, which makes it hard for me to answer, but I will say that in terms of game mechanics, those mechanics were optimised to make sure that they were instantly fun and accessible to everyone. Does that answer your question?
Q: Well, specifically we experienced some lag with the jumping.
Alex Kipman: Okay, well now I think I'm narrowing down on your problem and I can give you a more precise answer. On the jumping side, you have to think about the amount of fun people are having in the living room. This is the part where I say, we playtested the crap out of this game. If you have kids moving up and down, the last thing you want to do is have a false positive.
The last thing you want to do is have that raft or the obstacle course, or the avatar somewhere jumping when you are not jumping. If you take little kids, running around the room, this is where the human lag comes into play. You look at it and you say, 'how do I ensure, positively, that really was a jump?' How to make sure it wasn't just a crouch?
Think about the range of human motions. If I just crouch, and lift myself really quickly, how's that different from a jump? The answer is that you can detect those things, but you have to be very careful. This is where the game is tuned to make sure that you are building trust in the robustness of the platform. The last thing we wanted was to be getting a lot of false positives on River Rush because people in our playtest lab were having tons of fun, playing it.
So I'm not saying there's a half a second of lag, in that game, but I will say that, for the thing that you talked about, it's more about being intentional about design and making sure we're only actuating mechanics and actions when we're one hundred per cent sure that they've occurred in the living room.
Let's bring it to a world we understand - let's talk philosophy for a little bit. This is no different to thinking about a racing genre. Think about the difference between a Forza and a PGR. One has an arcadey feel to it, the other one has a simulation feel to it. As a game designer I choose where I put the needle. If I'm a Forza designer then I'm going against the demographic of customers that really enjoys the simulation level of the experience. I'm going to be extremely precise, to the point that, if you're not good, you're not going to be able to drive the car.
Project Gotham Racing is made to be much more approachable. More arcadey in its driving, less simulation. It allows for a different type of experience. The same is true here, and I'll use the two examples we just talked about. Dance Central is made to be a simulation type game. You're simulating dancing - you want it to be precise, you want it to be real-time and guess what? If you're not good at it, you're just going to suck at dancing. But they also have the 'Break it Down' feature that really teaches you how to be a great dancer.
Take Kinect Adventures, the needle moves to a different side. It is made to be a fun, simple, approachable game that gets people acquainted with the platform. Those remain toolsets - paint colours and paintbrushes that game designers, the storytellers, get to choose. I think you'll see that the platform has range, that it has the range to go down a more realtime simulation, all the way out to a less simulation sort of game. That's what I think you're seeing here.
Q: You mentioned earlier that the load on the 360 CPU had been bought down to single figures in terms of percentages, previously you'd said in an interview that it would be ten to fifteen per cent. Is it true that a bit of GPU time is being used as well?
Alex Kipman: That is true.
Q: Can you tell us a little bit about what that's being used for?
Alex Kipman: Sure. One of the major key ingredients of the experience is machine learning. Machine learning in our world is defining a world of probabilities. Machine learning, particularly our kind, which is probabilistic, is not really about what you know, it's about what you don't know. It's about being able to look at the world and not see duality, zeroes and ones, but to see infinite shades of grey. To see what's probable. You should imagine that, in our machine learning piece of the brain, which is just one component of the brain, pixels go in and what you get out of it is a probability distribution of likelihood.
So a pixel may go in and what comes out of it may be - hey, this pixel? Eighty per cent chance that this pixel belongs to a foot. Sixty per cent chance it belongs to a head, twenty per cent chance that it belongs to the chest. Now this is where we chop the human body into the 48 joints which we expose to our game designers. What you see is infinite levels of probability for every pixel and if it belongs to a different body part.
That operation is, as you can imagine, a highly, highly parallelisable operation. It's the equivalent of saying, pixel in, work through this fancy maths equation and imagine you get a positive number, a positive answer you branch right, you get a negative answer you branch left. Imagine doing this over a forest of probabilities. This is stuff where you'll get a thousand times performance improvement if you put it on the GPU rather than the CPU.
GPUs are machines designed for these types of operations. The core of our machine learning algorithm, the thing that really understands meaning, and translates a world of noise to the world of probabilities of human parts, runs on the GPU.
Q: GI: So it sounds like there's a lot of predictive aspects to the Kinect technology, that it's making some guesses and it's a matter of making those guesses as accurate as possible.
Alex Kipman: In a way, yes. A little bit differently though. The Kinect brain works in the same way as your brain, or my brain. Our brains are machines designed to essentially be this massive blob of signal to noise. We push away the noise and very quickly focus in on the signal.
I like to give the example of how you and I came into being, and then you'll understand how Kinect does the same. Imagine we have a fictitious baby. She or he is zero years old, you show this baby a human and a lion, and you say - 'here baby, tell the difference between them'. Turns out that a brand new baby cannot. Time goes by and now this baby has enough reference data, historical data, to be able to predict, next time you show it, the difference between the two.
Now that baby a male and a female. It's not going to have any idea how to differentiate those things. Some time passes and it has enough historical data, enough training to be able to tell the difference. Now show the baby two females, a random one and one the baby knows, how does the baby tell them apart? The baby can't.
Now fast forward to you and me. The second you meet someone that you've never seen before, you instantaneously have a gigantic amount of information. You know roughly that person's age, you roughly know their size, you roughly know where they're from, their age, whether they're male or female. You roughly know the position that they're in.
Our world works in the same way. Your brain doesn't just know everything that it sees. As you walk through the world, it's using previous historical data to essentially predict, based on probabilities, what you're seeing now. Kinect works in the same way, that's the fundamental principle. What we've done is shown it a sample of statistically significant data which allows us to comprehend the world in a way similar to the way your brain operates.
Let me tell you something slightly different. If I think of the search base - the ranges of human motion, the shapes of bodies, the conditions of living rooms worldwide, I get a search base that will equate to somewhere around ten to the power of 23. I can't even say that number. If I'm doing cause/effect based programming, that's a lot of conditions.
I could have an army of people coding, their entire lives - I still don't have a product that can recognise it. Which is why you have to break it, to move away from traditional programming, you have to create breakthroughs that really allow you to work in a different way. It's not that you code, in the system, what it's going to see, instead you teach it to recognise the real world. If you can teach it, much like your brain, when you get in front of it, it doesn't know, it has to look at previous history to predict the future. In that world you're talking about a much more fuzzy world of signal to noise - you're talking in terms of probabilities instead of actualities.
Part two of the interview with Alex Kipman will be published tomorrow.