Ghost in the Machine

Alex Kipman, the man behind the Kinect technology, reveals why you and Microsoft's new kit have more in common than you think

Feature by Dan Pearson Contributor

Published on Nov. 3, 2010

4 comments

Microsoft's Kinect has been touted as an entirely new evolutionary branch of videogaming, but it's also been dismissed by some as a casual gaming dead end. Whichever side you fall on, it's an impossible development to ignore.

What powers Kinect is some unique and very powerful technology, technology which is the brainchild of a team led by Alex Kipman, Microsoft's director of incubation. GamesIndustry.biz spoke to Kipman on the eve of Kinect's US launch, in an interview which will be published in two parts.

While much of Kinect's technology remains secretive, Kipman here explains the thinking and processes behind the motion controller, and how he feels it can change the face of gaming forever.

GamesIndustry.bizWe heard a lot, when Kinect's technical specifications were first made public, about an on-board chip which meant that all processing would be self-contained, that no extra load would be placed on the 360 itself, but this has since been removed, placing processing load on the console's own chip - what was the reasoning behind that?

Alex Kipman

The answer is simple - at the end of the day there's an understanding in research that we did. 'How close are we to hitting the theoretical ceiling of the Xbox 360 with the games that we have and are creating for this generation?

The answer is, as much as we like to talk about bits and percentages, you take a game like, I don't know, Call of Duty: Black Ops - there's a significant amount of processing, be it CPU or GPU, that still remains on the table. So after that, when we came to this revelation about games, and future games that would be coming to Xbox, we looked at it and we said - 'is it worth the trade-off to put on-board processing on the device when we think we can create magical, unique, deep, thorough experiences without it?'

That trade off is easy, it's about the affordability of the device. From the perspective of bringing to market this amazing deal, £129.99 with Kinect Adventures, plus sensor - buy one and have your entire family play, it's a very interesting customer value proposition. We can create games which are as rich and thorough and as deep as the games which we have on our platform today and which we will have tomorrow.

Then the conversation becomes simple: you start moving into a world which says, why keep something complicated when you can make it simple? We decided to have our cake and eat it too.

GamesIndustry.bizBut does it not mean that, at some point down the line, a developer is going to hit that 85 or 90 per cent CPU capacity and say 'if I want Kinect control, I'm going to have to cut something out of my game?'

Alex Kipman

Well, not that figure of 10 or 15 per cent, we're actually in single digits, but the philosophy is correct. It's a trade-off. As we create games, you can think about the platform as a set of paints and paintbrushes. You can think about our game creators as the painters which use this palette to paint.

What Kinect brings to the table is a new set of paints and paintbrushes, it broadens the palette and allows you to do different things. Not all features are created equal, you can totally imagine a game that's using practically the entire of the Xbox 360 and still uses identity recognition. You can have a game that uses a small vocabulary of voice recognition that will still have pretty much 100 per cent of the processing. And on and on.

You can shop, in a way, in the platform by menu, and you can choose the paint colours and paintbrushes you have. This is no different than saying, 'what physics engine, what AI engine, what graphics engine' you're going to be using. I can make the same argument that, hey, I'm going to be using Engine X off the shelf, I'm going to be giving up control over the hardware. There's some amount of resources that I give up for the price of the flexibility and the time to market of using a middleware engine.

Same thing applies here. At the end of the day you have to choose the correct set of paint colours to tell the stories you want. Now when I look at Kinect, it really allows us to create brand new experiences. Experiences that you haven't been able to see or have before. As I talk to the creative folk around the world, the people who are storytellers, both inside Microsoft and outside of Microsoft, their eyes light up. They're storytellers, they look at Kinect as a set of tools that allows them to tell stories that they've always wanted to tell and haven't been able to. Kinectimals is a great example of this.

Going forward, that remains true - as people learn to use the palette, it's the beginning of a journey, not the end. As we evolve the palette, things are going to become more and more interesting. Like looking at the path as a means to predict the future.

Just look at the evolution of Xbox Live, from where we launched on the original Xbox to where we are today. Just look at the evolution of any franchise from when they launched on 360 to where they are today. Look at Halo 2 compared to Halo: Reach. Fable to Fable III. Call of Duty to Modern Warfare or Black Ops. We didn't change the hardware, we didn't change the platform. These franchises look dramatically different today to when they started. The same is true about Kinect going forward, in terms of allowing people to tell brand new stories.

GamesIndustry.bizCan you break down the data pipeline in terms of Kinect's latency for us a little bit please? Using USB must mean there's a baseline lag that can't be optimised any further - alongside the rest of the processing does that mean that there's a certain level of unavoidable lag?

Alex Kipman

I would say no. I'll give you a real world example and then I'll try and break it down for you. Take driving. Driving was one of the first experiences that we showed, with Burnout Paradise, it's one of the cool experiences we'll have at launch with Kinect Joyride. Driving as a genre is a genre where, if you have any noticeable lag, you can't play the game - you'll oversteer or you'll understeer. That results in a sub-optimal experience.

In this world you can be, and we are, predictive, about where we're going to be. You can use strategy. I'll just mention one, but there are many, where you can precisely understand where you're going to be before you're there.

That's a generic answer, but let me break it apart for you in bits and pieces. The first thing you have to think about, when you're thinking about Kinect titles, is that we're moving from a digital world - a world of zeroes and ones, a world of cause and effect, of yes and no - into an analogue world, where you are the controller. In that world, where you are the controller and we're looking at the real world, understanding human speech, using motion and identity recognition, this is not a world of yes and no. It's a world of maybes.

It's not a world of true and false, this is a world of probables. From that perspective, you have to break the problem apart differently. So if you think about it, the actual human introduces, and forget about USB, the devices, anything like that, the actual human introduces lag. But differently. If you look at the physical space that you have to traverse, to move your thumb on a joypad, and you look at the physical space you have to traverse to drive a car, or punch someone, or paddle down the river - it takes you longer.

You as a human are going to take longer to traverse the real space because you're actually traversing more physical space. So the first kind of component that we think about, and have to worry about, is the actual human factor and what the human does in terms of adding lag into the system. The next one is about physics. And physics laws, well, they're laws, they're not subjective. Light only travels so fast, and there are plenty of other rules that people have come up with that we can't work around.

In the world of zeroes and ones, all you're doing is sending zeroes and ones down a pathway. In our world, we're actually perceiving the world. We are visualising the world and we're understanding the acoustic characteristics of the world. You know what, that takes longer as well. Now, pass all of this rich data to the console, where the Kinect brain lives, and there's more processing. In the world of zeroes and ones, zero means accelerate, one means brake.

In our world, as you correctly identified, there's a whole heck of science fiction turned science fact to really work in terms of our sophisticated set of algorithms that translate all of this noisy data of voice and visuals into human understanding, full body motion, identity recognition, voice recognition, and that takes time.

So when I look at the entire chain, look at what the human adds, what the physical barriers add in terms of laws of physics and what processing adds, you find out pretty quickly that simply adding these numbers up means you wouldn't be able to drive a car.

As a matter of fact you would find that there's a reason that these kinds of science fiction turned science fact technologies haven't been available before. And this is where I'll tell you, 'hey, there's been a breakthrough'. Quite a significant number of them. We've introduced them into the pipeline to essentially erase it - and essentially be comparative in terms of the immersion you get and the responsiveness you get.

The best way to experience this is to see it. And it can be from you using your hands or your voice to navigate the dash, and seeing this ability, precision and lag-free behaviour of the cursor, all the way to playing any one of our games from Dance Central to Joy Ride, which is my favourite example, because if it were laggy, you wouldn't be able to drive that car.