Tech Focus: IMG on PowerVR mobile graphics
IMG talk graphics tech featured in iPad 2 and Sony NGP
Weeks on from the iPad 2 launch, the full graphical power of the new tablet is finally coming into focus - and it's frankly monstrous, a massive statement of intent from Apple on its plans for the games market.
Apple's new A5 processor features a dual core PowerVR SGX 543 – the same graphics tech that's set to be featured in the forthcoming Sony NGP, the difference being that the new PlayStation portable will double the core count, bringing an unprecedented amount of graphical power to the mobile space.
UK-based Imagination Technologies is the engineering force behind the PowerVR graphics tech: in this interview, director of PR David Harold and business development manager Kristof Beets talk frankly about its current range of mobile processors and their capabilities, the importance of its support for DirectX and Open GL standards and discusses some of the custom features found in their GPUs. They also go into depth on the scalability of their hardware and look forward to the emergence of the ARM-based version of Windows.
Finally, if you thought the performance increase between the iPad and iPad 2 GPUs was impressive, the final question we put to Imagination should help put some perspective on that...
Q: What are the key rendering differences between the PowerVR architecture and the approach taken by NVIDIA and AMD?
IMG: PowerVR graphics technology is based on a concept called Tile Based Deferred Rendering (TBDR). In contrast to Immediate Mode Rendering (IMR) used by most graphics engines in the PC and games console worlds, TBDR has two components, the first Tile Based Rendering which focuses on keeping data processing on-chip by breaking the screen down into manageable tile sized chunks which can be kept on chip.
The second part focuses on minimising the processing required to render an image as early in the processing of a scene as possible, so that only the pixels that actually will be seen by the end user consume processing resources. This approach minimises memory bandwidth and power consumption while improving processing throughput but it is more complex.
Q: A lot of people have tried to bring a TBDR solution to market, but only IMG has achieved it. Why do you think the mainstream GPU manufacturers have stuck to their traditional approaches?
IMG: Partly because although the idea sounds simple it's actually very hard to do it in practice - especially in such a way that looks like any other renderer to developers. Partly because we have a lot of the fundamental patents.
Q: SGX543 is described as a Shader Model 3 part. This is a DirectX standard, but DX doesn't apply to the mobile devices. In contrast the PICA200 in the Nintendo 3DS forgoes PC style design for a design targeting intended use in a handheld. What are the costs and advantages going with a standard born in the PC space?
It's essential to understand that standardisation is critical for mass market success since standards enable content and without content hardware is of no use.
IMG: We target a range of markets from phone through navigation to computing and TV. We aren't a "PC style design" but we do on some of our cores offer DX capability for the computing market - and now for future Windows based mobile devices.
The total Series5 and Series5XT portfolio enables the industry's broadest range of performance/area options, from the smallest single pipe SGX520 core up to the 64-pipe SGX543 MP16. All popular APIs and OS are supported by all SGX cores, including OpenGL ES 2.0/1.1, OpenVG 1.1, OpenGL 2.0/3.0 and DirectX 9/10.1 on Symbian, Linux, Android, WinCE/Windows Mobile and Windows 7/Vista/XP.
It's essential to understand that standardisation is critical for mass market success since standards enable content and without content hardware is of no use. The highly proprietary approach taken by PICA200 can only work in a closed console environment and even there will limit content availability.
Imagination is focused on supporting key industry standards and for mobile parts this is focussed on Khronos APIs such as OpenGL ES but also increasingly the mobile market is crossing over with the PC market with tablet and netbook designs using the same processors and obviously this introduces the requirement for Microsoft DX API support. Also with game engines originating from the PC market coming down into the mobile space, such as the Epic Game Unreal engine, support and compatibility with PC functionality becomes increasingly important as standards and requirements evolve.
Q: Given scalability and DirectX based features, will IMG be more actively targeting PC use in future such as netbooks or even desktops? We've not seen much from you since Intel licensed SGX535 for GMA500/GMA600. What about consoles and CE devices?
IMG: We already do pretty well in netbooks with devices from Asus, Acer, Sony etc. We also now have a higher end technology for the professional market from Caustic, a new part of Imagination. The scalability of forthcoming PowerVR cores should make them very suitable for the rest of the computing market too. However there's a gap between what we have as licensable IP and what our customers decide to do with that IP.
As Windows moves more into mobile and embedded devices due to our experience we are uniquely positioned to support DX (irrespective of the CPU architecture).
Q: The PC graphics architecture space has been defined over the past decade by DirectX, and proprietary features of IHV designs have gone largely unexploited. Do you feel this is a good thing for standardisation, or a bad thing limiting innovation?
IMG: We believe in open standards, such as DX and OpenGL ES and are very active helping define those standards, working with Microsoft and as a promoter member of Khronos. We do expose some proprietary features via extensions though. When it comes to proprietary features it all depends on what they add for users whether they get used.
For example our 2bpp texture compression is very widely used, because it delivers high quality and real efficiency benefits but also for distribution size and memory footprint. It's a thin line to walk between true benefits that developers are eager to use and over-fragmentation of the market which damages developer uptake. For this reason we regularly survey our ecosystem partners to determine their true requirements and interests.
Q: Going forward, is SGX going to become more DirectX based, seeking optimal DX performance, or will you look for innovations you can use that would otherwise be missed in the PC space?
IMG: We essentially have two strands to our IP - one is focused on OpenGL ES, the other adds some additional silicon area for DX. We plan to continue with that differentiation. That said DX is obviously further ahead in terms of feature set, as is desktop OpenGL, and both are used as references to design the optimised embedded Khronos APIs - so some similarity is to be expected.
Over time DX might become more significant if Windows does manage to get a foothold in mobile devices. We've been shipping WHQL compliant DX9 capable mobile parts for several years and have seen an increased interest in licensing of DX10 and DX11 capable parts over the last few months.
Q: What are your thoughts on Windows looking to support ARM architecture? What are the opportunities here for IMG?
Historically CPU vendors have tried to push close links between CPUs and GPUs as beneficial but the reality is that for best performance CPU and GPU should be as autonomous as possible.
IMG: The combination of an ARM processor and our PowerVR graphics is a bit of a classic in the mobile and embedded space, so if Windows on ARM does take off we'd expect to do well out of it and we're certainly offering a roadmap that will support it very capably.
Q: IMG licenses its tech, but as far as we can gather, it doesn't actually fabricate the final hardware. In what ways does IMG work with its licensees in implementing PowerVR into the final designs to ensure best performance?
IMG: It varies. We often host engineers from our partners who come to work with us here and we send engineers to work with partners too. We often help with bring-up of new SoC and even with OEMs using those SoCs in end products. On occasion we design entire SoCs, but that's relatively rare. What generally happens is we work very closely with new partners while they get up to speed, but given the kind of partners we have, they are more than capable of picking things up quickly and then doing future designs more autonomously.
Q: With the OMAP line and A4/A5 amongst others, we're seeing integration of PowerVR tech into complete SoC solutions. Over and above the advantages in terms of battery life, does a closely integrated CPU/GPU design like this offer any performance advantages?
IMG: Almost all use of PowerVR is in an SoC for those very reasons: bandwidth and power efficiency. Historically CPU vendors have tried to push close links between CPUs and GPUs as beneficial but the reality is that for best performance CPU and GPU should be as autonomous as possible. PowerVR SGX is designed to offload the CPU as much as possible with the SGX GPU handling events locally to ensure optimal parallel processing with no direct CPU control impact.
Q: Discussing iOS Unreal Engine 3 at GDC 2010, Epic Games mentioned that one of the issues they faced was the lack of support of occlusion queries in OpenGL ES 2.0, limiting their ability to cull unseen polygons. Are you aware of any progress in this area?
IMG: It's difficult to comment on upcoming API (and hardware) specifications which have not yet been announced but occlusion queries offer interesting usage cases and are obviously being considered.
However it's also important to stress the difference between mobile and PC space, in the PC space it's easy to push a lot of work to the GPU (massive bandwidth, memory and power usage) and let it handle occlusion processing however in the mobile space efficiency is king so trying to remove processing as early as possible (e.g. no submission at all) is even more important so pushing all the workload to the GPU might be less efficient than advanced high-level culling on the CPU side. Still, over time such balance points will shift and even in the mobile space occlusion processing on the GPU will become more efficient over time.
Q: Now seems to be the time where key IMG partners are transitioning across from the SGX535 onto the SGX543 multi-core products - aside from the multi-core angle, what are the key enhancements you've made to the basic architecture itself?
IMG: Our Series5XT architecture (SGX543/544/554) is a significant mid-life update to the Series5 architecture (SGX520/530/531/535/540) which was driven based on market and customer feedback. Key in this feedback was increased interest in compute performance both for GP-GPU via OpenCL but also for higher-quality pixels via more complex shaders as a result we doubled the floating point performance per pipeline in the newer cores while maintaining efficiency via co-issue (dual instruction) capabilities.
Additionally there was an increased interest in compositing User Interfaces and as a result we added dedicated hardware for YUV formats to enable optimal integration with video and camera image streaming. Most of the other changes are much lower level and focused on improving the efficiency of the design including both improved performance and further reduced bandwidth usage - a specific area of focus has been anti-aliasing and polygon throughput.
Q: What is your approach to scalability in SGX543? Are we literally looking at 2x the performance as you move from single to dual cores and upwards?
IMG: Yes, graphics cores are inherently parallel processors which means that they work on data independently (one pixel does not impact the processing of another pixel) which means that performance can be scaled near linear compared to CPUs where adding more cores often gives a very low return (data does depend on the processing of other data elements).
It's important to note that SGX543 offers true and complete load-balance based scaling of performance across both geometry processing and pixel processing workloads - many other designs only scale pixel processing leading to unbalanced designs. Basically the hardware splits all processing tasks (geometry, pixels, GP-GPU) up into small batches which are assigned to GPU cores on demand - this results in high efficiency and avoids impact by hotspots since even if one core is very busy with a complex area of the screen the other cores will continue to process the rest of the screen. Obviously this is also designed to avoid increases in bandwidth usage per frame between multi-core and single-core processing.
Q: SGX543 scales up to 16 cores - what kinds of real-life applications did you have in mind for this top-end iteration of the technology?
IMG: Anything demanding performance: console, computing etc.
Q: Is the architecture flexible enough to allow for the GPU cores to carry out non-graphics based tasks? What sort of applications can you see here?
IMG: Absolutely. SGX already has OpenCL conformance and all SGX parts are OpenCL capable. There's all kinds of things that you can use that for from game-world physics to image processing and enhancement.
Q: What anti-aliasing modes are supported in PowerVR architecture and what are the associated performance costs? Can we expect high IQ (4x MSAA) on PowerVR SGX543 MP4 as being commonplace at a 960x540 resolution?
IMG: As mentioned before anti-aliasing (AA) was one of the key focus areas for Series5XT and the impact on performance is as low as possible without sacrificing image quality. We fully expect that AA will be enabled for the majority of content going forward due its low impact as can be seen on glbenchmark.com where the difference between AA on/off is just small performance fluctuations due to background tasks.
Q: Stereoscopic 3D is swiftly being embraced by many different types of media, with games taking the spearhead. Does PowerVR architecture offer any specific advantages that makes 3D easier to work with?
Anti-aliasing was one of the key focus areas for Series5XT and the impact on performance is as low as possible without sacrificing image quality.
IMG: The additional workload required for S3D places significant additional demands on the graphics processor - and PowerVR SGX is more than up to the task. PowerVR SGX graphics acceleration cores are ideally suited to S3D graphics, either using single or multi-processor cores for resolutions up to full 1080p HD, and are capable of supporting all commonly used S3D formats such as frame sequential, side-by-side, top-bottom and interlaced.
Using SGX it is possible to quickly upgrade existing 3D content to deliver full S3D, bringing new realism to 3D games and navigation, and exciting new possibilities for user interfaces in a wide range of applications. The PowerVR SGX tile-based deferred rendering architecture is ideally suited to deal with the increased demands of S3D - which include twice the geometry processing workload and commensurate increases in fill/texturing workload. The scalable nature of the SGX architecture and its ability to efficiently support multiple contexts ensure that the best possible S3D user experience can be achieved using SGX powered devices while maintaining SGX's unique low power, high performance credentials.
Q: Bearing in mind the importance of battery life, what kind of correlation is there between the clock speed of your GPUs and the fabrication process? Any real life examples you can talk about where die-shrinking the tech has allowed for faster clock speeds with the same power draw?
IMG: As Imagination delivers soft IP it can be targeted at any process technologies and this is very much an area where our customers have a lot of knowledge and unique benefits allowing them to differentiate their solutions even when based on the same GPU core.
Just as a reference, over time we have seen implementations of the same SGX core going initially from 110MHz to 200MHz and today designs are beyond 400MHz in silicon. Clock frequencies versus higher-end cores is a key choice for our partners and this is often impacted by their silicon process capabilities and benefits and as a result we have seen some customers doubling performance via clock frequency and others have doubled performance by going to a higher end core.
Q: Imagination Technologies have experimented with some novel texture formats before. What has happened to this line of technology, and where is it headed?
IMG: Imagination has always recognised the need for high quality but low memory footprint and bandwidth textures and images and in the old days we offered Vector Quantization approach for Dreamcast which offered on average 5:1 compression ratio.
With PowerVR MBX and SGX we offer PVRTC texture compression down to 4 bits per pixel (8:1) and 2 bits per pixel (16:1) compression for both RGB and RGBA formats. These PVRTC formats are very popular with developers since they offer a much better compression ratio than using PC formats such as DXTC which requires 8 bits per pixel for RGBA formats meaning POWERVR based products can have up to 4 times lower memory footprint and bandwidth usage (= power usage) compared to competing products. Without a doubt this is an area where Imagination will continue to make investments and announcements will be made in due course.
Q: Out of interest, Sony says that it has a PowerVR SGX543 MP4+ inside Sony NGP... what does the plus stand for?
IMG: That's to indicate the work Sony has done to implement the graphics. What they licensed is a SGX543 MP4.
Q: You have other multi-core projects in the pipeline for the series five hardware. What advantages do they have over the SGX543?
IMG: In addition to SGX543 we have also announced SGX544 which offers the same performance characteristics but enables fully compliant DX9 Feature Level 9_3 capabilities so basically an extra bump in feature set to meet Microsoft requirements. Also available is the SGX554 which is our first 8 pipeline part (SGX543/544 have 4 processing pipelines) which offers improved compute density for customers focused on GP-GPU and shader processing since a single SGX554 would offer the same compute capability as an SGX543 MP2 but not the same geometry or pixel throughput.
This means that SGX554 offers more GFLOPS per mm2 since the design avoid overscaling the geometry and pixel capabilities of the design versus customer requirements - basically we do not believe in "one size fits all" solutions and we thus offer our customers various options.
Q: The future of the SGX tech is in your series six "Rogue" platform. What are your overall objectives for this architecture and what kinds of products are you targeting it for?
IMG: Imagination's next generation PowerVR Series6 architecture, codenamed "Rogue", has now being licensed by multiple lead partners. Rogue delivers unrivalled GFLOPS per mm2 and per mW for all APIs. We see it crossing a very wide range of markets.
ST-Ericsson has announced that its new Nova application processors will include Imagination's next-generation PowerVR Series6 Rogue architecture but we've not really announced much detail of Rogue yet - I'm afraid it's "wait and see".
Q: Finally, we see Apple talking about a 9x performance increase from iPad 1 to iPad 2 and benchmarking of the devices sees at least a 4x "real world" boost in GPU performance. With SGX535 as the baseline, what are your performance targets for your hardware going forward. I'm sure I read somewhere you were looking at a 100x increase within five years...
IMG: Yes we are.