Back in August 2017 nVidia announced the latest version of their graphics virtualisation technology, GRID 5.0 together with the new datacentre-grade GRID graphics cards that have been released, the Tesla P6 and Tesla P40.
This blog post aims to give the reader an overview of nVidia GRID vGPU and a brief history and overview of the cards.
For those readers who may be unfamiliar with graphics cards and the existing nVidia GRID cards let’s have a quick recap before we look at the latest ‘Pascal’ Series GRID vGPU cards.
Why do we need Graphics Cards?
For any gamers reading this you will likely be familiar with the concept of having specific dedicated hardware resource separate to the CPU and RAM of your computer to process the multitude of simultaneous operations and rendering that games and other graphical workloads such as 3D CAD modelling require and you may wish to skip this section!
However for those unfamiliar with graphics cards and why we need them, we increasingly need dedicated graphics resource in modern computer systems to support the graphical elements of the Windows OS itself (Windows 10 is currently 40% more graphically intensive than any other version of windows), but we also need dedicated graphics resource if we are running any form of graphical applications that require high levels of graphics processing or rendering of graphics and multiple tasks in parallel.
This is because the computations required for these workloads are very intensive and consequently a processor separate to the main CPU is often a good idea – GPU’s (Graphics Processing Unit) also have a different internal architecture to CPU’s that are more suited to processing graphical workloads. Graphics cards also have their own RAM often referred to graphics or video RAM (vRAM), again this is separate to the RAM in the computer and again has a slightly different architecture and is used to offload the processing overhead from the main system RAM.
nVidia have an excellent article that’s worth a read here with regards to the differences between CPU’s and GPU’s if you want more info: https://blogs.nvidia.com/blog/2009/12/16/whats-the-difference-between-a-cpu-and-a-gpu/
Why do we need Graphics Cards in the Datacentre?
One of my main areas of responsibility at ComputerWorld is to architect End-user computing solutions and desktop virtualisation in particular is a passion - I’m not going to extoll the virtues of virtualising your users’ desktops here beyond saying that it’s something you should consider if you haven’t already! 😉
There is a trend towards using GPU hardware to address certain High Performance Compute requirements in the datacentre and that may be covered in a later blog post but for now I’ll focus on the VDI aspect.
Historically, a problem when looking at virtualising users’ desktops has been virtualising 3D engineering and/or design workloads, CAD applications for example, and this is where the nVidia GRID technology comes into the story.
nVidia GRID graphics cards are essentially the same as the cards you would insert into a CAD workstation or gaming PC they are built using the same technology and are tested with the same procedures – there are other differences but in essence they just have a lot more resources available for use… The power of an nVidia GPU is typically measured in the number of CUDA cores that it has.
More information on CUDA cores can be found here: http://www.nvidia.co.uk/object/cuda-parallel-computing-uk.html
nVidia GRID vGPU
I mentioned graphics or video RAM (vRAM) earlier but first lets run through the physical GPU and consider the vRAM aspect in a moment.
Virtualising Physical GPU’s the old way… Vmware vSGA & vDGA
Prior to release of the nVidia GRID vGPU technology the options for assigning physical GPU graphics resource to virtual desktops was pretty much all or nothing - the available physical GPU resource could be accessed by all users - i.e. shared (VMware vSGA); or accessed by one user - i.e. dedicated (VMware vDGA).
Meaning in the shared scenario that all users had access to graphics resources but if a single user starts to consume a lot of resources then the other users then suffered. Conversely in the dedicated scenario, only a single user has access to the available graphics resource resulting in that user having a vast amount of graphics resource available but with others having none – good for specific user scenarios but bad when multiple users require access to the GPU.
Virtualising Physical GPU’s the new way… nVidia vGPU
With the advent of vGPU, physical GPU graphics resource could be shared equally with users with each user having the same time-sliced access to the physical GPU cores (much as physical CPU is shared on a virtualisation host), meaning the vSGA problem of a single user consuming all the available graphics resource was no longer a problem nor was the cost inefficiency of having a single physical GPU dedicated to a single user with vDGA.
In summary vGPU was far more efficient with regards to allocating and utilising the resources available making graphics virtualisation much more feasible from a cost perspective.
Graphics/Video RAM – Frame Buffer & vGPU Profiles
Each physical GRID card has a finite amount of vRAM or to use the correct term ‘Frame Buffer’. The Frame Buffer that is allocated to each user is static and does not change, a portion of the total Frame Buffer present on the GRID card is allocated to each virtual desktop VM when the VM is powered on, and remains allocated until the VM is powered off - at which point it is released and can be re-allocated to a different VM.
The Frame Buffer of a GRID card is distributed evenly between the physical GPU’s meaning each GPU has a set amount of Frame Buffer that it can use. As I have mentioned above the Frame Buffer allocation that each VM receives is static and does not change and this is set/controlled via the use of vGPU ‘profiles’.
vGPU profiles allow the allocation of differing levels of vGPU Frame Buffer memory to VM’s and, in doing so allow an appropriate amount of Frame Buffer resource to be allocated to VM’s based upon their workload.
For example, a user that runs low to moderate graphical workloads may only require 1GB of vGPU Frame Buffer; whereas a CAD engineer may require 4GB of vGPU Frame Buffer – vGPU profiles allow us to do this on the same GRID card…. Provided there are two physical GPU’s on the card since each GPU can only host one type of vGPU profile – this is important to remember when capacity planning in mixed vGPU profile environments.
More information on vGPU can be found here: http://www.nvidia.com/object/grid-technology.html
Since the focus of this blog post is about nVidia GRID vGPU I’m not going to consider the cards that only support VMware vSGA and vDGA but they are out there and have their place still for niche requirements. GRID vGPU is an nVidia technology it is only supported on the nVidia GRID cards.
One other thing to note about vGPU that isn’t discussed further here but may form a future post is that vGPU resource can be assigned to virtual RDS servers to provide dedicated graphics resource to published applications in those environments.
nVidia ‘Kepler’ K-Series GRID Cards
The Kepler series of GRID cards, that is the K1 and K2 were nVidia’s first offering of dedicated graphics hardware that could be used practically in a shared manner via the vGPU technology with virtual desktops and used the nVidia GRID v1.0 drivers.
The table below lists the specifications of the old Kepler cards, the cards are intended for differing use-cases with the K1 being intended to give many users with low-end dedicated graphics requirements the resource they need and the K2 with more powerful GPU’s being aimed at users with High-end dedicated graphics resource requirements the resources they need.
The table below provides an overview of the technical specifications for the K-Series GRID cards:
A drawback for some customers with the Kepler cards being the user density was too low to be cost-effective when many virtualising desktops requiring vGPU resource.
At the time of their release, most GRID-compatible rack-mount servers could only house 2 x GRID cards meaning that user density was capped per host.
As an example, if every user needs graphics resource then user-density was maxed out at 64 users with 2 x K1 cards and 512MB vGPU profiles.
User density was then halved for the same vGPU profile with K2 cards, although the physical GPU resource available was far greater meaning more intensive graphical workloads could be run.
The result being that vGPU GRID solutions were more practical than ever from a resource allocation perspective but some customers found the host hardware required to be cost inhibitive.
nVidia ‘Maxwell’ M-Series GRID Cards
Just over 2 years ago nVidia released GRID 2.0 drivers and coincided the release with 3 new GRID cards, the M6, M10 and M60 cards, specs below for info:
M6 cards are used to deliver dedicated graphics resource for blade servers thus enabling vGPU within blade centres and on Hyper-Converged hardware appliances and has proven effective at providing dedicated graphics resource for subsets of users that required it – though it’s worth noting that the single GPU means that the M6 can only host VM’s with a single type of vGPU profile – meaning all vGPU-enabled VM’s must have the same vGPU profile.
M10 cards doubled the user density possible with K1 cards and increased the number of CUDA cores in each of the 4 physical GPU’s from 192 in the K1 to 640 in the M10 meaning that each physical GPU in the M10 had significantly more graphics processing power than was present in the K1 GPU’s.
The M60 cards again doubled the user density possible with the K2 cards increasing the Frame Buffer from 8GB per card to 16GB per card and the number of CUDA cores per physical GPU increased by 25% over the K2 card.
The increased user density made GRID vGPU virtual desktop solutions more economically feasible than ever before and the increased processing power of the M10 cards meant that some workloads that previously required a K2 GPU from a CUDA perspective could now be run on the M10 card at a far greater user density.
The latest generations of server hardware have also now increased PCIe capacity i.e. space for 3 x M-Series GRID cards in some hardware.
As a consequence, the possible user density per host has been increased again and taking the increased graphical requirements of the windows OS into account the vGPU technology and associated hardware, the GRID 2.0 cards are now where they needed to be to deliver the performance required at a palatable cost-point for a wide variety of virtual desktop workloads.
A useful guide to nVidia certified server hardware and their GRID card capacities can be found here: http://www.nvidia.com/object/grid-certified-servers.html
nVidia GRID Licensing Requirement
Also introduced with GRID 2.0 was a licensing requirement, GRID 1.0 did not require a licensing element. With the M-series cards, the GRID vGPU license required was dictated by the vGPU profiles in use and there were three tiers of licensing available:
- GRID Virtual Application (for providing vGPU resource to RDS-hosted applications)
- GRID Virtual PC (Moderate graphical workloads)
- GRID Virtual Workstation* (High-end graphical workloads)
The GRID license cost should always be considered when looking into the feasibility/affordability of a GRID vGPU solution and the license required is dictated by a number of different factors and ComputerWorld can help with identifying which license is most appropriate for your needs - a listing of vGPU profiles and their required licenses is included in a later section of this blog post.
nVidia ‘Pascal’ P-Series GRID Cards
Several driver updates were released in the interim between the M-Series GRID cards being released with the v2.0 drivers and the new P-Series GRID cards being released with the v5.0 drivers, I have included a link to the nVidia release notes below for information but they mainly contained bug-fixes and no major feature updates besides the nVidia Licensing Manager which provides visibility of which VM’s have GRID licenses allocated to them.
GRID Release-Notes: http://docs.nvidia.com/grid/
So, what do the new GRID cards have to offer? The table below compares the M-Series and P-Series technical specifications:
The new P6 card is for use in Blade servers and some HCI - check the compatibility matrix via the link I posted earlier.
The P6 offers double the Frame Buffer available compared to the M6, although since the supported vGPU profiles start at 1GB the maximum user density for both M6 and P6 remains the same, the P6 also increases the available GPU CUDA cores by 25% over the M6.
Taking into consideration the increased graphic resource requirements of Windows 10 in particular the P6 may be a better fit for Windows 10 VDI’s hosted on Blade/HCI hardware.
The new P40 has nearly double the GPU CUDA processing power in its single GPU (3,840) than each of the GPU’s of the M60 (2048 per GPU) – meaning that virtualised workloads demanding very high levels of GPU resource now have what they need.
As the P40 only has a single GPU it can only host a single type of vGPU profile (as is the case with the M6 and the P6) however this has an added benefit – since the Frame Buffer of the card is now not split between two physical GPU’s the P40 can now deliver vGPU profiles upto 24GB! Meaning that extreme graphical workloads requiring 8GB+ Frame Buffer can now be virtualised!
One final thing to note is that the P40 as with the P6 supports minimum vGPU profiles of 1GB – the 512MB vGPU profiles are no longer available unfortunately.
nVidia GRID Licensing
With the GRID 5.0 release also came an update to the names of the nVidia GRID licensing has been changed, the Virtual Workstation license has now been renamed to the ‘Quadro Virtual Datacentre Workstation’ license.
Hardware that supports nVidia GRID
One final thing to note is that although the servers listed on the nVidia certified hardware listing may support the use of GRID cards it’s often not as simple as installing the cards into your existing servers, there are certain requirements such as specific risers & uprated PSU’s and it’s far easier and cheaper to specify these components at the time of purchase even if GRID cards are not being installed right away, than it is to retro-actively fit them.