VMware vSAN - A closer look [Part 2 - Architecture and Hardware]

If you would like to read any other chapter's of this blog series, click the links below:

This post will cover vSAN architecture at a high level along with network recommendations and requirements to support vSAN. Finally, I’ll cover some of the hardware recommendations to ensure a vSAN deployment is successful.

Architecture

Since the release of vSAN 6.0 in 2015 we have had the option to choose either a hybrid or all-flash vSAN configuration. The option you choose will very much depend on the workload requirements, budget and feature requirements. A hybrid vSAN configuration uses a combination of magnetic disks for capacity and flash devices for capacity. As you can probably guess an all-flash vSAN uses only flash devices.

Within a vSAN node host you’ll find two types of storage devices running on a supported storage controller. Typically the storage controller will be running in pass through mode meaning there is no RAID protection configured across the devices within the server. Yes that’s correct – vSAN does not use hardware RAID, instead the data is protected at the software layer. vSAN uses a concept of disk groups with a minimum of one disk group required per host. A disk group can contain up to seven capacity devices and one cache device. Further to this each host can have a maximum of 5 disk groups, meaning up to 5 cache devices and 35 capacity devices per host are supported today.

1.png

With a hybrid vSAN configuration each host within the cluster will contain at least one magnetic disk for capacity and one flash device for cache. The magnetic capacity devices in the cluster are pooled together to form the single vSAN datastore. An important point to note is cache devices do not contribute persistent capacity and are only used for caching. vSAN will use 70% of the flash capacity as a read cache and 30% as a write buffer.

An All-Flash vSAN configuration will contain at least one high capacity, lower endurance flash device for capacity and one lower capacity, higher endurance flash device for cache. The flash devices used in the cache tier are used only for buffering writes. A read cache is not required as the performance available from the flash devices in capacity tier is more than sufficient.

VMware categorise flash devices based on their performance (measured in writes per second) and endurance (measured in terabytes written (TBW) over 5 years). The table below give you an indication or performance and use cases for each category of flash device.

2.png
3.png

Networking

Just before we move on to networking configuration I’ll just point out that all versions of vSAN unlocks access to the vSphere Distributed Switch (vDS) regardless of the vSphere license applied to the cluster. Whilst it’s recommended to use the vDS vSAN will also support the use of the vSphere Standard Switch (vSS). If you’re not familiar with the vDS it provides several features not found in the vSS, one of which is Network IO Control (NIOC) which I’ll look at shortly.

Moving on to the network requirements, VMware have several support statements when configuring vSAN networking. Depending on which option you deploy the general recommendations for bandwidth are

  • Dedicated 1Gb for hybrid configurations (10Gb recommended)
  • Dedicated or shared 10Gb for all-flash configurations.

In addition, each host in the vSAN cluster must also have a VMkernel network adapter configured for vSAN traffic regardless of whether it contribute capacity or not. The physical switches should be enterprise grade and should have built in redundancy features so a switch failure doesn’t bring down the entire environment.

4.png

NSX and vSAN are fully compatible as neither are dependent on each other to deliver their services. The key point to understand is vSAN traffic needs to remain isolated and not managed by NSX. It is not supported to have any VMkernel based traffic over the VXLAN overlay network in general and vSAN is no exception.

I previously mentioned NIOC which may be beneficial in environments that use a form of converged networking. This is more common with recent hardware as we are now seeing 10Gb becoming more affordable and servers being deployed with fewer NICS. NIOC allows bandwidth to be controlled using share allocations which are only applied during contention. VMware recommend the following allocation be applied for a host with dual 10Gb NICs.

  • Management traffic – 20 shares
  • Virtual Machine traffic – 30 shares
  • vMotion traffic – 50 shares
  • vSAN traffic – 100 shares

It’s not recommended to set reservations or limits as this can potentially waste bandwidth and NIOC is preferred over hardware based solutions such as NPAR as it’s more flexible.

5.png

One final point to be aware of, vSAN 6.6 no longer requires multicast support making deployments even easier. For those of you running an earlier version multicast support is still required until all hosts in the cluster have been upgraded to 6.6. Once this is complete vSAN will begin to use unicast automatically

Hardware

vSAN has been designed to run on industry standard x86 hardware and is vendor agnostic giving you a great deal of choice when to comes to choosing your server. The key to a successful vSAN deployment is ensuring all components of the solution are supported on the VMware Compatibility Guide (VCG). This doesn’t just cover the hardware components, it also includes items such as drivers and firmware revisions. You should never place production or business critical workloads onto a hardware platform with components that are not on the VCG. If you do expect a headache or worse further down the road.

https://www.vmware.com/resources/compatibility/search.php?deviceCategory=vsan

Whilst it is possible to configure your own server this isn’t the approach I would recommend. Many leading hardware vendors have already done the hard work for you and created validated configurations called “vSAN Ready Nodes”.  These are based on workload profile and capacity requirements with options for both All Flash and Hybrid platforms. VMware provide a simple tool to assist you in choosing a hardware vendor.

http://vsanreadynode.vmware.com/RN/RN

Part three will cover how vSAN protects data using storage policies and how vSAN can utilise erasure coding to store data more efficiently. 

Take a look at the other articles within this blog series:

Part 1 - Introductions and Licensing

Part 3 – Data Availability

Part 4 – Fault Domains and Stretched Clusters

Part 5 – Failure Events

Part 6 - Compression and Deduplication

Part 7 – Monitoring and Reporting