An introduction to Cohesity (#vRetreat)

I recently attended vRetreat in London and had the opportunity to hear from one of the sponsors Cohesity regarding their solution and the latest updates. I have seen Cohesity present several times now, so I thought it was about time that I pass some information on to those of you that read Define Tomorrow.

Cohesity is in the hybrid storage market but specifically for secondary storage workloads such as backup, file, archiving and more for enterprise size companies.

Cohesity was founded in 2013 by Mohit Aron who created the web-scale file system at Google, principles of this were infinitely scalable and continuously available, values that have been brought to the Cohesity platform. This isn’t where Mohit Aron’s history ends, however, he was also a co-founder of Nutanix.

Secondary Storage.PNG

First, we need to understand why Cohesity is concentrating on secondary storage and what is the issues with secondary storage. Secondary storage is all the data in your infrastructure that maybe isn’t actively being used or does not have a specific performance need. Great examples of this include data for backups, archiving, test and development and more. Finding these types of data a home is often a struggle, with scalability, reliability and cost often being a problem as well as performance. While you don’t necessarily need the same levels of performance as your expensive primary storage, you do need it to be reliable and scale as the data grows and it will grow! I find that customers either leave this data on their primary array which can be costly or invest in a cheaper platform that doesn’t meet the full business requirements. This is where Cohesity steps in.

Cohesity is a scale-out solution made up of multiple x86 physical server nodes with a minimum initial configuration of three nodes. However, it is also available as a virtual appliance for ROBO (Remote Office Branch Office) situations and directly from the cloud. This is where you will start to notice the differences between Cohesity and the vast majority of its competition.

Cohesity’s own hardware is manufactured by Intel and is a blade-based solution sitting in a dumb chassis delivering four X86 server nodes and twelve disk slots per chassis. The system is easily scaled in a web-scale fashion a single node at a time. Each node has at least two 10Gb uplinks, dual 6 core CPUs, 64GB RAM and over 800GB of flash, scalable depending on the model up to 2TB. Cohesity also works with Cisco and HPE to deliver rackmount server solutions. With all of this resource available, Cohesity is much more than a big box of dumb storage. It has a number of built-in services making use of that resource such as inline global deduplication and compression, backup, file services and analytics.

Cohesity runs a file system called SpanFS, which based on the Google File System allows for web-scale or infinite scalability. Currently, this has been tested up to 256 nodes with not flatline in performance, that's a whopping 10PB of capacity! It is compatible with NFS, S3 and SMB protocols; all protocols can be used at the same time and on the same volume all with global in-line deduplication. Because all deduplication metadata is kept in flash, there is no penalty regarding rehydration of data when accessed from the system. The file system is intelligent and able to detect the type of I/O and able to optimise on the fly specifically for random or sequential IO. The SpanFS refers to the nature of the file system spanning everything, with no master node, data is spanned across the hyperconverged nodes, as well as the flash and HDD storage tiers and even remote cloud storage.


I highly recommend having a read of this whitepaper if you want to know more about the SpanFS file system


Once data is written to the platform there is global indexing and search allowing you to search across all stored data for anything you wish to find. There is no doubt with legislation such as GDPR coming into place having the ability to search for specific data across your infrastructure when you receive a request for personal data will become a must. I would love to see this in action and understand how accurate it would be able to assist with tasks like this across a variety of scenarios, e.g. word document, SQL database etc.

Cohestiy uses a patented snapshot technology called SnapTree which uses a distributed redirect on write snapshot method. The result is no lack of performance and unlimited snapshots that are instantly available. The key technology differentiator in SnapTree is the access path to any given data block is fixed length and previous snapshots don’t need to be traversed to rehydrate backups. The net result is fully hydrated backups which from a business perspective enables instant mass restores of VMs and significantly lower RTOs. – Many thanks to Jason Monger of Cohesity for explaining that so eloquently.


Cohesity’s inbuilt backup solution is called DataProtect and is managed by a modern HTML5 interface. DataProtect will allow you to backup VM and files within your infrastructure using a mix of agent and agentless based technologies depending on where the data stored. There is inbuilt support for VMware, Hyper-V, KVM, Microsoft SQL, Oracle, Physical Servers and much more. Data stored across the Cohesity platform, not just for backup can be replicated to not only another physical Cohesity platform but to a virtual platform at a branch office or in the cloud, or any other type of S3 or NFS storage.