A solid sense of Scale
HCI (Hyper Converged Infrastructure) is maturing rapidly and now is a good time to examine the various HCI vendors a little closer, open up the hood and have a look inside. Ask yourself this: is it solid?
Data Resilience has closely followed one of the contenders in the HCI -space over the the past six years. Anyone having a closer look at Scale might feel that it all sound familiar, and it is going to sound like things you already know of from other products. But there is something a little bit unique, and little bit special in each one of its parts, and when you put them together in concert it makes the whole thing spectacular. So when you you have that feeling, that this is just like what you already know, ask why it is different, because it is.
What Scale did was to get the basic principles right at the outset. A lean architecture would make it fast, it has to be autonomous and self healing, and automated testing every night of each new line of code makes it robust and solid as it emerges over time. Ask your HCI vendor how many automated tests they are running on new code before entering GA. The combination of the autonomous and self-healing architecture and the automated testing in turn will result in a lower number of support staff required at the HCI vendor as the number of supported customers increase. If the HCI platform is robust the number of support staff needed to support it will drop and what you as a customer get is a solid platform. Ask your HCI vendor: what is their support staff versus number of customers ratio both today and historically.
Data Resilience opted for HC3 as the base for the portable data centre, the DC1, primarily for this reason. The methodology used by the developers at Scale Computing resulted in robust and resource conscious code having a small footprint, in turn making it possible to run on short depth server hardware. Hardware that could then be made portable. This of course also makes it a truly green software architecture regardless of application of the platform.
There is one pivotal element in the world of Hyper Converged Infrastructures (HCI) and it has to do with the data integrity and protection mechanism. The question to ask is where the data protection logic is located: is it sitting above or below the hypervisor?
I urge the reader to investigate and ask real questions about this when looking into HCI, because it matters. The position of the data integrity and protection mechanism matters for how easily the system can be supported, upgraded and maintained without disruption so it matters for how robust the system will be over time and it greatly affects the day-to-day effectiveness of the system in terms of latency in combination with energy consumption.
I would also recommend anyone looking at HCI to ask the question about how resilient the system is in a dual fault situation where you e.g. have one broken disk in one node and before the underpinning RAID system is healed you lose your single backplane NIC in the same or a different node? What will the impact be of a dual fault situation?
So how is Scale HC3 different? First of all everything physical is redundant across the entire system: dual backplane and dual public network ports on each node. Each VM in a Scale system enjoys the luxury of talking directly to the storage hardware, each write-IO pass only three (3) "hops" before taking a rest at the persistent media (RSD). This is a very good point of comparison when looking at other HCI vendors.
The picture above shows the elements that sum up the HC3 system. A combination of Virtual Storage Devices (VSD) and Real Storage Devices (RSD) are responsible for the data integrity and protection mechanism i.e. characteristics and behaviour of the block device that is presented to the VM:
- The Placement Spec & Protection Domain stipulates how the write to the blocks on the Real Storage Devices (RSD) are organised. Currently a simple R10 scheme with two copies: each write ends up on two different disks in two different nodes: Node 1 Disk 1 and Node 2 Disk 1: N1D1 and N2D1 in the example above. The Log Cabin keeps track of every movement of data and transactions happening within and between the physical nodes.
- The Heat Map module keeps track of and analyses the relative intensity across the block device and makes sure active parts of a disk is located on Flash storage (if available). Flash in a Scale HC3 system is not merely a cache but a proper storage tier.
- The State Machine with the help of the Log Cabin consensus/AI piece continuously monitors and acts upon deviations/anomalies from the normal state of all the different physical elements that are supporting the infrastructure.
- The libscribe library sits underneath the KVM Kernel Virtual Machine so it is completely transparent to any OS running on the Virtual Machine.
What is the most apparent take-away from the above?
- It has a simple, short stroke, bare metal feel to it. Any block level DR (Disaster Recovery) functionality will be very efficient going forward.
- It is robust and flexible in that the placement spec is defined per VM-disk, so one VM may very well have different block devices having different schemes for placement, such as erasure coding etc.
- The entire HCI infrastructure consumes only 4GB of your RAM resources on each node. Ask your HCI vendor for the corresponding number having all features switched on.
- In many cases the small server footprint may even positively affect the cost of licenses at the higher levels such as databases being licensed based of the number of sockets in your servers. If you can drive your HCI infrastructure including your application payload on servers having fewer sockets the total cost of running your IT platform will drop. You win.
The all-inclusive Scale HC3 sports a solid HCI architecture that makes it a simple, autonomous and affordable solution ready to grow along with your business.
Marketing & Board Advisor | Business Administration and Management
8 年Very cool!