VMware’s Biggest Problem

VMware’s Biggest Problem

94317256

VMware’s algorithms have always done a great job with application placement onto a set of servers. It uses an API to ask about the server hardware, capabilities (e.g. CPU, memory), and current allocations, and then algorithmically selects the proper location to place the workload. It also dynamically monitors that placement. This monitoring allows VMware, for example, to automatically move applications around and power down servers that aren’t being used.

With the acquisition of Nicira and the resulting NSX direction, VMware now has similar capabilities for network devices. It uses an API to ask about the network hardware, capabilities, and then mathematically selects the proper location.

Adding the network algorithms to the server algorithms introduces a set of constraints. The main constraint is that the selection of network bandwidth needs to be ultimately bound to where the application workload will be running. An equally important constraint is that the security requirements for the application now need to be automatically extended to the network tier.

So the automated placement of workload, and the subsequent provisioning of the network for that workload, is more complex. Dynamic monitoring and movement is likewise more complicated.

Now let’s add the dynamic selection and monitoring of storage into this equation, and you’ve got one of the main problems that needs to be solved for next-generation IT infrastructures. These infrastructures are projected to run millions of applications at any one time, which eliminates all chances of trying to do IT allocation manually.  Consider VMware looking to assign new applications on top of a massive infrastructure, with servers on top, storage on the bottom, and the network in between:

SDDC

VMware’s biggest problem, therefore, is a computer science problem. It is also a problem that they are uniquely qualified to solve, given their experience at the server and the network layer.  The need for a new form of IT infrastructure is primarily driven by the fact that applications are placing an increasingly complex set of requirements down onto the infrastructure layer. With millions of apps on the horizon, VMware will start playing the role of arbiter in terms of automatic workload deployment:

The math behind the placement is anything but simple, and the ongoing monitoring to validate proper placement is yet more complex.

Believe me when I say that the internal discussions between VMware and EMC’s internal storage teams are perhaps the most fascinating algorithmic discussions I’ve heard in quite some time (ask Chuck Hollis, who contributed some of his own opinions at our last technology all-hands!). In addition to the complexities involved with modeling storage performance and reliability, you also have different storage interface types to worry about: block, object, file, SQL, HDFS, etc. Storage allocation modelling is of course greatly facilitated by the storage catalogues that I mentioned in my last post.

Automated application placement and monitoring becomes even more complex in a multi-hypervisor environment (e.g. think data centers that are deploying multiple cloud orchestration platforms – VMware, Microsoft, OpenStack – on top of the same computing infrastructure).

How does one begin to frame a solution to the problem? Here are a set of considerations that will serve as starting points for future blog posts on the subject:

  • Not all IT infrastructures will need this capability. There is likely a threshold where the number of applications and/or amount of data exceeds the capabilities of an administrator.
  • Some apps need scale-up infrastructure and some need scale-out. There will be heterogeneity in the infrastructure that VMware must account for.
  • Application workloads need to be characterized (e.g. performance, cost, availability, power consumption, etc.) in order to facilitate automated placement.
  • VMware’s “resource pools” construct needs to be extended to match all new workload characteristics.
  • Prioritization schemes can be used to resolve resource competition
  • The algorithms will revolve around sophisticated mathematical models.

In this series of posts that describe a data lake architecture, VMware is attempting to solve quite a few of the problems that I brought up in the Data Lake Addendum post:

  • Enabling the introduction of the GemFire/HDFS building block. This is the fundamentally “new” piece of the infrastructure that many analytic workloads will run on top of.
  • Designing a server-based storage strategy.
  • Tight integration with Data Protection services.
  • Providing hybrid services for private cloud implementations.
  • Integrating with software-defined storage APIs (e.g. ViPR) as part of SDDC orchestration.

Over the next few weeks I plan to amplify this discussion of SDDC orchestration.

Steve

https://stevetodd.tech

Twitter: @SteveTodd

EMC Fellow