Component Performance in Userland

Component Performance in Userland

I’ve mentioned in previous posts that some of our engineering teams have been experimenting with the portability of data path software from several EMC products. This has been going on for the better part of a decade. One unique aspect of these porting experiments is the ability to use CSX technology to get the assets up and running in either kernel OR user space.

Software that traditionally runs in a kernel context can theoretically run in user-space on Windows, Unix, and/or LINUX platforms, as well as inside of some of EMC’s proprietary operating systems.

There are a couple of ramifications that come along with this statement. Firstly, user-space deployment gives greater flexibility when shipping on platforms that have licensing restrictions on kernel software. Secondly, the performance characteristics when running in user space are going to be markedly different than in the kernel.

It’s this second point that I’d like to dive into.  Early experiments with CSX and data path component porting were feasability experiments. Once a component was successfully ported into user space, I/O would be run against it to prove that the port worked. Usually the next step would be to test the performance. The initial performance results would usually be lacking.

As components matured over the many years of integrating into the CSX environment, bottlenecks were found and identified. The component bottlenecks on one platform (e.g. Windows user-space) might be different when the same component runs on another platform (e.g. LINUX user-space).

The bottom line is that most of these components have had several years of “practice” becoming higher-performing components (no matter what the platform, and no matter whether user space or kernel).

Once the components are leaner and faster, the next step is to assemble them into a product deployment and address any performance issues that arise. One of the best examples is the compression/de-dupe CSX component. This component can run inside Celerra (DART operating system), but it could also run as a separate process in Windows or LINUX user space as well. When assembling a product where I/O runs through the compression software, the component “edges” usually need optimization.

In other words, the platform-specific methods of routing data in to and out of  components needs to be examined. Where does compressed data go when it leaves the component?  That depends on the overall component assembly. It could pass to another CSX-ported component (e.g. a mirrored cache, or a search-and-index engine), or it could pass through the kernel and out to disk.

This type of work (optimizing the boundaries and protocols between components and/or drivers) is a big area of focus right now.

In addition, there are many other tips and tricks that can help on each platform. LINUX fastpath/slowpath comes to mind.

It’s an interesting exercise. The component strategy of porting, re-assembly, and re-use in a variety of product lines has followed this evolution:

  1. Feasibility
  2. Component hardening/quality
  3. Performance optimization

Steve

Information Playground

Twitter: @SteveTodd

EMC Intrapreneur