Metadata Ethos

Metadata Ethos

In a recent post I introduced the concept of “infrastructure-based metadata”. In a data center, this type of metadata falls into four categories:

  1. Metadata about configuration management of the infrastructure.
  2. Metadata relating to data center security.
  3. Metadata that facilitates data center backup and recovery.
  4. Metadata that facilitates content management activity.

I also mentioned that this type of metadata has complex relationships (cardinality) with other forms of (a) infrastructure-based metadata, (b) application content, and (c) metadata describing application content. I used the following diagram to highlight this cardinality in the context of backup and recovery metadata.

BackupMetadata

Application workloads were sending a continual stream of content (blue) and metadata (red) to block, file, and object storage systems. As more and more disparate storage systems were deployed within data centers, the amount of intrastructure-based metadata (green) began to rise exponentially.

A new wave of application workloads began to emerge, and these workloads were primarily focused around the centrality of infrastructure-based metadata. New workloads lead to new innovations, and perhaps one of the first (and toughest) problems to solve was to help customers backup and recover the data associated with their workloads.

The need for new applications for managing backup and recovery was characterized by several fundamental problems:

  1. The metadata catalogue itself. How is the infrastructure-based metadata
    for backup and recovery stored? Where are all the copies (and versions) of the
    content?  Who can access the copies and versions? The catalogue has always
    been the core of any backup application. To meet the increased scale (trillions
    of objects) and scope (search across multiple layers of metadata), backup
    applications must evolve their core.
  2. Complex data center configurations. Traditionally, backup clients generated and
    sent the metadata to the catalogue. As data scaled, customers turned to new
    protection mechanisms for better performance. This created new paths between
    the primary and secondary (the backup) systems. With so many permutations and
    paths between block, NAS, and CAS storage systems, how can the backup application
    discover and ingest the metadata into its catalogue?

Solutions to these problems would bring what my EMC colleague Stephen Manley calls a “metadata ethos” to the IT industry. The fast and reliable storage of content became table stakes for enterprise customers; the real action (and business profit) would revolve around the adoption of effective tools and processes for handling infrastructure-based metadata.

In my next post I will describe how the metadata ethos resulted in considerable backup and restore innovation across NAS and CAS systems.

Steve

https://stevetodd.tech

Twitter: @SteveTodd

EMC Fellow