Today, storage is about big boxes, from companies such as EMC and NetApp. But a number of new companies have taken up the banner of storage systems as software. With their help, developers can free themselves from the ties that bind them to on-site corporate storage systems.

Bill Roth, vice president of marketing at Nexenta and formerly at Sun Microsystems, said that new storage options from his company bring some old Sun technology back to the forefront. Sun’s Zetabyte File System was pushed into the open-source community even before Oracle had taken over Sun. But when Oracle acquired the company, ZFS was somewhat shunted to the side; Oracle also has the BTRFS project, which seeks to create a similarly massive file system for scaling storage over the next decades.

While Oracle still offers ZFS in its Solaris products, Nexenta uses a fork of OpenSolaris known as Illumos to build a storage-focused Solaris distribution based on ZFS.

“In many ways, NetApp and EMC are saddled with the innovator’s dilemma: They’ve got old file systems on proprietary hardware,” said Roth. “We’re seeing success with open-source software, with new file systems and commodity hardware. I don’t think the legacy vendors are going to be able to keep up.”

Another upstart software-based file storage system is Ceph, which grew up in the academic world. Designed by Sage A. Weil while he was attaining his doctorate in computer science at the University of California, Santa Cruz, Ceph is a flexible storage solution that can host objects, file systems or blocks. Based on the CRUSH algorithm, it is able to accommodate multiple types of storage systems on the same platform, making it stand out from the rest of its ilk in the OpenStack world. Originally designed as a scalable storage system, Ceph was recently submitted to OpenStack as an alternative for OpenStore, the OpenStack Object Storage system.

Today, Ceph includes a number of sub-projects that enable the system to scale to the petabyte range. In June, Ceph received corporate backing in the form of Inktank, a company formed in order provide to service and support of Ceph.

Ross Turk, vice president of community at Inktank, said that Ceph is unique because of its flexibility and robustness. “The interesting thing about Ceph is that it’s the only storage system I’ve seen that has been designed from the very first step to have no single point of failure, and the only one designed for failure as the norm,” he said. “For every 1 million nodes, a bunch will fail every day. The way the CRUSH algorithm works gets around a lot of the problems that are limiting the scalability of stuff that starts with the file system and builds on top of it.”

For developers, these new storage systems increase the flexibility of their environments, and give them more deployment options. Currently, large storage systems from EMC or NetApp live in corporate data centers. They can’t move because they are tied to extremely expensive hardware. Thus, developers can use these systems internally, but cloud-hosted apps are either unable to use such storage, or must do so through a high-latency connection from the cloud into the data center.

With Ceph, Nexenta and other software-based storage systems, that on-site data store can be replicated in a cloud, using the same storage software that exists on premises.

Bryan Bogensberger, president and COO of Inktank, said, “Ceph is changing how people are able to deploy IT infrastructure because of the fact [that] it’s software and it’s architected the way it is.”

Such software-based storage options also give developers the opportunity to rewrite storage interfaces the way they see fit, said Nexenta’s Roth. “We have open APIs. We do have our own user interface, or you can write your own if you don’t like ours,” he said.

“If they want to build their own cloud, they can throw up OpenStack and use this as a storage base underneath, or they’ll have their storage structure effectively virtualized. We’ve seen a lot of people putting Hadoop on top of Nexenta, because even though the data’s replicated, failure can still be very costly. How do you limit that? You put ZFS under it to make it even more powerful.”
#!
Ceph, too, is able to act as an underlying storage system for Hadoop. Indeed, the flexibility of these new storage systems is such that developers can now deploy them to just about any location. When compared to a world where storage is tied to specific boxes in specific data centers, it’s no wonder there is so much activity in the software-based storage world.

That’s an enticing proposition that even the hardware storage vendors are hoping to capitalize upon. To that end, the Cloud Data Management Interface (CDMI) standards will play a big part in NetApp’s plans, which the company said include both a software-based management solution, and the ability to combine cloud-hosted storage systems with on-premise hardware.

On Aug. 9, NetApp became the first CDMI participant to support the standard with the release of Storage Grid Version 9. NetApp believes CDMI will be the new standard for storage interfaces across the Web, thanks to its basis in REST. Jon Robbins, solutions marketing manager at NetApp, said, “With CDMI, there really isn’t anything that’s competing with it at this level and that has the backing that it has. All the major players that designed storage are in this. This is the one that we see that they’re truly backing. I think that does give it significant legs. We’re the first to announce it. We’ve had an open HTTP-based RESTful API, and it happened that CDMI is an HTTP-based RESTful interface, so it was easy for us to map the CDMI to our existing product.”

Of course, those other software-based storage solutions can also support CDMI, and thus allow IT and development teams to mix and match their storage choices across the Web and around the world. To that end, NetApp is hoping that its Storage Grid software will provide a compelling way to manage all of that data.

And this all comes back to what are, perhaps, the biggest buzzwords in technology right now: Big Data.

Richard Treadway, director of Big Data solutions marketing at NetApp, said, “If you look at the area of Big Data, it’s really no different for us. Big Data represents a disruption to the space, and it requires you to think about how you’ll deal with that data and the growth of unstructured data. While this area is pretty confusing, we’re trying to simplify it with a practical approach of things you can do today in terms of getting control of the data growth and seeing a real way to use that data to have better business outcomes. There’s also bottom-line savings because if you can deal with all that data and store it and use it more efficiently, that goes directly to your bottom line.”