Introduction to OpenStack Swift Object Storage
This article is a general introduction to Swift, OpenStack’s Object Storage. We will cover the following topics: what Swift is, what kind of data you can store in Swift, how Swift controls access to your data, how Swift protects that data using replication or erasure codes, where to find development resources for Swift? and much more.
What is Swift?
Conceptually, Swift is similar to Amazon’s S3 or Microsoft Azure’s Blob Storage. It allows you to store any kind of unstructured data using a web-based API.
Swift is designed to scale to billions of objects and petabytes of storage. Swift does not require any special hardware. It runs on commodity servers with normal disks. However, plugins allow it to use filers from NetApp, Solidfire, and others if so desired.
The Swift software uses a distributed, shared nothing model. Since there is no central database or node to act as a bottleneck, it has tremendous horizontal scaling. To get more capacity, you simply have to add more nodes. Swift stores the data redundantly. If a server or disk fails, Swift uses the data on the surviving hardware to restore the data to its full level of redundancy.
Physically, Swift is set-up as a set of nodes in a cluster. A proxy node receives incoming requests and directs them to storage nodes where the actual data resides. OpenStack recommends a minimum of 5 storage nodes for production use, so Swift takes some hardware to set-up.
Swift stores its objects in containers. Each user’s storage account can have as many containers as they want. Each object is associated with a single container. An object is referenced by a URL of the form:
//<storage_account_name>/<container name>/<object>
One thing to note is that containers cannot be nested. You cannot have a container within a container. But you can simulate a pseudo-hierarchy, albeit with some extra effort, by adding slashes in the objects name and doing special list operations.
What Can You Store in Swift?
Pretty much anything you want. An object can contain a document, photos, music, back-ups, VM images, or any other piece of unstructured data. Swift is set-up to let you access individual objects individually via their URLs. It does not organize the data into tables like HBase or store relationships between data like MySQL. Nor is it a filesystem. A container name is the only organization that it offers to the data.
That said, Swift is very powerful. Since the data is accessible via a URL, it is easy to access it from anywhere on the Internet. It can be easily accessed from cell phones, PCs, or a variety of devices. And the ability to store objects can mesh very nicely with object oriented programming when it needs to persist data.
How Swift Controls Access to Your Data
Although Swift can be run stand-alone, most commonly it is run as part of OpenStack. In this configuration, authentication is done through Keystone, the OpenStack Authentication Server. Once the user has presented credentials to Keystone, he is given an authentication token which is valid for 24 hours. This token proves that the user is the owner of the Swift storage account. By default, only the owner can access the account.
The owner can define individual Access Control Lists (ACLs) for each container. The owner can grant read and/or write access to everyone, various groups, referrer hosts, or domains. For more details, see the ACL section of the API doc.
Finally, Swift has a temporary URL feature. This allows a third party to access an object using a URL that will expire in within a certain amount of time. This is useful for doing things like mailing temporary download links to people.
How Swift Protects Data Using Replication or Erasure Codes
By default, Swift replicates each object three times. When storing the copies, it tries to spread them out over different servers and disks so the failure of a single component won’t cause lost data. The number of replicas for each object is configurable by the administrator.
Erasure codes use less storage than replicas. Rather than duplicating the complete object multiple times, parity data is created much like RAID. This can reduce the amount of storage for an object from 3X to 1.2X. For cold data which is simply being archived, this can be a tremendous win. Rackspace has a more detailed description of Swift erasure codes here.
Erasure coding isn’t in Swift yet, but the development community will be working on it throughout this year.
Where to Find Development Resources for Swift
The OpenStack foundation has online documentation for Swift. The Associated Projects page for Swift has API bindings for various languages. The only officially supported binding is for Python, but there are others for PHP, Java, and C#. Finally, if you have questions, you can always ask the OpenStack community.