The Rise of Cloud Storage
Today, many organizations have already gotten rid of their computing centers and servers. All their servers are deployed in the cloud using one of the many infrastructure as a service providers such as Amazon Web Services or Microsoft Azure.
As a first step, companies often replace their file servers with a cloud storage service. With this setup, you no longer need to synchronize storage servers across offices or worry about failing hard disks. The need turns to ensuring every office has a decent Internet connection to access the cloud storage service.
Many of Canto’s customers are looking for alternative storage options for their massive amounts of digital assets. As a logical consequence, we’ve now added the ability for Cumulus to natively support the leading cloud storage service: Amazon S3.
Cumulus & Amazon S3
Cumulus is agnostic to the underlying storage mechanism as long as it can access the assets when someone is requesting a file download or conversion. Cumulus can store files on local hard disks, a file server, network attached storage (NAS) or any distributed file system mounted locally. The Cumulus add-on “S3 Asset Store” adds the capability to directly communicate with Amazon S3 buckets.
In such a setup, metadata is still stored in the Cumulus Server, which can be deployed on premise or also in the cloud. The actual files are stored in Amazon S3 as shown in the picture above. A client program such as Cumulus Web Client asks the Cumulus Server to read or change metadata, but it requests the actual file from Amazon S3 instead. Thus giving customers the ability to maintain a hybrid cloud environment.
In the scenario above, when a new file is uploaded, metadata is extracted on the application server hosting Cumulus Web Client. The metadata is sent to Cumulus Server and the file to Amazon S3. The same process is triggered when a new version is checked in. Again, metadata is extracted first and updated in Cumulus Server while the new version of the physical file is stored in Amazon S3.
Amazon CloudFront in the mix
When the user requests the file for download or conversion, the file is first downloaded from Amazon S3 to the application server hosting Cumulus Web Client. This application server is performing the requested transformation like converting the file to a different format or bundling several files into a ZIP file. The results are then sent to the user’s web browser.
Files are not directly downloaded from Amazon S3, but instead Cumulus is requesting them from Amazon CloudFront. Amazon CloudFront is a globally distributed content delivery network (CDN) ensuring fast global downloads of files. Internally, Amazon CloudFront retrieves the file from Amazon S3, but if a file is downloaded several times via Amazon CloudFront, this transfer is only done once as Amazon CloudFront caches the content it retrieved from Amazon S3 for some time.
Amazon CloudFront servers are distributed globally and a user is routed to a server near their current location. Therefore, downloads via Amazon CloudFront are much faster than retrieving files from Amazon S3 directly.
If a single file is downloaded without any transformation, the file doesn’t need to go through the application server. In this situation, Cumulus Web Client and Cumulus Sites will directly retrieve the file from Amazon CloudFront to make full use of the distributed nature of Amazon CloudFront.
Securing your assets in the cloud
Working with a publicly available cloud storage service doesn’t mean assets are available to everyone. By default, all content stored in Amazon S3 is private and nobody has access. You need to set up technical user accounts with cryptographic keys and signatures and provide this information to Cumulus. Cumulus uses this information to generate cryptographically signed URLs so that an asset can be downloaded. Such an URL is only valid for a minute and it is impossible to guess such an URL.
Amazon provides a variety of mechanisms to further secure your cloud setup. For example, you could limit access to certain IP ranges and you can also enforce that all files stored in Amazon S3 are encrypted with AES-256 before they get written to the physical hard disks.
Hybrid cloud example
Let’s put everything together and look at a real-world example. A company from Silicon Valley is selling popular consumer electronics. Marketing, design and engineering departments are all located in the Valley, but device production is done by a contract manufacturer in Shenzhen, China.
The company is using Cumulus to store all product marketing material like product photos and videos, whitepapers, technical notes and manuals. The company uses a Cumulus on premise deployment in its Silicon Valley office to prevent any leakage of information prior to product release.
Technical specifications are transferred to a Cumulus catalog using Amazon S3 storage as soon as the contract manufacturer in Shenzhen starts with production. Moving assets from an on premise file server to Amazon S3 is a one click action in Cumulus and can also be automated, e.g. triggered on file status change.
The engineers in Shenzhen use a Cumulus Web Client installation hosted in Amazon EC2 in the Chinese AWS region. The Web Client retrieves all metadata from the Cumulus Server in Silicon Valley, but files are downloaded directly via Amazon CloudFront. Colleagues in the Valley will also get the files from Amazon CloudFront when downloading files stored in Amazon S3. It is ensured that file access is fast, no matter where a user is located.
Integrating Cumulus with Amazon S3 opens up many deployment opportunities for distributed companies. Companies with several offices don’t need to worry about file server synchronization. Instead, they store their files in Amazon S3 and get fast access to the files via Amazon CloudFront. Hybrid cloud scenarios are supported, in that selected assets can be moved to the cloud when needed.