Cloudbzz has written about some of the options for deploying RDBMS/SQL databases in the cloud, including:
- Do it yourself: Deploy a cloud server and install/configure/tune the database
- Service-based DBaaS: Typically, pick a boot image that already has the database installed in a default configuration. Use existing facilities for backups and/or monitoring.
- External DBaaS: Similar to the service-based approach, but probably a little easier and with more extensive or relevant management functions, and at least the prospect of cloud portability and scaling.
They list Xeround and FathomDB as the primary external DBaaS providers. Interestingly, Standing Cloud could easily enter the external DBaaS market with strong portability, flexibility and data control ratings (see the chart in the article). Our system naturally moves data among all the providers and gives the user root access to the server. We also support Drizzle and PostgreSQL, and all we would have to do is open the database port and create an external database user.
The problem with the approach used by these DBaaS providers is that performance is ultimately dependent on disk I/O, and cloud servers are not very good at high performance disk I/O. Furthermore, those cloud services that have persistent, separate storage (such as Amazon EBS), maintain that storage on network connected devices that are subject to network latency and bandwidth limitations.
GoGrid, and, more recently, Rackspace offer a high-speed connection between cloud servers deployed in their cloud environment and managed servers. Thus, a hybrid architecture could be deployed whereby the database is a Raw Iron managed server running the DBMS, using fiber-connected RAID storage - and the Web/app servers are “in the cloud” but connected via a high-speed link to the database. This is a good architecture, but it is not a fully “cloud” architecture.
To be a full-fledged cloud architecture, the high performance database needs to be user-provisioned. A user (or system) should not have to interact with the technical staff at the hosting provider, or wait for some physical action to be taken, or commit to a longer-term contract. Cloud skeptics will ask why this is important - after all, a high-performance database is not something that customers will start and stop frequently, as with a scalable web server tier. Chances are, a database of this kind will be deployed once and operated for a long time.
My answer is that going forward, the “burden of proof” will be on those who insist on traditional, slow, labor-intensive ways of doing business. A dozen years ago people still asked why we need Amazon to buy books - just go to the bookstore. But Amazon provides not only a bigger selection, but also immediate gratification or satisfaction of a need. It reduces the friction to taking action. And a self-provisioned, high-performance cloud database will do the same thing.
In the early days of Microsoft’s Azure project, we heard rumors that they were going to implement an architecture similar to what I have described, although offered in more of a PaaS model. With SQL Azure, they certainly have the platform to do it, but I haven’t heard anything about its performance or whether the storage is fiber-connected to the DBMS. For a while, it seemed as though EMC was going to offer something along these lines, but nothing ever came of it. If anyone has more information, we’d love to hear from you!