Platform as a Service (PaaS) is not a trivial thing to build, deploy, or maintain: first you have to orchestrate all the services internally, then you have abstract all of that work behind a facade, and finally, you have to market, sell it, and maintain it. Not surprisingly, an investment that large has been the domain of only a few well-funded companies.
Hence, it has been interesting to watch VMware roll out their CloudFoundry service as an open source project! An entire PaaS platform, which they will also offer as a hosted service, but also available for anyone to run within their own company or datacenter - you can now run a "mini Heroku", or an "EngineYard cloud" on your own servers! But marketing aside, the engineering behind the project is also very interesting: it is orchestrated entirely in Ruby! No Erlang, no JVM's, all Ruby under the hood.
CloudFoundry: Rails, Sinatra, EventMachine
Out of the gate, CloudFoundry is able to provision and run multiple platforms and frameworks (Rails, Sinatra, Grails, node.js), as well as provision and support multiple supporting services (MySQL, Redis, RabbitMQ). In other words, the system is modular and is fairly simple to extend. For a great technical overview, checkout the webinar.
To orchestrate all of these moving components, the "brains" of the platform is a Rails 3 application (CloudController) whose role is to store the information about all users, provisioned apps, services, and maintain the state of each component. When you run your CLI (command line client) on a local machine, you are, in fact, talking to the CloudController. Interestingly, the Rails app itself is designed to run on top of the Thin web-server, and is using Ruby 1.9 fibers and async DB drivers - in other words, async Rails 3!
A companion to the Rails application is the Health Manager, which is a standalone daemon, which imports all of the CloudController ActiveRecord models, and actively compares to what is in the database against all the chatter between the remaining daemons. When a discrepancy is detected, it notifies the CloudController - simple and an effective way to keep all the distributed state information up to date.
Orchestrating the CloudFoundry Platform
The remainder of the CloudFoundry platform follows a consistent pattern: each service is a Ruby daemon which queries the CloudController when it first boots, subscribes to and publishes to a shared message bus, and also exposes several JSON endpoints for providing health and status information. Not surprisingly, all of the daemons are also powered by Ruby EventMachine under the hood, and hence use Thin and simple Rack endpoints.
The router is responsible for parsing incoming requests and redirecting the traffic to one of the provisioned applications (droplets). To do so, it maintains an internal map of registered URL's and provisioned applications responsible for each. When you provision or decommission a new app server instance, the router table is updated, and the traffic is redirected accordingly. For small deployments, one router will suffice, and in larger deployments, traffic can be load-balanced between multiple routers.
The DEA (Droplet Execution Agent) is the supervisor process responsible for provisioning new applications: it receives the query from the CloudController, sets up the appropriate platform, exports the environment variables, and launches the app server.
Finally, the services component is responsible for provisioning and managing access to resources such as MySQL, Redis, RabbitMQ, and others. Once again, very similar architecture: a gateway Ruby daemon listens to incoming requests and invokes the required start/stop and add/remove user commands. Adding a new or a custom service is as simple as implementing a custom Provisioner class.
Connecting the pieces with NATS
Each of the Ruby daemons above follows a similar pattern: on load, query the CloudController, and also expose local HTTP endpoints to provide health and status information about its own status. But how do these services communicate between each other? Well, through another Ruby-powered service, of course! NATS publish-subscribe message system is a lightweight topic router (powered by EventMachine) which connects all the pieces! When each daemon first boots, it connects to the NATS message bus, subscribes to topics it cares about (ex: provision and heartbeat signals), and also begins to publish its own heartbeats and notifications.
This architecture allows CloudFoundry to easily add and remove new routers, DEA agents, service controllers and so on. Nothing stops you from running all of the above on a single machine, or across a large cluster of servers within your own datacenter.
Distributed Systems with Ruby? Yes!
Building a distributed system with as many moving components as CloudFoundry is no small feat, and it is really interesting to see that the team behind it chose Ruby as the platform of choice. If you look under the hood, you will find Rails, Sinatra, Rack, and a lot of EventMachine code. If you ever wondered if Ruby is a viable platform to build a non-trivial distributed system, then this is great case study and a vote of confidence by VMware. Definitely worth a read through the source!