Wednesday, December 30, 2015

Life in a container: AppScale in Docker

Docker and AppScaIe

Docker hardly needs any introduction: it's been extremely popular as of late, for very good reasons.  Based on a fairly old and tested container Linux technology, it makes it easy to create fresh environments to test and develop, or to isolate services within their own software stack, or to deploy a complex application stack.

App Engine is another well-tested and appreciated technology, developed in 2008 and in full production at Google since 2011. An estimated 6 million App Engine applications are running in production. AppScale implements the App Engine model, allowing your application to be deployed on virtually any infrastructure. It seems only natural then to have Docker and AppScale working together.

Developing and Testing App Engine Applications

The App Engine model makes application development easier, since the infrastructure components are already taken care of by AppScale or Google App Engine. For example, scaling the application or the databases is already part of the model. The developer is then able to focus on the logic of the application, and rely on well known components servicing the APIs. The development cycle is the usual: develop in your favorite IDE, setup a test environment, demo/confirm the feature or bug fix, merge in master, rinse and repeat. There are of course a lot of different methodologies, but setting up a test environment is one of the phases that is always required (is anyone still developing with waterfall?!).

AppScale can be deployed on a single node, thus simplifying the dev/test phases. The deployed system is a full environment, preserving the characteristics of the production environment (minus possibly the scale of nodes or data). Docker is one of the infrastructures supported by AppScale, and it is particularly suited for setting up multiple test environments. With AppScale 2.6.0 we officially released AppScale onto Docker Hub, so setting a new environment is as easy as

      docker pull appscale/appscale
      docker run -t -i appscale/appscale /bin/bash

In few seconds you will have your own App Engine environment.

Life in a Container

Typically Docker containers run one service, such as Nginx or Cassandra. It's a relatively recent push toward microservices, and it helps when integrating different systems since it isolates the software stack each service depends upon. Hence the base images in Docker don't have the usual 'boot' sequence, and many ancillary services aren't running since they are usually not needed to run the desired service. You typically don't need to have a cron or syslog daemon to run just Cassandra or Nginx.

AppScale expects to have a normal instance running, i.e.: it expects quite a few of the usual services a Linux box has. For example, AppScale expects the ssh and syslog daemons to be running. This is typically done during the boot sequence, and virtually all Linux boxes have them running. So we created a simple script to start a dev/test environment easily; we call it FastStart, and when it detects a Docker container it starts the needed ancillary services.

From the AppScale container as started using the previous commands, FastStart can be invoked with the following command:

      ./appscale/scripts/ --no-demo-app

The script creates a basic AppScalefile and starts AppScale. The AppScalefile is AppScale's configuration file and can be found and inspected in /root (the first cd command ensures it will be put there). A sample application is also downloaded into /root. If you just want to test your container, you can deploy it and check it out with the following command:

      appscale deploy guestbook.tar.gz

The FastStart script works on any infrastructure AppScale supports, from Vagrant/VirtualBox, to AWS and GCE, and it can set up a single node instance on any of them. If the infrastructure supports the concept of public and private IP (say GCE or AWS), the script detects it and configures the system properly. For some infrastructures where detection cannot be easily done (for example Vagrant), modify the AppScalefile and change the login directive to have the correct 'public' IP.

Dev/Test cycle made it easy

AppScale's FastStart makes it trivial to generate a new development or testing environment. With Docker this process literally takes seconds, bringing a new level of convenience to the App Engine development cycle. Try it out.

Thursday, December 3, 2015

'Scale' in AppScale

App Engine

A powerful platform to build web and mobile apps that scale automatically is Google's punch line for App Engine. And automatically is the keyword: it's difficult to underestimate the power of a platform that allows any application to react to a different user induced load automatically, with no intervention from sysadmin, or developer.  We loved that statement so much that we wanted everyone to be able to take advantage of autoscaling, and that's why AppScale was created.

App Engine: A powerful platform to build web and mobile apps that scale automatically.

The Basics of Scaling

Google has extensive documentation on the scaling of App Engine applications. In the documentation you will find references to application instances and how latency is the main factor to understand how many instances are needed to satisfy a specific load. Since App Engine applications run on the Google platform, the promise of infinite resources at their disposal is as true as it can get. Limited perhaps by the customer wallet.

AppScale works in a very similar way: the application is allowed to scale up for as long as resources are available. While in Google instances determine the memory available in each instance, in AppScale we have a configuration option to achieve the same. Similarly, latency is used to determine if and how to scale the application up or down. What's different is how resources are acquired to allow the application to scale.

App Engine applications scale automatically based on load, running in Google or AppScale. Users won't notice much: just that requests are served timely.

Within an AppScale deployment, some nodes are AppServer nodes, which means their CPUs and Memory will be dedicated to run application instances. Once the resources within the AppServers nodes are exhausted, if the underlying infrastructure allows it,  new nodes can be acquired (up to the desired maximum) as new AppServers. AppScale supports this on Cloud environments, like GCE, AWS, OpenStack, HP Helion Eucalyptus, and there are some experimental work for vSphere.

Scaling in AppScale

For autoscale to work properly, AppScale needs to be able to answer two questions: Does the application needs more resources? Do we have resources available to start a new instance? Whenever an application is uploaded within an AppScale deployment, the AppController (a component of AppScale) automatically creates a load balancer (we use haproxy) configuration for it. This allows the application's instances to be added or removed with no service interruption. Periodically the AppController checks the application statistics within haproxy to see if the application is struggling. This allows AppScale to keep the application latency in check.

At a very high level, AppScale is similar to a usual three tier system. The front end acts as a load balancer and SSL termination, the middle tier, the AppServers,  runs the application instances, and the lower tier is the Datastore.
Within each AppScale deployment, the Login node routinely checks the available resources on each AppServer node. When a new application instance is needed, the Login node needs to find an AppServer with enough memory for that application, and with enough CPU to spare. If there is no AppServer node meeting the requirements, a request is been sent to spawn a new virtual machine if the underlying infrastructure allows it.

The biggest difference between Google and AppScale comes in the scaling of the datastore (the App Engine API to the integrated NoSQL database). AppScale implements the datastore API using Cassandra: the scaling we obtain has been extremely good, and we tested it in excess of 17,000 datastore transactions per second (equivalent to over a quarter million transactions in Cassandra for that specific workload).  While Google service is limited only by the quota the user desires,  the scaling of AppScale datastore implementation is manual. The main reason is that adding nodes to a running database will incur a re-balancing cost that at this time needs to be weighted and controller by an administrator.
The AppController monitor the application statistics, via a query to the load balancer information on the front end (1). The AppController can then inform the AppServers to start or stop an application instance if needed.

Tuning Scaling operations in 2.5.0

AppScale 2.4.0 and 2.5.0 bring some tuning to the scaling mechanism. In particular the hysteresis cycle has been introduced also for scaling instances within existing nodes. We observed that under certain loads, the scaling was a bit too aggressive, in particular if the application requires a long time to load (for example a complex Java application with a lot of dependencies).

We also increased the cool-down period for VMs started in a private cloud: we observed that in some private clouds environments the boot time can be long (depending on configuration), so we wanted to make sure we amortized the cost of starting a new instance, increasing the time to live. For the latter, we made sure we were well within the one hour mark, which is used by AWS as the unit of time to charge.

For any question about AppScale find your preferred way to reach us at