Sunday, January 13, 2013

Will My Internet Be Faster?

I have been beating the Maintainability drum lately, and highlighting what the latest Eucalyptus did in that regard. I'm not done yet. This time I want to change the angle of approach, focusing more on the Map Workload to Cloud Resources step, using examples and some  back-of-the-envelope calculations. 


Will my Internet be Faster?

Back in the day, I helped a user going through the installation of an ancient Eucalyptus (I think it was 1.4), and after some hurdles (there was no Faststart back then), he finally got instances up and running on his two nodes home setup.  Then he asked "Will my Internet connection be faster now?". 

I think the question points more to how Cloud Computing has been a buzzword, and has been perceived as a panacea to all IT problems. But it is also a reminder of the need to understand the underlying hardware Infrastructure, in order to achieve the desired performance. An on-premise Cloud is able to go as fast as the underlying Infrastructure, and will have as much capacity as the hardware supporting it. There is a tremendous flexibility that the Cloud provides, yet there is also the risk to under-perform, if the Physical Infrastructure is not prepared for the workload. 

Case In Point: Cloud Storage

At the end of the day, all Cloud Resources map to physical resources: RAM, CPU, network, disks. I will now focus on the Cloud Storage story because of its importance (anyone cares about their data?), and because historically it is where there have been some interesting issues. In particular Eucalyptus 3.1.2 was needed because high disk load caused seemingly random failures.
Some back-of-the-envelope, hand-waiving calculation needed for this blog
From my chicken scratch of the first figure, you should see how Ephemeral Storage resides on the NCs (Node Controller), while EBS (Elastic Block Storage) is handled by the SC (Storage Controller). Let's quickly re-harsh when the two are used:
  • Ephemeral: both instance-store and bfEBS (boot from EBS instances) uses Ephemeral, although instance-store use Ephemeral Storage also for root, and swap;
  • EBS: any instance with an attach Volume uses EBS, and bfEBS uses Volume for root, and swap.
and which kind of contention there is:
  • Ephemeral: any instance running on the same NC will compete for the same storage;
  • EBS: all instances using EBS within the same availability zone (Eucalyptus parlance for cluster) will access and use the same SC.
I used a simple spreadsheet to aid my examples. Feel free to copy it, play with it, enhance it, but please consider it a learning tool and not a real calculator: way too many variables have been left behind for the sake of simplicity.

In my examples I will measure the underlying storage speed with IOPS

IOPS values may vary dramatically. The above may be
used only to have  an indication of the expected performance. 


In the following examples, I will make the very unreasonable assumption that instances will access equally all their storage (both Ephemeral, and EBS), and that they will either use it 20% or 100% of the time. Moreover in the 20% case, an oracle minimizes the concurrent disk access of all instances (ie if there are less than five running instances, they will not compete at all and see the full speed of the storage). 

Thus one is a very light scenario, where the instances are mainly idles, while the other (100%) assumes the instances running benchmarks. Starting instances is a fairly disk intensive process, first because Eucalyptus needs to prepare the image to boot (which involve copying multi-GB files), and then because the OS will have to read the disk while booting. I added a column to the spreadsheet to show the impact of starting instances on the light workload.

Home setup

A small Cloud installation will most likely have the SC backed by local storage. Let's use an IOPS calculator to estimate the performance. Here I will use 2 Seagate Cheetah 15K rpm, and RAID 0, which gives about 343 IOPS (I will round it to 350). For the NC, I will assume 150 IOPS which should be a reasonably fast single disk (non SSD). 

For a Home setup three NCs seems a good number to me. Each NC should have enough cores and RAM to allow more than ten instances running (12-24 cores, 12-24 GB RAM  should do). If I run one instance-store, one bfEBS instance, and have one Volume  per NC, the very unrealistic calculator gives
Light load on the home setup: slowest
storage is still comparable to a 5400RPM disk.

Not bad for my Home setup. Even if the instances were to run iozone on all the disks, I can still see a performance of a slow 5400RPM disk. Now, let me create more load: four instance-store, four bfEBS, and have two Volume used per NC
The home setup with a heavier load doesn't do thatwell:
instances may see performance as slow as a floppy drive.

That's a bit more interesting. If the instances are very light in disk usage, they will see the performance of a 7200RPM disk, but under heavier load, they will be using something barely faster then a floppy. Ouch!

A More Enterprisy setup

From the previous example, is fairly obvious why bigger installations tend to use a SAN for their SC storage back-end. For this example I will use a Dell Equallogic. I will use a setup that gives a 5000 IOPS. Correspondingly, the number of NCs are increased to 10.

Let's start with a light load: one instance-store, one bfEBS, and one Volume per NC (similarly to the Home setup, although now there is a total of 20 running instances).
A SAN backed cloud with a
light load: pretty good all around.

The results are pretty good with access to EBS around 250 IOPS under heavy load, and very fast access on the light load. Even Ephemeral compares well with a 3.5" desktop-class disk. 

Now I will run more instances: four instance-store, four bfEBS, and have 4 Volumes per NC. . 
A SAN backed cloud with an heavier load: EBS
is now comparable to  a 5400 RPM under heavy load.

Ephemeral still takes a beating:  as in the Home setup case, there are eight different instances using the same local disk (bfEBS has access to Ephemeral too, and in my simplistic approach all disks are used at the same rate) . EBS slowed down quite a bit, and now it compares to a slow desktop-class disk. Although the instances should still have enough IOPS to access storage, perhaps it is time to start thinking about adding a second availability zones to this setup.

Snapshots

The above examples didn't consider Snapshots at all. Snapshots allows to back-up Volumes, and to move them across availability zones (Volumes can be created from Snapshot in any availability zone). Snapshots resides on Walrus, which means that every time a Snapshot is created, a full copy of the Volume is taken on the SC, and sent to Walrus. If Snapshots are frequent on this Cloud, it is easy to see how the SC, Walrus, and the Network can become taxed serving them.

Expectations

I would take all the above numbers as a best case scenario under their relative cases. A lot of variables have been ignored, starting from network, as well as others disk access. For example, Eucalyptus provides swap by default to instance-store, and the typical linux installation creates swap (ie bfEBS instances will most likely have swap), hence any instance running out or RAM, will start bogging down the respective disk. 

There was also the assumption that not only the load is independent, but the instance co-operate to make sure they play nice with the disk. Finally in a production Cloud, a certain mix of operation is to be expected, thus, starting, terminating, creating volumes, creating snapshots, will increase the load of both Storage (Ephemeral and EBS) accordingly. 

As I mention in my Maintainability post, having a proper workload example, will allow you to properly test and tune the cloud to satisfy your users.

Making Internet Faster

In the above examples, I pulled off some back-of-the-envelope calculations which do not consider the software Infrastructure at all (ie they don't consider Eucalyptus overhead). Eucalyptus impact on the Physical Infrastructure has been constantly decreasing. Just to mention few of the improvements, before Eucalyptus 3, the NC would make straight copies multi-GB file, now it use device mapper to minimize the disk access to bare minimum, And the SC alongside with the SAN plugins, now has DASManager (Direct Access Storage, ie a partition or a disk), which allow to bypass the file system when dealing with Volumes. 


There has been a nice performance boost with Eucalyptus 3,  but there is still room for improvements, and no option has been left unexplored, from using distributed file systems as back-end, to employing SDN. Although Eucalyptus may not be able to make Internet faster yet, it is for sure trying hard.

Tuesday, January 1, 2013

Maintainability and Eucalyptus

I recently blogged about the importance of Maintainability for on-premise Clouds. Within the lists of steps to a successful on-premise Cloud deployment identified in the blog, Eucalyptus as IaaS software is heavily involved with the Deploy and Maintain part.

Deploy

I already  mentioned the work done to make Eucalyptus installation easy peasy, so let me summarize them here. Eucalyptus is packaged for the main Linux distritributions, so the installation is as easy as configuring the repository, and do a yum install or apt-get install. Configuring Eucalyptus is still a bit more complex that I would like to, and requires to register the components with each other,  but the steps can easily be automated, as demonstrated by our FastStart installation. 

Although there is always margin to improve, as distributed systems go, I dare to say that we are getting as easy as possible. Moreover, any good sysadmin already uses software to manage the infrastructure, so I see script-ability as the most important feature to allow easy progress with custom installations (ie Eucalyptus deploy recipes to use with ansible, chef, and puppet).


Maintain

If you follow our development, you already know that Eucalyptus 3.2 got recently released. There are ample documents covering the release either in general (Rich, and Marten blog) or for specific features (DavidAndrew, and Kyo blog), but if I wear my Cloud Admin hat, the part that didn't get enough coverage is the the amount of work that went into making Eucalyptus more maintainable.

Eucalyptus 3.2 fixed issues.
Eucalyptus 3.2 had 350 fixed bugs, and those are only the reported ones, since quite a few got fixed while restructuring parts of the code. Peek over the list, you will see the ones related to the new features but there is a large number of things done to make Eucalyptus more robust and hence maintainable. You don't believe me? Let me give you a sample:
  • reworked the inner code paths of the Storage Controller, preventing now to accidentally configure the SC with am undesired backend;
  • added safety mechanism to the HA functioning which will prevent or greatly reduce the risk to have split brain Cloud Controllers;
  • more robust handling of orphan instances (the situation appears if the Node Controller is not able to timely relay its information all the way to the CLC)
  • plugged memory and database connections leaks (fairly annoying since they required restart of components under particular use cases).
Likely you got more excited about our awesome new user console, but it's the features like the above list that gives me the comfort of a solid Infrastructure. 

User Console screenshot taken from David's blog


Are we there yet?

As I mentioned before, there is always room for improvements. The bulk of the work for 3.2 went into hardening the code, into covering all the corner cases, into improving QA coverage. I call all this work the invisible work since it is neither flashy, nor apparent at cursory inspection, yet it is the one that allows the Infrastructure to survive the test of time. 

With most of the invisible work done, what is ahead of us is more easy to understand and categorize. From the list of the work scoped for 3.3, I see a lot of great new features: autoscaling, cloudwatch, elb for example. This is still scoping work, so if you really like one feature, go and tell us, or up-vote it in our issue tracker.  Yet, with all these new features, we don't loose focus on our Infrastructure roots. In particular the work for Maintenance Mode, and Networking, alongside with a lot of other features that will make it much easier to deal with Cloud Resource, for example vmtypes and tagging.

So, are we there yet? As I mentioned in my previous blog, work on an infrastructure is done when the infrastructure is not in use anymore, so no, we are not there yet, but for sure we are having a great ride.