Wednesday, December 5, 2012

It's Maintainability, Stupid

I have been playing with FastStart and Silvereye for a while, as well as worked with the single-cloud script. I also wrote about the setup I use for testing Eucalyptus on a laptop. The latest tweets, and blogs clocked a Eucalyptus installation in less than 15 minutes. Quite impressive I have to say. Yet I think it's important not to lose prospective on the ultimate goal for an on-premise cloud.

On-Premise Cloud as Infrastructure

I see the installation of  Eucalyptus (or any on-premise cloud for that matter) as a step during the deployment of a cloud, and neither the first one, nor the most important one. The on-premise clouds I am referring to, are IaaS, and right now I want to focus on the Infrastructure part . When I think about Infrastructure I think of satellites, power grid, aqueducts, Internet, and so on. There are quite a few characteristics of an Infrastructure, but one in particular comes to my mind to identify a successful one: Maintainability.

Why Maintainability?

An Infrastructure's purpose is to provide or support basics services: when infrastructure collapses, severe consequences are to be expected (think blackouts, blocked highway, etc...). An Infrastructure needs to be dependable (how can you build a house on unstable foundations?), to have a long lifespan (I can think of temporary only for Proof Of Concept installations), to sustain a very different load or use throughout its useful life (think about Internet, highways  etc.. from their inception to what they are today), hence it needs to  adapt to the load (elasticity), to isolate and/or limit the scope of failures (resiliency), to be functioning and accessible (available), to be inter-operable with different versions and/or similar minded infrastructure, to isolate operator access (minimize human errors). In my mind the above encapsulates the essence of Maintainability. I guess I'm taking the Cloud Admin side by considering reliability and availability as part of maintainability - since an Infrastructure that is neither reliable, nor available is not maintainable from the admin point of view.

Deploying On-Premise Cloud

So, how do we deploy an on premise cloud successfully? I see most of the difficulties in the planning, preparing, and forecasting. One key element is to understand where the workload and the cloud will be in the future, and to have a path to migrate both the underlying physical system and the cloud software incrementally. Very easy, right?


Learn your Workload

In most cases an on-premise cloud is deployed for a specific application or set of applications, or departments, or group of users, or use-cases, so the workload is already implicit. Learn the needs to of the workload in term of compute, network, storage, and how parallel or spiky it is, and if possible at all, forecast the future workload.

Capturing at this stage a sample of the workload, or to create an artificial load which mimic the real workload, is a boon for the successive stages. Although it not always easy or possible, having a model of the workload will be very helpful to validate the physical infrastructure, and the cloud deployment.

Map Workload to Cloud Resources

Cloud Resources are basically three: compute (CPU and RAM used by the instances), storage, network (bandwidth, security/isolation, IP addresses). Understanding the workload needs in terms of Cloud Resources usage, will help to size the cloud appropriately. 

Also, usage will most likely vary with the Cloud Users becoming more savvy: it's common to see a very heavy reliance on EBS (for instances and volumes) when starting using the cloud, and to move more toward instance-store once the applications become cloud-aware. 


Prepare your Physical Infrastructure

With the workload defined in term of Cloud Resources, we can start isolating the possible bottlenecks and prepare the physical infrastructure to successfully run the workload. Note that some Cloud Resources may end up using multiple physical resources at the same time: for example, boot from EBS instances may tax at the same time the Storage Controller (EBS service provider) and the network (to allow the Node Controller to boot the instance). 

Also factor in the load incurred in fulfilling some operation: for example starting an instance may have the Node Controller fetch the Image from Walrus, copying it on its local disk, then starting the instance. Those operation may create contention on the network (instance traffic and image transfer), on Walrus (multiple NC asking for different images), or on the local disk (if the caching of the instances is done on the same disk where ephemeral storage resides).

The physical infrastructure plays also a very important role when thinking about scaling the cloud: if the storage cannot be easily expanded, if the network cannot be upgraded or reconfigured, growing the cloud to meet the forecast workload may be impossible without the need of a re-install or a long downtime


Deploy

Well, we finally got here: with the physical infrastructure properly sized, we can start installing each component, Cloud Controller, Walrus, Cluster Controller(s), Storage Controller(s), and Node Controller(s) on their respective hosts. And yes, this step can take a lot less than 1/2 hour, although on production installation, with multi-cluster, HA, and SANs, it may take a bit longer.


Maintain

Once the cloud is deployed, it truly becomes an infrastructure, and as such we need to ensure it stays up all the time, through upgrades (cloud software, host OS, router firmware, all should be up-gradable with hopefully no downtime for the cloud as a whole), failures (failure of a machine or a component should not impact the cloud, although it may impact an instance, a pending request, etc...), expansions (adding Node Controllers, clusters, storage, network), and load spikes (cloud should degrade gracefully and not collapse). 

Any of the above steps may happen after deployment (you may need to profile some problematic workload, or to deploy some new components), yet they all fall under the Maintain umbrella, since they are all needed to ensure that the Infrastructure fires on all cylinders and becomes invisible. Once the Infrastructure has been in place long enough, its usage will be taken for granted, and the only attentions will be received when there are deficiencies or problems (think about power grid and how black-outs or brown-outs get in the news). That is, when your cloud is not anymore maintainable, you will be in the news.

Wednesday, October 3, 2012

Cloud Storage Types

Persistent Storage sounds like a tautology to me. Been used as I am (was?) to Hard Disks, USB keys, DVDs, and all other possible way to store information, it seems that storage is persistent by definition, and only failures, or human errors can cause non-persistent and catastrophic behavior . Well, in IaaS terminology, Storage comes in different flavors and can also be Ephemeral.

Storage Types

Eucalyptus follows the AWS API and with these APIs comes 3 Storage types:
  • Buckets -- objects store implemented by S3 -- (provided by Walrus), 
  • Elastic Block Storage (EBS) Volumes (provided by the Storage Controller),
  • Ephemeral Instance Store  (provided by the Node Controller). 
Our Storage Team, started a wiki to dig into the technical aspect of the different storage types: stay tune on GitHub for more in depth technical dive on how they have been implemented.

Of the above list, two are meant to be persistent (i.e. to persist across instance termination): Volumes and Buckets. Two provides the familiar block interface (i.e. they appear  and are used as Hard Disks): Volumes and Ephemeral. One is designed to be massively scalable: Buckets. And one is meant to be temporary: Ephemeral.


Instances and Storage

Instances by default are sitting on Ephemeral Storage. Uploaded images (EMIs) are the master copies, and all instances will start as a fresh copy of the that very image. All changes made to the instance (e.g. packages installed, configuration, application data) will disappear once the instance terminates. Notice that the termination of an instance can be voluntary (i.e. the Cloud User issue a terminate-instance command) or accidental (e.g. the hardware running the instance fail, or the software within the instance fails badly). This kind of instances are called instance-store instances

Instances can also use Volumes for their root file system: they are aptly called boot from EBS instance. In this case, at instance creation, a Volume is cloned from a specific EMI snapshot. The instance will then have exclusive access to this Volume, throughout its lifetime, allowing for stopping and re-starting without loss of any changes made. The instance can be restarted wherever the Volume is available (i.e. EBS Storage is only available on a cluster basis, or availability zone), and its performance is driven by the Volume performance (e.g. network speed to a SAN, or DAS serviced by the Storage Controller).

The main different in Storage speed between instance-store and boot from EBS,  is a trade between speed of the local disk (in the case of instance-store), and the speed of accessing a SAN (or DAS) across a network. Things gets more complicated when multiple instances competes for shared resource (e.g. common disks on a Node Controller, or network access to the SAN).


Cloud Admin

The Cloud Admin, although not a user of these Storage types, needs to  have a clear deploy plan to provide enough Storage space for each type, limit contention on shared resources, and ensure that the performance and reliability meets the expected levels. An understanding of the specific load, will go a long way to size the cloud properly.

The deployment of Walrus and the Storage Controller, respectively providers of  Buckets and Volumes, is key to ensure the right level of reliable Nines of the Persistent Storage types. Walrus get/put interface, helps to ensure the scalabilty of the service, but a slow host (CPU is needed to decrypt uploaded images, and serve concurrent streams), or limited space (Walrus stores uploaded images and Buckets) can severely crippled the normal functioning of a cloud.  The Storage Controller serve Volumes to a cluster, both for EBS attachment and boot from EBS: under-sizing the network between Storage Controller and Node Controllers,  can slow down to a crawling halt each instance request to disk.

Ephemeral is served by the Node Controller. Sizing the physical storage subsystem for the expected number and type of instances is needed to ensure the full load can be achieved. Also, with the current multi-cores CPUs, quite a few instances can run on the same Node Controller. Too many concurrent disk requests can easily overwhelm the Node Controller's host, causing instances to time out, or unpredictable and erratic behavior: the storage subsystem needs to be properly tested for the expected concurrent load.

Cloud Application Architect

The Application Architect is reasonably isolated from the underlying hardware used to build the private cloud, insofar as the Cloud Admin has planned properly the Storage Types availability, performances and reliability. Thus the main decisions for the Architect is which Storage Type to use and when. 

Persistent vs Ephemeral 

When I started to drink champagne, I went for what I was accustomed to, that is, very well know servers, well taken care of (ie very persistent). In short an environment where a server rebuild is an exceptional case. The ancient version of Eucalyptus we used then, didn't have boot from EBS, so we effectively implemented it using Volumes and chroot environment. As a bonus, backups were as easy as to create a Snapshot (euca-create-snapshot).

After few cloud moves and upgrades (both of software and hardware), I started to embrace the idea of the chaos monkey, where no single instance is central to the service. Now I'm relying more and more on scripts to configure defaults images on the fly, and on Buckets to store the backups needed to recover the last good state. In the case of essential database I would still use a combination of Volumes and Bucket for availability and backups.

I think my experience is common, and  I see how administrators coming from datacenter tend to start with the Storage is Persistent idea, looking for the comfort of boot from EBS. Administrators coming from the public cloud, are already familiar with the dynamic approach of the cloud, and are more comfortable with the idea that some Storage is Ephemeral, and plan accordingly for instances to be disposable

Edited December 10, 2012
Added links for the various storage types definitions, and made it clear that S3 provides Buckets. Added Cloud Storage Types properties picture.

Wednesday, August 1, 2012

Customize Instance Libvirt Environment

Eucalyptus supports a variety of hypervisors (KVM, VMWare, Xen). Libvirt is used to control instances when eucalyptus is configured to use KVM or Xen. Simply put, Eucalyptus generates a domain file (aptly called libvirt.xml) to start the instance: the domain file can be found in a working directory of a running instance.

Eucalyptus generates the domain file (libvirt.xml) in
 response to the user action euca-rum-instances. Libvirt
will then instruct the hypervisors to execute it.

Older version of Eucalyptus (up to 2.0.3) used an helper perl script (gen_libvirt_xml or gen_kvm_libvirt_xml) to generate the domain file. Changing the hypervisor behavior was a matter of modifying the helper script.

Eucalyptus 3 brings a greater flexibility to customize the domain file. The Node Controller produces a stub xml file with all the instance-related information (the file, called instance.xml, can be found in the instance working directory). Then, using an XSL Transformation on instance.xml, Eucalyptus generates the domain file (libvirt.xml) used to start the instance. The XSL filter can be found on the Node Controller at /etc/eucalyptus/libvirt.xsl. At this point a couple of examples will clarify the process, and how it can be customized. To simplify the debugging and creation of the new filter, we suggest employing a command line XSLT processor during the development of the new libvirt.xsl (the examples below will use xsltproc).

 <?xml version="1.0" encoding="UTF-8"?>  
 <instance>  
  <hypervisor type="kvm" capability="hw" bitness="64"/>  
  <backing>  
   <root type="image"/>  
  </backing>  
  <name>i-37F04164</name>  
  <uuid>2b94d8ea-5438-4aa9-b69d-5c48781e62b7</uuid>  
  <reservation>r-DBAD4219</reservation>  
  <user>4XVKCF4WDM4NAIXBORW99</user>  
  <dnsName></dnsName>  
  <privateDnsName></privateDnsName>  
  <instancePath>/instances/work/4XVKCF4WDM4NAIXBORW99/i-37F04164</instancePath>  
  <consoleLogPath>/instances/work/4XVKCF4WDM4NAIXBORW99/i-37F04164/console.log</consoleLogPath>  
  <userData></userData>  
  <launchIndex>1</launchIndex>  
  <cores>1</cores>  
  <memoryKB>1048576</memoryKB>  
  <key isKeyInjected="false" sshKey="ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDPAQQZD644Jep3HWbRfv2TZxRKYSfXI6omWZV/JKnyOxJAkYS9ZxTPCeWeg/J0mguaXHVQapUYWnZkRRfJ2CAv4Yss8ya2mG9Itc3l113C1Rjiyk1YFZcDzxikauJX/r25+M32r1CUbxOnK90z16HUdOFBe78ebe/uA9P+FWCdo/qItF8VfBnKsTqrTi4pe2DP5fJnrtrsJA9vPNh+jrWcUCjN5byknGR/wgiQ0CeySeec0k7TIKKi8aIMcvEezUX0laY1kCC7WblT6HIRH6K+5VmXFmsMpdgENnakwvVIwX9MlT6scAtVRyTOCY1qz5YyK1U7pcxWs8bGyKhTQYvp 345590850920@eucalyptus.admin"/>  
  <os platform="linux" virtioRoot="true" virtioDisk="true" virtioNetwork="false"/>  
  <disks>  
   <diskPath targetDeviceType="disk" targetDeviceName="sda" targetDeviceNameVirtio="vda" targetDeviceBusVirtio="virtio" targetDeviceBus="scsi" sourceType="block">/dev/mapper/euca-4XVKCF4WDM4NAIXBORW99-i-37F04164-prt-15360none-5085ae2c</diskPath>  
  </disks>  
  <nics>  
   <nic bridgeDeviceName="eucabr558" mac="D0:0D:37:F0:41:64"/>  
  </nics>
 </instance>  
This is an example of instance.xml extracted from one of
our QA run. Notice the comprehensive instance information,
available: only a small subset will make it into libivrt.xml, but 
all can be used in the XSL Transformation.


Example: Using Huge Pages

Huge pages can boost KVM performance. Once the Node Controller has been modified to use huge pages, the domain files needs to be modified as well, to ensure that the hypervisor will take advantage of this feature. This is as simple as adding a static stanza to the domain file, and can be easily achieved with the following addition to libvirt.xsl

          <xsl:value-of select="/instance/name"/>  
        </name>  
        <description>Eucalyptus instance <xsl:value-of select="/instance/name"/></description>  
 +       <memoryBacking>  
 +         <hugepages/>  
 +       </memoryBacking>  
        <os>  
          <xsl:choose>  
            <xsl:when test="/instance/os/@platform = 'linux' and /instance/backing/root/@type = 'image'">  
This diff shows how we configured huge pages within the domain file.

Example: Legacy Images

RHEL 6 does not support anymore the SCSI driver: this can be a problem if you are using old images with no VIRTIO driver. When VIRTIO is disabled (eucalyptus.conf variables USE_VIRTIO_ROOT, and USE_VIRTIO_DISK), the generated domain file uses the SCSI  bus, which results in a non-compatible image for your RHEL 6 Node Controller (the image will not be able to find its own root device).  




 <  <os platform="linux" virtioRoot="true" virtioDisk="true" virtioNetwork="false"/>  
 ---  
 >  <os platform="linux" virtioRoot="false" virtioDisk="false" virtioNetwork="false"/>  
The diff of instance.xml generated when VIRTIO is disabled


The output of

xsltproc /etc/eucalyptus/libvirt.xsl /tmp/instance.xml


confirms that the generated libvirt.xml uses the deprecated driver.


   <disk device="disk" type="block">  
    <source dev="/dev/mapper/euca-4XVKCF4WDM4NAIXBORW99-i-37F04164-prt-15360none-5085ae2c"/>  
    <target dev="sda" bus="scsi"/>  
   </disk>  
When disabling VIRTIO, Eucalyptus defaults to SCSI.


With a simple modification of libvirt.xsl, we can add support for these legacy images:

 graziano@x220t:~/Prog/Eucalyptus/xsl$ diff libvirt.xsl /etc/eucalyptus/libvirt.xsl   
 116,117c116,117  
 <                     <cmdline>root=/dev/hda1 console=ttyS0</cmdline>  
 <                 <root>/dev/hda1</root>  
 ---  
 >                     <cmdline>root=/dev/sda1 console=ttyS0</cmdline>  
 >                 <root>/dev/sda1</root>  
 213,217c213  
 <                          <xsl:call-template name="string-replace-all">  
 <                            <xsl:with-param name="text" select="@targetDeviceName"/>  
 <                            <xsl:with-param name="replace" select="'sd'"/>  
 <                           <xsl:with-param name="by" select="'hd'"/>  
 <                     </xsl:call-template>  
 ---  
 >                          <xsl:value-of select="@targetDeviceName"/>  
 220c216  
 <                          <xsl:value-of select="'ide'"/>  
 ---  
 >                          <xsl:value-of select="@targetDeviceBus"/>  
The diff with the modified libvirt.xsl shows how some hard coded variable
have been changed (sda1 becomes hda1), and how we hard code the default bus
to be IDE instead of using what comes in instance.xml. Finally we need to
change every reference to block device starting with sd to hd.

Possibly not the best XSLT to handle the task (if you have better, please let me know), but it does the job.

Customization And Drawbacks

The information contained in  instance.xml is fairly complete, and allows a wide range of customization. For example, there could be specific rules based on the user ID, or rules to add another NIC, or rules to make some  PCI device available to the instances. All of this customization may require modifications within the image itself: for example, an extra NIC would require the image to be aware of it and to configured it correctly.

Eucalyptus 3 added also hooks for the Node Controller: in /etc/eucalyptus/nc-hooks you will find an example on how to add scripts to tailor your cloud to your specific needs. Hooks gets invoked at specific time during the instance staging (post-init, pre-boot, pre-adopt, and pre-clean). Hooks and XSL Transformation allow a complete control of what is passed to the hypervisor, and the environment your instances will find.

One of the attractive promise of the cloud, and in particular of the hybrid cloud, is to be able to run the same images everywhere: on your cloud, on your friend's cloud, on the public cloud. A heavily customized environment, may bind your images to the specific cloud you have, thus nullifying the benefit of running everywhere. As usual, with great power comes great responsibility, and the budding Cloud Administrator should be fully aware of all possible consequences.

[Edited: Aug 2, 2012]

It looks like blogspot is not the best way to discuss code: one cannot put xml in the comments. Lester had another great example. Disabling the hosts pagecache for EBS volumes can be done with the following in libvirt.xsl:

     <xsl:when test="/instance/hypervisor/@type='kvm' and ( /instance/os/@platform='windows' or /instance/os/@virtioRoot = 'true')">
                   <xsl:attribute name="bus">virtio</xsl:attribute>
                   <xsl:attribute name="cache">none</xsl:attribute>
                   <xsl:attribute name="dev">
                     <xsl:call-template name="string-replace-all">
                       <xsl:with-param name="text" select="@targetDeviceName"/>
                       <xsl:with-param name="replace" select="'sd'"/>
                       <xsl:with-param name="by" select="'vd'"/>
                     </xsl:call-template>
                   </xsl:attribute>

Which is then rendered as:

   <disk device="disk" type="block">
    <source dev="/dev/mapper/euca-3IRYEXXJ6OHXZXXYAFDOG-i-82CD4318-prt-04096none-11df8dda"/>

    <target bus="virtio" cache="none" dev="vdb"/>


Wednesday, July 11, 2012

Wind of Change: Eucalyptus 3.1 is here

It's been few weeks that Eucalyptus 3.1 have been (highly) available, and securely powering quite a few installations for the delight of Cloud Administrators, and all Cloud population. Eucalyptus 3.1 brings quite a few changes, the most obvious within the code itself  (more in the Release NotesRich's blog, Greg's blog, Garrett's blog), but a lot is happening without.

Some of the behind the scenes work, is targeted to support QA (more in Vic's blog, Kyo's blog), and dev-test. Our IT team, responsible for the happiness of QA and dev teams, is the first customer of Eucalyptus, and it is always on top of the automation game (check Andrew's blog,  Harold's blog, the recipes project). Eucalyptus 3.1 opened up quite a few fun configurations and possibilities, and I'm looking forward the next release of Silvereye, or the next crazy setup they can devise.

Perhaps the most visible changes, are related to the way we interact.  


Find Eucalyptus code in GitHub

First and foremost the code. We moved all our code development to git and GitHub where you can watch it, fork it, or otherwise enjoy it in any way you like it. All the old branches and releases are tagged there, if archaeology is your cup of tea. Eucalyptus 3.2 is coming fast: make sure you got Eucalyptus in your watch list.


Bug, Feature Requests, and Release Management all together

Issues, bug reports, features requests are the lifeblood of any project. We are bringing all of them closer to our development team, and to the release process: you can now find all of the above in one place. A part from the usual operation (create, search, comments, follow-up, watch issues), you can get high level stats, check the Road Map, follow the progress on the next release and more. Explore it, and let us know how you like it.

Eucalyptus project pages for Debian, Ubuntu, Fedora 

We also have pages representing our progress and status for each distro: we are in alioth for Debian, we have a wiki page for Fedora, and of course we are on Launchpad. The latter has served has faithfully as bug tracker, and code repository for a long time, and now we'll keep using it to interact with Ubuntu. And of course you can always join, or check interesting Eucalyptus related projects.


As usual, for all questions, comments, or social interactions you can find us on the forum, mailing list, or IRC. If you liked this Wind of Change, just holler.

Wednesday, June 6, 2012

A Developer Cloud

The title of this blog says it all: in this blog I will detail how I created my own private Eucalyptus cloud on my laptop, using VMs, bridge and iptables, and of course Silvereye. The instructions may be specific of my laptop, which runs Debian Sid. The relevant specs of the laptop (a Lenovo x220t to be precise) are: 8GB of RAM, CPU is INTEL i5-2520M, and a 160GB SSD. Let me repeat: this is a developer cloud setup, which means that it will do the testing I need to do, but it doesn't run any production environment, nor is meant to. Dustin created Cloud on a Stick and this work was inspired by Dustin's excellent work with UEC.

The idea is simple: create 2 VMs (respectively front-end and node controller), attach their NIC to a local bridge, use NAT for when the VMs need external connectivity, install Eucalyptus 3-devel on them, and play with it. Few reasons for the above setup:
  • 2 VMs because I want to use the MANAGED or MANAGED-NOVLAN networking modes (there is no HTML documentation yet, so get the manuals for more information on 3.1 networks modes) to take full advantage of security groups and elastic IPs;
  • a local bridge (with no physical device attached to it) because I keep changing networks between wireless and wired (and no network for my coffee shop breaks), and because most of the time I don't need the cloud to talk to the outside world, so NATting is sufficient for me.
Let's start setting up the virtualization bits. I use KVM on my laptop, and I will take advantage of the nested virtuatlization capabilities of the CPU I have. The kvm_intel module do not enable it by default, so I added a x220t.conf file to /etc/modprobe.d with the following

       options kvm_intel nested=1

and reloaded the module (rmmod kvm_intel and then modprobe kvm_intel). My old laptop had an AMD Turion and nested virtualization was on by default: if you are attempting to replicate this YMMV.

Next, let's setup the bridge to nowhere. I could have used the default bridge setup by libvirt (called virbr0), but I found it easier to setup my own own. I followed the QEMU Debian wiki to setup the tap devices I needed. Note how only tap0 and tap1 are enslaved to the bridge. The complete entry (excerpt from /etc/network/interfacesis


 auto br0

 iface br0 inet static

address 172.17.0.1
netmask 255.255.255.0
pre-up ip tuntap add dev tap0 mode tap user graziano
pre-up ip link set tap0 up
pre-up ip tuntap add dev tap1 mode tap user graziano
pre-up ip link set tap1 up
bridge_ports tap0 tap1
bridge_stp off
bridge_maxwait 0
bridge_fd      0
post-down ip link set tap0 down
post-down ip tuntap del dev tap0 mode tap
post-down ip link set tap1 down
post-down ip tuntap del dev tap1 mode tap

When I need my VMs to reach the outside world, I ensure that ip forward is enabled

          echo 1 > /proc/sys/net/ipv4/ip_forward

and I use the simplest NAT rules I found:

     /sbin/iptables -t nat -A POSTROUTING -o wlan0 -j MASQUERADE
     /sbin/iptables -A FORWARD -i wlan0 -o br0 -m state   --state RELATED,ESTABLISHED -j ACCEPT
     /sbin/iptables -A FORWARD -i br0 -o wlan0 -j ACCEPT

The above works when I'm connected with the wireless card; I substitute wlan0 with  eth0 otherwise. I didn't add any rule to the default network configuration since I don't allow the cloud to be online all the time.

I need the last few pieces, before I can start the instances. I already have the Silvereye iso, so I need to create a file big enough to hold the Eucalyptus component. I don't have too much space on the disk, so I settled for 10GB and 15GB respectively for the front-end (FE) and node controller (NC). I did use the following command

        dd if=/dev/zero of=nc.img count=1 bs=1G seek=10

to create both fe.img (for the front-end) and nc.img (for the node controller).

Last piece is to create the libvirt xml configuration for the 2 instances. Here is the front-end one:

<domain type='kvm'>
  <name>frontend</name>
  <memory unit="GiB">2</memory>
  <description>Front End</description>
  <cpu match='exact'>
    <model>core2duo</model>
    <feature policy='require' name='vmx'/>
  </cpu>
  <os>
    <type arch="x86_64">hvm</type>
    <boot dev='cdrom'/>
  </os>
  <features>
    <acpi/>
  </features>
  <clock sync="localtime"/>
  <devices>
    <emulator>/usr/bin/kvm</emulator>
    <disk type='file' device='disk'>
      <source file='/home/graziano/Prog/ImagesAndISOs/fe.img'/>
      <target dev='vda' bus='virtio'/>
    </disk>
    <disk type='file' device='cdrom'>
      <source file='/home/graziano/silvereye.1337754931.857206.iso'/>
      <target dev='hdc'/>
    </disk>
    <interface type='ethernet'>
      <target dev="tap0" />
      <mac address='24:42:53:21:52:45'/>
    </interface>
    <graphics type='vnc' port='-1'/>
  </devices>
</domain>

note how the flag vmx has been forced. vmx is the flag indicating hardware virtualization for INTEL processors: AMD's one is svn. The NC configuration file is very similar: the network device is tap1,  I changed the MAC address, VM name, and the file backing the disk . I didn't even need to comment out the <boot> flag after installation, since Silvereye boots from local disk by default.

My machine was already setup to run instance with my username (you can check the connection to libvirt with virsh list), so I can start the FE with 

        virsh create frontend.xml

and similarly the NC using node.xml. The first time I booted them, I followed the steps in my previous blog on Silvereye (virsh vncdisplay is your friend when you have multiple VMs on VNC). I did ensure that the VMs were configured to use static IPs assignments (I used 172.17.0.2 for the frontend and 172.17.0.3 for the NC). Because I wanted to change the network configuration and hostname ahead of time I didn't run the Silvereye script at the first login, but I run them afterwards (the scripts are in /usr/local/sbin in the installed VM).

Few extra tips. I did install the NC few times: make sure you remove the /root/.ssh/known_hosts when re-registering the NC if you keep the same IP between installs. If you want to play with different image sizes make sure you have enough disk space on the NC (few re-installs were needed for me to settle on a proper image size/NC disk size and I wish I could use a much bigger disk for it). After the first time installation, I prefer not to use VNC to connect to the VMs, so I added logic for the console (just before the VNC line)



    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>



and in the instances I add to add console=ttyS0 to the boot line (in /etc/default/grub for Debian and in /boot/grub/grub.conf for CentOS). After a VM reboot I can login with 


    virsh console frontend

Finally, these are not  instances: if you modify them, delete things or make them unusable, well, they are gone. The good news is that they are very easy to re-create them.

Tuesday, May 22, 2012

Playing with Silvereye

Silvereye is a bird native to Australia (I have to admit I have never seen one live) but it is also a project that allows a very quick and easy installation of Eucalyptus. The project is fairly young, but it is already very usable: you can follow its development on github. While Faststart has a very small footprint (it fits on a 1GB thumb drive) and requires a CentOS base installation, Silvereye creates a CD iso image which can be used to install bare machines.

Tonight  I took the latest Silvereye for a spin, and here are my notes. Let me start with a single word: wow! I forked the latest version from github, moved the silvereye.sh script to a CentOS Virtual Machine (VM) I had on my laptop, and ran the script while I went to grab some coffee. When I came back, I had an iso file freshly baked and ready to go.

I booted a VM using the Silvereye CD (I will write in another blog my libvirt configuration for this experiment) and connected to it with a VNC viewer to be greeted with the boot menu.
Boot menu for Eucalyptus Front-End or Eucalyptus Node Controller.
The options are to install a Eucalyptus Front-End, a Eucalyptus Node Controller, a minimal CentOS without Eucalyptus, to boot into a rescue image or to boot from local hard disk. I selected the Front-End installation.

The process continued with the usual CentOS installation,
Default CentOS 6 installation.
and in few minutes the (virtual) machine was ready to be restarted into its new OS.
CentOS 6 is now installed and ready to go.
At the first root login, Silvereye kicked in and configured Eucalyptus. I did follow the few easy questions (it also allows you to reconfigure network, DNS and hostname if needed) and I ended up with a working Eucalyptus Front-End in no time!
Silvereye first question.
You will need the physical network layout of your machines: IP addresses used by the machines and IP addresses available for the instances,  network netmask, gateways, DNS server. Also make sure that when you configure the private network for the instances, there is no overlap with your real network.
Network configuration for Eucalyptus, and database initialization.
As a final touch, Silvereye created an Eucalyptus Machine Image (EMI) for me!  Of course I could have downloaded the one we provides, created my own one, or found some Amazon's compatible images, but I have to say that having Silvereye doing all the dirty work was simply gratifying.
Silvereye can create an EMI for you.
The end result was a kernel, ramdisk and image ready to use.
List of available images.
And here is where I stopped: I didn't have an NC ready to go, and my latest book was waiting for me. It took me about 20 minutes from the time I started the VM the first time with the Silvereye CD, to have a list of EMIs.

 

 

Sunday, May 6, 2012

Euca @ UDS

In Greg's Where is Waldo game, one big event was missing: Eucalyptus is also at UDS. Harold, Brian and me are already in Oakland ready to rock and roll. We have exciting news for our session: you can check out Brian's blog for a sneak peak of  Eucalyptus 3.1 Alpha1. Your favorite on-premise IaaS now is available with HA, boot from EBS, and IAM (check the roadmap for more details).

Find us at the Eucalyptus session, or around the conference floor  to say hello or to get a t-shirt (limited supply). 

Wednesday, April 4, 2012

The One-Site Project

It's my great pleasure to announce our new web-site. The one-site project aimed to unified our two web sites. As previously mentioned, we started this project sometime ago to address the needs of our community. We believe that the diverse interests of our community are better represented by the cloud IT roles. And the new website has been designed to help our users to navigate following the interests, languages, and preferred communication channel of each role.

Navigation by Cloud IT Roles

First let me mentioned how the generic navigation of the site has been greatly improved. The main navigation is Learn, Eucalyptus Cloud, Participate, Services and Partners, and, if you already know where you want to go, in the footer of each page you will find direct links to subsections. The progression is obvious, starting from generic documents on cloud computing, on premise cloud, IT roles going to the specific of Eucalyptus Cloud, finally exposing how to reach our community, Services provided and last but not least diving into the rich and varied partner ecosystem.


Information customized on a per-role basis


Let's go back to the navigation by roles. In the Learn section there is a selection that allows each roles to explore what cloud means for them. Each role page has specific links and information related to the respective competences. Let me give you examples for some of the roles represented there:
  • Cloud Administrators will find install and maintenance guides. In case they get into issues they can search our knowledge base for the specific error code, or engage other administrators to discuss about the different installation options or what is the best hypervisor for their situation; 
  •  Developers can quickly reach the latest source code, looks at currently worked on issues, and start talking about the latest patch they have been working on. Detailed informations on Eucalyptus internal structure and API supported are also the bread and butter for Developers;
  • Application Architects will find everything about our starter images, from what's inside and why, to how to recreate and modify them. They would want to use and modify the instances recipes and share their latest scripts to recreate their production environment. 
You will find the roles highlighted throughout the web site, so, for example, under GetEucalyptus you will find options for Managers (get the free trial), Architects (FastStart for a quick proof of concept), Administrators (get the latest release), Application Architect (get the starter images).

Find how to participate independently from your role

Participate and Resource Library are the two notable exceptions to the roles principle. While you will still find links to the cloud roles pages, these pages addresses our community as a single entity. Participate gathers selected blogs posts for all our users, as well as giving pointers to where to find online community. Resource Library allows you to find all the published information about Eucalyptus and cloud computing organized by type and date.

With the new web site, you will also find new redesigned forum. You will find the motivation behind it in Darren's blog, and more about its functioning in the Engage blog. Our new site is now available and ready to Engage you: let us know what you like and how we can improve it for you.

Engage!

We started the journey of One sometime ago. The original idea took shape and evolved into the one-site project and now into Engage. The sentiments and the ideas behind Engage have been articulated very clearly in Darren's blog. Since the time of Eucalyptus 1.5, we established forum within our web site, to allow our community to come together and discuss Eucalyptus and Cloud Computing. It has been my pleasure to help the best I could, participate in discussions, and meet new friends.

It didn't take too long before we realized how inadequate the forum and ultimate we were to help all our users. Limited search capabilities, repeating threads, difficult forum management, obscured error reporting, and a new field called on-premise cloud, got us into long per-post sessions. And yet all we could do was the idiomatic drop in the bucket. The whole process was wrong: we needed to change tack.

That's when Darren got into the picture. His experience with multiple support organizations, brought the needed fresh thinking. His efforts are now live with Engage, and these are the key points:

Intuitive user experience: type your question

  • very intuitive user experience: just type in your question;
  • the same intuitive experience for customers and non-customer: we are doing the necessary work to ensure that customers requests meets their SLA;
  • our customers already love the competence and promptness of our support team. Our community shared the same love, perhaps without knowing: the support team is on IRC, on the forum, on emails, and Engage allows to more effectively track all open questions (wherever they are);
  • modern forum tools: we can now tag an answer as 'best answer', turn it into an article, keep track of the unanswered active threads;
  • single Knowledge Base! Perhaps the most important point: all the articles, questions, and soon issues/bugs form a common knowledge base that a simple and powerful search mines effectively. Every time you type in a question, you are searching the knowledge base!

Likely answers or topic offered while you type limits duplicate threads



Engage has two different basic tool: the Q&A and the Article. A Q&A can be seen as a forum post, everybody can create one and everyone can answer it, can be open ended or can have a 'best answer' to it. Customers can also flag it for faster escalation. Articles are the bulk of the knowledge base: we turn the most common, or important, Q&A into articles, where we can give more background information, have a common format, expose workaround or solutions.

Articles provides the bulk of the knowledge base


When posting a new question you will be asked for Topic, Name, Email Address, Subject, and Body. We divided the Topic based on the Cloud IT roles, following the navigation principle of our site. For a short amount of time, you will have to enter Name, and Email Address each time. This issue will be resolved once we deploy the single sign-on mechanism (based on OpenID) we are working on.

Topics are divided based on the Cloud IT roles

This new tool come with  a cost: we will not be able to migrate our old forum posts to it. The tools are too different and we simply don't have the man-power to make this migration manually. We already pre-populated the articles with the most note-worthy topic coming from the old forum, but we did probably miss some. Also your username has not been migrated from the old web site: once we implements the single sign-on, you will be able to utilize your preferred OpenID provider to login.

Please help us to migrate your unanswered or favorite post from the old forum: just go to engage and post it there. And if you encounter any issue, don't hesitate to contact us. We do believe you will love the simplicity and completeness of what you will find at Engage, and we will be happy to hear any comment you have to improve it.

Wednesday, March 28, 2012

4 of 20

The International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC) will commemorate its first 20 years with a special proceeding containing the best 20 papers. Rich Wolski, and Steve Fitzgerald, respectively founder and CTO, and VP of Technical Services at Eucalyptus Systems, will each have 2 papers published in that proceeding. Rich will have Forecasting Network Performance To Support Dynamic Scheduling Using The Network Weather Service (NWS) and Scheduling from the Perspective of the Application (AppLeS) while Steve will have Grid Information Services for Distributed Resource Sharing and Application experiences with the Globus toolkit. That is 20% of the papers!

I have been fortunate enough to have been working with them since the mid '90, back when a cloud on a whiteboard meant Internet, and Grid Computing was the pinnacle of Distribute Computing research. In addition to be excellent scientists, Rich and Steve have also been role models, teaching the new generations for more than 15 years. It's hardly a coincidence then, that Rich and Steve are now pushing the boundaries of  knowledge in Cloud Computing with Eucalyptus.

It's hard to overestimate the influence that Rich and Steve have within our organization. For example, in the latest Eucalyptus 3, I can find Rich's fingerprints in the overall architecture, in the attentions to the internal protocols, and Steve's touch in the identity integration (LDAP) and management (EIAM).  They are mentors and role model to all eucalyptoids, for their determination, insights, and skillful leadership each with his own personable yet efficient style.

Edit: check out the full list.

Sunday, March 11, 2012

PyCon 2012

We (cpt_yesterday, gholms and obino) left Silicon Beach (aka Santa Barbara home of Eucalyptus) on Thursday and drove up to Santa Clara for my first PyCon.

Driving from Silicon Beach to Silicon Valley

I was a bit leery about dinner (just few week ago cpt_yesterday and I got food poisoned in Sunnyvale), but all went well and it was great to see Mitch.


Setting up the booth

We set up the booth on the morning, just before the keynotes. The booth has been a good meeting point throughout the conference to talk about Eucalyptus, get a bit of rest, or just grab one of our t-shirts. It's always rewarding to talk about how Eucalyptus is used in production or to receive thumbs up from people walking by.

Mitch and cpt_yesterday
This was my first PyCon, and been a Python n00b myself (my first python program is few weeks old), it was a bit intimidating to be around such a crowd. Everyone was very helpful and engaging and willing to tolerate newbie: a very nice experience.
  
gholms likes PyCon
We got a bit distracted from the conference by PG&E: they told us on Friday night that an urgent safety fix was needed and Saturday they will pull the power all around our HQ. Like all good DevOps we spent quite a bit of time to prepare for the outage, coordinating with the team back at HQ. Our core services runs on our public Eucalyptus cloud at CoreSite (read more about the ECC for more info on our setup) but we have an internal cloud we use at HQ, and we wanted to be sure all these internal services were shut down nicely.


The stars of the conference

And of course I needed a picture of the stars of PyCon 2012.

Monday, March 5, 2012

First steps with Eutester


In one of my previous post I mention our mantra: "Listen to our community and deliver quality software". To deliver quality software, it is necessary to have the QA process as a first class citizen. Our QA team created quite a spectacular infrastructure to test all sort of combinations and configurations automatically (distro, architectures, versions, hypervisors, networking, images etc ...) to guarantee our users, regression-free releases. They also managed to have fun in the process (check out Pigeons on a Euca).

Eutester is the latest brainchild of our QA team, and this blog is about my experience writing a test using it. Eutester is a framework to create automatic tests against a Eucalyptus installations (or any cloud following the AWS API for that matter).

 These are the ingredients I needed to bake my first test:
  • a really annoying bug
  • a Eucalyptus cloud to test it against
  • the Eutester framework
  • basic python knowledge helps although it is not necessary
Let's start with the main ingredient: a really annoying bug. I picked lp:737335, annoying and small enough for me to tackle. The issue reported is that the launch permissions as set by euca-modify-image-attribute are not respected in Eucalyptus 2.0. The bug happened also to have a proposed fix.

 Next: reproduce the issue on a Eucalyptus cloud of the desired version. This bug is easy to reproduce: I just need to upload an image, change the launch permission to remove all user, then try to launch it: if the instance starts the issue is present. Since I wanted to test the proposed patch, I did installed Eucalyptus 2.0.3, compiling it from source. Lester also tested it against the development branch of Eucalyptus 3.1 (check Andy's blog to work with the devel branch, and Greg's blog for more info on 3.next) and confirmed that only 2.0 is affected.

 I then downloaded Eutester, installed it locally on my client machine and started to hack the test. I used virtualenv to ensure a clean, reproducible environment to work with. My first attempt of the test successfully reproduced the issue, so I gingerly submitted it for integration. The QA team is still laughing at my python skills ... They kindly reworked the test and it is now sitting into the Eutester repository. With the new test added to a sequence in our QA system I am now guaranteed that all future version of Eucalyptus will be tested for regression against this issue.

 Onward to the patch for 2.0. I created a branch with the fix, and proposed it for merging. The QA system caught my sloppy programming at the first attempt (typo), but the second version was the win: QA passed with flying colors and the branch was eventually merged into the top of the Eucalyptus 2.0 branch.

Quite an accomplishment here: my first python script, my first Eutester test and no more lp:737335, just the perfect happy ending for this blog.

Tuesday, February 21, 2012

FOSDEM 2012

After having heard about FOSDEM for long time, and this year I had the privilege to attend it. This may be the reason of the extreme cold  spell that hit Europe at that time. I reached Brian Thomason in Hamburg where he presented Eucalyptus. They loved him, and I have to agree with them: the presentation was very interactive and Brian was very charming. We then reached Brussels by train, on the only non-heated car: needless to say I was ready for a hot bath once we reached the hotel.

FOSDEM was as good as I read about, and more. My interests brought me most of the time at the Cloud and Virtualization track, although I did walked into a Google Summer of Code talk, CentOS and Debian talks, and a survey on how to have HA with MySQL. I like the HA talk (search for Ivan Zoratti's HA reloaded): there is no free lunch when deploying HA, and the talk highlights how decisions need to be taken early on about what kind of  HA one wants and she is willing to pay for.

The Cloud and Virtualization track was very well attended. It was good to see old friends and shake hands, from Xen's Lars, to Dave and James from the Ubuntu server team, Thierry and Rick and more. Very good content too. I did enjoy quite a bit the libguestfs talk, since lately I have been doing some images work,  and I already have been using zerofree, but I learned that there are a lot more tools I can use to simplify my life.