Red Hat

Genome Documentation Copyright (C) © 2008 Red Hat This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses.

Red Hat and the Red Hat "Shadow Man" logo are registered trademarks of Red Hat, Inc. in the United States and other countries.

All other trademarks referenced herein are the property of their respective owners.

The GPG fingerprint of the security@redhat.com key is:

CA 20 86 86 2B D6 9D FC 65 F6 EC C4 21 91 80 CD DB 42 A6 0E

1801 Varsity Drive
        RaleighNC 27606-2072
        USA
        Phone: +1 919 754 3700
        Phone: 888 733 4281
        Fax: +1 919 754 3701
        PO Box 13588
        Research Triangle ParkNC 27709
        USA

Abstract

Documentation for the Genome tooling


Preface
1. Document Conventions
2. We Need Feedback!
1. Genome Appliances
1.1. Appliances
1.1.1. Cloudmaster
1.1.2. Cloudhost Appliance
1.1.3. Genome Appliance
1.2. Custom Machine Types
2. Getting Started
3. Tooling
3.1. Command Line Tools
3.1.1. genome-report
3.1.2. genome-replace-self
3.1.3. genome-bootstrap
3.1.4. genome-sync
3.2. Web Tools
3.2.1. Genome Server (genomed service)
3.2.2. Cloudmaster (cloudmasterd service)
3.2.3. Cloudhost (cloudhost service)
4. Open source technologies used with Genome
4.1. Koan
4.1.1. Background
4.1.2. Installation
4.1.3. Guest Provisioning
4.1.4. Watching the VM
4.1.5. Cleaning Up
4.1.6. Known Issues
4.2. LVM
4.3. Xen Virtualization
4.4. JBoss
4.5. Source Code Management (Git)
4.6. Configuration Management (Puppet)
4.7. General
5. Self Tests
5.1. LVM
5.2. Xen
5.3. Git
6. Debugging
6.1. Puppet
6.2. Puppetmaster
7. Contribute
7.1. Licensing
7.2. Design Axiom
7.3. Community
7.3.1. Please Be Friendly
7.3.2. Community Communication
7.4. Working With The Code
7.4.1. Checkout The Code
A. Revision History
A.1. Logging in to Genome machines
A.2. Versioning conventions
A.2.1. Handling upgrades
A.3. Managing releases with the Genome tooling
A.3.1. The "release" repository
A.3.2. Creating a superproject
A.3.3. A word on pushing superprojects
A.3.4. Branching strategy
A.3.5. What about the master branch?
Glossary

Preface

The genome is fundamental to the reliable encoding and transfer of both genetic code and data. As the project name suggests, Genome is the equivalence for software systems. The project started formally in early 2008, though the origins can be traced back several years prior to real struggles within Red Hat IT developing and deploying software.

While it may not be the perfect analogy, it is indeed fitting to say that hereditary information is stored within every IT organization. The truth is software systems, like species, face extinction through poor replication of this information. Sadly, the knowledge that is required to maintain and reproduce complex systems often only lives in the form of tangled configuration scripts or, worse still, only in the minds of consulting domain experts. Transfering knowledge in such manners is practically a recipe for building legacy systems.

Taking the biological analogy a little further, briefly imagine a world in which generations of genetic information had to be manually replicated by any number of people. Now try to imagine a different world in which genetic information could only be copied exactly, that is to say, diversity is altogether unattainable. Genome aims to solve both of these problems for IT; that of reproducing exceedingly complicated systems in a world where heterogeneity is always more of the rule than the exception.

As you begin tackling these problems for your organization it cannot be emphasized enough that the collaboration amongst teams enabled by Genome is more important than any particular tool implementation. Feel free to mutate Genome into any shape or form to solve your problems. The truth is, we readily await your patches and enjoy seeing the best ideas rise to the top.

1. Document Conventions

Certain words in this manual are represented in different fonts, styles, and weights. This highlighting indicates that the word is part of a specific category. The categories include the following:

Courier font

Courier font represents commands, file names and paths, and prompts .

When shown as below, it indicates computer output:

Desktop       about.html       logs      paulwesterberg.png
Mail          backupfiles      mail      reports

bold Courier font

Bold Courier font represents text that you are to type, such as: service jonas start

If you have to run a command as root, the root prompt (#) precedes the command:

# gconftool-2

				
italic Courier font

Italic Courier font represents a variable, such as an installation directory: install_dir/bin/

bold font

Bold font represents application programs and text found on a graphical interface.

When shown like this: OK , it indicates a button on a graphical application interface.

Additionally, the manual uses different strategies to draw your attention to pieces of information. In order of how critical the information is to you, these items are marked as follows:

Note

A note is typically information that you need to understand the behavior of the system.

Tip

A tip is typically an alternative way of performing a task.

Important

Important information is necessary, but possibly unexpected, such as a configuration change that will not persist after a reboot.

Caution

A caution indicates an act that would violate your support agreement, such as recompiling the kernel.

Warning

A warning indicates potential data loss, as may happen when tuning hardware for maximum performance.

2. We Need Feedback!

Send commends to genome-list@redhat.com. All bugs can posted to the Genome Trac.

Chapter 1. Genome Appliances

1.1. Appliances

Appliances in the Genome environment are machines that enable the Genome tooling.

1.1.1. Cloudmaster

The primary purpose of a Cloudmaster is to group hardware together to form a cloud that hosts virtual machines. Cloudmasters report the status of cloud hosts and can show the status of all of the virtual machines running across all of the cloud hosts. Additionally, users can search for virtual machines in the cloud based on the virtual machines' names, hostnames, status, or IP Addresses.

1.1.1.1. System Requirements

CPU
1GHz
Memory
1G of RAM
System architecture
A cloudmaster appliance can be installed on either i386 or x86_64 architectures.
Storage
50G of hard drive space.

Important

Cloudmasters should only be installed on a Physical Machine and not a Virtual Machine

1.1.1.2. Features

  • Visualization of the cloud resources.

1.1.2. Cloudhost Appliance

Some text about the cloud host appliances.

1.1.2.1. System Requirements

CPU

1GHz

Memory

This depends on the number of virtual machines you plan on hosting. We recommend 3G of RAM to start.

System architecture

A cloud host appliance can be installed on either i386 or x86_64 architectures.

Hardware Virtualization

When using Fedora as the distro for the cloud host, the machine must support hardware virtualization.

Storage

This depends on how many virtual machines you plan on hosting. We recommend 200G of hard drive space to start.

1.1.2.2. Features

  • Host virtual machines (Either Xen or KVM)

  • Control virtual machines (stop and start VMs)

  • Provision new virtual machines

1.1.3. Genome Appliance

The Genome Appliance is the center of development in the Genome environment. In a nutshell it is the self-contained provisioning, configuration and artifact store. For this reason Genome Appliances are generally not considered volatile.

As all other machine types it is designed to work as both a "baremetal" and virtual machine. The main resource requirement that distinguishes this machine type is disk space, which is a function of the amount of bits imported to cobbler.

Creating Genome Appliances

The easiest way to create a Genome Appliance is via genome-replace-self.

1.1.3.1. Minimum System Requirements

CPU
1GHz
Memory
512M RAM
System architecture
A Genome Appliance can be installed on either i386 or x86_64 architectures.
Storage
This depends on how many distros you plan on hosting in cobbler. We recommend 50G of hard drive space to start.

1.1.3.2. Features

  • Cobbler for all RPM/provisioning

  • A Puppetmaster for all configuration

  • Bare git repos for all content located under /srv/git

  • GitWeb running on http://[hostname]/git/gitweb.cgi

  • The genomed service running at http://[hostname]:8106/nodes.html

  • Apache redirects to surface the genomed service at http://[hostname]/genome

1.1.3.3. Genome Appliance cloning

The state of a particular Genome Appliance can be described by the content stored under /var/www/cobbler and /srv/git. Cloning a particular Genome Appliance is really just a matter of getting the correct bits from those locations onto a new Genome Appliance.

Aside from the simple bit replicatation that must be performed there are also a few "one-off" things that need to happen. This involves:

  • Getting the puppet modules to the location where the puppetmaster can see them.

  • Setting up commit hooks for the puppet module git repositories.

  • Setting up commit hook for the Genome documentation.

See the cookbook for more information.

1.1.3.4. Genome Appliance customization

The genome-repo RPM is designed to get users up and running with a known working configuration. There are certain custom settings users of Genome will need to configure for their environment. The two most common needs for customization are adding new Genome machine types to genomed and any extra cobbler customization.

How these customizations are managed is at the user's discretion. However, since the Genome Appliance is already controlled by puppet it makes sense in many cases to simply use it for this as well.

For this to work a puppet module named repo_extensions must be created and exist on the module path. The class that this module must define is also called repo_extensions.

Important

The reason this works is because by default the Genome Appliance's puppet external nodes script includes two classes: genomerepo::appliance and repo_extensions.

1.2. Custom Machine Types

A custom machine type in the Genome environment can be roughly described as a collection of known working puppet classes matched with an operating system (or more precisely, a cobbler profile). The list of machines that can be provisioned from a given Genome Appliance can be found when using the genome-bootstrap wizard or the genomed UI.

Note

See the cookbook for more information on creating custom machine types.

Important

From Puppet's point of view these "types" are not bound to any particular OS version. You choose the OS with genome-bootstrap or when provisioning directly with Koan. This allows users to test out different OS and applications versions using the same Puppet code.

Chapter 2. Getting Started

For those who wish to get up and running quickly with Genome you can simply use Quick Start. That being said, a typical Genome evironment consists of:

  • An environment to host virtual machines

  • At least one Genome Appliance ("bare metal" or virtualized)

  • A number of custom machine types which can be provisioned via genome-bootstrap.

Chapter 3. Tooling

One of the goals of Genome is not to invent new tools but rather to leverage and contribute to existing Open Source projects. This section presents the user with links to many technologies that can be considered prerequisites for contributing to Genome.

3.1. Command Line Tools

Genome provides several command-line tools for convenience.

3.1.1. genome-report

[root@bleanhar-test1 ~]# genome-report guest --help
NAME
  genome-report

SYNOPSIS
  genome-report guest (enable|disable) --cloudhost=cloudhost [options]+

PARAMETERS
  --cloudhost=cloudhost (0 ->  cloudhost=cloudhost]) 
      The cloudhost this guest is running on. 
  --help, -h

As the name suggests, this tool is used for reporting information in the Genome environment. The main use at the moment is to report information like, hostname, ipaddress to the cloudhost.

The help docs show that the cloudhost to report to can be explicitly passed in. Another option is to set cloudhost in /etc/genome/genome.conf. This is required if enabling the reporting. This simply means installing a cronjob to /etc/cron.d.

# genome-report guests enable

Important

The setting of the cloudhost and enabling the cronjob is typically handled by the guest kickstart files in the Genome environment. The web bootstrap UI passes in this information as ksmeta to cobbler.

3.1.2. genome-replace-self

To avoid many "chicken and the egg" sorts of provisioning problems the Genome tooling provides a RPM and script called genome-replace-self. As the name suggests this tool is a quick way to completely replace a machine. The term replace-self is borrowed from koan and under the covers that is basically all that is really happening. The script does includes some helpful logic to properly install koan on whatever Red Hat based system was previously running on the system in question.

Important

Machines set up via genome-replace-self are not always controlled by puppet. They tend to be treated more like appliances.

3.1.2.1. Usage

To use this tool the user must know the profile that will be used to replace-self. This can be obtained easily with koan.

Note

Ideally which profile to select should be obvious based on the names. A good practice is to have the profiles include both the architecture and operating system in the name.

3.1.2.1.1. Creating a Genome Appliance or Cloudmaster
 
$ genome-replace-self --help

Usage: genome-replace-self -[c]obbler_server -[p]rofile -[m]etadata

where options include:
    -c (required)  the cobbler server from which to provision this machine
    -p (optional)  a specific profile to use for this machine
    -m (optional)  the metadata to pass to the cobbler system

$ # Select a profile from the list this command returns
$ koan -s [Your Genome Repo machine] --list=profiles 

$ # Only certain types of machines require the -m (metadata) flag
$ genome-replace-self -c [Your Genome Repo machine] -p [Profile selected in previous step]

3.1.2.1.2. Creating a Cloudhost
$ genome-replace-self --help

Usage: genome-replace-self -[c]obbler_server -[p]rofile -[m]etadata

where options include:
    -c (required)  the cobbler server from which to provision this machine
    -p (optional)  a specific profile to use for this machine
    -m (optional)  the metadata to pass to the cobbler system

$ # Select a profile from the list this command returns
$ koan -s [Your Genome Repo machine] --list=profiles 

$ # Only certain types of machines require the -m (metadata) flag.  This example shows
$ # how a cloudhost can be configured to report to a cloudmaster 
$ genome-replace-self -c [Your Genome Repo machine FQDN] -p [Profile selected in previous step] -m cloudmaster=[A Cloudmaster FQDN]

3.1.3. genome-bootstrap

Command Line Tool

It is important to differentiate this tool from the Web User Interface for bootstrapping virtual machines in the Genome environment. This section refers specifically to the command-line interface tool called genome-bootstrap.

3.1.3.1. Background

With the introduction of virtualization, we are able to easily rebuild entire environments quickly; however there is a fair amount of complexity involved in doing so. We've created a tool called genome-bootstrap that automates the process of wiring a machine up to puppet.

Originally, genome-bootstrap was designed to create virtual machines based on puppet configurations and cobbler profiles in a cloud. The intent was to allow users to build reproducible systems via a simple command-line interface.

One problem with this approach is that there is an inherent requirement to have puppet configurations "published" to a puppetmaster in order to test puppet configurations. This can quickly pollute the set of puppet configurations that might be used by several other systems at the same time.

The genome-bootstrap command-line tool has evolved to enable "remote" development on a disconnected system. The central idea is to allow users to create a machine (virtual or otherwise) that is not controlled by a puppetmaster. Then, pull down puppet configurations and iterate on development of those puppet configurations without having to publish the configurations to a central shared puppetmaster server for testing.

genome-bootstrap 1.3+

The genome-bootstrap command-line tool can still be used to provision virtual machines in a cloud, but that is now considered the "advanced" mode.

3.1.3.2. Installation

Genome Appliances and any virtual machine provisioned by a Genome Appliance probably already has genome-bootstrap installed.

If you are installing genome-bootstrap on a separate machine, like a laptop, you can easily add the Genome yum repositories and install genome-bootstrap on any machine you like. Run the following commands to create the Genome yum repository file:

# Switch to root
su -

echo """
[genome-noarch]
name=Genome (noarch)
baseurl=http://brenton.fedorapeople.org/genome/yum/Fedora-9-genome-noarch
enabled=1
gpgcheck=0

[genome-i386]
name=Genome (i386)
baseurl=http://brenton.fedorapeople.org/genome/yum/Fedora-9-genome-i386
enabled=1
gpgcheck=0
""" > /etc/yum.repos.d/genome.repo

# Install genome-bootstrap
yum install genome-bootstrap

Testing and Development repositories

The testing and development yum repositories are located under http://brenton.fedorapeople.org/genome/yum/testing and http://brenton.fedorapeople.org/genome/yum/development respectively.

3.1.3.3. Usage

The genome-bootstrap command-line interface does not need any parameters. Simply run the program and you will be guided through the bootstrap process.

The default usage of genome-bootstrap asks the user to provide a Genome Appliance's fully qualified domain name and select a machine type. This information will be used to collect and download —via git clones—a set of puppet configurations. Then, genome-bootstrap generates a yaml file and puppet script ("pp" file) representing the chosen machine type. Finally, genome-bootstrap runs the puppet using the generated puppet script.

3.1.3.4. Advanced Mode

The advanced mode comes in two flavors: local and remote.

3.1.3.4.1. Advanced Mode: Local
# genome-bootstrap advanced local --help

NAME
  genome-bootstrap

SYNOPSIS
  genome-bootstrap advanced local --module_path=module_path [options]+

DESCRIPTION
  Run puppet on the this machine instead of a VM in the cloud. This mode must run as root.

PARAMETERS
  --lib_dir=lib_dir, -l (0 ~> lib_dir) 
      The puppet libdir. Usually only needed if plugins are being used. The 
      default resolves to a directory called 'plugins' underneath the 
      'module_path' 
  --yaml=yaml, -y (0 ~> yaml) 
      YAML configuration for this machine 
  --module_path=module_path, -m (0 -> module_path) 
      The puppet modulepath to be used when running puppet. 
  --help, -h

# genome-bootstrap advanced local --module_path=/home/$USER/.genome-bootstrap/ --yaml=/home/$USER/.genome-bootstrap.yaml

The "local" advanced mode of genome-bootstrap allows a user to run puppet using the configurations downloaded by a previous run of genome-bootstrap. This makes it very easy to make changes to puppet configurations and test out those changes without having to publish the changes to a puppetmaster.

3.1.3.4.2. Advanced Mode: Remote
# genome-bootstrap advanced remote --help
NAME
  genome-bootstrap

SYNOPSIS
  genome-bootstrap advanced remote --repo=repo --cloudhost=cloudhost --email=email [options]+

PARAMETERS
  --lib_dir=lib_dir, -l (0 ~> lib_dir) 
      The puppet libdir. Usually only needed if plugins are being used. The 
      default resolves to a directory called 'plugins' underneath the 
      'module_path' 
  --yaml=yaml, -y (0 ~> yaml) 
      YAML configuration for this machine 
  --fqdn=fqdn, -f (0 ~> fqdn) 
      Fully qualified domain name of machine to be provisioned 
  --system=system, -s (0 ~> system) 
      Cobbler system name of the machine to be provisioned 
  --repo=repo, -r (0 -> repo) 
      Fully qualified domain name for the Genome repo machine to use for 
      provisioning 
  --cloudhost=cloudhost, -c (0 -> cloudhost) 
      Fully qualified domain name for the machine controlling the cloud 
  --email=email, -e (0 -> email) 
      Your email address to use to help identify the instance owner 
  --help, -h

The "remote" advanced mode of genome-bootstrap allows a user to provision a virtual machine onto a cloudhost using a predefined yaml file that describes the machine type for the new virtual machine.

YAML File

The YAML file can be piped in via stdin or specified via the --yaml option.

The important thing to remember is that the structure of the YAML fed to genome-bootstrap must be in the same format Puppet expects for its external nodes. You must know exactly which parameters are required for a given Genome machine. The nice thing is that this yaml can be obtained from Genomed.

3.1.3.5. Post bootstrapping

Several artifacts are created after successfully running genome-bootstrap:

/tmp/genome-bootstrap.pp

A puppet script is created in /tmp that represents a machine type. The script loads a number of puppet classes and defines a number of puppet variables that will be used to configure the machine.

/home/$USER/.genome-bootstrap.yaml

A yaml file is created in the user's home directory that represents a machine type. The yaml file contains references to a number of puppet classes and definitions of a number of puppet parameters.

/home/$USER/.genome-bootstrap

A directory is created in the user's home directory that contains all of the puppet configurations defined in the Genome Appliance specified when running genome-bootstrap

These artifacts are put in place to make running puppet on updated configurations very easy. If you need to test updated puppet configurations, simply run:

$ genome-bootstrap advanced local --yaml=/home/$USER/.genome-bootstrap.yaml --module_path=/home/$USER/.genome-bootstrap/

This will run puppet with the given yaml file and use the puppet configurations downloaded during the bootstrap processes.

3.1.4. genome-sync

The goal of genome-sync is to make the process of syncronizing git repositories from one Genome Repo to another as easy as possible. The main mode start guides the user through the process.

In the start mode work will be performed in a working directory. The app will then iterate over each repository, asking the user what work to perform. After this process has completed the user can publish their changes with the save mode.

3.1.4.1. Usage

The following oneliners are in no particular order.

# Start the syncronization wizard
genome-sync start --repo=[remote Repo machine]

# Hard reset to a given repositories state (This is the fastest way to 
# get up and running with a newly created repo machine).
genome-sync start quiet --repo=[remote Repo machine]

# Push content where it needs to go.  If puppet modules are updated the
# puppetmaster may need to be bounced.
genome-sync save

# Remove the working directory  
genome-sync clean

Important

genome-sync must be run as the user that owns the content under /srv. This is usually the genome user.

Note

All genome-sync modes take the --help, --verbose and --workingdir flags.

3.2. Web Tools

Important

Currently the Genome web tools are proxied by apache. This means whenever the they are restarted it's usually a good idea to restart the httpd service as well. The most common issue arises when a Genome service is bounced and a request comes in. apache will take the node out of rotation and it will appear that the Genome service is down.

3.2.1. Genome Server (genomed service)

The genomed service is a simple web app that serves as the canonical source of Genome machine information used in compiling puppet configurations. It's really quite simple so it's probably best explained by simply showing the link: http://[your repo hostname]/genome/nodes.html. From there it should be simple to browse find the other features by exploration.

It's also worth noting that most of the resources (this is a RESTful service) have several representations. Try changing the urls to end with xml or yaml.

3.2.1.1. Features

The Genome Server web application has three main parts:

Machine Types

The "Machine Types" view of the Genome Server web application shows the list of machine types available. The default view is to simply show the list of machine types and a short description about each.

To see the details about a machine type, enter the address: http://[genome_server_hostname]/genome/machine_types/[machine_type_name].html. This will provide a view of the classes and parameters that make up that machine type.

Nodes

The "Nodes" view of the Genome Server web application shows the list of puppet external nodes that have been defined by the Genome Server. Seeing a machine's name listed on the Nodes page does not necessarily imply that the machine is running, or even still exists Nodes page does not necessarily imply that the machine is running, or even still exists. It simply means that the machine was provisioned from the Genome Server.

To see details about an individual node, click on the node name in the list. This will display a text-area with the puppet parameters and classes configured for that machine.

The text-area provides a way to change the machine's configuration without having to re-provision the virtual machine. Any changes submitted through the text-area will be used by the puppetmaster to compile the virtual machine's local configuration.

Bootstrap

The Bootstrap view of the Genome Server web application provides a way to create new virtual machines in a Cloudhost. The user interface is a simply wizard that prompts for information about the virtual machine and which host the virtual machine will reside.

3.2.1.2. Configuration

genomed, like all of the Genome web services, is configured via /etc/genome/genome.conf. The other file of iterest for this for this service is /etc/genome/machine_types.rb. This file is the only one that deserves special attention since it is a Domain Specific Language that gets executed by the ruby interpreter. A documented sample configuration ships with the genome-repo RPM which should be sufficient to get up and running quickly. If changes are made to this file the genomed must be restarted.

Note

Many people bundle the /etc/genome/machine_types.rb with their repo_extensions.

3.2.2. Cloudmaster (cloudmasterd service)

The cloudmasterd is a RESTful web service running on a cloud master that provides cloud computing capabilities across one or more cloud members . The main landing page can be found at http://[hostname]/cloud/status.

The cloudmasterd service also provides a simple status page indicating the current state of the cloud members. A search functionality also enables finding guests running in the cloud.

3.2.3. Cloudhost (cloudhost service)

The cloudhost is another RESTful web service running that takes on much of the functionality of the cloudmasterd prior to version 1.4. It is the web service that actually issues the koan commands and is available at http://[hostname]/host/status.

While this service is designed to be used in conjunction with genomed and other Genome tools like genome-report, it functions well with the standard virt tooling. Uses are free to use koan or virt-manager and information will be gathered and reported in the same way.

Important

A cloudmasterd is not require for using the cloudhost. If the cloudmaster is set in /etc/genome/genome.conf a thread will be started to periodically report it's state to the cloudmaster.

Chapter 4. Open source technologies used with Genome

4.1. Koan

4.1.1. Background

Koan is a tool coming out of Red Hat Emerging Technologies that is used to provision machines from Cobbler servers. Following the unix philosophy it's very simple to use and the man page will tell you everything you need to know. For more information check out the cobbler documentation.

Note

Most provisioning with the Genome tools can be done without having to work with Koan directly. However, a good understanding of its basic operation is useful for advanced usage of the Genome tooling.

4.1.2. Installation

RPMs exist for both Fedora and RHEL (through EPEL). If your repositories are configured correctly you should simply be able to yum install koan. Koan doesn't have many dependencies so if you don't feel like adding the EPEL repo to your RHEL machine you can simply install the RPM.

Once installed you should test your installation against a cobbler server.

koan -s genome-repo.usersys.redhat.com --list=profiles
koan -s genome-repo.usersys.redhat.com --list=systems

4.1.3. Guest Provisioning

Note

genome-bootstrap now wraps Koan for provisioning virtual machines. This is only included for advanced use cases.

 
koan -s genome-repo.usersys.redhat.com --virt --virt-type=xenpv --virt-path=HostVolGroup00 --system=[your hostname]

Here the most important part is obviously the --virt flag. If you pass in a Volume Group name for --virt-path koan will automatically create (or reuse) a logical volume in the format of [name]-disk0. With cobbler much of the configuration lies on the server side (the memory, size of the logical volume, etc). If you have different requirements you can either create a new profile for cobbler or you can use the tooling that makes up Genome achieve the desired results.

Tip

One trick to creating a quest with a larger logical volume than a cobbler profile specifies is to simply create it by hand and specify the size you desire. Koan will simply reuse that logical volume.

4.1.4. Watching the VM

During the kickstart provisioning process you can connect to the virtual framebuffer which is accessible through VNC. It's only available locally so don't try and connect from another machine. From the Xen host you should be able to use:

 
ssh -X root@YourXenHost.usersys.redhat.com
vncviewer localhost:5900

The port may vary according to how many guests you have running. To find out which ports are being used:

 
# If you are using RHEL5 less than U2
netstat -nlp | grep vnc

#otherwise
netstat -nlp | grep qemu-dm

4.1.5. Cleaning Up

If you would like to remove work performed by koan:

  • Remove the Xen configuration for the guest under /etc/xen

  • Remove the file or logical volume that backs your guest.

4.1.6. Known Issues

Provisioning will fail if a config file under /etc/xen has the same name as the machine you are trying to create. The error message is fairly cryptic and says something like "machine already exists". The fix is to simply remove the config file.

4.2. LVM

LVM is used to back our virtualized guests. It is an extremely flexible and pervasive storage technology for Linux. One of the most useful features is the ability to create copy-on-write snapshots.

4.3. Xen Virtualization

Virtualization is a key component of the new architecture. Managing development, build and deployment environments on a variety of hardware and operating systems has always been extremely costly. We have a fairly low tolerance for inconsistencies in both environments, yet customization is critical to most developers and avoided at all costs in production. Virtualization is the technology that gives Genome isolation in both worlds. The development, build, and deployment environments can all be isolated and managed on virtual machines to enable different configurations and optimizations while still residing on the same machine. It also gives us the flexibility to modify our virtualization option as time progress (e.g. from Xen to KVM) but keep the core strategy consistent for the foreseable future.

4.4. JBoss

JBoss is going to be a cornerstone of our new infrastructure. We will be using a slightly newer version of the JBoss EAP stack with components from the JBoss SOA team to incorporate the JBoss ESB.

  • JBoss Getting Started Guide

    Make sure you are comfortable with starting and stopping JBoss as well as the server configurations, deployment mechanisms, jmx console, and general filesystem layout to find logs.

  • JBoss ESB Documentation

    These documents aren't incredibly thorough at the moment, but it should give you a good initial understanding of the JBoss ESB technologies.

  • JBoss Seam Documentation

    The reference documentation for Seam 2.0 CR3 will probably be the best document to read through.

4.5. Source Code Management (Git)

In addition to refining our infrastructure, we also have needed to refine our development and deployment practices for quite a while. We need to be able to run multiple development efforts in parallel, collaborate between them, and maintain a sane state of a deployable branch (e.g. trunk). Subversion has worked in some regards but has fallen short in our ability to utilize it for multiple development streams. Complicated merges end up very error prone and have almost always resulted in production defects. Also, given the errors around branching and merging, it has been very difficult to get the development community to maintain a clean revision history and state of our production branch. Git's distributed nature will allow development to proceed in an offline fashion and result with a small number of clean patches being applied to our production branches. So in essence, be warned, stream of consciousness coding and commits will no longer be accepted.

4.6. Configuration Management (Puppet)

Puppet is a configuration management technology that will help us eliminate many of the manual steps required during releases. Today, the configuration and release process is extremely manual and becoming increasingly difficult to scale. Moving the configuration management aspects down to development allows developers to drive more automation into the release process by providing container and system configurations using a mechanism that can be deployed without modification into production. This also allows groups like Release Engineering to operate in more of a review role and reduce the manual steps they are required to deploy projects.

  • Puppet Documentation

    Since your virtual environment will be running a puppet master to configure all of your virtual machines, make sure you have an understanding of what the puppet master does as well as the templating process used to generate files. This knowledge will be key in enabling developers to make system configuration changes, testing and submitting patches instead of making manual requests for various changes to be applied.

4.7. General

Two places that you should always look for documentation are:

Chapter 5. Self Tests

To walk away with a deeper understanding than just honed copy and paste skills when using the Cookbook, you need some knowledge about the underlying technologies. These self tests will access your ability to use the Genome tooling.

5.1. LVM

  1. How can you find out how many free extents are in your available volume groups?

  2. What command will tell you how much free space is left on your logical volume snapshot?

  3. What happens if your snapshot becomes full?

  4. How can you determine the origin logical volume of a snapshot?

  5. If your root paritition is a single volume group that occupies all extents on your only volume group. How can you free up space for creating other logical volumes / volume groups?

  6. If you are using LVM to back a Xen guest why does simply growing the logical volume not give your guest more disk space?

  7. If your volume groups or logical volumes are not showing up at their appropriate device mount points under /dev, what command(s) can you run to create the necessary device nodes? (Useful for working with LVM in rescue mode)

5.2. Xen

  1. How can you make your Xen guests start at boot time?

  2. Explain the relationship between the Dom0 (or Domain-0) and the DomU.

  3. From a user's point of view, what are the main differences between using para-virtualization and hardware assisted virtualization?

  4. What is the name of the library that both xm and virsh use?

  5. What service must be running for this library to make hypercalls? (When you figure it out, temporarily shut it off and try running +virt-install+)

  6. Where does virt-install create its guest configuration files?

  7. How can you completely delete a guest from the command line? (Say, if you created it originally with virt-manager).

  8. When you are running Xen's default bridged network what is the default name for your real ethernet device?

  9. What will the affect be on your guests when running a Xen host without network connectivity? Why?

  10. Give a highlevel explaination of the difference between a bridged and a routed network.

  11. If you wanted to use NAT instead of the default bridged network setup what config file would you edit?

5.3. Git

  1. Name a command that is not safe to run while other people are using your repo.

  2. Say you just made a bad commit on your private branch, how can you fix it?

  3. What does git pull do under the covers? How is that different than +git fetch+?

  4. What is a bare repo? How can you convert a working repo into a bare one?

  5. How many bytes does it take to create a new branch?

  6. What do the commit SHA1 sums represent?

  7. What is unique about cloning a repo to a location on the same filesystem?

  8. What is the danger in using git rebase on a public branch?

  9. How can you erase all traces of a bad commit on your private branch?

  10. How can you checkout the state of your current branch 6 commits ago?

Chapter 6. Debugging

When things go wrong with Genome look here first.

6.1. Puppet

Sometimes things go wrong when puppet configurations are applied. Most of these failures are due to timing issues that a particular manifest rely on. Timing issues most often encountered during bootstrapping. Usually this is an indication that the manifest needs to be fixed (though there are cases that can't be worked around easily). If a configuration seems to have be half-way applied to your machine you can always force the configuration to run and watch the logging.

  • When debugging it's helpful to stop the long running Puppet service so that changes will be made to you system only when you trigger them explicitly.

    # service puppet stop
    
    

    Runing Puppet manaully:

    # puppetd --test
    
    

    Runing Puppet manaully with full debug info.

    Note

    You must stop this command with ^c

    # puppetd --debug --trace --no-daemonize
    
    

6.2. Puppetmaster

  • Stop the service

    # service puppetmaster stop
    
    

    Runing Puppetmaster manually:

    # puppetmasterd --debug
    
    

    Note

    When you are done be sure to start the puppetmaster service back up.

Chapter 7. Contribute

We're excited that Genome has become a community project! There are a few things to know regarding Genome community participation

7.1. Licensing

All Genome source and pre-built binaries are provided under the GNU General Public License, version 2

7.2. Design Axiom

The Genome framework really tries to delegate as much functionality as it can to tools that are invented to do a particular function. That said, any code contributed to glue tools together should be as minimal as possible to get the job done.

7.3. Community

Now that you're ready to be an active community member, here are a few directions to get you started.

7.3.1. Please Be Friendly

We strongly encourage everyone participating in the Genome community to be friendly and courtious toward other community members. Of course, being courteous is not the same as failing to constructively disagree with each other, but it does mean that we should be respectful of each other when enumerating the 42 technical reasons that a particular proposal may not be the best choice. There's never a reason to be antagonistic or dismissive toward anyone who is sincerely trying to contribute to a discussion

7.3.2. Community Communication

The best way to participate in the community is to use the mailing list and/or the IRC channel. The mailing list is genome-list@redhat.com and the IRC channel is #genome on irc.freenode.net.

7.4. Working With The Code

If you're not familiar with the Git source code management tool, do yourself a favor and take time to get over the learning curve. It's bliss once you 'get it'

7.4.1. Checkout The Code

Developer Checkout URI:

                    ssh://git.fedorahosted.org/git/genome

Anonymous Checkout URI:

                    git://git.fedorahosted.org/git/genome

                    or

                    http://git.fedorahosted.org/git/genome

The Genome project code is seperated into several Git repositories. The code repositories are granular so that the repositories are small and easy to work with. We have sepearted core tooling, core documentation, puppet configuration manifests, third party tool extensions, application code, and website into their respective Git repositories. When you clone the Git repository from fedorahosted.org/git/genome, that is actually a supermodule, which references all the git repositories hosted on gitorious.org. If you do want to get use get all the Genome code at once, you can use the fedorahosted.org/git/genome URL.

                    # Clone the Genome supermodule
                    git clone git://git.fedorahosted.org/git/genome

                    # Move into the cloned supermodule
                    cd genome
                    
                    # Then initialize the submodules
                    git submodule init

                    # Then do the actual cloning of the remote submodules, if you already have them checked out, this will update the submodules locally
                    git submodule update

If you want to work with a specific Git repository, you can review the gitorious genome project and then use the clone urls listed for each Git repository under the project. For example, if I want to clone the Genome tools repository I would go to http://gitorious.org/projects/genome/repos/tools and the choose a clone URL.

                    # Clone the tools git repository
                    git clone git://gitorious.org/genome/tools

Appendix A. Revision History

Revision History
Revision 1.0 Red Hat
IT

Ported documentation to publican

A.1. Logging in to Genome machines

The only interesting thing about logging into Genome machines is the root password. It is currently set in the kickstart file in our Cobbler profiles. That means if you do any provisioning with Koan in the Genome environment your root password will be password. Users can change the password to anything they like once logged in.

A.2. Versioning conventions

A Genome release will typically contain several packages. Each package is versioned with a Major, Minor and Patch number. As is to be expected each has their own meaning.

The Major number will seldomly change. It represents major shifts in tooling or architecture. For example if Genome were to change to a different configuration engine it would be a major change.

Minor changes will occur at more frequent intervals. A specific interval has yet to be determined, but that is indeed the end goal. Monthly releases tend to work will with Fedora so that might be what happens. Examples of minor releases are the shift from using /pub to /srv. This change requred coordinated patches to several of the Genome tools as well as documentation. Minor releases will always strive to be backwards compatible. As with the example directory move, new options to genome-sync were added to make the transition easier. Anything that is not backwards compatible will be clearly communicated on the mailing list.

Note

Minor releases are announced on the mailing list 1 to 2 days before they become publically available in the stable yum repositories.

The last element of the release version is the Patch. These will be backwards compatible and typically require no documentation changes and little communication. If a particular feature is interesting it may be discussed on the mailinglist.

A.2.1. Handling upgrades

Related to versioning, it is worthwhile to mention strategies for upgrading the Genome tools. While the RPMs themselves will always upgrade cleanly, the example Minor update shows that some changes require planned adoption.

The good news, however, is that Genome handles this sort of problem quite nicely. Everything from the machine types, to the cloud and appliance machines can be replicated and tested in isolation. One of the main goals of Genome is to keep teams from having new versions of the tooling "forced" on them at inconvenient times.

The suggested upgrade path for both Major and Minor releases is to build a parallel environment to test the upgrade and then migrate a team to it as they are ready. genome-sync and cobbler --replicate can be used to create a parallel Genome appliance from which any machine types can be tested. DNS can also be used to make this transition seamless for users as well as provide a quick backout plan if needed.

A.3. Managing releases with the Genome tooling

One of the challenges of working on large teams is simply keeping track of all the various forms of content that make up a project. While teams have traditionally used some sort of Source Code Management tool such as subversion or git the same discipline also applies to configuration artifacts and binary dependencies.

For this reason, projects making use of the Genome tooling have the ability to track all content via git repositories. Detailed below is a process that handles bundling the state of several puppet modules, RPM dependencies and source code into one deliverable that can be tracked throughout the project lifecycle.

A.3.1. The "release" repository

The release git repository is basically just a superproject which can contain any number of submodules. This allows project dependencies to be woven together as needed.

A.3.2. Creating a superproject

A superproject is really just a normal git repository for tracking the states of other repositories.

# Create a new git repository
mkdir release
cd release
git init

Once the repository has been created submodules can be added.

# Add the submodule
git submodule add [url] [path/in/superproject/to/be/created]


At this point a new file will have been added called .gitmodules. This is the submodule configuration file. Another "file" that is created is a pseudofile that corresponds to the path created when the submodule was added. Both of these should be committed.

Important

The url used when adding the submodule can be relative. This is often more desirable than hard coding the path to a specific repository. The main reason is that the content referenced by a particular release repository should actually exist in the Repo Appliance. This is a best practice that allows Repo Appliance state to be backed up with the guarantee that a project's state can be rebuilt and the machines involved can be provisioned. See the git-submodule manpage for more information.

Note

See the git-submodule manpage for more information.

A.3.3. A word on pushing superprojects

Typically only metadata is stored in the release superproject. For this reason copying release deliverables from one Repo Appliance to another is not as simple as using git push on only the release repository. If relative submodule paths are used (and they should be) the state referenced in all submodules must exist on a given Repo Appliance. Luckily, this is quite easy to do with genome-sync.

A.3.4. Branching strategy

Complexity, risk as well as external factors all play a large role in how a particular project decides to branch. Conventions go a long way to simplifying this process and can make projects move smoothly from development to production.

In a nutshell it conventions are:

  • If a project is named foo then there will be a branch called foo on all git repositories touched by that project.

  • Branches that match the project name are considered to be stable and "on their way to production".

  • Using the release superproject is simply a matter of wiring up the branches for a particular project into one branch, which also bears the name of the project.

    In practice what this equates to is, after adding the submodules to a superproject, going into the submodule's directory and getting the working directory to match the desired state. If the project branch naming conventions are being followed the content can simply be fetched and then checked out.

    If the fetch/checkout process results in a change, at the root of the superproject git status will reflect the change. The changes can then been commited (on the superproject branch that corresponds to the project name).

Note

These conventions only need to be followed at the by the people who are "interfaces" between teams. The use of Repo Appliances can also aid the branching strategy in that it allows each group to determine what works best for them. For example, development and release engineering (RE) teams have different goals when it comes to managing a codebase. In development a team will be more concerned with how to balance the bugfix and feature streams of a project while RE will focus more on how moving these changes through the various environments affects other projects.

A.3.5. What about the master branch?

For most git repositories it really isn't even needed and only aids to confusion since there is no consensus as to how branches like trunk and master should be used. The main exception with the Genome toolings is the case of the puppet module repositories. The hook that checks out the module code and puts it on the modulepath needs to know the name of a particular branch to work with. That branch is the master branch.

The normal workflow for a puppet module is to test changes on the master branch and then push changes to the project branch when they are baked.

Important

This process can be followed regardless of where in the lifecycle the change occurs. Development can test their changes, push to their project branch and then QA can push the project branch into their master. Once through QA, the code can again be pushed to a project branch where RE can take over.

Glossary

cloud appliance

A server appliance that controls a number of cloud members as func minions.

A Cloud Appliance is simply a prepackaged cloud master.

See Also cloud master.

cloud master

A server that controls a number of cloud members as func minions.

cloudmasterd

A service running on a cloud master that provides the ability to control one or more cloud members .

cloud member

A server that can host virtual machines and that is controlled by a cloud master . Cloud members are controlled through the use of func. When a cloud member is added to a cloud, it is added to the cloud master as a func minion. This allows the cloud master to take control of certain functions on the cloud member. For the purposes of Genome, this means taking control of the ability to koan new virtual machines on the cloud member.

In order for a server to become a viable cloud member, it must have been kickstarted with an appropriate cobbler profile for cloud machines. This ensures that the cloud member has the correct virtual machine hosting capabilities and storage facilities.

genome appliance

An server appliance that serves as the central controlling unit in the Genome framework.