Tuesday, May 19, 2009

Approaching a Cloud Computing Model with SOA and Virtualization

There is a lot of press given these days to ‘cloud computing’ that is attractive to many in industry, especially the IT components of those industries. Some of the obvious values that cloud computing purportedly provides are not readily accessible to customers with more stringent security and privacy concerns such as those required within the Federal Government due to the globally virtual nature of the cloud. This white paper will address the impact of the cloud computing concept to portions of the Federal Government who have already engaged in providing shared infrastructure services such as those typically planned and provisioned for an enterprise SOA.

Cloud computing isn’t an entirely new concept and in fact could be considered an amalgamation of several computing patterns that have come into vogue and matured to become prevalent in the IT mainstream. Some of these concepts are grid, virtualization, clustering, and on-demand computing. Unless you’ve been under a rock over the last decade you’ve undoubtedly been inundated with vendor speak on these items as well as likely having taken a stab at leveraging them to provide some value they ostensibly provide. In these cases the likely cost benefit fell into two main categories which are one, the physical footprint to provide requisite compute power (size and/or cost) and two, being able to manage many resources as one.

When looking at the recent adoption of these provisioning patterns it is also important to understand the larger scope of what has been successful in the Federal government to date. There have been shared backbones for supercomputing applications at NASA, Dept of Energy and DoD/DARPA for many years as well as recently established grids such as HHS/NCI National Cancer Grid which has outreach to other research institutions outside of the Federal Government. There has been an uptick of shared service Centers of Excellence for booking travel (GovTrip, Defense Travel Service and FedTraveler) and Human Resources/Payroll (Dept of Interior National Business Center, USDA National Finance Center). Enterprise security (Dept of Defense Net-Centric Enterprise Services/NCES) which provides a common access method to all facilities and systems in the form of a CAC (Common Access Card) has also begun to take hold.

As was mentioned previously, the technology patterns that constitute a ‘cloud’ are mature and in use in many places today. Clustering has been around in mid range servers (Unix, Linux, Windows) for a while and has even become part of the base operating system although third parties like Veritas, etc. still exist with some compelling value adds. Generally, a cluster is a communicating group of computers that mostly offers load balancing of processes as well as availability in the case of failure as the group appears as a single entity to the outside world. Another mainstream pattern for distributing compute power across a set of associated nodes is that of a Grid. Grids take clustering a step further as they have a way to digest workload and decompose it to perform in parallel across grouped resources. An example of mainstream Grid processing is that of Oracle’s 11g database where the ‘g’ is for Grid.

The other piece of the cloud puzzle that has significant uptake is that of virtualization. Virtualization is the act of hosting many servers on a single piece of infrastructure that may or may not be members of the same cluster or grid. Recently both Oracle and IBM have begun to offer Xen hypervisors capable of virtualizing Linux on Intel/AMD based servers (Oracle VM) and IBM S/390 or System Z mainframes (IBM z/VM). This coupled with Oracle’s recent acquisition of Sun means that this combination of virtualizing Linux is likely to receive the same R&D attention that LPARs (IBM), VPARS (HP) and Containers/Zones (Sun) have long received on the UNIX side that provided such excellent manageability of large SMP servers.

Springing from the collection of these concepts are offerings like On Demand computing and Software, Platform or Infrastructure as a Service that have recently come of age. Given the distributed nature of the resources employed to provide such services, even behind data center firewalls, SOA has been a large part of realizing any successful foray into these offerings. What has been challenging for offering these kinds of services are the very items that cloud computing seeks to ameliorate. Examples of the challenges to date have been around provisioning compute resources ‘just in time’, being able to scale when needed with an agreed fee schedule and defining the support model when platform, infrastructure or software is offered as a service. Some have been able to conquer these gaps by offering software developer tools at an appropriate abstraction layer in order to maintain control over the other layers in the infrastructure.

Perhaps due to the fact the cloud offerings are somewhat in their infancy is the other reality that entire clouds have been unavailable for hours at a time, which would violate most SLAs for Federal Government systems. Outside of this difficult fact are the privacy and security required for Federal Government systems that are really at the root of the challenge for offering a cloud computing model for these systems. The ability to manage the total infrastructure, or fabric, which runs all of the components necessary to effectively provide a cloud infrastructure, is the next step in realizing cloud computing capabilities. The fabric could include a SAN and its controllers, 10GB Ethernet or Infiniband, MPLS with QoS and latency requirements, VPNs, Routers and Firewalls, Blades and their operating systems as well as application software deployed on them. To effectively manage resources in a cloud you must have all of these items defined to a level where they can be specified and provisioned at a moment’s notice, perhaps from some trigger in an actively managed infrastructure such as CPU threshold met.

SOA applications have been reduced to a footprint that is easily configurable at deploy time through provisioning capabilities like the Open Virtualization Format and WebLogic Scripting Tool while also being able to subsist on just a few different form factors of blade servers in order to create a virtual ‘appliance’ from a group of virtual machines. Once these variables have been identified and values managed in lists, such as IP addresses and TCP ports of cluster controllers, SOA process definitions and connections, etc. it becomes a somewhat trivial task to introduce compute power to the cluster. However, identifying the matrix of dependencies in aggregate is a non-trivial task given that grids may contain clusters, clouds may be on top of grids, and grids may be on top of clouds. In the end, this distillation of the interface data is what can allow for a telecommunications like provisioning system for compute resources. Coupled with effective blanket purchasing model that allows you to accurately forecast and stock these types of physical resources or provision them as services via partners puts you in a position to actively manage your private infrastructure the same way Amazon or EMC does.

Many Federal Government agencies have invested in multiple data center locations for disaster recovery as well as having pursued ambitious Enterprise level SOA projects. The coupling of the cloud paradigms covered in this document along with the understanding that even cloud computing and its provisioning can be offered through an SOA interface is enticing. As the annual budget outlays for multiple data centers essentially increases inversely to the number of data centers added, more pressure is exerted by OMB and agency executives to leverage them for more than simply an equally performing warm failover but in an active-active role that leverages the total investment. Cloud computing not only offers the ability to actively manage the entire infrastructure with methods that help achieve this goal but also the capability to re-purpose compute power as needed. The possibilities here include performing data warehouse aggregations in the evening hours or perhaps even offering them as a dynamically shared service for compute power in a multi-tenant model to other agencies whether they are looking for SOA or more specialized grid computing applications.

No comments: