(Post 22/08/2006) OptimalGrid is a distributed
computing middleware designed to support parallel computation on any Grid,
cluster, or intranet group of computers. This tutorial will give you a
practical introduction to OptimalGrid and show how you would design a
Grid solution to your own development problems.
Section 1. Before you start
About this tutorial
The goal of this tutorial is to give you a practical
introduction to OptimalGrid. Grid computing offers immense potential to
industry, science, and individual users, but you need an easy way to harness
that potential. OptimalGrid is an attempt to address this challenge.
OptimalGrid provides a Grid-enabled application framework
you can use for rapid and easy development of Grid applications. It's
working prototype middleware from IBM Almaden Research Center, and it's
used by applications that require large-scale computation on a Grid. OptimalGrid
hides the complexity of the underlying Grid infrastructure from the application
developer. It also provides essential autonomic features such as self-configuration,
healing, and optimization so you don't have to invent and implement these
features. For you, the result is that you can rapidly build and deploy
applications on a Grid.
After completing the tutorial, you should:
- Understand what types of problems OptimalGrid is designed to solve
- Understand the basics of how OptimalGrid implements the solution
to these types of problems
- Have a working installation of the OptimalGrid software
- Be able to write the Java™ code to implement the solution to a specific
problem
Should I take this tutorial?
Are you an application developer with some Java programming
experience? And do you have a problem that requires more computing power
than can be utilized on a single processor? If so, then you should complete
the initial chapters that describe OptimalGrid and the problems that it
is well equipped to handle. If you determine that, in fact, OptimalGrid
and your problem are a good match, then you should complete the tutorial
and learn how to use OptimalGrid to quickly implement a solution to your
problem that can be implemented on a Grid network of computers.
Prerequisites
To complete this tutorial, you need a working knowledge
of Java technology.
Minimum platform requirements for OptimalGrid
- One or more computers (750-MHz processor)
- Java runtime 1.3 or higher
- TSpaces (included in distribution)
- 10-Mbit Ethernet to each processor
- Storage requirements are application-dependent
- Any operating system that supports Java code; tested platforms include
Linux® and Windows®.
Recommended configuration
- Linux
- Cluster of more than one machine (1-GHz processor)
- Java runtime 1.3 or higher with Java 3D extensions installed
- TSpaces (included in distribution)
- 100-Mbit Ethernet to each processor
- Storage requirements are application-dependent
Section 2. Introduction to OptimalGrid
What is OptimalGrid?
OptimalGrid -- a research prototype from IBM Almaden
Research Center -- is middleware that aims to simplify creating and managing
large-scale, connected, parallel grid applications. The OptimalGrid system
is pure Java code and runs on any operating system or collection of operating
systems that support Java technology. OptimalGrid is designed to support
parallel applications that require ongoing communication between cluster
processors or nodes.
The purpose of OptimalGrid is to optimize performance
of a distributed Grid system given whatever resources and infrastructure
are actually available. It is in this sense we are optimal.
For a high-level overview of the OptimalGrid components
and architecture, see "OptimalGrid
-- Autonomic computing on the grid."
Section 3. Solving a problem using the OptimalGrid
object model
What's in this section?
This section describes the OptimalGrid object model and
how it is used to describe a typical cellular automata, Finite Element
problem, or other application where computational progress depends on
sharing information between nodes.
The next section shows how you would implement the Java
code to solve one of these typical problems.
OptimalGrid object model
All Finite Element Model (FEM) problems are solved numerically
by portioning space into small finite regions or elements where small
is typically defined by the smallest natural scale in a problem. Figure
1 shows a continuous solid object being turned into a discrete set of
nodes, each with specific properties.
Original Problem Cell (OPC)
To describe the OptimalGrid system's approach, it's easiest
to consider the simple two dimensional problem, as shown in Figure 2.
Here an element A is connected to its four closest neighbors,
and each neighbor in turn is also connected to three elements plus A.
Edge elements connect to either one, two, or three elements depending
on their location. We call the smallest piece of a problem (in this case
a single element) an Original Problem Cell or OPC. In the abstract, this
is a node on the application graph that contains data, methods, and pointers
to other OPCs.
The OptimalGrid object model implements code to solve
a problem using abstract OPCs. The user implements a small set of methods
defined in an OPC Abstract Class that are unique to the particular problem
being solved. These methods describe the connectivity of the cell with
its neighbors, and they specify the calculations to be performed by the
cell using the information communicated by its neighbors.
A single OPC object is very small, requiring very little
memory to hold and little computational power to execute. An OPC can contain
0 or more entities that may flow from one OPC to neighboring OPCs and
0 or more Properties that describe properties of the OPC or its entities.
Definition: An OPC is the smallest unit
of work in the OptimalGrid system. Your problem solution is implemented
by writing the Java code to implement the behavior of the OPC and its
interaction with its neighbors.
OPC collections
The OptimalGrid system aggregates sets of OPCs that are
connected to one another to form an OPC collection. Once created, OPC
collections are fixed in size, for the lifetime of the problem.
Definition: An OPCCollection is a set
of OPC objects that are connected together. The set of OPCs contained
in an OPCCollection is assigned at problem initialization and remains
fixed.
Variable Problem Partition (VPP)
The problem piece object containing a set of OPC Collections
is defined as a Variable Problem Partition (VPP). This is illustrated
in Figure 3. A VPP is the unit of work that is distributed to a compute
node. Load balancing will be accomplished by
exchanging OPC Collections between VPPs.
Definition: A Variable Problem Partition
or VPP is the set of OPC Collections assigned to a Grid compute node.
The number of OPC Collections contained in a VPP is variable.
The OptimalGrid architecture
Figure 4 shows the OptimalGrid architecture.
The main coordinator component of the OptimalGrid system
is the Autonomic Program Manager (APM). The APM does a number of things:
- Manages the compute agents and the pieces of the problem
- Can invoke a problem builder (the component that creates the initial
problem) if the user doesn't do it manually from the console
- Assigns the initial distribution of the problem given all of the
information maintained on problems, compute agents, and the general
computing environment
- Invokes the various pluggable Rule Engines that track the events
of the OptimalGrid system and store the lessons learned for optimizing
the problem computation
Given an initial problem, such as the abstract FEM problem,
OptimalGrid automatically partitions the problem into OPC collections,
based on the problem complexity. Then, those OPC collections are grouped
into variable problem partitions, based on the available compute resources.
Suppose that, in this particular example, the FEM problem was composed
of 2 million two-dimensional elements (OPCs), using a simple square mesh.
For a given complexity, we might turn that into 800 collections of 2500
OPCs each, with an average 16 collection VPP having about 40,000 OPCs.
With an even distribution, 50 compute agents would each get a 40,000 VPP
(with the unit of change/modification being one 2500 collection). Faster
networks (or shorter communications latency) might allow more compute
agents with smaller VPPs, while larger compute agent memories and a slower
network might require fewer, larger VPPs.
(Copyright IBM Corporation) |