(Post 09/12/2005) There are many different
types of distributed computing systems and many challenges to overcome
in successfully designing one. The main goal of a distributed computing
system is to connect users and resources in a transparent,
open, and scalable way. Ideally this arrangement
is drastically more fault tolerant and more
powerful than many combinations of stand-alone computer systems.
In computer science, distributed
computing studies the coordinated
use of physically distributed computers.
As stated by Andrew S. Tanenbaum,
"Distributed systems need radically different software than centralized
systems do."
Goal
There are many different types of distributed computing
systems and many challenges to overcome in successfully designing one.
The main goal of a distributed computing system is to connect users and
resources in a transparent,
open, and scalable way. Ideally this arrangement
is drastically more fault tolerant and more
powerful than many combinations of stand-alone computer systems.
Today Web Services provide the
standard protocols for connecting distributed systems.
Examples
An example of a distributed system is the World Wide Web. As you
are reading a web page, you are actually using the distributed system
that comprises the site. As you are browsing the web, your web browser running on your
own computer communicates with different web servers that provide web
pages. Possibly, your browser uses a proxy server to access the
web contents stored on web servers faster and more securely. To find these
servers, it also uses the distributed domain name system. Your web browser
communicates with all of these servers over the Internet, via a system of routers which are
themselves part of a large distributed system.
Openness
Openness is the property of distributed systems such
that each subsystem is continually open to interaction with other systems
(see references). Web Services
protocols are standards which enable distributed systems to be extended
and scaled. In general, an open system that scales has an advantage over
a perfectly closed and self-contained system.
Consequently, open distributed systems are required to
meet the following challenges:
- monotonicity: Once something is published in an open distributed
system, it cannot be taken back.
- pluralism: Different subsystems of an open distributed system
include heterogeneous, overlapping and possibly conflicting information.
There is no central arbiter of truth in open distributed systems.
- unbounded nondeterminism: Asynchronously, different subsystems
can come up and go down and communication links can come in and go
out between subsystems of an open distributed system. Therefore the
time that it will take to complete an operation cannot be bounded
in advance (see unbounded nondeterminism).
Scalability
A scalable system is one that can easily be altered to
accommodate changes in the number of users, resources and computing entities
affected to it. Scalability can be measured in three different dimensions:
- Load scalability — A distributed system should make it easy for
us to expand and contract its resource pool to accommodate heavier
or lighter loads.
- Geographic scalability — A geographically scalable system is one
that maintains its usefulness and usability, regardless of how far
apart its users or resources are.
- Administrative scalability — No matter how many different organizations
need to share a single distributed system, it should still be easy
to use and manage.
Some loss of performance may occur in a system that allows
itself to scale in one or more of these dimensions.
Multiprocessor systems
A multiprocessor system
is simply a computer that has more than one CPU on its motherboard. If
the operating system is built to take advantage of this, it can run different
processes on different
CPUs, or different threads belonging to the same process.
Over the years, many different multiprocessing options
have been explored for use in distributed computing. OS' such as Linux already have built-in support
for this. Intel CPUs employ a technology called Hyperthreading that allows
more than one thread (usually
two) to run on the same CPU. The most recent Sun UltraSPARC T1, Athlon 64
X2 and Intel Pentium D processors feature
multiple processor cores to also increase the number of concurrent threads
they can run.
Multicomputer systems
A multicomputer system is a system made up of several
independent computers interconnected by a telecommunications network.
Multicomputer systems can be homogeneous or heterogeneous:
A homogeneous distributed system
is one where all CPUs are similar and are connected by a single type of
network. They are often used for parallel computing
which is a kind of distributed computing where every computer is working
on different parts of a single problem.
In contrast a heterogeneous distributed
system is one that can be made up of all sorts of different computers,
eventually with vastly differing memory sizes, processing power and even
basic underlying architecture. They are in widespread use today, with
many companies adopting this architecture due to the speed with which
hardware goes obsolete and the cost of upgrading a whole system simultaneously.
Architecture
Various hardware and software architectures exist that
are usually used for distributed computing. At a lower level, it is necessary
to interconnect multiple CPUs with some sort of network, regardless of
that network being printed onto a circuit board or made up of several
loosely-coupled devices and cables. At a higher level, it is necessary
to interconnect processes running on those CPUs with some sort of communication
system.
- Client-server — Smart
client code contacts the server for data, then formats and displays
it to the user. Input at the client is committed back to the server
when it represents a permanent change.
- 3-tier architecture
— Three tier systems move the client intelligence to a middle tier
so that stateless clients can be used. This simplifies application
deployment. Most web applications are 3-Tier.
- N-tier architecture
— N-Tier refers typically to web applications which further forward
their requests to other enterprise services. This type of application
is the one most responsible for the success of application servers.
- Tightly coupled (clustered)
— refers typically to a set of highly integrated machines that run
the same process in parallel, subdividing the task
in parts that are made individually by each one, and then put back
together to make the final result.
- Peer-to-peer — an architecture
where there is no special machine or machines that provide a service
or manage the network resources. Instead all responsibilities are
uniformly divided among all machines, known as peers.
- Service oriented
— Where system is organized as a set of highly reusable services that
could be offered through a standardized interfaces.
- Mobile code — Based on the
architecture principle of moving processing closest to source of data
- Replicated
repository — Where repository is replicated among distributed
system to support online / offline processing provided this lag in
data update is acceptable.
Concurrency
Distributed computing implements a kind of concurrency.
Computing taxonomies
The types of distributed computers are based on Flynn's taxonomy of systems;
single instruction,
single data (SISD), multiple instruction, single data
(MISD), single
instruction, multiple data (SIMD) and multiple instruction, multiple data
(MIMD). Other taxonomies and architectures available at Computer architecture
and in Category:Computer
architecture.
Computer clusters
A cluster is multiple stand-alone machines acting in
parallel across a local high speed network. Distributed computing differs
from cluster computing in
that computers in a distributed computing environment are typically not
exclusively running "group" tasks, whereas clustered computers are usually
much more tightly coupled. The difference makes distributed computing
attractive because, when properly configured, it can use computational
resources that would otherwise be unused. It can also make available computing
resources which would otherwise be impossible.
The Second Life grid is a heterogeneous
multicomputer and so are most Beowulf clusters.
Grid computing
A grid uses the resources of many separate computers
connected by a network (usually the internet) to solve large-scale computation
problems. Most use idle time on many thousands of computers throughout
the world. Such arrangements permit handling of data that would otherwise
require the power of expensive supercomputers or would
have been impossible to analyze otherwise.
Distributed computing projects also often involve competition
with other distributed systems. This competition may be for prestige,
or it may be a means of enticing users to donate processing power to a
specific project. For example, stat races are a measure of what the most
distributed work a project has been able to compute over the past day
or week. This has been found to be so important in practice that virtually
all distributed computing projects offer online statistical analyses of
their performances, updated at least daily if not in real-time.
See List
of distributed computing projects for more information on specific
projects.
See also
References
- William Kornfeld and Carl Hewitt. The Scientific Community
Metaphor MIT AI Memo 641. January 1981.
- Carl Hewitt and Peter de Jong. Analyzing the Roles of Descriptions
and Actions in Open Systems Proceedings of the National Conference
on Artificial Intelligence. August 1983.
- Carl Hewitt. The Challenge of Open Systems Byte Magazine.
April 1985.
- Carl Hewitt. Towards Open Information Systems Semantics Proceedings
of 10th International Workshop on Distributed Artificial Intelligence.
October 23-27, 1990. Bandera, Texas.
- Carl Hewitt. Open Information Systems Semantics Journal of
Artificial Intelligence. January 1991.
Distributed computing infrastructure
Distributed computing conferences
and journals
Proprietary infrastructure
People who have contributed
to the distributed computing resear
- This is an incomplete list,
which may never be able to satisfy certain standards for completeness.
Revisions and additions are welcome.
Foundations and Principles
Gul
Agha, Henry Baker,
James
Aspnes, Hagit
Attiya, Will
Clinger, Danny
Dolev, Shlomi Dolev, Michael
J. Fischer, Vassos
Hadzilacos, Carl Hewitt, Leslie
Lamport, Nancy Lynch, Michael
Merritt, Paul Spirakis, Michel
Raynal, Sam
Toueg, Aki
Yonezawa
Systems
Ken
Birman, Frans
Kaashoek, Barbara Liskov, Andrew Tanenbaum,
External links
(theo Wikipedia) |