Cluster Computing
What
Is Cluster Computing?
- Consists of many of the same or
similar type of machines
(Heterogenous clusters are a subtype, still mostly experimental) - Tightly-coupled using dedicated
network connections
- All machines share resources such
as a common home directory
(NFS can be a problem in very large clusters, so binaries and data must be pushed to scratch on each node.) - They must trust each other so that
rsh or ssh does not require a password,
otherwise you would need to do a manual start on each machine. - Must have software such as an MPI
implementation installed to allow programs to be run across all nodes
Shared
Disk Clusters
One approach to clustering
utilizes central I/O devices accessible to all computers within the cluster. We
call these systems shared-disk clusters as the I/O involved is
typically disk storage for normal files and/or databases. Shared-disk cluster
technologies include Oracle Parallel Server (OPS)and IBM's HACMP.
Shared-disk clusters rely
on a common I/O bus for disk access but do not require shared memory. Because
all nodes may concurrently write to or cache data from the central disks, a
synchronization mechanism must be used to preserve coherence of the system. An
independent piece of cluster software called the "distributed lock
manager" assumes this role.
Shared-disk clusters
support higher levels of system availability: if one node fails, other nodes
need not be affected. However, higher availability comes at a cost of somewhat
reduced performance in these systems because of overhead in using a lock
manager and the potential bottlenecks of shared hardware generally. Shared-disk
clusters make up for this shortcoming with relatively good scaling properties:
OPS and HACMP support eight-node systems, for example.
Shared
Nothing Clusters
A second approach to
clustering is dubbed shared-nothing because it does not
involve concurrent disk accesses from multiple nodes. (In other words, these
clusters do not require a distributed lock manager.) Shared-nothing cluster
solutions include Microsoft Cluster Server (MSCS).
MSCS is an atypical
example of a shared nothing cluster in several ways. MSCS clusters use a shared
SCSI connection between the nodes, that naturally leads some people to believe
this is a shared-disk solution. But only one server (the one that owns the
quorum resource) needs the disks at any given time, so no concurrent data
access occurs. MSCS clusters also typically include only two nodes, whereas
shared nothing clusters in general can scale to hundreds of nodes.
Mirrored
Disk Clusters
Mirrored-disk cluster solutions include
Legato's Vinca. Mirroring involves replicating all application data from
primary storage to a secondary backup (perhaps at a remote location) for
availability purposes. Replication occurs while the primary system is active,
although the mirrored backup system -- as in the case of Vinca -- typically
does not perform any work outside of its role as a passive standby. If a
failure occurs in the primary system, a failover process transfers control to
the secondary system. Failover can take some time, and applications can lose
state information when they are reset, but mirroring enables a fairly fast
recovery scheme requiring little operator intervention. Mirrored-disk clusters
typically include just two nodes.
No comments:
Post a Comment