Custom Search

Thursday, March 21, 2013

Cluster Computing


Cluster Computing

What Is Cluster Computing?
  • Consists of many of the same or similar type of machines
    (Heterogenous clusters are a subtype, still mostly experimental)
  • Tightly-coupled using dedicated network connections
  • All machines share resources such as a common home directory
    (NFS can be a problem in very large clusters, so binaries and data must be pushed to scratch on each node.)
  • They must trust each other so that rsh or ssh does not require a password,
    otherwise you would need to do a manual start on each machine.
  • Must have software such as an MPI implementation installed to allow programs to be run across all nodes
Shared Disk Clusters
One approach to clustering utilizes central I/O devices accessible to all computers within the cluster. We call these systems shared-disk clusters as the I/O involved is typically disk storage for normal files and/or databases. Shared-disk cluster technologies include Oracle Parallel Server (OPS)and IBM's HACMP.
Shared-disk clusters rely on a common I/O bus for disk access but do not require shared memory. Because all nodes may concurrently write to or cache data from the central disks, a synchronization mechanism must be used to preserve coherence of the system. An independent piece of cluster software called the "distributed lock manager" assumes this role.
Shared-disk clusters support higher levels of system availability: if one node fails, other nodes need not be affected. However, higher availability comes at a cost of somewhat reduced performance in these systems because of overhead in using a lock manager and the potential bottlenecks of shared hardware generally. Shared-disk clusters make up for this shortcoming with relatively good scaling properties: OPS and HACMP support eight-node systems, for example.
Shared Nothing Clusters
A second approach to clustering is dubbed shared-nothing because it does not involve concurrent disk accesses from multiple nodes. (In other words, these clusters do not require a distributed lock manager.) Shared-nothing cluster solutions include Microsoft Cluster Server (MSCS).
MSCS is an atypical example of a shared nothing cluster in several ways. MSCS clusters use a shared SCSI connection between the nodes, that naturally leads some people to believe this is a shared-disk solution. But only one server (the one that owns the quorum resource) needs the disks at any given time, so no concurrent data access occurs. MSCS clusters also typically include only two nodes, whereas shared nothing clusters in general can scale to hundreds of nodes.
Mirrored Disk Clusters
Mirrored-disk cluster solutions include Legato's Vinca. Mirroring involves replicating all application data from primary storage to a secondary backup (perhaps at a remote location) for availability purposes. Replication occurs while the primary system is active, although the mirrored backup system -- as in the case of Vinca -- typically does not perform any work outside of its role as a passive standby. If a failure occurs in the primary system, a failover process transfers control to the secondary system. Failover can take some time, and applications can lose state information when they are reset, but mirroring enables a fairly fast recovery scheme requiring little operator intervention. Mirrored-disk clusters typically include just two nodes.

No comments:

Post a Comment