Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Programming
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Databases
Mail Systems
openSolaris
Eclipse Documentation
Techotopia.com
Virtuatopia.com
Answertopia.com

How To Guides
Virtualization
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Windows
Problem Solutions
Privacy Policy

  




 

 

NOTE: CentOS Enterprise Linux is built from the Red Hat Enterprise Linux source code. Other than logo and name changes CentOS Enterprise Linux is compatible with the equivalent Red Hat version. This document applies equally to both Red Hat and CentOS Enterprise Linux.

Chapter 1. Red Hat Cluster Manager Overview

Red Hat Cluster Manager allows administrators to connect separate systems (called members or nodes) together to create failover clusters that ensure application availability and data integrity under several failure conditions. Administrators can use Red Hat Cluster Manager with database applications, file sharing services, web servers, and more.

To set up a failover cluster, you must connect the nodes to the cluster hardware, and configure the nodes into the cluster environment. The foundation of a cluster is an advanced host membership algorithm. This algorithm ensures that the cluster maintains complete data integrity by using the following methods of inter-node communication:

  • Network connections between the cluster systems

  • A Cluster Configuration System daemon (ccsd) that synchronizes configuration between cluster nodes

To make an application and data highly available in a cluster, you must configure a cluster service, an application that would benefit from Red Hat Cluster Manager to ensure high availability. A cluster service is made up of cluster resources, components that can be failed over from one node to another, such as an IP address, an application initialization script, or a Red Hat GFS shared partition. Building a cluster using Red Hat Cluster Manager allows transparent client access to cluster services. For example, you can provide clients with access to highly-available database applications by building a cluster service using Red Hat Cluster Manager to manage service availability and shared Red Hat GFS storage partitions for the database data and end-user applications.

You can associate a cluster service with a failover domain, a subset of cluster nodes that are eligible to run a particular cluster service. In general, any eligible, properly-configured node can run the cluster service. However, each cluster service can run on only one cluster node at a time in order to maintain data integrity. You can specify whether or not the nodes in a failover domain are ordered by preference. You can also specify whether or not a cluster service is restricted to run only on nodes of its associated failover domain. (When associated with an unrestricted failover domain, a cluster service can be started on any cluster node in the event no member of the failover domain is available.)

You can set up an active-active configuration in which the members run different cluster services simultaneously, or a hot-standby configuration in which primary nodes run all the cluster services, and a backup cluster system takes over only if the primary nodes fail.

If a hardware or software failure occurs, the cluster automatically restarts the failed node's cluster services on the functional node. This cluster-service failover capability ensures that no data is lost, and there is little disruption to users. When the failed node recovers, the cluster can re-balance the cluster services across the nodes.

In addition, you can cleanly stop the cluster services running on a cluster system and then restart them on another system. This cluster-service relocation capability allows you to maintain application and data availability when a cluster node requires maintenance.

1.1. Red Hat Cluster Manager Features

Cluster systems deployed with Red Hat Cluster Manager include the following features:

No-single-point-of-failure hardware configuration

Clusters can include a dual-controller RAID array, multiple bonded network channels, and redundant uninterruptible power supply (UPS) systems to ensure that no single failure results in application down time or loss of data.

Alternatively, a low-cost cluster can be set up to provide less availability than a no-single-point-of-failure cluster. For example, you can set up a cluster with a single-controller RAID array and only a single Ethernet channel.

Certain low-cost alternatives, such as host RAID controllers, software RAID without cluster support, and multi-initiator parallel SCSI configurations are not compatible or appropriate for use as shared cluster storage.

Cluster configuration and administration framework

Red Hat Cluster Manager allows you to easily configure and administer cluster services to make resources such as applications, server daemons, and shared data highly available. To create a cluster service, you specify the resources used in the cluster service as well as the properties of the cluster service, such as the cluster service name, application initialization (init) scripts, disk partitions, mount points, and the cluster nodes on which you prefer the cluster service to run. After you add a cluster service, the cluster management software stores the information in a cluster configuration file, and the configuration data is aggregated to all cluster nodes using the Cluster Configuration System (or CCS), a daemon installed on each cluster node that allows retrieval of changes to the XML-based /etc/cluster/cluster.conf configuration file.

Red Hat Cluster Manager provides an easy-to-use framework for database applications. For example, a database cluster service serves highly-available data to a database application. The application running on a cluster node provides network access to database client systems, such as Web applications. If the cluster service fails over to another node, the application can still access the shared database data. A network-accessible database cluster service is usually assigned an IP address, which is failed over along with the cluster service to maintain transparent access for clients.

The cluster cluster-service framework can also easily extend to other applications through the use of customized init scripts.

Cluster administration user interface

The Cluster Configuration Tool interface facilitiates the administration and monitoring tasks of cluster resources, such as: creating, starting, and stopping cluster services; relocating cluster services from one node to another; modifying the cluster service configuration; and monitoring the cluster nodes. The CMAN interface allows administrators to individually control the cluster on a per-node basis.

Failover domains

By assigning a cluster service to a restricted failover domain, you can limit the nodes that are eligible to run a cluster service in the event of a failover. (A cluster service that is assigned to a restricted failover domain cannot be started on a cluster node that is not included in that failover domain.) You can order the nodes in a failover domain by preference to ensure that a particular node runs the cluster service (as long as that node is active). If a cluster service is assigned to an unrestricted failover domain, the cluster service starts on any available cluster node (if none of the nodes of the failover domain are available).

Data integrity assurance

To ensure data integrity, only one node can run a cluster service and access cluster-service data at one time. The use of power switches in the cluster hardware configuration enables a node to power-cycle another node before restarting that node's cluster services during the failover process. This prevents any two systems from simultaneously accessing the same data and corrupting it. It is strongly recommended that fence devices (hardware or software solutions that remotely power, shutdown, and reboot cluster nodes) are used to guarantee data integrity under all failure conditions. Watchdog timers are an alternative used to ensure correct operation of cluster service failover.

Ethernet channel bonding

To monitor the health of the other nodes, each node monitors the health of the remote power switch, if any, and issues heartbeat pings over network channels. With Ethernet channel bonding, multiple Ethernet interfaces are configured to behave as one, reducing the risk of a single-point-of-failure in the typical switched Ethernet connection between systems.

cluster-service failover capability

If a hardware or software failure occurs, the cluster takes the appropriate action to maintain application availability and data integrity. For example, if a node completely fails, a healthy node (in the associated failover domain, if used) starts the service or services that the failed node was running prior to failure. Cluster services already running on the healthy node are not significantly disrupted during the failover process.

When a failed node reboots, it can rejoin the cluster and resume running the cluster service. Depending on how the cluster services are configured, the cluster can re-balance services among the nodes.

Manual cluster-service relocation capability

In addition to automatic cluster-service failover, a cluster allows you to cleanly stop cluster services on one node and restart them on another node. You can perform planned maintenance on a node system while continuing to provide application and data availability.

Event logging facility

To ensure that problems are detected and resolved before they affect cluster-service availability, the cluster daemons log messages by using the conventional Linux syslog subsystem.

Application monitoring

The infrastructure in a cluster monitors the state and health of an application. In this manner, should an application-specific failure occur, the cluster automatically restarts the application. In response to the application failure, the application attempts to be restarted on the node it was initially running on; failing that, it restarts on another cluster node. You can specify which nodes are eligible to run a cluster service by assigning a failover domain to the cluster service.

1.1.1. Red Hat Cluster Manager Subsystem Overview

Table 1-1 summarizes the GFS Software subsystems and their components.

Software Subsystem Components Description
Cluster Configuration Tool system-config-cluster Command used to manage cluster configuration in a graphical setting.
Cluster Configuration System (CCS) ccs_tool Command used to create CCS archives.
  ccs_test Diagnostic and testing command that is used to retrieve information from configuration files through ccsd.
  ccsd CCS daemon that runs on all cluster nodes and provides configuration file data to cluster software.
Resource Group Manager (rgmanager) clusvcadm Command used to manually enable, disable, relocate, and restart user services in a cluster
  clustat Command used to display the status of the cluster, including node membership and services running.
  clurgmgrd Daemon used to handle user service requests including service start, service disable, service relocate, and service restart
  clurmtabd Daemon used to handle Clustered NFS mount tables
Fence fence_node Command used by lock_gulmd when a fence operation is required. This command takes the name of a node and fences it based on the node's fencing configuration.
  fence_apc Fence agent for APC power switch.
  fence_bladecenter Fence agent for for IBM Bladecenters with Telnet interface.
  fence_bullpap Fence agent for Bull Novascale Platform Administration Processor (PAP) Interface.
  fence_ipmilan Fence agent for Bull Novascale Intelligent Platform Management Interface (IPMI).
  fence_wti Fence agent for WTI power switch.
  fence_brocade Fence agent for Brocade Fibre Channel switch.
  fence_mcdata Fence agent for McData Fibre Channel switch.
  fence_vixel Fence agent for Vixel Fibre Channel switch.
  fence_sanbox2 Fence agent for SANBox2 Fibre Channel switch.
  fence_ilo Fence agent for HP ILO interfaces (formerly fence_rib).
  fence_gnbd Fence agent used with GNBD storage.
  fence_egenera Fence agent used with Egenera BladeFrame system.
  fence_manual Fence agent for manual interaction.
  fence_ack_manual User interface for fence_manual agent.
DLM libdlm.so.1.0.0 Library for Distributed Lock Manager (DLM) support.
  dlm.ko Kernel module that is installed on cluster nodes for Distributed Lock Manager (DLM) support.
LOCK_GULM lock_gulm.o Kernel module that is installed on GFS nodes using the LOCK_GULM lock module.
  lock_gulmd Server/daemon that runs on each node and communicates with all nodes in GFS cluster.
  libgulm.so.xxx Library for GULM lock manager support
  gulm_tool Command that configures and debugs the lock_gulmd server.
LOCK_NOLOCK lock_nolock.o Kernel module installed on a node using GFS as a local file system.
GNBD gnbd.o Kernel module that implements the GNBD device driver on clients.
  gnbd_serv.o Kernel module that implements the GNBD server. It allows a node to export local storage over the network.
  gnbd_export Command to create, export and manage GNBDs on a GNBD server.
  gnbd_import Command to import and manage GNBDs on a GNBD client.

Table 1-1. Red Hat Cluster Manager Software Subsystem Components

 
 
  Published under the terms of the GNU General Public License Design by Interspire