| United States-English |
|
|
|
![]() |
HP Integrity Essentials Global Workload Manager User's Guide: A.03.00.00 > Chapter 5 Additional Configuration and Administration TasksAutomatic Restart of gWLM’s Managed Nodes in SRDs (High Availability) |
|
Whenever a managed node boots, the node’s gWLM agent attempts to automatically rejoin the node in its SRD, providing high availability. The only configuration steps you need to perform for this behavior to happen are:
This feature works best when one managed node is lost at a time or all managed nodes are lost.
When a managed node boots, the gWLM agent (gwlmagent) starts automatically if GWLM_AGENT_START is set to 1 in the file /etc/rc.config.d/gwlmCtl. The agent then checks the file /etc/opt/gwlm/deployed.config to determine its CMS. Next, it attempts to contact the CMS to have the CMS re-deploy its view of the SRD. If the CMS cannot be contacted, the SRD in the deployed.config file is deployed as long as all nodes agree. In general, when an SRD is disrupted by a node’s going down or by network communications issues, gWLM attempts to reform the SRD. gWLM maintains the concept of a cluster for the nodes in an SRD. In a cluster, one node is a master, and the other nodes are nonmasters. If the master node loses contact with the rest of the SRD, the rest of the SRD can continue without it, as a partial cluster, by unanimously agreeing on a new master. If a nonmaster loses communication with the rest of the SRD, the resulting partial cluster continues operation without the lost node. The master simply omits the missing node until it becomes available again.
You can configure the following SIM events regarding this automatic restart feature:
For information on enabling and viewing these events, refer to gWLM’s “Configure Events” menu. You can then view these events using the Event Lists item in the left pane of SIM. The following sections explain how to handle some of the events. If you see this event:
If you have an SRD containing n nodes and you get n - 1 of the “SRD Communication Issue” events but no “SRD Reformed with Partial Set of Nodes” events within 5 minutes (assuming an allocation interval of 15 seconds) of the first “SRD Communication Issue” event you may need to restart the gwlmagent on each managed node in the affected SRD: # /opt/gwlm/bin/gwlmagent --restart If gWLM is unable to reform an SRD, you can manually clear the SRD, as described below. The command discussed below is an advanced command for clearing an SRD. The recommended method for typically removing a host from management is by using the gwlm undeploy command. Starting with A.02.50.00.x agents, you can manually clear an SRD with the following command: # gwlm reset --host=host where host specifies the host with the SRD to be cleared. If the above command does not work, follow the procedure given in the next section. The following procedure clears an SRD regardless of the version of the agents in the SRD:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||