Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home

HP XC System Software : Administration Guide
Version 3.2.1

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

HP Part Number: A-XCADM-321

Published: November 2007

Abstract

This document describes the tools and procedures necessary to administer, monitor, and maintain an HP XC system running HP XC System Software Version 3.2.1.


Table of Contents

About This Document
Intended Audience
New and Changed Information in This Edition
Typographic Conventions
HP XC and Related HP Products Information
Related Information
Manpages
HP Encourages Your Comments
1 HP XC Administration Environment
Understanding Nodes, Services, and Roles
Nodes
Services
Roles
File System
Key Operating System Directories
Log Files
HP XC Command Environment
HP XC Command Set
The nodelist Parameter
Executing a Command on Multiple Nodes
Configuration and Management Database
HP XC Configuration File Guidelines
Linux Configuration Files
Configuration Files
Configuration Files in Imaged Nodes
Installation and Software Distribution
Determining the Installation Type
Software Distribution
Improved Availability
Networking
Linux Virtual Server for HP XC Cluster Alias
Network Time Protocol
Network Address Translation
Network Information Service
Modulefiles
Security
Administrator Passwords
Firewall
Secure Shell (ssh)
Recommended Administrative Tasks
2 Improved Availability
Purpose of the Availability Tool
Services Eligible for Improved Availability
Availability Sets
Determining Which Nodes Are in the Availability Set
Reconfiguring an Availability Set
HP Serviceguard Tasks
Viewing the Serviceguard Cluster Status
Moving Packages
Reimaging Nodes in the Availability Set
Transferring Control of Services
3 Starting Up and Shutting Down the HP XC System
Understanding the Node States
Starting Up the HP XC System
Starting All Nodes
Determining Which Nodes Require Imaging
Imaging and Starting Nodes
Restarting a Node for Imaging
Shutting Down the HP XC System
Shutting Down One or More Nodes
Determining a Node's Power Status
Locating a Given Node
Disabling and Enabling a Node
4 Managing and Customizing System Services
HP XC System Services
Displaying Services Information
Displaying All Services
Displaying the Nodes That Provide a Specified Service
Displaying the Services Provided by a Specified Node
Restarting a Service
Stopping a Service
Global System Services
Customizing Services and Roles
Overview of the HP XC Services Configuration
Service Configuration Sequence of Operation
Assigning Roles with the cluster_config Utility
The *config.d Directories
Configuration Scripts
Understanding Global Configuration Scripts
Advance Planning
Editing the roles_services.ini File
Creating a service.ini File
Adding a New Service
Verifying a New Service
5 Managing Licenses
License Manager and License File
Determining If the License Manager Is Running
Starting and Stopping the License Manager
Starting the License Manager
Stopping the License Manager
Restarting the License Manager
6 Managing the Configuration and Management Database
Accessing the Configuration and Management Database
Querying the Configuration and Management Database
Displaying Configuration Details
Displaying the Nodes That Provide a Specified Service
Displaying Blade Enclosure Information
Finding and Setting System Attribute Values
Backing Up the Configuration Database
Restoring the Configuration Database from a Backup File
Archiving Sensor Data from the Configuration Database
Restoring the Sensor Data from an Archive File
Purging Sensor Data from the Configuration and Management Database
Dumping the Configuration and Management Database
7 Monitoring the System
Monitoring Tools
Monitoring Strategy
Displaying System Environment Data
Monitoring Disks
Displaying System Statistics
Displaying System Sensors from the Command Line
Monitoring Processor Usage and Load from the Command Line
Monitoring Memory from the Command Line
Monitoring Paging and Swap Data from the Command Line
Logging Node Events
Understanding the Event Logging Structure
The syslog-ng.conf Rules File
Modifying the syslog-ng Rules Files
The collectl Utility
Running the collectl Utility from the Command Line
Running the collectl Utility as a Service
Running the collectl Utility in a Batch Job Submission
HP Graph
The resmon Utility
The netdump and crash Utilities
Installing Netdump and Crash
Configuring Netdump
Starting Netdump
Obtaining the Kernel Dump
Using the Crash Utility to Analyze a Kernel Dump File
Using the Crash Utility to Analyze a Live System
8 Monitoring the System with Nagios
Nagios Overview
Nagios Components
Nagios Hosts
Nagios Plug-Ins
Nagios Web Interface
Nagios Files
Using the Nagios Web Interface
Accessing the Nagios Web Interface
Using the Nagios Tactical Overview
Using the Nagios Service Detail View
Using the Nagios Service Problems View
Adjusting the Nagios Configuration
Stopping and Restarting Nagios
Updating the Nagios Configuration
Forwarding Nagios e-mail Alerts
Changing Sensor Thresholds
Adjusting the Time Allotted for Metrics Collection
Changing the Default Nagios User Name
Disabling Individual Nagios Plug-Ins
Configuring Nagios on HP XC Systems
Monitored Nagios Services
Nagios Default Settings
Understanding Nagios Alert Messages
System Event Log Monitoring
Nan Notification Aggregator and Delimiter
Nagios Report Generator Utility
9 Network Administration
Network Address Translation Administration
Network Time Protocol Service
Changing the External IP Address of a Head Node
Modifying Sendmail
Bonding Ethernet Network Interface Cards for Failover
10 Managing Patches and RPM Updates
Sources for Software Packages and Information
Downloading and Installing Patches
Rebuild Kernel Dependent Modules
Rebuilding Serviceguard Modules
11 Distributing Software Throughout the System
Overview of the Image Replication and Distribution Environment
Installing and Distributing Software Patches
Adding Software or Modifying Files on the Golden Client
Installing Additional RPMs from the HP XC System Software Installation DVD
Using File Overrides to the Golden Image
Using Per-Node Service Configuration
Determining Which Nodes Will Be Imaged
Golden Image Checksum
Updating the Golden Image
The cluster_config Utility
The updateimage Command
Exclusion Files
Ensuring That the Golden Image Is Current
Propagating the Golden Image to All Nodes
Using the Full Imaging Installation
Using the cexec Command
Maintaining a Global Service Configuration
12 Opening an IP Port in the Firewall
Open Ports
Opening Ports in the Firewall
Opening a Temporary Port in the Firewall
Opening an IP Port in the Firewall Persistently
13 Connecting to a Remote Console
Console Management Facility
Accessing a Remote Console
14 Managing Local User Accounts and Passwords
HP XC User and Group Accounts
General Procedures for Administering Local User Accounts
Adding a Local User Account
Modifying a Local User Account
Deleting a Local User Account
Configuring the ssh Keys for a User
Synchronizing the NIS Database
Changing Administrative Passwords
Changing the Superuser Password
Changing the CMDB Password
Changing the Interconnect Password
Changing the Console Port Password
Synchronizing the BMC/IPMI Password for CP6000 Systems
Changing the Nagios Administrator Password
Changing the LSF Administrator Password
15 Managing SLURM
Overview of SLURM
Configuring SLURM
Configuring SLURM System Interconnect Support
Configuring SLURM Servers
Configuring Nodes in SLURM
Configuring SLURM Partitions
Configuring SLURM Features
Propagating Resource Limits
Restricting User Access to Nodes
Job Accounting
Using the sacct Command
Disabling Job Accounting
Configuring Job Accounting
Monitoring SLURM
Draining Nodes
Configuring the SLURM Epilog Script
Maintaining the SLURM Daemon Log
Enabling SLURM to Recognize a New Node
Removing SLURM
16 Managing LSF
Standard LSF-HPC
LSF-HPC with SLURM
Integration of LSF-HPC with SLURM
Switching the Type of LSF Installed
LSF-HPC with SLURM Installation
LSF-HPC with SLURM Startup and Shutdown
Starting Up LSF-HPC with SLURM
Shutting Down LSF-HPC with SLURM
Controlling the LSF-HPC with SLURM Service
Launching Jobs with LSF-HPC with SLURM
Monitoring and Controlling LSF-HPC with SLURM Jobs
Maintaining Shell Prompts in LSF-HPC Interactive Shells
Job Accounting
LSF Daemon Log Maintenance
Load Indexes and Resource Information
LSF-HPC with SLURM Monitoring
LSF-HPC with SLURM Failover
Overview of LSF-HPC with SLURM Monitoring and Failover Support
Interplay of LSF-HPC with SLURM
Assigning the Resource Management Nodes
LSF-HPC with SLURM Failover and Running Jobs
Manual LSF-HPC with SLURM Failover
Moving SLURM and LSF Daemons to Their Backup Nodes
Enhancing LSF-HPC with SLURM
LSF-HPC with SLURM Enhancement Settings
Thresholds in LSF-HPC with SLURM and SLURM Interplay
Configuring an External Virtual Host Name for LSF-HPC with SLURM on HP XC Systems
17 Managing Modulefiles
18 Mounting File Systems
Overview of the Network File System on the HP XC System
Understanding the Global fstab File
Mounting Internal File Systems Throughout the HP XC System
Understanding the csys Utility in the Mounting Instructions
Mounting Internal File Systems
Mounting Remote File Systems
Understanding the Mounting Instructions
Mounting a Remote File System
19 Managing Software RAID Arrays
Overview of Software RAID
Software RAID-0
Software RAID-1
Installing Software RAID on the Head Node
Installing Software RAID on Client Nodes
Examining a Software RAID Array
Error Reporting
Removing Software RAID from Client Nodes
20 Using Diagnostic Tools
Using the sys_check Utility
Using the ovp Utility for System Verification
Using the dgemm Utility to Analyze Performance
Using the System Interconnect Diagnostic Tools
HP XC Diagnostic Tools for the Myrinet System Interconnect
Using Diagnostic Tools for the Quadrics System Interconnect
Using Diagnostic Tools for the InfiniBand Interconnect
Using Diagnostic Tools for the Gigabit Ethernet System Interconnect
21 Troubleshooting
General Troubleshooting
Cannot Connect to Database During Configuration
Mismatched Secure Shell Keys
NFS Mount Failure (Permission Denied)
NFS Attribute Caching on Large-Scale Systems
Stale Metrics Data
Nagios Troubleshooting
Determining the Status of the Nagios Service
Nagios Fails to Start
Nagios Log Files
Running Nagios Plug-Ins Manually
Using the nrg Command's Analyze Mode
Multiple %EXPR% Expressions Are Not Accepted in the nagios_vars.ini File
Messages Reported by Nagios
System Interconnect Troubleshooting
Myrinet System Interconnect Troubleshooting
Quadrics System Interconnect Troubleshooting
InfiniBand System Interconnect Troubleshooting
OFED Troubleshooting Procedures
Improved Availability Issues
How To Start HP Serviceguard When Only the Head Node is Running
Restart Serviceguard Quorum Server if Quorum Server Node is Re-imaged
Known Limitation if Nagios is Configured for Improved Availability
Network Restart Command Negatively Affects Serviceguard
Problem Failing Over Database Package Under Serviceguard
SLURM Troubleshooting
SLURM Configuration Issues
SLURM Run-Time Troubleshooting
LSF-HPC Troubleshooting
22 Servicing the HP XC System
Adding a Node
Replacing a Client Node
Actualizing Planned Nodes
Replacing a Server Blade Enclosure OnBoard Administrator
Replacing a System Interconnect Board in an HP CP6000 System
Software RAID Disk Replacement
Replacing a RAID Disk
Writing a Boot Block to the RAID Disk
Incorporating External Network Interface Cards
Gathering Information
Editing the platform_vars.ini File
Using the device_config Command
Updating the Database for the External Network Card
Updating the Firewall Custom Configuration
Reconfiguring the Nodes
Verifying Success
Updating the Golden Image
A Installing LSF-HPC with SLURM into an Existing Standard LSF Cluster
Assumptions
Requirement
Sample Case
HP XC Preparation
Installing LSF-HPC with SLURM
Perform Post Installation Tasks
Configuring the LSF Alias
Starting LSF on the HP XC System
Sample Running Jobs
Troubleshooting
B Installing Standard LSF on a Subset of Nodes
Requirements
Assumptions
Sample Case
Instructions
C Setting Up MPICH
Downloading the MPICH Source Files
Building MPICH on the HP XC System
Running the MPICH Self-Tests
Installing MPICH
D HP MCS Monitoring
Customizing the Configuration for Your Installation
Regenerating the Nagios MCS Configuration
Useful Administrative Commands
MCS Log Files
Nagios Plug-Ins for MCS
Glossary
Index
Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 2003 Hewlett-Packard Development Company, L.P.