Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Using Serviceguard Extension for RAC > Chapter 3 Maintenance and Troubleshooting

Replacing Disks

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Index

The procedure for replacing a faulty disk mechanism depends on the type of disk configuration you are using and on the type of Volume Manager software. For a description of replacement procedures using VERITAS VxVM or CVM, refer to the chapter on “Administering Hot-Relocation” in the VERITAS Volume Manager 3.2 Administrator’s Guide. Additional information is found in the VERITAS Volume Manager 3.2 Troubleshooting Guide.

The following paragraphs describe how to replace disks that are configured with LVM. Separate descriptions are provided for replacing a disk in an array and replacing a disk in a high availability enclosure.

Replacing a Mechanism in a Disk Array Configured with LVM

With any HA disk array configured in RAID 1 or RAID 5, refer to the array’s documentation for instruction on how to replace a faulty mechanism. After the replacement, the device itself automatically rebuilds the missing data on the new disk. No LVM activity is needed. This process is known as hot swapping the disk.

NOTE: If your LVM installation requires online replacement of disk mechanisms, the use of disk arrays may be required, because software mirroring of JBODs with MirrorDisk/UX does not permit hot swapping for disks that are activated in shared mode.

Replacing a Mechanism in an HA Enclosure Configured with Exclusive LVM

Non-Oracle data that is used by packages may be configured in volume groups that use exclusive (one-node-at-a-time) activation. If you are using exclusive activation and software mirroring with MirrorDisk/UX and the mirrored disks are mounted in a high availability disk enclosure, you can use the following steps to hot plug a disk mechanism:

  1. Identify the physical volume name of the failed disk and the name of the volume group in which it was configured. In the following examples, the volume group name is shown as /dev/vg_sg01 and the physical volume name is shown as /dev/c2t3d0. Substitute the volume group and physical volume names that are correct for your system.

  2. Identify the names of any logical volumes that have extents defined on the failed physical volume.

  3. On the node on which the volume group is currently activated, use the following command for each logical volume that has extents on the failed physical volume:

    lvreduce -m 0 /dev/vg_sg01/lvolname /dev/dsk/c2t3d0 

  4. At this point, remove the failed disk and insert a new one. The new disk will have the same HP-UX device name as the old one.

  5. On the node from which you issued the lvreduce command, issue the following command to restore the volume group configuration data to the newly inserted disk:

    vgcfgrestore /dev/vg_sg01 /dev/dsk/c2t3d0 

  6. Issue the following command to extend the logical volume to the newly inserted disk:

    lvextend -m 1 /dev/vg_sg01 /dev/dsk/c2t3d0 

  7. Finally, use the lvsync command for each logical volume that has extents on the failed physical volume. This synchronizes the extents of the new disk with the extents of the other mirror.

    lvsync /dev/vg_sg01/lvolname  

Offline Replacement of a Mechanism in an HA Enclosure Configured with Shared LVM (SLVM)

Hot plugging of disks is not supported for Oracle RAC data, which is configured in volume groups with Shared LVM (SLVM). If you need this capability, you should use disk arrays for your Oracle RAC data.

If you are using software mirroring for shared concurrent activation of Oracle RAC data with MirrorDisk/UX and the mirrored disks are mounted in a high availability disk enclosure, use the following steps to carry out offline replacement:

  1. Make a note of the physical volume name of the failed mechanism (e.g., /dev/dsk/c2t3d0).

  2. Deactivate the volume group on all nodes of the cluster:

    # vgchange -a n vg_ops

  3. Replace the bad disk mechanism with a good one.

  4. From one node, initialize the volume group information on the good mechanism using vgcfgrestore(1M), specifying the name of the volume group and the name of the physical volume that is being replaced:

    # vgcfgrestore /dev/vg_ops /dev/dsk/c2t3d0 

  5. Activate the volume group on one node in exclusive mode then deactivate the volume group:

    # vgchange -a e vg_ops

    This will synchronize the stale logical volume mirrors. This step can be time-consuming, depending on hardware characteristics and the amount of data.

  6. Deactivate the volume group:

    # vgchange -a n vg_ops

  7. Activate the volume group on all the nodes in shared mode using vgchange - a s:

    # vgchange -a s vg_ops

Replacing a Lock Disk

Replacing a failed lock disk mechanism is the same as replacing a data disk. If you are using a dedicated lock disk (one with no user data on it), then you need to issue only one LVM command:

# vgcfgrestore /dev/vg_lock /dev/dsk/c2t1d0

After doing this, wait at least an hour, then review the syslog file for a message showing that the lock disk is healthy again.

On-line Hardware Maintenance with In-line SCSI Terminator

ServiceGuard allows on-line SCSI disk controller hardware repairs to all cluster nodes if you use HP’s in-line terminator (C2980A) on nodes connected to the end of the shared FW/SCSI bus. The in-line terminator cable is a 0.5 meter extension cable with the terminator on the male end, which connects to the controller card for an external bus. The in-line terminator is used instead of the termination pack that is attached to the controller card and makes it possible to physically disconnect the node from the end of the F/W SCSI bus without breaking the bus's termination. (Nodes attached to the middle of a bus using a Y cable also can be detached from the bus without harm.) When using in-line terminators and Y cables, ensure that all orange-socketed termination packs are removed from the controller cards.

NOTE: You cannot use inline terminators with internal FW/SCSI buses on D and K series systems, and you cannot use the inline terminator with single-ended SCSI buses. You must not use an inline terminator to connect a node to a Y cable.

Figure 3-1 “F/W SCSI Buses with In-line Terminators ” shows a three-node cluster with two F/W SCSI buses. The solid line and the dotted line represent different buses, both of which have inline terminators attached to nodes 1 and 3. Y cables are also shown attached to node 2.

Figure 3-1 F/W SCSI Buses with In-line Terminators

F/W SCSI Buses with In-line Terminators

The use of in-line SCSI terminators allows you to do hardware maintenance on a given node by temporarily moving its packages to another node and then halting the original node while its hardware is serviced. Following the replacement, the packages can be moved back to the original node.

Use the following procedure to disconnect a node that is attached to the bus with an in-line SCSI terminator or with a Y cable:

  1. Move any packages on the node that requires maintenance to a different node.

  2. Halt the node that requires maintenance. The cluster will re-form, and activity will continue on other nodes. Packages on the halted node will switch to other available nodes if they are configured to switch.

  3. Disconnect the power to the node.

  4. Disconnect the node from the in-line terminator cable or Y cable if necessary. The other nodes accessing the bus will encounter no problems as long as the in-line terminator or Y cable remains connected to the bus.

  5. Replace or upgrade hardware on the node, as needed.

  6. Reconnect the node to the in-line terminator cable or Y cable if necessary.

  7. Reconnect power and reboot the node. If AUTOSTART_CMCLD is set to 1 in the /etc/rc.config.d/cmcluster file, the node will rejoin the cluster.

  8. If necessary, move packages back to the node from their alternate locations and restart them.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 2005 Hewlett-Packard Development Company, L.P.