SERVERware 4 StandAlone Edition Storage Pool Faulty Disk Replacement
When one of the disks from the storage pool is damaged, the next procedure should be followed:
If a disk fails, zpool will be in the state: DEGRADED.
We need to log in to the ssh on the storage server and check the status of zpool .
~ # zpool status pool: NETSTOR state: DEGRADED status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://zfsonlinux.org/msg/ZFS-8000-4J scan: none requested config: NAME STATE READ WRITE CKSUM NETSTOR DEGRADED 0 0 0 mirror-0 ONLINE 0 0 0 NETSTOR1 ONLINE 0 0 0 NETSTOR2 ONLINE 0 0 0 mirror-1 DEGRADED 0 0 0 NETSTOR3 ONLINE 0 0 0 NETSTOR4 UNAVAIL 0 158 0 corrupted data
In the output from the zpool status, we can see that NETSTOR4 is corrupted.
NETSTOR4 UNAVAIL 0 158 0 corrupted data
If this is the case we need to replace the disk labeled NETSTOR4 with a new one and add it to zpool mirror.
First, physically remove the faulty disk from the server and replace it with a new disk.
After replacement, we should see a new disk in /dev/disk/by-id/
# ls -lah /dev/disk/by-id total 0 drwxr-xr-x 2 root root 480 Srp 27 08:57 . drwxr-xr-x 7 root root 140 Srp 27 08:13 .. lrwxrwxrwx 1 root root 9 Srp 27 08:13 ata-INTEL_SSDSC2CW060A3_CVCV308402M3060AGN -> ../../sde lrwxrwxrwx 1 root root 10 Srp 27 08:13 ata-INTEL_SSDSC2CW060A3_CVCV308402M3060AGN-part1 -> ../../sde1 lrwxrwxrwx 1 root root 10 Srp 27 08:13 ata-INTEL_SSDSC2CW060A3_CVCV308402M3060AGN-part2 -> ../../sde2 lrwxrwxrwx 1 root root 10 Srp 27 08:13 ata-INTEL_SSDSC2CW060A3_CVCV308402M3060AGN-part9 -> ../../sde9 lrwxrwxrwx 1 root root 9 Srp 27 08:13 ata-ST31000520AS_5VX0BZN0 -> ../../sda lrwxrwxrwx 1 root root 10 Srp 27 08:13 ata-ST31000520AS_5VX0BZN0-part1 -> ../../sda1 lrwxrwxrwx 1 root root 9 Srp 27 08:13 ata-WDC_WD10JFCX-68N6GN0_WD-WX61A465TH1Y -> ../../sdc lrwxrwxrwx 1 root root 10 Srp 27 08:13 ata-WDC_WD10JFCX-68N6GN0_WD-WX61A465TH1Y-part1 -> ../../sdc1 lrwxrwxrwx 1 root root 9 Srp 27 08:13 ata-WDC_WD10JFCX-68N6GN0_WD-WX81EC512Y4H -> ../../sdd lrwxrwxrwx 1 root root 10 Srp 27 08:13 ata-WDC_WD10JFCX-68N6GN0_WD-WX81EC512Y4H-part1 -> ../../sdd1
lrwxrwxrwx 1 root root 9 Srp 27 08:57 ata-WDC_WD10JFCX-68N6GN0_WD-WXK1E6458WKX -> ../../sdb
lrwxrwxrwx 1 root root 9 Srp 27 08:13 wwn-0x10076999618641940481x -> ../../sdd lrwxrwxrwx 1 root root 10 Srp 27 08:13 wwn-0x10076999618641940481x-part1 -> ../../sdd1 lrwxrwxrwx 1 root root 9 Srp 27 08:13 wwn-0x11689569317835657217x -> ../../sdc lrwxrwxrwx 1 root root 10 Srp 27 08:13 wwn-0x11689569317835657217x-part1 -> ../../sdc1 lrwxrwxrwx 1 root root 9 Srp 27 08:57 wwn-0x11769037186453098497x -> ../../sdb lrwxrwxrwx 1 root root 9 Srp 27 08:13 wwn-0x12757853320186451405x -> ../../sde lrwxrwxrwx 1 root root 10 Srp 27 08:13 wwn-0x12757853320186451405x-part1 -> ../../sde1 lrwxrwxrwx 1 root root 10 Srp 27 08:13 wwn-0x12757853320186451405x-part2 -> ../../sde2 lrwxrwxrwx 1 root root 10 Srp 27 08:13 wwn-0x12757853320186451405x-part9 -> ../../sde9 lrwxrwxrwx 1 root root 9 Srp 27 08:13 wwn-0x7847552951345238016x -> ../../sda lrwxrwxrwx 1 root root 10 Srp 27 08:13 wwn-0x7847552951345238016x-part1 -> ../../sda1
Now when we have a block device name, we can make a table, partition, and prepare the drive for usage.
Use parted to make partition table for new logical drives.
~# parted /dev/sdb --script -- mktable gpt
And create a new label.
IMPORTANT: label must be named in the following format: NETSTORx.
Where “NETSTOR” comes from the server storage and “x” is the drive number.
So, in our example (NETSTOR4):
1. NETSTOR - this means virtual pool on storage SERVER
2. 4 - this is the number of the disk (disk 4)
Now add a label to the new drive,
~# parted /dev/sdb --script -- mkpart "NETSTOR4" 1 -1
To see changes use the next command:
~ # ls -lah /dev/disk/by-partlabel/ total 0 drwxr-xr-x 2 root root 160 Srp 27 09:07 . drwxr-xr-x 7 root root 140 Srp 27 08:13 .. lrwxrwxrwx 1 root root 10 Srp 27 08:13 grub -> ../../sde2 lrwxrwxrwx 1 root root 10 Srp 27 08:13 NETSTOR1 -> ../../sdc1 lrwxrwxrwx 1 root root 10 Srp 27 08:13 NETSTOR2 -> ../../sdd1 lrwxrwxrwx 1 root root 10 Srp 27 08:13 NETSTOR3 -> ../../sda1 lrwxrwxrwx 1 root root 10 Srp 27 09:07 NETSTOR4 -> ../../sdb1 lrwxrwxrwx 1 root root 10 Srp 27 08:13 zfs-67f0216ac0cdfd49 -> ../../sde1
We have added a new disk to the system, created a partition and label. Now we have to add this disk to the storage pool.
First, we need to change guid of the old disk to guid of the new disk, so that zpool can identify the new disk.
To change guid from old to new in zpool, first we need to find out the new guid.
We can use zdb command to find out:
~ # zdb NETSTOR: version: 5000 name: 'NETSTOR' state: 0 txg: 3690 pool_guid: 1509362615723986299 errata: 0 hostname: 'PoosyNebulaC' vdev_children: 2 vdev_tree: type: 'root' id: 0 guid: 1509362615723986299 children[0]: type: 'mirror' id: 0 guid: 9114691413196561361 metaslab_array: 35 metaslab_shift: 33 ashift: 12 asize: 1000197849088 is_log: 0 create_txg: 4 children[0]: type: 'disk' id: 0 guid: 17673948060813426502 path: '/dev/disk/by-partlabel/NETSTOR1' whole_disk: 1 create_txg: 4 children[1]: type: 'disk' id: 1 guid: 15948717213456048677 path: '/dev/disk/by-partlabel/NETSTOR2' whole_disk: 1 create_txg: 4 children[1]: type: 'mirror' id: 1 guid: 4857650377060724882 metaslab_array: 58 metaslab_shift: 33 ashift: 12 asize: 1000198373376 is_log: 0 create_txg: 224 children[0]: type: 'disk' id: 0 guid: 7084287257001519327 path: '/dev/disk/by-partlabel/NETSTOR3' whole_disk: 0 create_txg: 224 children[1]: type: 'disk' id: 1
guid: 3813061267827485888 path: '/dev/disk/by-partlabel/NETSTOR4'
whole_disk: 0 create_txg: 224 features_for_read: com.delphix:hole_birth com.delphix:embedded_data
The important line for from zdb output:
guid: 3813061267827485888 path: '/dev/disk/by-partlabel/NETSTOR4'
Using this guid we need to replace the old NETSTOR4 with the new NETSTOR4 disk.
The guid part needs to be updated to zpool.
We can update guid with the command:
~# zpool replace NETSTOR <old_guid> <path_to_HDD> -f
Example:
~# zpool replace NETSTOR 3813061267827485888 /dev/disk/by-partlabel/NETSTOR4 -f
This is the end of the procedure for replacing the disk.
To see status of the pool type:
~ # zpool status
Now restart swhspared deamon to update GUI information.
~# /etc/init.d/swhspared restart