1. Bicom Systems
  2. Solution home
  3. SERVERware
  4. HOWTOs

SERVERware 4 Standalone Edition Storage Pool Faulty Disk Replacement

SERVERware 4 StandAlone Edition Storage Pool Faulty Disk Replacement

When one of the disks from the storage pool is damaged, the next procedure should be followed:

If a disk fails, zpool will be in the state: DEGRADED.

We need to log in to the ssh on the storage server and check the status of zpool .

~ # zpool status 
  pool: NETSTOR
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: none requested
config:

    NAME          STATE     READ WRITE CKSUM
    NETSTOR       DEGRADED     0     0     0
      mirror-0    ONLINE       0     0     0
        NETSTOR1  ONLINE       0     0     0
        NETSTOR2  ONLINE       0     0     0
      mirror-1    DEGRADED     0     0     0
        NETSTOR3  ONLINE       0     0     0
        NETSTOR4  UNAVAIL      0   158     0  corrupted data


In the output from the zpool status, we can see that NETSTOR4 is corrupted.

NETSTOR4  UNAVAIL      0   158     0  corrupted data


If this is the case we need to replace the disk labeled NETSTOR4 with a new one and add it to zpool mirror.

First, physically remove the faulty disk from the server and replace it with a new disk.

After replacement, we should see a new disk in /dev/disk/by-id/

# ls -lah /dev/disk/by-id
total 0
drwxr-xr-x 2 root root 480 Srp 27 08:57 .
drwxr-xr-x 7 root root 140 Srp 27 08:13 ..
lrwxrwxrwx 1 root root   9 Srp 27 08:13 ata-INTEL_SSDSC2CW060A3_CVCV308402M3060AGN -> ../../sde
lrwxrwxrwx 1 root root  10 Srp 27 08:13 ata-INTEL_SSDSC2CW060A3_CVCV308402M3060AGN-part1 -> ../../sde1
lrwxrwxrwx 1 root root  10 Srp 27 08:13 ata-INTEL_SSDSC2CW060A3_CVCV308402M3060AGN-part2 -> ../../sde2
lrwxrwxrwx 1 root root  10 Srp 27 08:13 ata-INTEL_SSDSC2CW060A3_CVCV308402M3060AGN-part9 -> ../../sde9
lrwxrwxrwx 1 root root   9 Srp 27 08:13 ata-ST31000520AS_5VX0BZN0 -> ../../sda
lrwxrwxrwx 1 root root  10 Srp 27 08:13 ata-ST31000520AS_5VX0BZN0-part1 -> ../../sda1
lrwxrwxrwx 1 root root   9 Srp 27 08:13 ata-WDC_WD10JFCX-68N6GN0_WD-WX61A465TH1Y -> ../../sdc
lrwxrwxrwx 1 root root  10 Srp 27 08:13 ata-WDC_WD10JFCX-68N6GN0_WD-WX61A465TH1Y-part1 -> ../../sdc1
lrwxrwxrwx 1 root root   9 Srp 27 08:13 ata-WDC_WD10JFCX-68N6GN0_WD-WX81EC512Y4H -> ../../sdd
lrwxrwxrwx 1 root root  10 Srp 27 08:13 ata-WDC_WD10JFCX-68N6GN0_WD-WX81EC512Y4H-part1 -> ../../sdd1
lrwxrwxrwx 1 root root   9 Srp 27 08:57 ata-WDC_WD10JFCX-68N6GN0_WD-WXK1E6458WKX -> ../../sdb
lrwxrwxrwx 1 root root   9 Srp 27 08:13 wwn-0x10076999618641940481x -> ../../sdd
lrwxrwxrwx 1 root root  10 Srp 27 08:13 wwn-0x10076999618641940481x-part1 -> ../../sdd1
lrwxrwxrwx 1 root root   9 Srp 27 08:13 wwn-0x11689569317835657217x -> ../../sdc
lrwxrwxrwx 1 root root  10 Srp 27 08:13 wwn-0x11689569317835657217x-part1 -> ../../sdc1
lrwxrwxrwx 1 root root   9 Srp 27 08:57 wwn-0x11769037186453098497x -> ../../sdb
lrwxrwxrwx 1 root root   9 Srp 27 08:13 wwn-0x12757853320186451405x -> ../../sde
lrwxrwxrwx 1 root root  10 Srp 27 08:13 wwn-0x12757853320186451405x-part1 -> ../../sde1
lrwxrwxrwx 1 root root  10 Srp 27 08:13 wwn-0x12757853320186451405x-part2 -> ../../sde2
lrwxrwxrwx 1 root root  10 Srp 27 08:13 wwn-0x12757853320186451405x-part9 -> ../../sde9
lrwxrwxrwx 1 root root   9 Srp 27 08:13 wwn-0x7847552951345238016x -> ../../sda
lrwxrwxrwx 1 root root  10 Srp 27 08:13 wwn-0x7847552951345238016x-part1 -> ../../sda1



Now when we have a block device name, we can make a table, partition, and prepare the drive for usage.

Use parted to make partition table for new logical drives.

~# parted /dev/sdb --script -- mktable gpt


And create a new label.

IMPORTANT: label must be named in the following format: NETSTORx.
Where “NETSTOR” comes from the server storage and “x” is the drive number.

So, in our example (NETSTOR4):

1. NETSTOR - this means virtual pool on storage SERVER
2. 4 - this is the number of the disk (disk 4)


Now add a label to the new drive,

~# parted /dev/sdb --script -- mkpart "NETSTOR4" 1 -1


To see changes use the next command:

~ # ls -lah /dev/disk/by-partlabel/
total 0
drwxr-xr-x 2 root root 160 Srp 27 09:07 .
drwxr-xr-x 7 root root 140 Srp 27 08:13 ..
lrwxrwxrwx 1 root root  10 Srp 27 08:13 grub -> ../../sde2
lrwxrwxrwx 1 root root  10 Srp 27 08:13 NETSTOR1 -> ../../sdc1
lrwxrwxrwx 1 root root  10 Srp 27 08:13 NETSTOR2 -> ../../sdd1
lrwxrwxrwx 1 root root  10 Srp 27 08:13 NETSTOR3 -> ../../sda1
lrwxrwxrwx 1 root root  10 Srp 27 09:07 NETSTOR4 -> ../../sdb1
lrwxrwxrwx 1 root root  10 Srp 27 08:13 zfs-67f0216ac0cdfd49 -> ../../sde1


We have added a new disk to the system, created a partition and label. Now we have to add this disk to the storage pool.


First, we need to change guid of the old disk to guid of the new disk, so that zpool can identify the new disk.

To change guid from old to new in zpool, first we need to find out the new guid.

We can use zdb command to find out: 

~ # zdb 
NETSTOR:
    version: 5000
    name: 'NETSTOR'
    state: 0
    txg: 3690
    pool_guid: 1509362615723986299
    errata: 0
    hostname: 'PoosyNebulaC'
    vdev_children: 2
    vdev_tree:
        type: 'root'
        id: 0
        guid: 1509362615723986299
        children[0]:
            type: 'mirror'
            id: 0
            guid: 9114691413196561361
            metaslab_array: 35
            metaslab_shift: 33
            ashift: 12
            asize: 1000197849088
            is_log: 0
            create_txg: 4
            children[0]:
                type: 'disk'
                id: 0
                guid: 17673948060813426502
                path: '/dev/disk/by-partlabel/NETSTOR1'
                whole_disk: 1
                create_txg: 4
            children[1]:
                type: 'disk'
                id: 1
                guid: 15948717213456048677
                path: '/dev/disk/by-partlabel/NETSTOR2'
                whole_disk: 1
                create_txg: 4
        children[1]:
            type: 'mirror'
            id: 1
            guid: 4857650377060724882
            metaslab_array: 58
            metaslab_shift: 33
            ashift: 12
            asize: 1000198373376
            is_log: 0
            create_txg: 224
            children[0]:
                type: 'disk'
                id: 0
                guid: 7084287257001519327
                path: '/dev/disk/by-partlabel/NETSTOR3'
                whole_disk: 0
                create_txg: 224
            children[1]:
                type: 'disk'
                id: 1


guid: 3813061267827485888
                path: '/dev/disk/by-partlabel/NETSTOR4'


whole_disk: 0
                create_txg: 224
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data


The important line for from zdb output:

guid: 3813061267827485888
path: '/dev/disk/by-partlabel/NETSTOR4'


Using this guid we need to replace the old NETSTOR4 with the new NETSTOR4 disk.

The guid part needs to be updated to zpool.

We can update guid with the command:

~# zpool replace NETSTOR <old_guid> <path_to_HDD> -f


Example:

~# zpool replace NETSTOR 3813061267827485888 /dev/disk/by-partlabel/NETSTOR4 -f


This is the end of the procedure for replacing the disk.

To see status of the pool type:

~ # zpool status


Now restart swhspared deamon to update GUI information.

~# /etc/init.d/swhspared restart