当前位置: 首页>>技术问答>>正文


替换zpool中的死磁盘

, , ,

问题描述

我正在使用本机ZFS运行Ubuntu Server 13.04 64位。我有一个zpool由4个硬盘驱动器组成,其中一个硬盘驱动器昨天死亡,现在不再被操作系统或BIOS识别。

不幸的是我只在下次重启后才看到问题,所以现在缺少驱动器标签,我无法使用官方说明herehere替换磁盘。

zpool status hermes -x

版画

root@zeus:~# zpool status hermes -x
  pool: hermes
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub repaired 0 in 2h4m with 0 errors on Sun Jun  9 00:28:24 2013
config:

        NAME                         STATE     READ WRITE CKSUM
        hermes                       DEGRADED     0     0     0
          raidz1-0                   DEGRADED     0     0     0
            ata-ST3300620A_5QF0MJFP  ONLINE       0     0     0
            ata-ST3300831A_5NF0552X  UNAVAIL      0     0     0
            ata-ST3200822A_5LJ1CHMS  ONLINE       0     0     0
            ata-ST3200822A_3LJ0189C  ONLINE       0     0     0

errors: No known data errors

我已经更换了一个新的驱动器(它有标签/dev/disk/by-id/ata-ST3500320AS_9QM03ATQ)

任何一个命令

zpool replace hermes /dev/disk/by-id/ata-ST3300831A_5NF0552X /dev/disk/by-id/ata-ST3500320AS_9QM03ATQ
zpool offline hermes /dev/disk/by-id/ata-ST3300831A_5NF0552X
zpool detatch hermes /dev/disk/by-id/ata-ST3300831A_5NF0552X

失败了

root@zeus:~# zpool offline hermes /dev/disk/by-id/ata-ST3300831A_5NF0552X
cannot offline /dev/disk/by-id/ata-ST3300831A_5NF0552X: no such device in pool

因为死亡的驱动器的标签不再存在于系统中。我也尝试了上面的命令,省略了驱动器标签的路径无济于事。

如何更换”ghost”磁盘?

最佳解决办法

在今晚无休止地挖掘之后,我终于找到了解决方案。简而言之,您可以使用zpool命令使用磁盘的GUID(即使在断开驱动器后仍然存在)。

答案很长:我使用zdb命令得到了磁盘的GUID,这给了我以下输出

root@zeus:/dev# zdb
hermes:
    version: 28
    name: 'hermes'
    state: 0
    txg: 162804
    pool_guid: 14829240649900366534
    hostname: 'zeus'
    vdev_children: 1
    vdev_tree:
        type: 'root'
        id: 0
        guid: 14829240649900366534
        children[0]:
            type: 'raidz'
            id: 0
            guid: 5355850150368902284
            nparity: 1
            metaslab_array: 31
            metaslab_shift: 32
            ashift: 9
            asize: 791588896768
            is_log: 0
            create_txg: 4
            children[0]:
                type: 'disk'
                id: 0
                guid: 11426107064765252810
                path: '/dev/disk/by-id/ata-ST3300620A_5QF0MJFP-part2'
                phys_path: '/dev/gptid/73b31683-537f-11e2-bad7-50465d4eb8b0'
                whole_disk: 1
                create_txg: 4
            children[1]:
                type: 'disk'
                id: 1
                guid: 15935140517898495532
                path: '/dev/disk/by-id/ata-ST3300831A_5NF0552X-part2'
                phys_path: '/dev/gptid/746c949a-537f-11e2-bad7-50465d4eb8b0'
                whole_disk: 1
                create_txg: 4
            children[2]:
                type: 'disk'
                id: 2
                guid: 7183706725091321492
                path: '/dev/disk/by-id/ata-ST3200822A_5LJ1CHMS-part2'
                phys_path: '/dev/gptid/7541115a-537f-11e2-bad7-50465d4eb8b0'
                whole_disk: 1
                create_txg: 4
            children[3]:
                type: 'disk'
                id: 3
                guid: 17196042497722925662
                path: '/dev/disk/by-id/ata-ST3200822A_3LJ0189C-part2'
                phys_path: '/dev/gptid/760a94ee-537f-11e2-bad7-50465d4eb8b0'
                whole_disk: 1
                create_txg: 4
    features_for_read:

我正在寻找的GUID是15935140517898495532,它使我能够做到

root@zeus:/dev# zpool offline hermes 15935140517898495532
root@zeus:/dev# zpool status
  pool: hermes
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
  scan: scrub repaired 0 in 2h4m with 0 errors on Sun Jun  9 00:28:24 2013
config:

        NAME                         STATE     READ WRITE CKSUM
        hermes                       DEGRADED     0     0     0
          raidz1-0                   DEGRADED     0     0     0
            ata-ST3300620A_5QF0MJFP  ONLINE       0     0     0
            ata-ST3300831A_5NF0552X  OFFLINE      0     0     0
            ata-ST3200822A_5LJ1CHMS  ONLINE       0     0     0
            ata-ST3200822A_3LJ0189C  ONLINE       0     0     0

errors: No known data errors

然后

root@zeus:/dev# zpool replace hermes 15935140517898495532 /dev/disk/by-id/ata-ST3500320AS_9QM03ATQ
root@zeus:/dev# zpool status
  pool: hermes
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun Jun  9 01:44:36 2013
    408M scanned out of 419G at 20,4M/s, 5h50m to go
    101M resilvered, 0,10% done
config:

        NAME                            STATE     READ WRITE CKSUM
        hermes                          DEGRADED     0     0     0
          raidz1-0                      DEGRADED     0     0     0
            ata-ST3300620A_5QF0MJFP     ONLINE       0     0     0
            replacing-1                 OFFLINE      0     0     0
              ata-ST3300831A_5NF0552X   OFFLINE      0     0     0
              ata-ST3500320AS_9QM03ATQ  ONLINE       0     0     0  (resilvering)
            ata-ST3200822A_5LJ1CHMS     ONLINE       0     0     0
            ata-ST3200822A_3LJ0189C     ONLINE       0     0     0

errors: No known data errors

重新启动完成后,一切都运行良好。包含此信息可能会很好,您可以使用通过zdb获得的磁盘的GUID和zpool命令,以及zpool的联机帮助页。

编辑

正如下面的durval指出的那样,zdb命令可能无法输出任何内容。然后你可以尝试使用

zdb -l /dev/<name-of-device>

明确列出有关设备的信息(即使它已从系统中丢失)。

次佳解决办法

问题是磁盘由ID引用而不是由设备引用。

这是一个应该工作的解决方法:

ln -s /dev/null /dev/ata-ST3300831A_5NF0552X
zpool export hermes
zpool import hermes
zpool status
# note the new device name that should appear here
zpool offline hermes xxxx
zpool replace hermes xxxx /dev/disk/by-id/ata-ST3500320AS_9QM03ATQ

编辑:我迟到了30秒……

第三种解决办法

@Marcus:感谢您对自己的问题发表了这个优秀的答案,这对我帮助很大。

前几天我发现了一个可能让你感兴趣的转折(以及将来a-googling的任何其他人):我有一个缓存设备从池中删除(并标记为”UNAVAIL”)由于同样的错误(ZFS-8000) -4J,“标签丢失或无效”),尝试脱机/删除/替换它失败的完全相同的“池中没有这样的设备”消息。

但是,当我尝试应用您的解决方案时,普通的”zdb”(没有参数)没有列出设备,更不用说它的GUID了。

经过一番挖掘,我发现“zdb -l /dev /DEVICENAME”列出了GUID(直接从设备中获取,而不是从池记录中获取),并使用该GUID使我能够进行替换(实际上我做了一个”zpool offline”接着是”zpool remove”,然后是”zpool add”,效果很好。

参考资料

本文由Ubuntu问答整理, 博文地址: https://ubuntuqa.com/article/7115.html,未经允许,请勿转载。