當前位置: 首頁>>技術問答>>正文


替換zpool中的死磁盤

, , ,

問題描述

我正在使用本機ZFS運行Ubuntu Server 13.04 64位。我有一個zpool由4個硬盤驅動器組成,其中一個硬盤驅動器昨天死亡,現在不再被操作係統或BIOS識別。

不幸的是我隻在下次重啟後才看到問題,所以現在缺少驅動器標簽,我無法使用官方說明herehere替換磁盤。

zpool status hermes -x

版畫

root@zeus:~# zpool status hermes -x
  pool: hermes
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub repaired 0 in 2h4m with 0 errors on Sun Jun  9 00:28:24 2013
config:

        NAME                         STATE     READ WRITE CKSUM
        hermes                       DEGRADED     0     0     0
          raidz1-0                   DEGRADED     0     0     0
            ata-ST3300620A_5QF0MJFP  ONLINE       0     0     0
            ata-ST3300831A_5NF0552X  UNAVAIL      0     0     0
            ata-ST3200822A_5LJ1CHMS  ONLINE       0     0     0
            ata-ST3200822A_3LJ0189C  ONLINE       0     0     0

errors: No known data errors

我已經更換了一個新的驅動器(它有標簽/dev/disk/by-id/ata-ST3500320AS_9QM03ATQ)

任何一個命令

zpool replace hermes /dev/disk/by-id/ata-ST3300831A_5NF0552X /dev/disk/by-id/ata-ST3500320AS_9QM03ATQ
zpool offline hermes /dev/disk/by-id/ata-ST3300831A_5NF0552X
zpool detatch hermes /dev/disk/by-id/ata-ST3300831A_5NF0552X

失敗了

root@zeus:~# zpool offline hermes /dev/disk/by-id/ata-ST3300831A_5NF0552X
cannot offline /dev/disk/by-id/ata-ST3300831A_5NF0552X: no such device in pool

因為死亡的驅動器的標簽不再存在於係統中。我也嘗試了上麵的命令,省略了驅動器標簽的路徑無濟於事。

如何更換”ghost”磁盤?

最佳解決辦法

在今晚無休止地挖掘之後,我終於找到了解決方案。簡而言之,您可以使用zpool命令使用磁盤的GUID(即使在斷開驅動器後仍然存在)。

答案很長:我使用zdb命令得到了磁盤的GUID,這給了我以下輸出

root@zeus:/dev# zdb
hermes:
    version: 28
    name: 'hermes'
    state: 0
    txg: 162804
    pool_guid: 14829240649900366534
    hostname: 'zeus'
    vdev_children: 1
    vdev_tree:
        type: 'root'
        id: 0
        guid: 14829240649900366534
        children[0]:
            type: 'raidz'
            id: 0
            guid: 5355850150368902284
            nparity: 1
            metaslab_array: 31
            metaslab_shift: 32
            ashift: 9
            asize: 791588896768
            is_log: 0
            create_txg: 4
            children[0]:
                type: 'disk'
                id: 0
                guid: 11426107064765252810
                path: '/dev/disk/by-id/ata-ST3300620A_5QF0MJFP-part2'
                phys_path: '/dev/gptid/73b31683-537f-11e2-bad7-50465d4eb8b0'
                whole_disk: 1
                create_txg: 4
            children[1]:
                type: 'disk'
                id: 1
                guid: 15935140517898495532
                path: '/dev/disk/by-id/ata-ST3300831A_5NF0552X-part2'
                phys_path: '/dev/gptid/746c949a-537f-11e2-bad7-50465d4eb8b0'
                whole_disk: 1
                create_txg: 4
            children[2]:
                type: 'disk'
                id: 2
                guid: 7183706725091321492
                path: '/dev/disk/by-id/ata-ST3200822A_5LJ1CHMS-part2'
                phys_path: '/dev/gptid/7541115a-537f-11e2-bad7-50465d4eb8b0'
                whole_disk: 1
                create_txg: 4
            children[3]:
                type: 'disk'
                id: 3
                guid: 17196042497722925662
                path: '/dev/disk/by-id/ata-ST3200822A_3LJ0189C-part2'
                phys_path: '/dev/gptid/760a94ee-537f-11e2-bad7-50465d4eb8b0'
                whole_disk: 1
                create_txg: 4
    features_for_read:

我正在尋找的GUID是15935140517898495532,它使我能夠做到

root@zeus:/dev# zpool offline hermes 15935140517898495532
root@zeus:/dev# zpool status
  pool: hermes
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
  scan: scrub repaired 0 in 2h4m with 0 errors on Sun Jun  9 00:28:24 2013
config:

        NAME                         STATE     READ WRITE CKSUM
        hermes                       DEGRADED     0     0     0
          raidz1-0                   DEGRADED     0     0     0
            ata-ST3300620A_5QF0MJFP  ONLINE       0     0     0
            ata-ST3300831A_5NF0552X  OFFLINE      0     0     0
            ata-ST3200822A_5LJ1CHMS  ONLINE       0     0     0
            ata-ST3200822A_3LJ0189C  ONLINE       0     0     0

errors: No known data errors

然後

root@zeus:/dev# zpool replace hermes 15935140517898495532 /dev/disk/by-id/ata-ST3500320AS_9QM03ATQ
root@zeus:/dev# zpool status
  pool: hermes
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun Jun  9 01:44:36 2013
    408M scanned out of 419G at 20,4M/s, 5h50m to go
    101M resilvered, 0,10% done
config:

        NAME                            STATE     READ WRITE CKSUM
        hermes                          DEGRADED     0     0     0
          raidz1-0                      DEGRADED     0     0     0
            ata-ST3300620A_5QF0MJFP     ONLINE       0     0     0
            replacing-1                 OFFLINE      0     0     0
              ata-ST3300831A_5NF0552X   OFFLINE      0     0     0
              ata-ST3500320AS_9QM03ATQ  ONLINE       0     0     0  (resilvering)
            ata-ST3200822A_5LJ1CHMS     ONLINE       0     0     0
            ata-ST3200822A_3LJ0189C     ONLINE       0     0     0

errors: No known data errors

重新啟動完成後,一切都運行良好。包含此信息可能會很好,您可以使用通過zdb獲得的磁盤的GUID和zpool命令,以及zpool的聯機幫助頁。

編輯

正如下麵的durval指出的那樣,zdb命令可能無法輸出任何內容。然後你可以嘗試使用

zdb -l /dev/<name-of-device>

明確列出有關設備的信息(即使它已從係統中丟失)。

次佳解決辦法

問題是磁盤由ID引用而不是由設備引用。

這是一個應該工作的解決方法:

ln -s /dev/null /dev/ata-ST3300831A_5NF0552X
zpool export hermes
zpool import hermes
zpool status
# note the new device name that should appear here
zpool offline hermes xxxx
zpool replace hermes xxxx /dev/disk/by-id/ata-ST3500320AS_9QM03ATQ

編輯:我遲到了30秒……

第三種解決辦法

@Marcus:感謝您對自己的問題發表了這個優秀的答案,這對我幫助很大。

前幾天我發現了一個可能讓你感興趣的轉折(以及將來a-googling的任何其他人):我有一個緩存設備從池中刪除(並標記為”UNAVAIL”)由於同樣的錯誤(ZFS-8000) -4J,“標簽丟失或無效”),嘗試脫機/刪除/替換它失敗的完全相同的“池中沒有這樣的設備”消息。

但是,當我嘗試應用您的解決方案時,普通的”zdb”(沒有參數)沒有列出設備,更不用說它的GUID了。

經過一番挖掘,我發現“zdb -l /dev /DEVICENAME”列出了GUID(直接從設備中獲取,而不是從池記錄中獲取),並使用該GUID使我能夠進行替換(實際上我做了一個”zpool offline”接著是”zpool remove”,然後是”zpool add”,效果很好。

參考資料

本文由Ubuntu問答整理, 博文地址: https://ubuntuqa.com/zh-tw/article/7115.html,未經允許,請勿轉載。