« Back to all recent discussions

NAS540 hard disk failure reconstruction abnormality

RiceCRiceC Posts: 10  Junior Member
edited November 21 in Questions
Dear Sir,
After I replaced the failed hard disk, the place of resyncing has been stuck at 0.2%. What is the problem?


#NAS_Nov_2019
«1

Answers

  • RiceCRiceC Posts: 10  Junior Member
    unused devices: <none>
    ~ $ cat /proc/mdstat
    Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
    md2 : active raid5 sdb3[4] sda3[5] sdc3[6] sdd3[2]
          17569173504 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [_UUU]
          [>....................]  recovery =  0.2% (16664192/5856391168) finish=16375.7min speed=5943K/sec

    md1 : active raid1 sda2[6] sdb2[5] sdc2[4] sdd2[7]
          1998784 blocks super 1.2 [4/4] [UUUU]

    md0 : active raid1 sda1[6] sdb1[5] sdc1[7] sdd1[4]
          1997760 blocks super 1.2 [4/4] [UUUU]

    ~ $ cat /proc/mdstat
    Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
    md2 : active raid5 sdb3[4] sda3[5](S) sdc3[6] sdd3[2](F)
          17569173504 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/2] [_U_U]

    md1 : active raid1 sda2[6] sdb2[5] sdc2[4] sdd2[7]
          1998784 blocks super 1.2 [4/4] [UUUU]

    md0 : active raid1 sda1[6] sdb1[5] sdc1[7] sdd1[4]
          1997760 blocks super 1.2 [4/4] [UUUU]

    unused devices: <none>

  • MijzelfMijzelf Posts: 897  Heroic Warrior Member
    edited November 21
    md2 : active raid5 sdb3[4] sda3[5](S) sdc3[6] sdd3[2](F)
    Disk sdd (probably disk 4) failed while rebuilding the array. Now the array is down, as 2 disks is not enough to build the array.
    Does SMART say anything about disk4?

  • RiceCRiceC Posts: 10  Junior Member
    Dear Sir,
    The hard drive's SMART is normal. 
    I feel that there is a problem with a certain block of the hard disk. Is there a solution?
    Thanks a lot.
  • MijzelfMijzelf Posts: 897  Heroic Warrior Member
    Your disk sdd has an hardware failure, sector 41578064 cannot be read. That is around 20GB, or around 16GB from the start of data partition. As you can read in the log, the array starts re-syncing at 78 seconds, and the failure pops up at 1200 seconds, which means the array was re-syncing at about 16GB/1122seconds=14.2MB/sec. That is low, so I think there are more problems with this disk. Strange that SMART is OK.

    It is possible that sector 41578064 is not in use. The raidmanager cannot know, as it functions below the filesystem level, and so it syncs everything, no matter if in use or not.
    So it is possible that if you re-create this (degraded) array from the command line, using --assume-clean, that you can copy away al your files, without triggering this error again.
    However, as the slow sync-speed suggests that there is more wrong with that disk, it is possible that the disk will die during the copy.
    When your data is valuable, I think the only sane way to handle this is by making a bitwise copy from disk sdd to a new disk, using dd_rescue or a similar tool. Then re-create the degraded array manually with the new disk, and using --assume-clean.
    And then you can add a new 4th disk.

    BTW, your array seems to be 16.3TiB in size. Are you running firmware <5.10? Since firmware 5.10 a volume can't exceed 16TiB.






  • RiceCRiceC Posts: 10  Junior Member
    Dear Sir,
    Thank you very much for your reply.
    I tried to use the dd_rescue tool for bitwise copying.
    
    This is followed by re-creating the degraded array with --assume-clean and then adding a new disk.
    Any effect, I will come back again.
    
    The firmware currently in use is V5.21 (AATB.3)
    


  • RiceCRiceC Posts: 10  Junior Member
    Dear Sir,

    I am not sure how to give instructions to ensure that data is not lost and restore the RAID mechanism. Can you please help?
    ex: mdadm --create --assume-clean?
  • MijzelfMijzelf Posts: 897  Heroic Warrior Member
    Can you post the output of

    mdadm --examine /dev/sd[abcd]3
  • RiceCRiceC Posts: 10  Junior Member
    Dear Sir, 

    ~ # mdadm --examine /dev/sd[abcd]3
    /dev/sda3:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x2
         Array UUID : 28524431:c959c258:2ab11b6d:2bb4adc1
               Name : NAS540:2  (local to host NAS540)
      Creation Time : Tue Nov 10 15:29:20 2015
         Raid Level : raid5
       Raid Devices : 4

     Avail Dev Size : 11712782336 (5585.09 GiB 5996.94 GB)
         Array Size : 17569173504 (16755.27 GiB 17990.83 GB)
        Data Offset : 262144 sectors
       Super Offset : 8 sectors
    Recovery Offset : 33356152 sectors
              State : clean
        Device UUID : a40e16eb:f6263576:1bef532d:551ba599

        Update Time : Thu Dec  5 11:34:31 2019
           Checksum : 1c01ab93 - correct
             Events : 234561

             Layout : left-symmetric
         Chunk Size : 64K

       Device Role : Active device 0
       Array State : AA.A ('A' == active, '.' == missing)
    /dev/sdb3:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : 28524431:c959c258:2ab11b6d:2bb4adc1
               Name : NAS540:2  (local to host NAS540)
      Creation Time : Tue Nov 10 15:29:20 2015
         Raid Level : raid5
       Raid Devices : 4

     Avail Dev Size : 11712782336 (5585.09 GiB 5996.94 GB)
         Array Size : 17569173504 (16755.27 GiB 17990.83 GB)
        Data Offset : 262144 sectors
       Super Offset : 8 sectors
              State : clean
        Device UUID : 99f23022:bdaedd7f:c125470f:ef1827d9

        Update Time : Thu Dec  5 11:34:31 2019
           Checksum : 7c3d1e8b - correct
             Events : 234561

             Layout : left-symmetric
         Chunk Size : 64K

       Device Role : Active device 1
       Array State : AA.A ('A' == active, '.' == missing)
    /dev/sdc3:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : 28524431:c959c258:2ab11b6d:2bb4adc1
               Name : NAS540:2  (local to host NAS540)
      Creation Time : Tue Nov 10 15:29:20 2015
         Raid Level : raid5
       Raid Devices : 4

     Avail Dev Size : 11712782336 (5585.09 GiB 5996.94 GB)
         Array Size : 17569173504 (16755.27 GiB 17990.83 GB)
        Data Offset : 262144 sectors
       Super Offset : 8 sectors
              State : clean
        Device UUID : d12d0b15:04babef0:d036cc64:dbc69dcb

        Update Time : Thu Dec  5 11:34:31 2019
           Checksum : 27f45190 - correct
             Events : 234561

             Layout : left-symmetric
         Chunk Size : 64K

       Device Role : Active device 3
       Array State : AA.A ('A' == active, '.' == missing)
    /dev/sdd3:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : 28524431:c959c258:2ab11b6d:2bb4adc1
               Name : NAS540:2  (local to host NAS540)
      Creation Time : Tue Nov 10 15:29:20 2015
         Raid Level : raid5
       Raid Devices : 4

     Avail Dev Size : 11712782336 (5585.09 GiB 5996.94 GB)
         Array Size : 17569173504 (16755.27 GiB 17990.83 GB)
        Data Offset : 262144 sectors
       Super Offset : 8 sectors
              State : clean
        Device UUID : 77d95206:b3029d75:0ee7a4e3:1c5b8cd8

        Update Time : Thu Dec  5 11:28:22 2019
           Checksum : 29e08eee - correct
             Events : 234556

             Layout : left-symmetric
         Chunk Size : 64K

       Device Role : Active device 2
       Array State : AAAA ('A' == active, '.' == missing)
    ~ #

  • MijzelfMijzelf Posts: 897  Heroic Warrior Member
    This is hard to interpret. According to this dump, the array is up, yet degraded.

    The volume was created on Tue Nov 10 15:29 2015. Today, at 11:28 (localtime?) disk sdd is dropped, and the rest of the disks were last updated on 11:34. Those disks agree that they're up with 3 members.

    So according to this dump it makes no sense to recreate the array, as it's up. Don't know what to say.
  • RiceCRiceC Posts: 10  Junior Member
    Dear Sir,

    My situation is the same as described above. I can't read sector 41578064. What complete instructions should I use without triggering this error again?
    
    
    
    I making a bitwise copy from disk sdd to a new disk, using dd_rescue or a similar tool. 
    I want to know what instructions can be re-create the degraded array manually with the new disk, and using --assume-clean.
    
  • MijzelfMijzelf Posts: 897  Heroic Warrior Member
    OK. You have to pull all disks, except sdd, and the new disk to which you want to copy.

    Download the dd_rescue package here: https://zyxel.diskstation.eu/Users/Mijzelf/Tools/ , and put it on the nas, in /bin/. You can use WinSCP for that.
    Then open a shell on the nas, and execute

    cd /bin/
    tar xf *.tgz

    If you run dd_rescue now, you should get a warning that you have to specify the in- and output.

    run
    mdadm --examine /dev/sd[ab]3

    This should show the new name of sdd, it's the one with 'Device Role : Active device 2'. The other one is the new disk.

    Let's assume the old sdd now is sda, and the new disk is sdb, then the command is

    dd_rescue /dev/sda /dev/sdb

    This will take several hours, maybe days, depending on the quality of sdd. You'll have to keep the terminal open all that time.

    After that, remove sdd, and put the other original 2 disks back, and repost the output of

    mdadm --examine /dev/sd[abcd]3
    cat /proc/mdstat

  • RiceCRiceC Posts: 10  Junior Member
    Dear Sir, 
    dd_rescue I've run it a few days ago, my current situation is: the original 2 disks and the bit-copy hard disk, 
    I can see the RAID when I boot, but I have to insert a new hard disk to let him It is a normal four-disk RAID5 mechanism, 
    but the error that occurred at the beginning will still occur. The new hard disk that I copied using dd_rescue still has the error of the disk sector. 
    Is there an error to skip and allow the new hard disk to rebuild RAID What about the instructions?
    Thank you.
    
    
    Just like you said above:
    So it is possible that if you re-create this (degraded) array from the command line, using --assume-clean, that you can copy away al your files, without triggering this error again.
    
  • MijzelfMijzelf Posts: 897  Heroic Warrior Member
    OK. According to your post on 21 November, your array status went from [_UUU] when rebuild started to [_U_U] when the hardware failure occurred. So the array has to be rebuild from the 'Active devices' 1..3, as Active Device 0 is never completely synced.
    According to your post on 5 December, the 'Active devices' 1..3 are the partitions sdb3, sdd3 and sdc3.

    The command to recreate the array with these 3 members on these roles is

    mdadm --stop /dev/md2
    mdadm --create --assume-clean --level=5  --raid-devices=4 --metadata=1.2 --chunk=64K  --layout=left-symmetric /dev/md2 missing /dev/sdb3 /dev/sdd3 /dev/sdc3

    That are 2 lines, both starting with mdadm.

  • RiceCRiceC Posts: 10  Junior Member
    edited December 10
    親愛的先生, 


    以下消息出現在命令中。我該怎麼辦?謝謝。

    〜$ mdadm --stop /開發/ md2
    mdadm:必須是超級用戶才能執行此操作
    〜$ sudo mdadm --stop /開發/ md2
    -sh:sudo:找不到
    〜#
    〜$ su根
    密碼:


    BusyBox v1.19.4(2019-09-04 14:33:19 CST)內置外殼(ash)
    輸入“幫助”以獲取內置命令列表。

    〜#
    〜#mdadm --stop / dev / md2
    mdadm:無法獨占訪問/ dev / md2:也許正在運行的進程,已掛載的文件系統或活動的捲組?
    〜#
    /dev/mapper # pvdisplay
      --- Physical volume ---
      PV Name               /dev/md2
      VG Name               vg_28524431
      PV Size               16.36 TiB / not usable 3.81 MiB
      Allocatable           yes (but full)
      PE Size               4.00 MiB
      Total PE              4289348
      Free PE               0
      Allocated PE          4289348
      PV UUID               2L3zxx-baO6-JlSj-Y88b-Jr5I-hBo3-i20If6

  • MijzelfMijzelf Posts: 897  Heroic Warrior Member
    Does the NAS support some eastern language? Amazing.

    Anyway, the command to deactivate the logical volume is

    vgchange -an

    which has to be executed before the mdadm --stop.
Sign In or Register to comment.