« Back to all recent discussions

NSA325v2 RAID 1 recovery

AlanAlan Posts: 2  Junior Member
edited August 2018 in Questions
Hi,

I have had my NSA325 for a few years now. It is configured with 2x5TB WD Red HD''s in RAID1 (mirror) configuration.  Recently, it has reported that the RAID was degraded and SMART was reporting that DISK 2 was bad.  I replaced DISK 2 with another 5TB WD Red HD that I had but when I go back into the web console it shows the internal volume (volume1) as Inactive. When I go to Storage->Volume, the only action that I am presented with is delete.  I am able to create a new volume on the new DISK 2 (as JBOD) so the NAS is seeing the new HD but it isn''t allowing me to repair the RAID volume.  When I re-installed the bad DISK 2 it shows the RAID as degraded. After a few more disk swap attempts it is now in a state where it is showing as inactive with both of the original disks.  Attempting to repair it fails after a few minutes.

Any suggestions?

Thanks

Alan

#NAS_Aug
«1

Answers

  • MijzelfMijzelf Posts: 784  Heroic Warrior Member
    edited September 2018
    Can you open the telnet backdoor, login over telnet as root, and post the output of
    cat /proc/partitions
    cat /proc/mdstat
    mdadm --examine /dev/sd?2
  • AlanAlan Posts: 2  Junior Member
    Thanks for replying.

    I had also emailed Zyxel support about this issue.  Their solution was to backup the data, destroy the raid array and re-create it.

    Not a very helpful "solution".  They didnt even ask to look at any output from mdadm to work out what was going on.

    I decided to dump the device and purchased a Synology.
  • Mr_CMr_C Posts: 10  Junior Member
    Hi - I've exactly the same problem.  Just raised a case with Zyxel.  Any suggestions would be life saving....
  • Mr_CMr_C Posts: 10  Junior Member
    BTW, output per the above in case anyone's keen...:

    ~ $ cat /proc/partitions
    major minor  #blocks  name
       7        0     143360 loop0
       8        0 1953514584 sda
       8        1 1953512448 sda1
       8       16 1953514584 sdb
       8       17     514048 sdb1
       8       18 1952997952 sdb2
      31        0       1024 mtdblock0
      31        1        512 mtdblock1
      31        2        512 mtdblock2
      31        3        512 mtdblock3
      31        4      10240 mtdblock4
      31        5      10240 mtdblock5
      31        6      48896 mtdblock6
      31        7      10240 mtdblock7
      31        8      48896 mtdblock8

    ~ $ cat /proc/partitions
    major minor  #blocks  name
       7        0     143360 loop0
       8        0 1953514584 sda
       8        1 1953512448 sda1
       8       16 1953514584 sdb
       8       17     514048 sdb1
       8       18 1952997952 sdb2
      31        0       1024 mtdblock0
      31        1        512 mtdblock1
      31        2        512 mtdblock2
      31        3        512 mtdblock3
      31        4      10240 mtdblock4
      31        5      10240 mtdblock5
      31        6      48896 mtdblock6
      31        7      10240 mtdblock7
      31        8      48896 mtdblock8
    ~ $ cat /proc/mdstat
    Personalities : [linear] [raid0] [raid1]
    md0 : inactive sdb2[2](S)
          1952996928 blocks super 1.2
    unused devices: <none>

    ~ $ mdadm --examine /dev/sd?2
    mdadm: cannot open /dev/sda2: Permission denied
    mdadm: cannot open /dev/sdb2: Permission denied
    mdadm: cannot open /dev/sdc2: Permission denied
    mdadm: cannot open /dev/sdd2: Permission denied
    mdadm: cannot open /dev/sde2: Permission denied
    mdadm: cannot open /dev/sdf2: Permission denied
    mdadm: cannot open /dev/sdg2: Permission denied
    mdadm: cannot open /dev/sdh2: Permission denied
    mdadm: cannot open /dev/sdi2: Permission denied
    mdadm: cannot open /dev/sdj2: Permission denied
    mdadm: cannot open /dev/sdk2: Permission denied
    mdadm: cannot open /dev/sdl2: Permission denied
    mdadm: cannot open /dev/sdm2: Permission denied
    mdadm: cannot open /dev/sdn2: Permission denied
    mdadm: cannot open /dev/sdo2: Permission denied
    mdadm: cannot open /dev/sdp2: Permission denied
    mdadm: cannot open /dev/sdq2: Permission denied
    mdadm: cannot open /dev/sdr2: Permission denied
    mdadm: cannot open /dev/sds2: Permission denied
    mdadm: cannot open /dev/sdt2: Permission denied
    mdadm: cannot open /dev/sdu2: Permission denied
    mdadm: cannot open /dev/sdv2: Permission denied
    mdadm: cannot open /dev/sdw2: Permission denied
    mdadm: cannot open /dev/sdx2: Permission denied
    mdadm: cannot open /dev/sdy2: Permission denied
    mdadm: cannot open /dev/sdz2: Permission denied
  • MijzelfMijzelf Posts: 784  Heroic Warrior Member
       8        0 1953514584 sda
       8        1 1953512448 sda1
       8       16 1953514584 sdb
       8       17     514048 sdb1
       8       18 1952997952 sdb2
    For some reason your new disk sda isn't repartitioned. It still contains it's factory partition. The old disk, sdb, has 2 partitions. sdb1, a small 514048kB one containing some firmware stuff, and sdb2, spanning the rest of the disk.

    Is your raid array degraded or down? If it's down, is it possible that you pulled the wrong disk?

    For mdadm you need root rights. Please repeat with
    sumdadm --examine /dev/sdb2

  • Mr_CMr_C Posts: 10  Junior Member
    Mijzelf - you are indeed a hero for getting back to me.

    So, the output for sdb2 is below.  In answer to the other question, I swapped out the disk which was failing (sdb1 presumably) as the volume wouldn't repair properly.  I've tried putting the failed drive back into the unit and trying to repair through the UI but this failed (hence popping in the replacement disk).

    Anyhow, output from mdadm below (sorry for being dense).  Any feedback would be awesome.  Someone from Zyxel did get back to me and suggested switching the HDDs around and trying again but this didn't work - there other suggestion was to replace the unit but I'm less keen on that one obviously.

     mdadm --examine /dev/sdb2
    /dev/sdb2:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x2
         Array UUID : 1ae146c1:423305c9:1ee2a382:b4dc443b
               Name : Nelly:0  (local to host Nelly)
      Creation Time : Sun Sep 14 09:15:30 2014
         Raid Level : raid1
       Raid Devices : 2
     Avail Dev Size : 1952996928 (1862.52 GiB 1999.87 GB)
         Array Size : 1952996792 (1862.52 GiB 1999.87 GB)
      Used Dev Size : 1952996792 (1862.52 GiB 1999.87 GB)
        Data Offset : 2048 sectors
       Super Offset : 8 sectors
    Recovery Offset : 294400 sectors
              State : clean
        Device UUID : a3a2e13d:5c50a491:7795795c:485d74a7
        Update Time : Mon Oct  7 22:47:26 2019
           Checksum : b6d79d5c - correct
             Events : 34784074

       Device Role : Active device 1
       Array State : AA ('A' == active, '.' == missing)
    ~ # mdadm --examine /dev/sdb2
    /dev/sdb2:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x2
         Array UUID : 1ae146c1:423305c9:1ee2a382:b4dc443b
               Name : Nelly:0  (local to host Nelly)
      Creation Time : Sun Sep 14 09:15:30 2014
         Raid Level : raid1
       Raid Devices : 2
     Avail Dev Size : 1952996928 (1862.52 GiB 1999.87 GB)
         Array Size : 1952996792 (1862.52 GiB 1999.87 GB)
      Used Dev Size : 1952996792 (1862.52 GiB 1999.87 GB)
        Data Offset : 2048 sectors
       Super Offset : 8 sectors
    Recovery Offset : 294400 sectors
              State : clean
        Device UUID : a3a2e13d:5c50a491:7795795c:485d74a7
        Update Time : Mon Oct  7 22:47:26 2019
           Checksum : b6d79d5c - correct
             Events : 34784074

       Device Role : Active device 1
       Array State : AA ('A' == active, '.' == missing)
    ~ #

  • MijzelfMijzelf Posts: 784  Heroic Warrior Member
    I think this is the faulty disk. The raidmanager writes the array status to the header (which you dumped here), and it says  "Array State : AA ('A' == active, '.' == missing)", so according to this header the array consists of 2 healthy disks. Which means this one is the one which was dropped, and no longer updated.
    Your array went down shortly after 'Update Time : Mon Oct  7 22:47:26 2019', Correct?
    Can you insert the other 'old' disk and look at it's raid header?

    In answer to the other question, I swapped out the disk which was failing (sdb1 presumably)
    No. sdb is the disk, sdb1 and sdb2 are the partitions on that disk. The disk you pulled was sda. Which doesn't mean it will be sda again if you plug it in again. The first disk found on boot is sda, the 2nd one sdb. When 2 disks are inserted the detection is dependent on the slot. With only one disk inside it's always sda, no matter which slot you use.

  • Mr_CMr_C Posts: 10  Junior Member
    Hi again - sorry for the delayed response and thanks again for your help.  

    So, I've switched back in what I had thought to be the originally faulty disk in slot 1 (where it cam from) and the replacement disk in slot 2.  The result is that I now have a red LED on slot 1 but the NAS is seemingly up.

    The UI is telling me that the volume is degraded but it doesn't want to accept a repair command from the UI - tried once, then won't take the command again despite showing degraded.

    I've rerun the mdam commands again and get the response below.  I think I pulled the correct disk last time but perhaps I should've put original slot 2 into slot 1 to make it sda and the replacement drive would then become sdb (although entirely possible I've misunderstood your comments above...).


    Again, any help much appreciated.

     mdadm --examine /dev/sd?2
    /dev/sda2:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : 1ae146c1:423305c9:1ee2a382:b4dc443b
               Name : Nelly:0  (local to host Nelly)
      Creation Time : Sun Sep 14 10:15:30 2014
         Raid Level : raid1
       Raid Devices : 2
     Avail Dev Size : 1952996928 (1862.52 GiB 1999.87 GB)
         Array Size : 1952996792 (1862.52 GiB 1999.87 GB)
      Used Dev Size : 1952996792 (1862.52 GiB 1999.87 GB)
        Data Offset : 2048 sectors
       Super Offset : 8 sectors
              State : clean
        Device UUID : 90c7d407:3d4cb530:732a5902:96ebd9b1
        Update Time : Fri Oct 11 21:48:12 2019
           Checksum : d02191d0 - correct
             Events : 34785181

       Device Role : Active device 0
       Array State : AA ('A' == active, '.' == missing)
    /dev/sdb2:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x2
         Array UUID : 1ae146c1:423305c9:1ee2a382:b4dc443b
               Name : Nelly:0  (local to host Nelly)
      Creation Time : Sun Sep 14 10:15:30 2014
         Raid Level : raid1
       Raid Devices : 2
     Avail Dev Size : 1952996928 (1862.52 GiB 1999.87 GB)
         Array Size : 1952996792 (1862.52 GiB 1999.87 GB)
      Used Dev Size : 1952996792 (1862.52 GiB 1999.87 GB)
        Data Offset : 2048 sectors
       Super Offset : 8 sectors
    Recovery Offset : 215808 sectors
              State : clean
        Device UUID : 1e2e7d0c:2a220cdc:46664cf8:4edf3d6a
        Update Time : Fri Oct 11 21:48:12 2019
           Checksum : 2e7b48dc - correct
             Events : 34785181

       Device Role : Active device 1
       Array State : AA ('A' == active, '.' == missing)



    major minor  #blocks  name
       7        0     143360 loop0
       8        0 1953514584 sda
       8        1     514048 sda1
       8        2 1952997952 sda2
       8       16 1953514584 sdb
       8       17     514048 sdb1
       8       18 1952997952 sdb2
      31        0       1024 mtdblock0
      31        1        512 mtdblock1
      31        2        512 mtdblock2
      31        3        512 mtdblock3
      31        4      10240 mtdblock4
      31        5      10240 mtdblock5
      31        6      48896 mtdblock6
      31        7      10240 mtdblock7
      31        8      48896 mtdblock8

    cat /proc/mdstat
    Personalities : [linear] [raid0] [raid1]
    md0 : active raid1 sdb2[2] sda2[0]
          1952996792 blocks super 1.2 [2/1] [U_]
    unused devices: <none>
       9        0 1952996792 md0






  • MijzelfMijzelf Posts: 784  Heroic Warrior Member
    I suppose the last line of '/proc/mdstat' in your post is actually the last line in '/proc/partitions'?

    The disk is parititioned, and assigned to the raid array. But the array isn't synced yet. Nor is it syncing. If it was syncing, /proc/mdstat should show a percentage. Now it only shows the second member isn't up. ([U_])
    The header of sdb2 shows 'Recovery Offset : 215808 sectors', which I think means that it has synced 215808 sectors. About 100MB, that's about 2 seconds of syncing.
    (And now I look back, your 'original' sdb2 also showed a 'Recovery Offset' of 294400 sectors. Which is about the same number. Weird)

    Did you somehow interrupt the process after initiating the rebuild? You are aware that the rebuild will take about 40000 seconds, 10 hours?

    Let's manually initiate a new rebuild, to see what happens. To do so we have to remove sdb2 (the partition on the new disk) from the array, zero out the raid header, and add it again. The raidmanager will start a new rebuild.
    mdadm --remove /dev/md0 /dev/sdb2
    mdadm --zero-superblock /dev/sdb2 mdadm --add /dev/md0 /dev/sdb2
    After this you'll have to leave the box alone, until the sync is ready. 'cat /proc/mdstat' should show a percentage.



  • Mr_CMr_C Posts: 10  Junior Member
    Hello again, so, various faffing about later and:

     mdadm --remove /dev/md0 /dev/sdb2
    mdadm: hot remove failed for /dev/sdb2: Device or resource busy

    I can't seem to get past this.
    I've also tried getting the array to repair again through the UI but it got precisely nowhere before it bombed out unaided.

    Below following kicking off the repair in the space of about a minute or two (instead of 13 hours).

    ~ # cat /proc/mdstat
    Personalities : [linear] [raid0] [raid1]
    md0 : active raid1 sdb2[2] sda2[0]
          1952996792 blocks super 1.2 [2/1] [U_]
          [>....................]  recovery =  0.0% (107840/1952996792) finish=88419.7min speed=368K/sec
    unused devices: <none>
    ~ # cat /proc/mdstat
    Personalities : [linear] [raid0] [raid1]
    md0 : active raid1 sdb2[2] sda2[0]
          1952996792 blocks super 1.2 [2/1] [U_]
    unused devices: <none>

    Any thoughts?

  • Mr_CMr_C Posts: 10  Junior Member
    edited October 13
    So, after a bit more messing about I've found the following - can I presume that the last line "spare rebuilding" is a dead giveaway as to what the wretched thing is doing? D'oh.

    ~ # mdadm --detail /dev/md0
    /dev/md0:
            Version : 1.2
      Creation Time : Sun Sep 14 10:15:30 2014
         Raid Level : raid1
         Array Size : 1952996792 (1862.52 GiB 1999.87 GB)
      Used Dev Size : 1952996792 (1862.52 GiB 1999.87 GB)
       Raid Devices : 2
      Total Devices : 2
        Persistence : Superblock is persistent
        Update Time : Sun Oct 13 17:56:11 2019
              State : clean, degraded
     Active Devices : 1
    Working Devices : 2
     Failed Devices : 0
      Spare Devices : 1
               Name : Nelly:0  (local to host Nelly)
               UUID : 1ae146c1:423305c9:1ee2a382:b4dc443b
             Events : 34813639
        Number   Major   Minor   RaidDevice State
           0       8       18        0      active sync   /dev/sdb2
           2       8        2        1      spare rebuilding   /dev/sda2
    ~ # mdadm --detail /dev/md0
    /dev/md0:
            Version : 1.2
      Creation Time : Sun Sep 14 10:15:30 2014
         Raid Level : raid1
         Array Size : 1952996792 (1862.52 GiB 1999.87 GB)
      Used Dev Size : 1952996792 (1862.52 GiB 1999.87 GB)
       Raid Devices : 2
      Total Devices : 2
        Persistence : Superblock is persistent
        Update Time : Sun Oct 13 17:57:17 2019
              State : clean, degraded
     Active Devices : 1
    Working Devices : 2
     Failed Devices : 0
      Spare Devices : 1
               Name : Nelly:0  (local to host Nelly)
               UUID : 1ae146c1:423305c9:1ee2a382:b4dc443b
             Events : 34813675
        Number   Major   Minor   RaidDevice State
           0       8       18        0      active sync   /dev/sdb2
           2       8        2        1      spare rebuilding   /dev/sda2

  • MijzelfMijzelf Posts: 784  Heroic Warrior Member
    'Spare rebuilding' is what it should do, and according to your /proc/mdstat it indeed started to do so, but stopped it quickly.
    That behavior is reproducible, it seems. I wonder if something is wrong with the source disk, that it always stops around 100MB.

    Can you execute
    dd if=/dev/sdb2 of=/dev/null bs=16M count=64
    and see if that throws an error? This will copy the first 1000MB from /dev/sdb2 (which is the active raid member) to /dev/null (nowhere). Basically the same as a resync does, except for the destination. If this fails at +/- 100MB, your sdb disk has a problem.
  • Mr_CMr_C Posts: 10  Junior Member
    This would be the error.  It is the originally faulty disk mind you.

    dd if=/dev/sdb2 of=/dev/null bs=16M count=64
    dd: /dev/sdb2: Input/output error


    I'm just hoping the rebuilding spare is going to get there....
  • MijzelfMijzelf Posts: 784  Heroic Warrior Member
    I'm just hoping the rebuilding spare is going to get there....
    Apparently not. The rebuild is interrupted each time you start it. It doesn't jump over an unreadable section.

    The original 'good' disk didn't assemble either. Assuming that disk is healty, we can try to repair the array, which basically means creating a new header. The content remains untouched.

    Remove both disks, and put back the original 'good´ disk. Double check it still doesn't assemble. According to your header, the command to create a new array is:

    mdadm --stop /dev/md0
    mdadm --create --assume-clean --level=1 --raid-devices=2 --metadata=1.2 /dev/md0 missing /dev/sda2
    The 'assume-clean' tells the raid manager not to touch the content of the array. Don't know if that actually matters with raid1, but it won't hurt. The array is build with device role 0 'missing', so it can insert your new disk here, when the array is up.

    Maybe the stop command will fail. I don't know if it's actually 'up'.

    After creating the array you might have to reboot, to get the volume.

  • Mr_CMr_C Posts: 10  Junior Member
    Thanks for that.  Just to be clear though, the NAS currently has two drives - sdb2 is the faulting drive, sda2 is the drive originally in the volume currently "rebuilding".  The new drive is not in the NAS presently.  So, do I follow the process you've suggested above or switch sda2 out for the replacement drive?  I think the answer is to leave as is but I'm just conscious I'm a desperate noob and could well screw this up utterly....

    Again, you are being awesome Mijzelf - thank you for your support in this and sorry for being dense.
Sign In or Register to comment.