« Back to all recent discussions

NAS540 - RAID5 Crash --> Volume lost

Detlef_MDetlef_M Posts: 4  Junior Member
edited April 2018 in Discussions
Hallo zusammen,

ich brauche dringend Hilfe bei der Wiederherstellung meins Raid5-Volume. Als vorhin der Kopiervorgang auf die NAS abbrach und ich diese nicht mehr ansprechen konnte, vollzog ich einen Neustart. Anschließend erhielt ich folgende Fehlermeldung in der GUI. Die einzelnen HDDs zeigen keine Defekte an, auch die SMART-Werte passen. Es gibt jedoch keine Option diesen Fehler zu "reparieren". Was muss ich tun um die Daten nicht entgültig zu verlieren?

Vielen Dank vorab für eure Hilfe.
Detlef
--------------------------------
Hi there,

I desperately need help recovering a raid5 volume. When the copying process to the NAS stopped unsuspected, I could no longer access it, so I restarted the NAS. Then I received the following error message in the GUI. The individual HDDs do not show any defects, even the SMART values fit. However, there is no option to "repair" this error. What do I have to do in order not to lose the data permanently?

Thanks in advance for your help.
Detlef








#NAS_April

Comments

  • MijzelfMijzelf Posts: 1,243  Heroic Warrior Member
    Can you enable the ssh server (Control panel->Network->Terminal) login as admin over ssh (on Windows you can use PuTTY for that) and post the output of
    cat /proc/mdstat

  • Detlef_MDetlef_M Posts: 4  Junior Member
    Hi Mijzelf,

    following the output of the commands:
    $ cat /proc/mdstat
    # mdadm --examine /dev/sd[abcd]3

    Thank's for your efforts so far.
    Detlef
  • MijzelfMijzelf Posts: 1,243  Heroic Warrior Member
    The output of 'mdadm --examine' is interesting. Here the 'Array State' tells what the separate members 'think' about the state of their array, and 'Update Time' tells *when* they thought so.
    As you can see 3 members are saying .A.A, which means only 2 valid members, thus down, as this array needs at least 3 members.
    Sdc3 says AAAA, which means up and redundant. Sdc3 has an 'Update Time' 2018-04-21 22:49:48, while the others have 2018-04-22 00:16:07.
    So somewhere after 04-21 22:49:48 sdc3 was dropped from the array, and that member is no longer updated. The array was degraded.
    At 04-22 00:16:07 the array was down, because for some reason the 'Device Role' of sda3 changed from 'Active device 0' to 'spare'.
    When the copying process to the NAS stopped unsuspected
    I guess that was at 04-22 00:16:07 UTC?

    Anyway, as you were active copying to the NAS, the contents of sdc3 is no longer usable in this array, except in case of emergency.

    It's not clear to me why sda3 became a spare member, but as the array went down immediately, sd[abd]3 should contain a valid filesystem, except for the last writing action.

    You can re-create the array, using the same settings as originally were used.
    mdadm --stop /dev/md2
    (that are two lines starting with mdadm)

    Some settings are the defaults, according to https://linux.die.net/man/8/mdadm, but I just add them for safety and completeness.
    Here --assume-clean tells mdadm the partitions contain a valid array, and the 'missing' keyword tells that the device with role '2' is missing. The roles 0 - 3 are assigned in the order in which you specify the partitions here.

    The sequence of the other arguments is important for mdadm, and I don't know if this is the right sequence. Fortunately it will tell you if it's wrong, and it will also tell what should be right.


  • Detlef_MDetlef_M Posts: 4  Junior Member
    edited April 2018
    I guess that was at 04-22 00:16:07 UTC?
    I don't know the exact time but the time frame fits.

    Ok, the most of the points you mentioned I understand. Before I execute the given instructions, however, I have a few questions.
    1. If it doesn' t work, will the data be lost forever?
    2. What happens with the "missing" sdc3? Will it be integrated back into the array?
    3. Change the status "spare" to "active device x" (on sda3) automatically after the execution?
    4. cat /proc/mdstat shows another sequence of partitions. How do I know that the sequence (on your second mdadm --create ... line) is correct? Is there an error if not or would it also cause direct a data loss?
    I'm sorry, but I'm not a specialist in this field.

    Thank you for your competent support!
  • MijzelfMijzelf Posts: 1,243  Heroic Warrior Member
    If it doesn' t work, will the data be lost forever?
    If you (or somebody you pay for it) fail in re-assembling or re-creating the array, and you have no backups, the data is lost.
    What happens with the "missing" sdc3? Will it be integrated back into the array?
    *A* disk can be integrated, which means that all parities are calculated, and redundancy is restored. If you add this disk to the array, it will be treated as a new, empty disk. The data on the disk is useless, because it's out of sync, and the raid manager has no way to check which parts of the data are still usable.
    Change the status "spare" to "active device x" (on sda3) automatically after the execution?

    The command builds a new array, without touching the data on the array. So it does not actually switch the status back, it generates a new 'active device x'.

    In theory it's possible to only change the status of sda3, but I don't know how. (Only changing the status of that member is not enough BTW, you also have to change the 'Array state' on all members sd[abd]3. If the members don't agree on the status of the array, it will not be assembled.)

    cat /proc/mdstat shows another sequence of partitions. How do I know that the sequence (on your second mdadm --create ... line) is correct? Is there an error if not or would it also cause direct a data loss?
    mdstat says 'sdb3[1](S) sda3[4](S) sdd3[3](S)', which means sdb3 is disk 1, (counting from 0), sdd3 is disk 3, and sda3 is disk 4. So sda3 is the 5th disk in a 4 disk array, which automatically means it's a (hot)spare.
    As far as I see that is the same info as 'mdadm --examine' gives for the sequence. But correct me if I'm wrong.

    If you would create an array with the wrong sequence, it simply doesn't produce a valid filesystem, so it can't be mounted, and nothing will be written to it.
    You can retry with another sequence.
    Only if you specify a wrong metadata version, you can be in bigger trouble. The header of the raid member can be located on the start or the end of the partition, depending on the metadata version. So if you specify the wrong one, the newly created raidheader will overwrite a part of the filesystem on the array.
  • Detlef_MDetlef_M Posts: 4  Junior Member
    Thanks a lot for the detailed informations.

    I executed the commands. There were no problems. After this I could mount the new array (read-only).

    Then I made a --zero-superblock on /dev/sdc3 and added it "clean" to the new array again. The recovery phase is currently running.



    I hope that now everything goes through and I can set up a share at the end. Alternatively, I will copy the data to another disk using usb.

    Thank you very much for your intensive support.
    Detlef
Sign In or Register to comment.