« Back to all recent discussions

NAS542 extremely slow write speed, CPU 100 %

TomasMalinaTomasMalina Posts: 16  Junior Member
edited January 12 in Questions
Hi, 
I have a NAS542 set up in RAID5 with 4x8TB disks. When I start data transfer from a USB3 disk to the NAS, the write speed is extremely low (9-10MB/s). The CPU is almost constantly maxed, Tweaks by Mijzelf show that the load average is around 6, which is horrible. When I tested it as JBOD, the write speed is a little higher, 26-27MB/s, but again, the CPU is maxed. Disks alone when connected not via NAS work at 110-120MB/s write.
I can't test the speeds over ethernet right now, still waiting for a Gbit router to arrive.
(All these speeds are sequential, GB archives.)
Do I have a faulty piece?

PS more details:
The external USB3 disk is a 3TB WD formatted to NTFS and a 750GB Samsung also NTFS.
I have already tried to disable all the power consuming options (Twonky, thumbnails, in Tweaks I disabled python Twonky, fileye and the recycle bin manager).
The file transfer is setup using the zyxel UI, not PC.

I've reset the NAS and created the disk group (RAID5) again. The data transfers seem to run a bit faster, copied 14.5GBs in 9:10 minutes. After setting up and starting a file transfer, the status center and Tweaks show this (the CPU usage here is low, 20-45 %, but in the UI it is constantly 70-100 %): 


(by the time I am finishing writing this, the load average went above 5 right now for all the time intervals).

Mem: 957088K used, 54548K free, 0K shrd, 17248K buff, 731952K cached
CPU: 23.8% usr 28.5% sys  0.0% nic  0.0% idle 47.6% io  0.0% irq  0.0% sirq
Load average: 4.43 3.90 2.93 3/170 3396
30821rootS382443.741.6python /usr/local/apache/web_framework/job_queue_daemon.pyc
16231rootS135m13.612.4python /usr/local/apache/web_framework/main_wsgi.pyc
189743494nobodyS N233042.28.3/usr/sbin/httpd -f /etc/service_conf/httpd.conf
289283494nobodyS N223562.28.3/usr/sbin/httpd -f /etc/service_conf/httpd.conf
125378445rootR N259282.54.1/usr/sbin/smbd -D





#NAS_Jan_2020
Tagged:

Best Answer

  • MijzelfMijzelf Posts: 1,259  Paragon Member
    Accepted Answer
    I wonder if you are not looking at the webinterface backend. Close the webinterface, login over ssh, run 'top', and do a copy action.
    Then repeat with your cpu-load webinterface open.

    If the first action gives a significant lower cpu load, and the difference is some python script in /usr/local/apache/, the creation of all that fancy charts is the problem.

Answers

  • TomasMalinaTomasMalina Posts: 16  Junior Member
    edited January 13
    My method was to do it in the webinterface (open file manager there, set up the transfer and then close it all), wait for the data to copy. 
    result of 'top' from powershell:
    Mem: 957392K used, 54244K free, 0K shrd, 105360K buff, 635108K cached
    CPU: 23.4% usr 49.3% sys  0.3% nic 19.8% idle  1.8% io  0.0% irq  5.1% sirq
    Load average: 4.03 3.67 2.83 3/166 11115
      PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
     3082     1 root     S    54504  5.3   1 42.6 python /usr/local/apache/web_framework/job_queue_daemon.pyc
    10742     2 root     RW       0  0.0   0 11.8 [md2_raid5]
    23378     2 root     SW       0  0.0   1  6.2 [usb-storage]
     9061     2 root     DW       0  0.0   1  3.7 [flush-253:1]
      298     2 root     SW       0  0.0   1  2.9 [kswapd0]
     1623     1 root     S     135m 13.6   0  2.5 python /usr/local/apache/web_framework/main_wsgi.pyc
      764     2 root     DW       0  0.0   1  1.4 [pfe_ctrl_timer]
    10700     2 root     SW       0  0.0   0  0.8 [kworker/0:4]
     7461  8445 root     S N  42276  4.1   0  0.4 /usr/sbin/smbd -D
    10736     2 root     SW       0  0.0   1  0.4 [kworker/1:1]
    I'll setup the transfer from powershell tomorrow, right now I'm copying data, initiated from the file commander in webinterface.
  • eozrocwdeozrocwd Posts: 48  Junior Member
    Do you try to disable "Generating thumbnail for multimedia files, which is within share folders." on myZyXELcloud app?

  • TomasMalinaTomasMalina Posts: 16  Junior Member
    edited January 13
    @eozrocwd Yes, I disabled Twonky media server, thumbnails, in Tweaks I disabled python Twonky, fileye, and the recycle bin manager.

    @Mijzelf Status after initiating file transfer from powershell (e-data is my USB3 disk, i-data is the NAS folder), webinterface closed now. I don't know if I am able to see the transfer speed from powershell, but after it finishes, I'll calculate the transfer speed.

    Mem: 956800K used, 54836K free, 0K shrd, 135936K buff, 598980K cached
    CPU:  3.0% usr 70.2% sys 13.7% nic  0.3% idle  4.2% io  0.0% irq  8.4% sirq
    Load average: 4.86 3.64 2.37 5/180 25212
      PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
    16548 24892 admin    R     2644  0.2   1 36.3 cp -r /e-data/735674c7b37e3d20652d2c692c7c8f6f/Lab/Archives /i-data/74abab5a/Tomas1
    10742     2 root     RW       0  0.0   0 17.2 [md2_raid5]
    25196 25194 root     S N  17328  1.7   1 14.4 python /usr/local/apache/web_framework/portal/MZCA_Auto_Install.pyc
    23378     2 root     SW       0  0.0   0  8.2 [usb-storage]
    16574     2 root     DW       0  0.0   0  4.8 [flush-253:1]
      298     2 root     SW       0  0.0   1  4.4 [kswapd0]
     1623     1 root     S     135m 13.6   1  2.5 python /usr/local/apache/web_framework/main_wsgi.pyc
     5354     2 root     SW       0  0.0   0  1.0 [kworker/0:2]
      764     2 root     RW       0  0.0   1  0.9 [pfe_ctrl_timer]
    28669     2 root     SW       0  0.0   0  0.7 [kworker/0:0]
        3     2 root     SW       0  0.0   0  0.6 [ksoftirqd/0]
     1311     2 root     SW<      0  0.0   0  0.6 [loop0]
    17892     2 root     SW       0  0.0   1  0.6 [kworker/1:4]
    25165  8482 root     S     5484  0.5   1  0.4 {sshd} sshd: [email protected]/1
        9     2 root     SW       0  0.0   1  0.4 [ksoftirqd/1]
    12381     2 root     SW       0  0.0   0  0.3 [jbd2/dm-1-8]
    25171 25166 admin    R     2756  0.2   1  0.2 top
    18601     2 root     SW       0  0.0   0  0.2 [kworker/0:5]
     3082     1 root     S    54372  5.3   1  0.1 python /usr/local/apache/web_framework/job_queue_daemon.pyc
    10656     1 root     S    10416  1.0   0  0.1 /sbin/DAV_httpd -f /etc/service_conf/httpd_dav.conf
    10118     2 root     SW       0  0.0   1  0.1 [kworker/1:5]
    17185     2 root     SW       0  0.0   1  0.1 [kworker/1:3]
    22741  8445 root     S N  32036  3.1   0  0.0 /usr/sbin/smbd -D
     8471  8445 root     S N  25820  2.5   1  0.0 /usr/sbin/smbd -D
     8445     1 root     S N  25812  2.5   0  0.0 /usr/sbin/smbd -D
  • TomasMalinaTomasMalina Posts: 16  Junior Member
    edited January 13
    @Mijzelf: OK, approx. 10 % of files transfered and the transfer speed from USB3 to NAS is approx 45-50 MB/s. This is an improvement, thank you.
    However, I'm still a little worried about the load average, even when idle, it shows the load average over 1, which doesn't look right (file transfer terminated ~10 minutes ago). Shouldn't it be more in the region of 0.1 or something? (I have not monitored the NAS for the 10 minutes, I've left it alone for ~10 minutes and then ran the top command).
    Mem: 947412K used, 64224K free, 0K shrd, 139644K buff, 595988K cached
    CPU:  1.9% usr  0.5% sys  0.0% nic 95.2% idle  2.4% io  0.0% irq  0.0% sirq
    Load average: 1.06 1.57 2.54 2/164 27555
  • MijzelfMijzelf Posts: 1,259  Paragon Member
    Shouldn't it be more in the region of 0.1 or something?
    Yes, I think so. Is the network led blinking like crazy?

    BTW, 1.0 is not as bad as it sounds. It's a dual core box, so 2.0 is the new 1.0.
  • TomasMalinaTomasMalina Posts: 16  Junior Member
    edited January 13
    No, the LED next to the LAN port is blinking sporadically, most of the time it's just on, shining. However, it blinks once in a while even though I am not accessing the NAS.

    Oh, right, but still, 1.0 should mean something like 50 % CPU usage, right? The CPU usage is close to 0 % when idle. No matter how long I leave it alone, it doesn't drop below 1.

    Also, I have finally connected the Gbit LAN, the write speed over LAN is 60-70 MB/s (task manager measures 500-550Mbps) for sequential write, so the problem with the slow USB is due to the initiation from webinterface. 
  • MijzelfMijzelf Posts: 1,259  Paragon Member
    Oh, right, but still, 1.0 should mean something like 50 % CPU usage, right?
    Yes. And if top doesn't show the culprit, I think that's because the processes live too short to get a significant amount of CPU time, and so never show up in top.

    You can check that by executing

    ls -l /proc/self

    a few times. This shows the (virtual) symlink to the process pid directory of ls. If this pid makes big jumps, the system is burning pid's fast. On a completely idle system the pid would increment with one, each time.

    I'm sure there should be a way to log all starting processes, but from the top of my head I wouldn't know. A poor mans approach could be to capture the output of several 'ps' runs, and compare them to see if you caught it alive.
  • TomasMalinaTomasMalina Posts: 16  Junior Member
    Ok, the PIDs keep rising by approximately 10 for every 10 seconds, but sometimes I see a burst increase in counts of tens.
    I'm trying to catch it with ps, so far I have just noticed two irregularly active processes:
    python /usr/local/apache/web_framework/portal/MZCA_Auto_Install.pyc
    /bin/sh -c /usr/bin/python /usr/local/apache/web_framework/portal/MZCA_Auto_Install.pyc
    some other scarcely ocurring extra processes in the ps log are:
    sleep 10
    sh -c /usr/bin/delete_unused_semaphore.sh
    {delete_unused_s} /bin/sh /usr/bin/delete_unused_semaphore.sh
    sh -c i2cget -y 0x0 0x0a 0x07
    /usr/sbin/pwauth

    However, I have not seen (in the 4000+ logs I've looped through) any burst, the list is always the same length (same character count), apart from those minor deviations mentioned above.
  • MijzelfMijzelf Posts: 1,259  Paragon Member
    python /usr/local/apache/web_framework/portal/MZCA_Auto_Install.pyc

    That is the ZyXELcloud auto installer. That runs once in 6 hours. Good catch!

    The  i2cget asks for the motherboard temperature. It's launched by /usr/sbin/wd_app, which is always running, and does some maintenance. It turns on the beeper when the box is too hot. And it runs also /usr/bin/delete_unused_semaphore.sh. The latter masks a bug somewhere. A process should cleanup it's own semaphores.


  • TomasMalinaTomasMalina Posts: 16  Junior Member
    Do you think this masked bug could be what's causing the higher load average, or those would be just network pings/calls (indicated by the spikes in PID number)? Would a firmware reinstall (from here I guess? ftp://ftp.zyxel.com/NAS542/firmware) help?
  • MijzelfMijzelf Posts: 1,259  Paragon Member
    Do you think this masked bug could be what's causing the higher load average,

    Unlikely. It's hard to presume a link between leaking a semaphore and starting an excessive number of processes. Further my NAS520 doesn't have that problem, and that also has that script.

    or those would be just network pings/calls
    Maybe. But then I would expect a blinking led. I'm not sure if a ping will burn a pid. It's handled by the kernel, and I don't know if it will start a (kernel) process for that.
    Would a firmware reinstall help?
    I doubt. But it won't hurt.

  • TomasMalinaTomasMalina Posts: 16  Junior Member
    Okay, thank you for your help and time. 
Sign In or Register to comment.