Date: Wed, 03 Feb 2010 12:26:44 +0100 From: Attila Nagy <bra@fsn.hu> To: Matthias Gamsjager <mgamsjager@gmail.com> Cc: freebsd-fs@freebsd.org Subject: Re: Machine stops for some seconds with ZFS Message-ID: <4B695D74.6040003@fsn.hu> In-Reply-To: <585602e11002030239y3da31f7bkbf593a04950c351e@mail.gmail.com> References: <4B694689.2030704@fsn.hu> <585602e11002030239y3da31f7bkbf593a04950c351e@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Slower in what regard? In sequential read -which is meaningless-, maybe. But in random read and latency? Absolutely no. Compare these: L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 1 64 64 4071 8.7 0 0 0.0 55.4| ad0s1d.eli 0 44 44 2799 7.1 0 0 0.0 31.0| ad1s1d.eli 1 1208 1208 1908 0.8 0 0 0.0 78.8| da0 An average consumer SATA drive can push about 120 IOPS with 8-10 ms seek time. An average consumer USB pendrive can perform more than 10 times better of that (in both IOPS and latency). SATA drive: (64/55.4)*100=115 IOPS, latency: about 8 ms USB drive: (1208/78.8)*100=1530 IOPS, latency: about 0.8 ms This is the essential of Windows ReadyBoost and ZFS's L2ARC. There is absolutely no IO (no nothing) towards the disks (be is HDD or SSD), so this is not because of the cache. (yes, I've tried without that, and the freeze also comes without an L2ARC device) Matthias Gamsjager wrote: > What's the point in having a cache device that is slower then the > harddisks itself? > could you please try the build without the slow cache device? > > On Wed, Feb 3, 2010 at 10:48 AM, Attila Nagy <bra@fsn.hu> wrote: > >> Hello, >> >> After a long time, I've switched back to ZFS on my desktop. It runs >> 8-STABLE/amd64 with two SATA disks and an USB pendrive. >> One-one partition is used from each disk for the zpool, which is encrypted >> using GELI, and the pendrive is there for L2ARC: >> NAME STATE READ WRITE CKSUM >> data ONLINE 0 0 0 >> mirror ONLINE 0 0 0 >> ad0s1d.eli ONLINE 0 0 0 >> ad1s1d.eli ONLINE 0 0 0 >> cache >> da0 ONLINE 0 0 0 >> >> Today, after 12 days of uptime the machine has frozen. I could ping it from >> a different machine, even could open a telnet to its ssh port, but I >> couldn't get the ssh banner. >> >> Now I'm building a 9-CURRENT kernel and world to see whether the same >> problem persists with that, and during the make process I've noticed a >> strange thing. >> I build with -j4 (the machine has one dual core CPU), so the fans are >> screaming during the process. But every few minutes (I couldn't recognize >> any patterns in it) the machine goes completely silent (even more silent >> than normally), and everything halts. >> During this, the top running on the machine can refresh itself, and I can >> type on pass through ssh connections (that is, I use the machine in question >> to access other machines with ssh), but I can't open new ssh connections to >> it, and can't start anything new (for example from an open shell). >> ping is running seamlessly during this, and top shows the following: >> >> last pid: 36503; load averages: 1.59, 3.04, 3.01 up 0+00:49:53 >> 10:32:10 >> 97 processes: 1 running, 96 sleeping >> CPU: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle >> Mem: 218M Active, 24M Inact, 639M Wired, 40M Cache, 6208K Buf, 1022M Free >> Swap: 4096M Total, 4096M Free >> >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND >> 1342 root 1 44 0 3204K 620K select 0 0:02 0.00% make >> 1424 root 1 44 0 3204K 1036K select 0 0:01 0.00% make >> 1280 root 1 44 0 12540K 1900K select 0 0:01 0.00% >> hald-addon-storage >> 1234 haldaemon 1 44 0 24116K 4464K select 0 0:01 0.00% hald >> 93600 root 1 44 0 3204K 1028K select 0 0:00 0.00% make >> 1260 root 1 44 0 19704K 2688K select 0 0:00 0.00% >> hald-addon-mouse-sy >> 15142 bra 1 44 0 9332K 2864K CPU0 0 0:00 0.00% top >> 1263 root 1 44 0 12540K 1896K cgticb 0 0:00 0.00% >> hald-addon-storage >> 94415 bra 1 44 0 37944K 4992K select 1 0:00 0.00% sshd >> 35837 root 1 44 0 5252K 2424K select 1 0:00 0.00% make >> 95361 bra 1 44 0 37944K 4992K select 1 0:00 0.00% sshd >> 35973 root 1 44 0 3204K 1772K select 0 0:00 0.00% make >> 608 root 1 44 0 6892K 1436K select 1 0:00 0.00% syslogd >> 96928 root 1 44 0 3204K 728K select 0 0:00 0.00% make >> 94369 root 1 51 0 37944K 4584K sbwait 0 0:00 0.00% sshd >> 82631 root 1 50 0 37944K 4584K sbwait 0 0:00 0.00% sshd >> 16304 root 1 44 0 37944K 4576K zio->i 1 0:00 0.00% sshd >> 951 _ntp 1 44 0 6876K 1692K select 0 0:00 0.00% ntpd >> 1238 root 1 76 0 16768K 2372K select 0 0:00 0.00% >> hald-runner >> 4916 root 1 44 0 3204K 728K select 1 0:00 0.00% make >> 95338 root 1 49 0 37944K 4584K sbwait 1 0:00 0.00% sshd >> 1259 root 1 44 0 10280K 2712K pause 1 0:00 0.00% csh >> 33357 bra 1 44 0 21596K 4004K select 0 0:00 0.00% ssh >> 16405 bra 1 44 0 37944K 5012K zio->i 0 0:00 0.00% sshd >> 1044 root 1 44 0 9104K 1796K kqread 0 0:00 0.00% master >> 34765 root 1 76 0 8260K 1764K wait 1 0:00 0.00% sh >> 82685 bra 1 44 0 37944K 4960K select 1 0:00 0.00% sshd >> 1065 postfix 1 44 0 9100K 1872K kqread 0 0:00 0.00% qmgr >> 1237 root 17 44 0 27460K 4124K waitvt 0 0:00 0.00% >> console-kit-daemon >> 95362 bra 1 44 0 10216K 2612K ttyin 0 0:00 0.00% bash >> 34764 root 1 44 0 3204K 852K select 0 0:00 0.00% make >> 1222 root 1 49 0 21672K 1896K wait 0 0:00 0.00% login >> 35728 root 1 44 0 3204K 860K select 0 0:00 0.00% make >> 1064 postfix 1 44 0 9104K 1772K zio->i 1 0:00 0.00% pickup >> 82696 bra 1 44 0 10216K 2596K wait 0 0:00 0.00% bash >> 94417 bra 1 44 0 10216K 2596K wait 1 0:00 0.00% bash >> 35455 root 1 44 0 3204K 744K select 0 0:00 0.00% make >> 35774 root 1 44 0 3204K 728K select 1 0:00 0.00% make >> 16409 bra 1 44 0 10216K 2592K ttyin 0 0:00 0.00% bash >> 1155 root 1 44 0 7948K 1604K nanslp 0 0:00 0.00% cron >> 1077 messagebus 1 53 0 8092K 2060K select 0 0:00 0.00% >> dbus-daemon >> 1149 root 1 44 0 26012K 3960K select 1 0:00 0.00% sshd >> 35729 root 1 76 0 8260K 1760K wait 0 0:00 0.00% sh >> 4921 root 1 57 0 8260K 1748K wait 0 0:00 0.00% sh >> 825 root 1 76 0 39212K 2372K lockf 1 0:00 0.00% >> saslauthd >> 35460 root 1 76 0 8260K 1748K wait 0 0:00 0.00% sh >> 34761 root 1 48 0 8260K 1740K wait 1 0:00 0.00% sh >> 96923 root 1 50 0 8260K 1740K wait 0 0:00 0.00% sh >> >> >> As you can see, top reports that the machine is 100% idle, while a make -j4 >> buildworld runs. This lasts for few seconds (10-20), then everything goes >> back to normal, the fans start to scream, the build continues and I can use >> the machine. >> This occasional halt is new to me -but I'm just switched to ZFS on my >> desktop, in a server it's harder to notice if you don't use it for >> interactive sessions-, but I could see the final freeze on more than one >> servers. >> How could I help to debug this, and the final one? >> >> Thanks, >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> >>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4B695D74.6040003>