Date: Mon, 3 Jun 2013 15:34:26 -0700 From: Jeremy Chadwick <jdc@koitsu.org> To: Ross Alexander <rwa@athabascau.ca> Cc: freebsd-stable@freebsd.org Subject: Re: 9.1-current disk throughput stalls ? Message-ID: <20130603223425.GA51402@icarus.home.lan> In-Reply-To: <alpine.BSF.2.00.1306031433130.1926@autopsy.pc.athabascau.ca> References: <alpine.BSF.2.00.1306030844360.79095@auwow.bogons> <20130603203146.GB49602@icarus.home.lan> <alpine.BSF.2.00.1306031433130.1926@autopsy.pc.athabascau.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jun 03, 2013 at 03:48:30PM -0600, Ross Alexander wrote: > On Mon, 3 Jun 2013, Jeremy Chadwick wrote: > > >1. There is no such thing as 9.1-CURRENT. Either you meant 9.1-STABLE > >(what should be called stable/9) or -CURRENT (what should be called > >head). > > >I wrote: > >>The oldest kernel I have that shows the syndrome is - > >> > >> FreeBSD aukward.bogons 9.1-STABLE FreeBSD 9.1-STABLE #59 r250498: > >> Sat May 11 00:03:15 MDT 2013 > >> toor@aukward.bogons:/usr/obj/usr/src/sys/GENERIC amd64 > > See above. You're right, I shouldn't post after a 07:00 dentist's > appt while my spouse is worrying me about the ins adjustor's report > on the car damage :(. Hey, I'm very fallible. I'll try harder. > > >2. Is there some reason you excluded details of your ZFS setup? > >"zpool status" would be a good start. > > Thanks for the useful hint as to what info you need to diagnose. > > One of the machines ran a 5 drive zraid-1 pool (Mnemosyne). > > Another was a 2 drive gmirror, in the simplest possible gpart/gmirror setup. > (Mnemosyne-sub-1.) > > The third is a 2 drive ZFS raid-1, again in the simplest possible > gpart/gmirror manner (Aukward). > > The fourth is a conceptually identical 2 drive ZFS raid-1, swapping > to a zvol (Griffon.) > > If you look on the FreeBSD wiki, the pages that say "bootable zfs > gptzfsboot" and "bootable mirror" - > > https://wiki.freebsd.org/RootOnZFS > http://www.freebsdwiki.net/index.php/RAID1,_Software,_How_to_setup > > Well, I just followed those in cookbook style (modulo device and pool > names). Didn't see any reason to be creative; I build for > reliability, not performance. > > Aukward is gpart/zfs raid-1 box #1: > > aukward:/u0/rwa > ls -l /dev/gpt > total 0 > crw-r----- 1 root operator 0x91 Jun 3 10:18 vol0 > crw-r----- 1 root operator 0x8e Jun 3 10:18 vol1 > > aukward:/u0/rwa > zpool list -v > NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT > ult_root 111G 108G 2.53G 97% 1.00x ONLINE - > mirror 111G 108G 2.53G - > gpt/vol0 - - - - > gpt/vol1 - - - - > > aukward:/u0/rwa > zpool status > pool: ult_root > state: ONLINE > scan: scrub repaired 0 in 1h13m with 0 errors on Sun May 5 04:29:30 2013 > config: > > NAME STATE READ WRITE CKSUM > ult_root ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > gpt/vol0 ONLINE 0 0 0 > gpt/vol1 ONLINE 0 0 0 > > errors: No known data errors > > (Yes, that machine has no swap. Has NEVER had swap, has 16 GB and > uses maybe 10% at max load. Has been running 9.x since prerelease > days, FWTW. The ARC is throttled to 2 GB; zfs-stats says I never get > near using even that. It's just the box that drives the radios, > a ham radio hobby machine.) > > Griffon is also gpart/zfs raid-1 - > > griffon:/u0/rwa > uname -a > FreeBSD griffon.cs.athabascau.ca 9.1-STABLE FreeBSD 9.1-STABLE #25 r251062M: > Tue May 28 10:39:13 MDT 2013 > toor@griffon.cs.athabascau.ca:/usr/obj/usr/src/sys/GENERIC > amd64 > > griffon:/u0/rwa > ls -l /dev/gpt > total 0 > crw-r----- 1 root operator 0x7b Jun 3 08:38 disk0 > crw-r----- 1 root operator 0x80 Jun 3 08:38 disk1 > crw-r----- 1 root operator 0x79 Jun 3 08:38 swap0 > crw-r----- 1 root operator 0x7e Jun 3 08:38 swap1 > > and the pool is fat and happy - > > griffon:/u0/rwa > zpool status -v > pool: pool0 > state: ONLINE > scan: none requested > config: > > NAME STATE READ WRITE CKSUM > pool0 ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > gpt/disk0 ONLINE 0 0 0 > gpt/disk1 ONLINE 0 0 0 > > errors: No known data errors > > Note that swap is through ZFS zvol; > > griffon:/u0/rwa > cat /etc/fstab > # Device Mountpoint FStype Options Dump Pass# > # > # > /dev/zvol/pool0/swap none swap sw 0 0 > > pool0 / zfs rw 0 0 > pool0/tmp /tmp zfs rw 0 0 > pool0/var /var zfs rw 0 0 > pool0/usr /usr zfs rw 0 0 > pool0/u0 /u0 zfs rw 0 0 > > /dev/cd0 /cdrom cd9660 ro,noauto 0 0 > /dev/ada2s1d /mnt0 ufs rw,noauto 0 0 > /dev/da0s1 /u0/rwa/camera msdosfs rw,noauto 0 0 > > The machine has 32 GB and never swaps. It runs virtualbox loads, anything > from one to forty virtuals (little OpenBSD images.) Load is always light. > > As for the zraid-5 box (Mnemosyne), I first replaced the ZFS pool with > a simple gpart/gmirror. The drives gmirrored are known to be good. That > *also* ran like mud. Then I downgraded to 8.4-STABLE, GENERIC kernel, > and it's just fine now thanks. > > I have the five zraid-1 disks that were pulled sitting in a second 4 > core server chassis, on my desk, and they fail in that machine in the > same way that the production box died. I'm 150 km away and the power > went down over the weekend at the remote site so I'll have to wait > until tomorrow to send you those details. > > For now, think cut-and-paste from freebsd wiki, nothing clever, > everything as simple as possible. Film at 11. > > >3. Do any of your filesystems/pools have ZFS compression enabled, or > >have in the past? > > No; disk is too cheap to bother with that. > > >4. Do any of your filesystems/pools have ZFS dedup enabled, or have in > >the past? > > No; disk is too cheap to bother with that. > > >5. Does the problem go away after a reboot? > > It goes away for a few minutes, and then comes back on little cat feet. > Gradual slowdown. > > >6. Can you provide smartctl -x output for both ada0 and ada1? You will > >need to install ports/sysutils/smartmontools for this. The reason I'm > >asking for this is there may be one of your disks which is causing I/O > >transactions to stall for the entire pool (i.e. "single point of > >annoyance"). > > Been down that path, good call, Mnemosyne (zraid-1) checked clean as a > whistle. (Later) Griffon checks out clean, too. Both -x and -a. > Aukward might have an iffy device, I will sched some self tests and > post everything, all neatly tabulated. > > I've already fought a bad disk, and also just-slighly-iffy cables, > in a ZFS context and that time was nothing like this one. > > >7. Can you remove ZFS from the picture entirely (use UFS only) and > >re-test? My guess is that this is ZFS behaviour, particularly the ARC > >being flushed to disk, and your disks are old/slow. (Meaning: you have > >16GB RAM + 4 core CPU but with very old disks). > > Already did that. A gmirror 9.1 (Mnemosyne-sub-1) box slowly choked > and died just like the ZFS instance did. An 8.4-STABLE back-rev > without hardware changes was the fix. > > Also: I noticed that when I mounted the 9.1 zraid from an 8.4 flash > fixit disk, everything ran quickly and stably. I did copies of about > 635 GB worth of ~3 GB sized .pcap files out of the zraid onto a SCSI > UFS and the ZFS disks were all about 75 to 80% busy for the ~8000 > seconds the copy was running. No slowdowns, no stalls. > > BTW, I'd like to thank you for your kind interest, and please forgive > my poor reporting skills - I'm at home, work is 150 k away, the phone > keeps ringing, there are a lot of boxes, I'm sleep deprived, whine & > snivel, grumble & moan ;) All the above information is almost "too much". There are now multiple machines with multiple hardware devices (disks, controllers, etc.) and different setups to try and figure out. Each/every situation (system, etc.) needs to be analysed individually. So let's please focus on the one you called "aukward", because it's the one we have some details for. Please do not involve the other systems at this point in time. What we know at this point: 1. OS is amd64 [1] 2. System uses a 4-core CPU and has 16GB RAM [1] 3. Uses AHCI, driven by an ATI/AMD IXP700 [1] 4. Has two disks: ada0 and ada1, both of which are very old/slow WD1200JD (120GB, SATA150, 8MB cache, 512-byte sectors, 7200rpm); I have used these disks, so I speak from experience when I say old/slow [1] 5. Both disks use GPT partitioning [2], but partition layout we don't know the layout ("gpart show {ada0,ada1}" would be helpful), 6. ZFS is involved [1][2] 7. ZFS setup is a mirror (RAID-1-like), 8. Root filesystem uses ZFS, but we don't know what your filesystem layouts look like ("zfs get all" and "df -k" would be helpful) [2] 9. Compression nor dedup are used (good!!!) [2] 10. System does not use swap [2] 11. ARC is "throttled" to 2GB, but we don't know how you did this. I really need to see your sysctl.conf and loader.conf tunings [2] 12. Rolling back to 8.4-STABLE (date/build unknown) apparently fixes your issue (I would appreciate you running the system for 72 hours before making this statement, and doing the *exact same things* on it that cause the problem with 9.1-STABLE) [2] 13. Rebooting the system causes I/O to be fast again, for a little while, then gradually gets worse [2]. Pending things: i) Need data from #5 above ii) Need data from #8 above iii) Need data from #11 above iv) I still want to see smartctl -x output. I do not need you to "run self-tests" -- respectfully please just do what I ask. Most people do not know how to interpret/understand SMART results v) I really wish you would not have rolled this system back to 8.4-STABLE. For anyone to debug this, we need the system in a consistent state. Changing kernels/etc. vi) Would appreciate seeing "sysctl -a | grep zfs" when the I/O is fast (immediately after a reboot is fine) and again when the I/O is very slow. I do not care about "zfs-stats". vii) dmesg would also be useful (put it up on pastebin if you want). Please be aware the FreeBSD Wiki on ZFS is known to be outdated in many regards. I won't go into details. I have so many gut feelings at this point about your problem that is it almost unbearable. The possibilities are near endless at this point. Answers to the above could help narrow them down. Finally, wanted to briefly mention that your [2] repeatedly says "load" with no indication if you mean CPU load or disk load. Your phrasing indicate you're referring to CPU load, which is unrelated to disk load. For disk load, use "gstat -I500ms" (please ignore the busy% column). systat will not show you this in a coherent manner; you need to have two windows up (preferably one with "top -s 1" the other with gstat). You may be surprised at what's going on behind the scenes with disk load. [1]: http://lists.freebsd.org/pipermail/freebsd-stable/2013-June/073654.html [2]: http://lists.freebsd.org/pipermail/freebsd-stable/2013-June/073662.html -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130603223425.GA51402>