Date: Sun, 19 May 2013 18:45:35 -0700 From: Dennis Glatting <freebsd@pki2.com> To: Paul Kraus <paul@kraus-haus.org> Cc: Tijl Coosemans <tijl@coosemans.org>, freebsd-questions@freebsd.org Subject: Re: More than 32 CPUs under 8.4-P Message-ID: <1369014335.16472.60.camel@btw.pki2.com> In-Reply-To: <B06924FB-141E-421B-96E0-CEFE37C277A5@kraus-haus.org> References: <1368897188.16472.19.camel@btw.pki2.com> <51989FDA.5070302@coosemans.org> <1368978686.16472.25.camel@btw.pki2.com> <B06924FB-141E-421B-96E0-CEFE37C277A5@kraus-haus.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 2013-05-19 at 16:28 -0400, Paul Kraus wrote: > On May 19, 2013, at 11:51 AM, Dennis Glatting <freebsd@pki2.com> wrote: > > > ZFS hangs on multi-socket systems (Tyan, Supermicro) under 9.1. ZFS does > > not hang under 8.4. This (and one other 4 socket) is a production > > system. > > Can you be more specific, I have been running 9.0 and 9.1 systems with > multi-CPU and all ZFS with no (CPU related*) issues. > I have (down to) ten FreeBSD/ZFS systems. Five of them are multi-socket populated. All are AMD CPUs of the 6200 series. Two of those multi-socketed systems are simply workstations and don't do much file I/O, so I have yet to see them fault. The remaining three perform significant I/O in the 1-8TB (simultaneous) file range, including sorting, compression, backup, etc (ZFS compression is enabled on some data sets as is dedup on a few minor data sets). I also do iSCSI and NFS from one of these systems. Simply, if I run 9.1 on those three busy systems ZFS will eventually hang under load (within ten hours to a few days) whereas it does not under 8.3/4. Two of those systems are 4x16 cores, one 2x16, and two 2x8 cores. Multiple, simultaneous pbzip2 runs on individual 2-5TB ASCII files generally causes a hang within 10-20 hours. "Hang" means the system is alive and on the network but disk I/O has stopped. Run any command except statically linked executables on a memory volume and they will not run (no output or return to command prompt). This includes "reboot," which never really reboots. The volumes where work is performed are typically 12-33TB RAIDz2 volumes. For example: root@mc:~ # zpool list disk-1 NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT disk-1 16.2T 5.86T 10.4T 36% 1.32x ONLINE - root@mc:~ # zpool status disk-1 pool: disk-1 state: ONLINE scan: scrub repaired 0 in 21h53m with 0 errors on Mon Apr 29 01:52:55 2013 config: NAME STATE READ WRITE CKSUM disk-1 ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 da4 ONLINE 0 0 0 da7 ONLINE 0 0 0 da5 ONLINE 0 0 0 da6 ONLINE 0 0 0 cache da0 ONLINE 0 0 0 errors: No known data errors > * I say no CPU related issues because I have run into SATA timeout > issues with an external SATA enclosure with 4 drives (I know, SATA port > expanders are evil, but it is my best option here). Sometimes the zpool > hangs hard, sometimes just becomes unresponsive for a while. My "fix", > such as it is, is to tune the zfs per vdev queue depth as follows: > > vfs.zfs.vdev.min_pending="3" > vfs.zfs.vdev.max_pending="5" > I've not tried those. Currently, these are mine: vfs.zfs.write_limit_override="1G" vfs.zfs.arc_max="8G" vfs.zfs.txg.timeout=15 vfs.zfs.cache_flush_disable=1 # Recommended from the net # April, 2013 vfs.zfs.l2arc_norw=0 # Default is 1 vfs.zfs.l2arc_feed_again=0 # Default is 1 vfs.zfs.l2arc_noprefetch=0 # Default is 0 vfs.zfs.l2arc_feed_min_ms=1000 # Default is 200 > The defaults are 5 and 10 respectively, and when I run with those I > have the timeout issues, but only under very heavy I/O load. I only > generate such load when migrating large amounts of data, which > thankfully does not happen all that often. > Two days ago when the 9.1 system hanged I was able to run a static procstat where it inadvertently(?) printed that da0 wasn't responsive on the console. Unfortunately I didn't have a static camcontrol ready so I was unable to query it. That said, according to the criteria from https://wiki.freebsd.org/AvgZfsDeadlockDebug that hang isn't a true ZFS problem, yet hung it was. I have since (today) updated the firmware of most of the devices in that system and it is currently running some tasks. Most of the disks in that system are Seagate but the un-updated devices include three WD disks (RAID1 OS and a swap disk) -- unupdated because I haven't been able to figure WD firmware download out) and a SSD where the manufacturer indicates the firmware diff is minor, though I plan to go back and flash it anyway. If my 4x16 system ever finishes I will be updating its device's firmware too but it is an 8.4-P system and doesn't give me any trouble. Another 4x16 system gave me ZFS trouble under 9.1 but when I downgraded to 8.4-P it has been stable as a rock for the past 22 days often under heavy load.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1369014335.16472.60.camel>