Date: Tue, 18 Nov 2008 19:42:56 +0100 From: Lorenzo Perone <lopez.on.the.lists@yellowspace.net> To: Chao Shin <quakelee@geekcn.org> Cc: d@delphij.net, Pawel Jakub Dawidek <pjd@freebsd.org>, FreeBSD Stable <freebsd-stable@freebsd.org> Subject: Re: ZFS crashes on heavy threaded environment Message-ID: <7BA53082-577E-4DF2-8E2A-025942C11C0A@yellowspace.net> In-Reply-To: <op.uks5krn2hnq548@qld630> References: <491CE71F.2020208@delphij.net> <491CE835.4050504@delphij.net> <20081117155835.GC2101@garage.freebsd.pl> <op.uks5krn2hnq548@qld630>
next in thread | previous in thread | raw e-mail | index | archive | help
For what's worth it, I have similar problems on a comparable system =20 (amd64/8GB, 7.1-PRERELEASE #3: Sun Nov 16 13:39:43), which I wouldn't call heavilly threaded yet (as there is only one mysql51 running, and courier-mta/imap, max 15 users now). Perhaps worth a note: Bjoern's multi-IP jail patches are applied on this system. The setup is so that one zfs filesystem is mounted into a jail handling only mail (and for that: just the root of the mail files), and a =20 script on the main host rotates snapshots hourly (making a new one, and destroying the oldest). After about 8-24 hours of production: - mysqld is stuck in sbwait state; - messages start filling up with kernel: vm_thread_new: kstack allocation failed - almost any attempt to fork a process fails with Cannot allocate memory. No panic so far, at least since I've introduced =20 vfs.zfs.prefetch_disable=3D"1". Before that, I experienced several panics upon shutdown. If I still have an open shell, I can send around some -TERMs and -KILLs and halfway get back control; after that, if I zfs umount -a kernel memory usage drastically drops down, and I can resume the services. However, not for long. After about 1-2 hrs of production it starts whining again in the messages about kstack allocation failed, and soon thereafter it all repeats. Only rebooting gives back another 12-24hrs of operation. What I've tracked down so far: - zfs destroy'ing old snapshots definitively makes those failures pop up earlier - I've been collecting some data shortly around the memory problems, which I post below. Since this is a production machine (I know, I shoudn't - but hey, you made us lick blood and now we ended up wanting more! So, yes, I confirm, you definitively _are_ evil! ;)), I'm almost ready to move that back to UFS. But if it can be useful for debugging, I would be willing to set up a =20= zabbix agent or such to track whichever values could be useful over time for =20= a day or two. If on the other hand these bugs (leaks, or whatever) are likely to be solved in the recent commit, I'll just move back to UFS until they're ported to -STABLE. Here follows some data about memory usage (strangely, I never saw this even halfway reaching 1.5 GB, but it's really almost voodoo to me so I leave the analysis up to others): TEXT=3D`kldstat | tr a-f A-F | awk 'BEGIN {print "ibase=3D16"}; NR > 1 =20= {print $4}' | bc | awk '{a+=3D$1}; END {print a}'` DATA=3D`vmstat -m | sed 's/K//' | awk '{a+=3D$3}; END {print a*1024}'` TOTAL=3D`echo $DATA $TEXT | awk '{print $1+$2}'` TEXT=3D13102280, 12.4953 MB DATA=3D470022144, 448.248 MB TOTAL=3D483124424, 460.743 MB vmstat -m | grep vnodes kern.maxvnodes: 100000 kern.minvnodes: 25000 vfs.freevnodes: 2380 vfs.wantfreevnodes: 25000 vfs.numvnodes: 43982 As said, the box has 8 GB of RAM, the following loader.conf, and at the time of the lockups there were about 5GB free userland memory available. my loader.conf: vm.kmem_size=3D"1536M" vm.kmem_size_max=3D"1536M" vfs.zfs.arc_min=3D"512M" vfs.zfs.arc_max=3D"768M" vfs.zfs.prefetch_disable=3D"1" as for the filesystem, I only changed the recordsize and the mountpoint, the rest is default: [horkheimer:lopez] root# zfs get all hkpool/mail NAME PROPERTY VALUE SOURCE hkpool/mail type filesystem - hkpool/mail creation Fri Oct 31 13:28 2008 - hkpool/mail used 5.50G - hkpool/mail available 386G - hkpool/mail referenced 4.33G - hkpool/mail compressratio 1.05x - hkpool/mail mounted yes - hkpool/mail quota none default hkpool/mail reservation none default hkpool/mail recordsize 4K local hkpool/mail mountpoint /jails/mail/mail local hkpool/mail sharenfs off default hkpool/mail checksum on default hkpool/mail compression on local hkpool/mail atime on default hkpool/mail devices on default hkpool/mail exec on default hkpool/mail setuid on default hkpool/mail readonly off default hkpool/mail jailed off local hkpool/mail snapdir hidden default hkpool/mail aclmode groupmask default hkpool/mail aclinherit secure default hkpool/mail canmount on default hkpool/mail shareiscsi off default hkpool/mail xattr off temporary hkpool/mail copies 1 default the pool is using a partition on a hardware RAID1: [horkheimer:lopez] root# zpool status pool: hkpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM hkpool ONLINE 0 0 0 da0s1f ONLINE 0 0 0 Regards and thanx a lot for bringing on zfs, Lorenzo On 18.11.2008, at 10:20, Chao Shin wrote: > On Mon, 17 Nov 2008 23:58:35 +0800=EF=BC=8CPawel Jakub Dawidek = <pjd@freebsd.o=20 > rg> wrote: > >> On Thu, Nov 13, 2008 at 06:53:41PM -0800, Xin LI wrote: >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> Xin LI wrote: >>> > Hi, Pawel, >>> > >>> > We can still reproduce the ZFS crash (threading+heavy I/O load) =20= >>> on a >>> > fresh 7.1-STABLE build, in a few minutes: >>> > >>> > /usr/local/bin/iozone -M -e -+u -T -t 128 -S 4096 -L 64 -r 4k -s =20= >>> 30g -i >>> > 0 -i 1 -i 2 -i 8 -+p 70 -C >>> > >>> > I have included a backtrace output from my colleague who has his =20= >>> hands >>> > on the test environment. Should there is more information =20 >>> necessary >>> > please let us know and we wish to provide help on this. >>> >>> Further datapoint. The system used to run with untuned =20 >>> loader.conf, and >>> my colleague just reported that with the following loader.conf, the >>> problem can be triggered sooner: >>> >>> vm.kmem_size_max=3D838860800 >>> vm.kmem_size_scale=3D"2" >>> >>> The system is running FreeBSD/amd64 7.1-PRERELEASE equipped with =20 >>> 8GB of >>> RAM with GENERIC kernel. >> >> With new ZFS I get: >> >> Memory allocation failed:: Cannot allocate memory >> >> Is this expected? >> > > At first, Congratulations to you, thanks to your works, well done! > > I used this command on a FreeBSD 7.1-PRERELEASE amd64 box with 8GB =20 > mem, isn't got output like that, but kernel panic. > Maybe you should lower the threads and file size, for example: > > /usr/local/bin/iozone -M -e -+u -T -t 64 -S 4096 -L 64 -r 4k -s 2g -=20= > i 0 -i 1 -i 2 -i 8 -+p 70 -C > > Actually, we had used this command to test a 8-current with zfs v12 =20= > patch on July, there is no more panic. So we hope > zfs v13 can MFC as soon as possible, because we really need it now. > --=20 > The Power to Serve > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to = "freebsd-stable-unsubscribe@freebsd.org=20 > "
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7BA53082-577E-4DF2-8E2A-025942C11C0A>