From owner-freebsd-current@FreeBSD.ORG Tue Oct 9 21:01:04 2007 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E89EB16A41B; Tue, 9 Oct 2007 21:01:04 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (arm132.internetdsl.tpnet.pl [83.17.198.132]) by mx1.freebsd.org (Postfix) with ESMTP id 379FC13C4DB; Tue, 9 Oct 2007 21:01:03 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 7497C45E90; Tue, 9 Oct 2007 23:01:02 +0200 (CEST) Received: from localhost (154.81.datacomsa.pl [195.34.81.154]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 6612A45684; Tue, 9 Oct 2007 23:00:56 +0200 (CEST) Date: Tue, 9 Oct 2007 23:00:43 +0200 From: Pawel Jakub Dawidek To: Darren Reed Message-ID: <20071009210043.GC13519@garage.freebsd.pl> References: <20071005000046.GC92272@garage.freebsd.pl> <20071008121523.GM2327@garage.freebsd.pl> <470BD961.4000407@freebsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="eRtJSFbw+EEWtPj3" Content-Disposition: inline In-Reply-To: <470BD961.4000407@freebsd.org> User-Agent: Mutt/1.4.2.3i X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 7.0-CURRENT i386 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-2.6 required=3.0 tests=BAYES_00 autolearn=ham version=3.0.4 Cc: freebsd-fs@FreeBSD.org, freebsd-current@FreeBSD.org Subject: Re: ZFS kmem_map too small. X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Oct 2007 21:01:05 -0000 --eRtJSFbw+EEWtPj3 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Oct 09, 2007 at 12:41:21PM -0700, Darren Reed wrote: > Pawel Jakub Dawidek wrote: > >Here are some updates: > > > >I was able to reproduce the panic by rsyncing big files and trying > >bonnie++ test suggested in this thread. > > > >Can you guys retry with this patch: > > > > http://people.freebsd.org/~pjd/patches/vm_kern.c.2.patch > > =20 >=20 > So, I have a question... > What happens if the "for (i =3D 0..)" is changed to "while(1)" and > the "panic" is subsequently removed? I think it should stay to give the user a hint what's going on instead of hanging there forever. > It appears like the code changes the meaning of "WAIT" to "wait > for 4 seconds" then panic if it won't work. Previously, "WAIT" was > not waiting at all...whch could be described as a bug! It's actually 7 seconds:) > If I recall correctly, ZFS caches writes and doe them in spurts and > that those spurts are spaced out more than 4 seconds. (For the > curious, do "zpool status" and observe the gap in time between > write activity.) >=20 > If you start a large amount of I/O, it is possible that all the KVA will > be used up and ZFS will not get a chance to flush its buffers before > the 4s timer here expires. Does that sound plausible? It depends if the problem we see is because of caching/delaying writes or just caching data for faster reads. If the latter, the cache can be just thrown away, so it's much faster than waiting for buffers to be flushed in former case. ZFS flushes buffers every 5 seconds by default or when there is too much data, so 7 seconds sounds reasonable. > Would doubling the 8 to (say) 16 be beneficial here, to at least make > the waiting span one ZFS flush out to disk? Note that this is visible by the user as almost complete system hang, I think. 16 would make it to wait for 30 seconds. I do agree that waiting even 30 seconds in some extremly rare situations is better than panicing, but I'd first see if 8 fixes the problem. In my testing kernel I added debug printf to see when 'i' is larger than 0 - every value larger than 0 means panic with the old kernel. I never observed 'i' larger than 1. --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --eRtJSFbw+EEWtPj3 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFHC+v7ForvXbEpPzQRAiNlAKCcYhVYuqetJSW65l+JNEnnnVKB7ACdFRx5 xjaHLr4pLF4OEct/3Jzx/Wk= =3bSE -----END PGP SIGNATURE----- --eRtJSFbw+EEWtPj3--