From owner-freebsd-stable@FreeBSD.ORG Sat May 18 10:14:34 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E3303433 for ; Sat, 18 May 2013 10:14:34 +0000 (UTC) (envelope-from ronald-freebsd8@klop.yi.org) Received: from smarthost1.greenhost.nl (smarthost1.greenhost.nl [195.190.28.78]) by mx1.freebsd.org (Postfix) with ESMTP id 7B02DA6D for ; Sat, 18 May 2013 10:14:33 +0000 (UTC) Received: from smtp.greenhost.nl ([213.108.104.138]) by smarthost1.greenhost.nl with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.69) (envelope-from ) id 1Ude9m-0003GE-GU; Sat, 18 May 2013 12:14:31 +0200 Received: from dhcp-077-251-158-153.chello.nl ([77.251.158.153] helo=ronaldradial) by smtp.greenhost.nl with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from ) id 1Ude9k-0003Ou-T4; Sat, 18 May 2013 12:14:28 +0200 Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: "dennis berger" , "Jeremy Chadwick" Subject: Re: still mbuf leak in 9.0 / 9.1? References: <004BC6EA-D8E6-473E-851C-9CDA7578510A@nipsi.de> <20130515211436.GA42790@icarus.home.lan> <696B5622-A95D-4187-A027-07ECC9B5AD1F@nipsi.de> <4F319A22-E611-4EE6-A970-98315B15C12F@nipsi.de> <1186B7CE-EC84-42F6-8904-EDD0C4A5FFBD@bsdsystems.de> <20130517173101.GB87223@icarus.home.lan> Date: Sat, 18 May 2013 12:14:28 +0200 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: "Ronald Klop" Message-ID: In-Reply-To: <20130517173101.GB87223@icarus.home.lan> User-Agent: Opera Mail/12.15 (Win32) X-Virus-Scanned: by clamav at smarthost1.samage.net X-Spam-Level: / X-Spam-Score: 0.8 X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.1 X-Scan-Signature: a9e4b997d6a751f3e45cb47a3c2b1d2c Cc: Steven Hartland , FreeBSD stable X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 May 2013 10:14:34 -0000 On Fri, 17 May 2013 19:31:01 +0200, Jeremy Chadwick wrote: > On Fri, May 17, 2013 at 11:37:23AM +0200, dennis berger wrote: >> Hi List, >> I can confirm that it is the bug you mentioned steven. >> Here is how I found it. >> >> I recorded hourly zfskern and nfsd stats. like this. >> >> echo "PROCSTAT" >> $reportname >> pgrep -S "(zfskern|nfsd)" | xargs procstat -kk >> $reportname >> >> luckily it crashed this night and logged this. >> >> 1910 101508 nfsd nfsd: service mi_switch+0x186 >> sleepq_wait+0x42 _sleep+0x376 arc_lowmem+0x77 kmem_malloc+0xc1 >> uma_large_malloc+0x4a malloc+0xd9 arc_get_data_buf+0xb5 >> arc_read_nolock+0x1ec arc_read+0x93 dbuf_prefetch+0x12c >> dmu_zfetch_dofetch+0x10b dmu_zfetch+0xaf8 dbuf_read+0x4a7 >> dmu_buf_hold_array_by_dnode+0x16b dmu_buf_hold_array+0x67 >> dmu_read_uio+0x3f zfs_freebsd_read+0x3e3 >> >> Maybe it would be good to merge this fix into RELENG_9_1 and distribute >> a fix via freebsd-update what do you think? >> >> best, >> -dennis >> >> >> Am 16.05.2013 um 11:42 schrieb dennis berger: >> >> > This is indeed a ZFS+NFS system and I can see that istgt and nfs are >> stuck in some ZIO state. Maybe it's this. >> > Thank's for pointing out. >> > >> > Is it this ZFS+NFS deadlock? >> > >> > --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c >> > +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c >> > @@ -3720,8 +3720,16 @@ arc_lowmem(void *arg __unused, int howto >> __unused) >> > mutex_enter(&arc_reclaim_thr_lock); >> > needfree = 1; >> > cv_signal(&arc_reclaim_thr_cv); >> > - while (needfree) >> > - msleep(&needfree, &arc_reclaim_thr_lock, 0, "zfs:lowmem", 0); >> > + >> > + /* >> > + * It is unsafe to block here in arbitrary threads, because we can >> come >> > + * here from ARC itself and may hold ARC locks and thus risk a >> deadlock >> > + * with ARC reclaim thread. >> > + */ >> > + if (curproc == pageproc) { >> > + while (needfree) >> > + msleep(&needfree, &arc_reclaim_thr_lock, 0, "zfs:lowmem", 0); >> > + } >> > mutex_exit(&arc_reclaim_thr_lock); >> > mutex_exit(&arc_lowmem_lock); >> > } >> > >> > I'll try to crash our testsystem. I'll assume that stressing NFS >> backed with ZFS a lot might trigger this bug? >> > >> > -dennis >> > >> > >> > Am 16.05.2013 um 00:03 schrieb Steven Hartland: >> > >> >> ----- Original Message ----- From: "dennis berger" >> >>> FreeBSD 9.1-RELEASE FreeBSD 9.1-RELEASE #0 r243825: Tue Dec 4 >> 09:23:10 UTC 2012 >> >>> >> >>>> 3. Regarding this: >> >>>>>> A clean shutdown isn't possible though. It hangs after vnode >> >>>>>> cleaning, normally you would see detaching of usb devices here, >> or >> >>>>>> other devices maybe? >> >>>> Please don't conflate this with your above issue. This is almost >> >>>> certainly unrelated. Please start a new thread about that if >> desired. >> >>> >> >>> Maybe this is a misunderstanding normally this system will shutdown >> cleanly, of course. >> >>> This hang only appears after the network problem above. >> >> >> >> If this is a ZFS system, its a known issue which is fixed in current, >> >> stable-9, stable-8 and the upcoming 8.4 release. >> >> >> >> If not and you have USB devices see if the following sysctl helps: >> >> hw.usb.no_shutdown_wait=1 > > I'm sorry to say it won't happen. The only updates that the -RELEASE > branches get are for security. If you want fixes for other things, you > need to follow/run stables branches (i.e. stable/9), otherwise you will > need to wait until 9.2-RELEASE comes out. > And errata notices? Are they for security? Ronald.