From owner-freebsd-stable@freebsd.org Fri Oct 16 10:26:06 2015 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8EC019D2EFE for ; Fri, 16 Oct 2015 10:26:06 +0000 (UTC) (envelope-from ck-lists@cksoft.de) Received: from mx1.cksoft.de (mx1.cksoft.de [IPv6:2001:67c:24f8:1::25:1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mx1.cksoft.de", Issuer "CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 48A582D2; Fri, 16 Oct 2015 10:26:06 +0000 (UTC) (envelope-from ck-lists@cksoft.de) Received: from m.cksoft.de (m.cksoft.de [212.17.240.75]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.cksoft.de (Postfix) with ESMTPSA id CADDD1E9EB2; Fri, 16 Oct 2015 12:26:02 +0200 (CEST) Received: from amavis.cksoft.de (unknown [IPv6:2a01:170:1110:8001::25:a1]) by m.cksoft.de (Postfix) with ESMTP id C5EDD631D0; Fri, 16 Oct 2015 12:24:28 +0200 (CEST) X-Virus-Scanned: amavisd-new at cksoft.de Received: from m.cksoft.de ([IPv6:2a01:170:1110:8001::25:1]) by amavis.cksoft.de (amavis.cksoft.de [IPv6:2a01:170:1110:8001::25:a1]) (amavisd-new, port 10041) with ESMTP id usxzwL5KJQtT; Fri, 16 Oct 2015 12:24:27 +0200 (CEST) Received: from noc1.cksoft.de (noc1.cksoft.de [IPv6:2a01:170:1110:8001::53:1]) by m.cksoft.de (Postfix) with ESMTP id 27E9C62FA4; Fri, 16 Oct 2015 12:24:27 +0200 (CEST) Received: by noc1.cksoft.de (Postfix, from userid 1000) id A321713BB1; Fri, 16 Oct 2015 12:26:00 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by noc1.cksoft.de (Postfix) with ESMTP id 73D0C13B4B; Fri, 16 Oct 2015 12:26:00 +0200 (CEST) Date: Fri, 16 Oct 2015 12:26:00 +0200 (CEST) From: Christian Kratzer X-X-Sender: ck@noc1.cksoft.de Reply-To: Christian Kratzer To: Rick Macklem cc: freebsd-stable@freebsd.org, John Baldwin Subject: Re: smbfs crashes since approx. 10.1-RELEASE In-Reply-To: <173739656.33429352.1444704458926.JavaMail.zimbra@uoguelph.ca> Message-ID: References: <3563189.eDHDcCgW5L@ralph.baldwin.cx> <358885214.31305796.1444518367048.JavaMail.zimbra@uoguelph.ca> <2135054744.32546564.1444653156980.JavaMail.zimbra@uoguelph.ca> <173739656.33429352.1444704458926.JavaMail.zimbra@uoguelph.ca> User-Agent: Alpine 2.20 (BSF 67 2015-01-07) X-Spammer-Kill-Ratio: 75% MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Oct 2015 10:26:06 -0000 Hi Rick, looks like your latest patch nailed the issue. The box has been up for 3 days: ck@noc3:~ % uptime 12:22PM up 3 days, 4:11, 1 user, load averages: 0.07, 0.10, 0.08 ck@noc3:~ % If it does not crash over the weekend this seems to be it: ck@noc3:/usr/src % svn diff sys/netsmb/smb_iod.c Index: sys/netsmb/smb_iod.c =================================================================== --- sys/netsmb/smb_iod.c (revision 289211) +++ sys/netsmb/smb_iod.c (working copy) @@ -659,6 +659,11 @@ break; tsleep(&iod->iod_flags, PWAIT, "90idle", iod->iod_sleeptimo); } + + /* We can now safely destroy the mutexes and free the iod structure. */ + smb_sl_destroy(&iod->iod_rqlock); + smb_sl_destroy(&iod->iod_evlock); + free(iod, M_SMBIOD); mtx_unlock(&Giant); kproc_exit(0); } @@ -695,9 +700,6 @@ smb_iod_destroy(struct smbiod *iod) { smb_iod_request(iod, SMBIOD_EV_SHUTDOWN | SMBIOD_EV_SYNC, NULL); - smb_sl_destroy(&iod->iod_rqlock); - smb_sl_destroy(&iod->iod_evlock); - free(iod, M_SMBIOD); return 0; } ck@noc3:/usr/src % Can you get this committed into current and later stable ? Greetings Christian On Mon, 12 Oct 2015, Rick Macklem wrote: > Christian Kratzer wrote: >> Hi Rick, >> >> On Mon, 12 Oct 2015, Rick Macklem wrote: >> >>> Christian Kratzer wrote: >>>> Hi Rick, >>>> >>>> there was also a second more recent crash in /var/crash >>>> >>>> Mon Oct 12 03:01:16 CEST 2015 >>>> >>>> FreeBSD noc3.cksoft.de 10.2-STABLE FreeBSD 10.2-STABLE #2 r288980M: >>>> Sun >>>> Oct 11 08:37:40 CEST 2015 >>>> ck@noc3.cksoft.de:/usr/obj/usr/src/sys/NOC amd64 >>>> >>>> panic: Assertion mtx_unowned(m) failed at >>>> /usr/src/sys/kern/kern_mutex.c:955 >>>> >>> Oops, I screwed up. I should have looked at this panic assertion when you >>> reported >>> it before. Ok, so if I understand the assertion correctly, it means that >>> another >>> thread has the mutex locked. If this is correct, I'll have to take another >>> look at >>> the code and figure out how to wait for these other threads to finish with >>> the mutexes. >>> >>> I do think the patch fixes the race I saw, but there must be other races in >>> the code. >>> >>> I'll take another look, but if anyone else is conversant with netsmb, feel >>> free to >>> jump in, because it is all new to me. >>> >>> Unfortunately, I won't have any way to do testing for the next month or so, >>> so any >>> patches I do come up with will be "try this untested..". >> >> thats no problem. >> >> Just keep the patches coming when you have time and tell me when to reset >> back to stable, >> current or whatever so we don't lose sync of the status. >> > Well, you can try the attached one instead of the previous ones (ie. against stable). > It just delays destroying the mutexes until the iod thread is exiting. > > I can't quite see why the previous patches wouldn't fix it, but this one leaves > smb_iod_main() unchanged, so it is a simpler patch and doesn't affect semantics > except for a slight delay in destroying the mutexes. > >> As it looks like that the race happens on unmount I could try putting a sleep >> 60 into the >> script that does the "mount && rsycn && umount" magic just before the umount. >> That would >> allow anything that it slow to go away to perhaps release the mutexes before >> the umount. >> > If it still crashes with this patch, it might be worth a try. > > Or, if this patch still crashes, you could just delete the 3 lines that the > patch moves, so the mutexes are never destroyed. This would result in a leak, > but it would tell us if destroying these mutexes is the problem. > > Thanks for your willingness to test these, rick > >> Not a real fix of course but might help to verify what's going on. >> >> Greetings >> Christian >> >> >> -- >> Christian Kratzer CK Software GmbH >> Email: ck@cksoft.de Wildberger Weg 24/2 >> Phone: +49 7032 893 997 - 0 D-71126 Gaeufelden >> Fax: +49 7032 893 997 - 9 HRB 245288, Amtsgericht Stuttgart >> Mobile: +49 171 1947 843 Geschaeftsfuehrer: Christian Kratzer >> Web: http://www.cksoft.de/ >> _______________________________________________ >> freebsd-stable@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-stable >> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >> > -- Christian Kratzer CK Software GmbH Email: ck@cksoft.de Wildberger Weg 24/2 Phone: +49 7032 893 997 - 0 D-71126 Gaeufelden Fax: +49 7032 893 997 - 9 HRB 245288, Amtsgericht Stuttgart Mobile: +49 171 1947 843 Geschaeftsfuehrer: Christian Kratzer Web: http://www.cksoft.de/