Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 16 Oct 2015 12:26:00 +0200 (CEST)
From:      Christian Kratzer <ck-lists@cksoft.de>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        freebsd-stable@freebsd.org, John Baldwin <jhb@freebsd.org>
Subject:   Re: smbfs crashes since approx. 10.1-RELEASE
Message-ID:  <alpine.BSF.2.20.1510161223390.47677@noc1.cksoft.de>
In-Reply-To: <173739656.33429352.1444704458926.JavaMail.zimbra@uoguelph.ca>
References:  <alpine.BSF.2.20.1510051157450.16263@noc1.cksoft.de> <3563189.eDHDcCgW5L@ralph.baldwin.cx> <alpine.BSF.2.20.1510091107010.71292@noc1.cksoft.de> <358885214.31305796.1444518367048.JavaMail.zimbra@uoguelph.ca> <alpine.BSF.2.20.1510120946150.47677@noc1.cksoft.de> <alpine.BSF.2.20.1510121008010.47677@noc1.cksoft.de> <2135054744.32546564.1444653156980.JavaMail.zimbra@uoguelph.ca> <alpine.BSF.2.20.1510121552090.47677@noc1.cksoft.de> <173739656.33429352.1444704458926.JavaMail.zimbra@uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Rick,

looks like your latest patch nailed the issue. The box has been up for 3 days:

     ck@noc3:~ % uptime
     12:22PM  up 3 days,  4:11, 1 user, load averages: 0.07, 0.10, 0.08
     ck@noc3:~ %

If it does not crash over the weekend this seems to be it:


ck@noc3:/usr/src % svn diff sys/netsmb/smb_iod.c
Index: sys/netsmb/smb_iod.c
===================================================================
--- sys/netsmb/smb_iod.c        (revision 289211)
+++ sys/netsmb/smb_iod.c        (working copy)
@@ -659,6 +659,11 @@
                         break;
                 tsleep(&iod->iod_flags, PWAIT, "90idle", iod->iod_sleeptimo);
         }
+
+       /* We can now safely destroy the mutexes and free the iod structure. */
+       smb_sl_destroy(&iod->iod_rqlock);
+       smb_sl_destroy(&iod->iod_evlock);
+       free(iod, M_SMBIOD);
         mtx_unlock(&Giant);
         kproc_exit(0);
  }
@@ -695,9 +700,6 @@
  smb_iod_destroy(struct smbiod *iod)
  {
         smb_iod_request(iod, SMBIOD_EV_SHUTDOWN | SMBIOD_EV_SYNC, NULL);
-       smb_sl_destroy(&iod->iod_rqlock);
-       smb_sl_destroy(&iod->iod_evlock);
-       free(iod, M_SMBIOD);
         return 0;
  }

ck@noc3:/usr/src %


Can you get this committed into current and later stable  ?

Greetings
Christian



On Mon, 12 Oct 2015, Rick Macklem wrote:

> Christian Kratzer wrote:
>> Hi Rick,
>>
>> On Mon, 12 Oct 2015, Rick Macklem wrote:
>>
>>> Christian Kratzer wrote:
>>>> Hi Rick,
>>>>
>>>> there was also a second more recent crash in /var/crash
>>>>
>>>>      Mon Oct 12 03:01:16 CEST 2015
>>>>
>>>>      FreeBSD noc3.cksoft.de 10.2-STABLE FreeBSD 10.2-STABLE #2 r288980M:
>>>>      Sun
>>>>      Oct 11 08:37:40 CEST 2015
>>>>      ck@noc3.cksoft.de:/usr/obj/usr/src/sys/NOC  amd64
>>>>
>>>>      panic: Assertion mtx_unowned(m) failed at
>>>>      /usr/src/sys/kern/kern_mutex.c:955
>>>>
>>> Oops, I screwed up. I should have looked at this panic assertion when you
>>> reported
>>> it before. Ok, so if I understand the assertion correctly, it means that
>>> another
>>> thread has the mutex locked. If this is correct, I'll have to take another
>>> look at
>>> the code and figure out how to wait for these other threads to finish with
>>> the mutexes.
>>>
>>> I do think the patch fixes the race I saw, but there must be other races in
>>> the code.
>>>
>>> I'll take another look, but if anyone else is conversant with netsmb, feel
>>> free to
>>> jump in, because it is all new to me.
>>>
>>> Unfortunately, I won't have any way to do testing for the next month or so,
>>> so any
>>> patches I do come up with will be "try this untested..".
>>
>> thats no problem.
>>
>> Just keep the patches coming when you have time and tell me when to reset
>> back to stable,
>> current or whatever so we don't lose sync of the status.
>>
> Well, you can try the attached one instead of the previous ones (ie. against stable).
> It just delays destroying the mutexes until the iod thread is exiting.
>
> I can't quite see why the previous patches wouldn't fix it, but this one leaves
> smb_iod_main() unchanged, so it is a simpler patch and doesn't affect semantics
> except for a slight delay in destroying the mutexes.
>
>> As it looks like that the race happens on unmount I could try putting a sleep
>> 60 into the
>> script that does the "mount && rsycn && umount" magic just before the umount.
>> That would
>> allow anything that it slow to go away to perhaps release the mutexes before
>> the umount.
>>
> If it still crashes with this patch, it might be worth a try.
>
> Or, if this patch still crashes, you could just delete the 3 lines that the
> patch moves, so the mutexes are never destroyed. This would result in a leak,
> but it would tell us if destroying these mutexes is the problem.
>
> Thanks for your willingness to test these, rick
>
>> Not a real fix of course but might help to verify what's going on.
>>
>> Greetings
>> Christian
>>
>>
>> --
>> Christian Kratzer                   CK Software GmbH
>> Email:   ck@cksoft.de               Wildberger Weg 24/2
>> Phone:   +49 7032 893 997 - 0       D-71126 Gaeufelden
>> Fax:     +49 7032 893 997 - 9       HRB 245288, Amtsgericht Stuttgart
>> Mobile:  +49 171 1947 843           Geschaeftsfuehrer: Christian Kratzer
>> Web:     http://www.cksoft.de/
>> _______________________________________________
>> freebsd-stable@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
>>
>

-- 
Christian Kratzer                   CK Software GmbH
Email:   ck@cksoft.de               Wildberger Weg 24/2
Phone:   +49 7032 893 997 - 0       D-71126 Gaeufelden
Fax:     +49 7032 893 997 - 9       HRB 245288, Amtsgericht Stuttgart
Mobile:  +49 171 1947 843           Geschaeftsfuehrer: Christian Kratzer
Web:     http://www.cksoft.de/



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.20.1510161223390.47677>