Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 07 Oct 2015 10:42:52 -0700
From:      John Baldwin <jhb@freebsd.org>
To:        Christian Kratzer <ck@cksoft.de>
Cc:        Rick Macklem <rmacklem@uoguelph.ca>, freebsd-stable@freebsd.org
Subject:   Re: smbfs crashes since approx. 10.1-RELEASE
Message-ID:  <3563189.eDHDcCgW5L@ralph.baldwin.cx>
In-Reply-To: <alpine.BSF.2.20.1510070844030.16263@noc1.cksoft.de>
References:  <alpine.BSF.2.20.1510051157450.16263@noc1.cksoft.de> <2148690.gx9M0ZzrG1@ralph.baldwin.cx> <alpine.BSF.2.20.1510070844030.16263@noc1.cksoft.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday, October 07, 2015 08:52:30 AM Christian Kratzer wrote:
> Hi,
> 
> On Tue, 6 Oct 2015, John Baldwin wrote:
> <snipp/>
> >> This crash is occurring when doing an mtx_unlock(&Giant). Unfortunately, I'm not
> >> conversant w.r.t. this code. I've cc'd jhb@ in case he has some insight.
> >> If you don't get any responses, I'd suggest reposting to freebsd-current@ with
> >> "crashes in mtx_unlock(&Giant)" in the subject line.
> >>
> >> Btw John, the code does tsleep() in a loop before the mtx_unlock(&Giant). I do
> >> remember that was once allowed, but am not sure if it still is (ie a tsleep() call
> >> while holding Giant)?
> >>
> >> Hopefully someone who knows what is special about Giant that might cause this will
> >> respond.
> >>
> >> Good luck with it, rick
> >
> > tsleep() with Giant is still allowed.  However, this sort of panic usually means
> > you unlocked a mutex you didn't hold (but without INVARIANTS enabled or you'd get
> > an assertion failure earlier).
> >
> > I don't see anything obviously wrong in smb_iod_thread() however.
> >
> > If you have the crashdump, can you please run this in kgdb:
> >
> > frame 9
> > p (struct mtx *)c
> > p *(struct mtx *)c
> 
> yes I have. Here we go:
> 
> --snipp--
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address   = 0x20
> fault code              = supervisor read data, page not present
> instruction pointer     = 0x20:0xffffffff80996c7c
> stack pointer           = 0x28:0xfffffe004e79bac0
> frame pointer           = 0x28:0xfffffe004e79baf0
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                          = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = resume, IOPL = 0
> current process         = 12235 (smbiod172)
> trap number             = 12
> panic: page fault
> cpuid = 0
> KDB: stack backtrace:
> #0 0xffffffff80984e30 at kdb_backtrace+0x60
> #1 0xffffffff809489e6 at vpanic+0x126
> #2 0xffffffff809488b3 at panic+0x43
> #3 0xffffffff80d4aadb at trap_fatal+0x36b
> #4 0xffffffff80d4addd at trap_pfault+0x2ed
> #5 0xffffffff80d4a47a at trap+0x47a
> #6 0xffffffff80d307f2 at calltrap+0x8
> #7 0xffffffff8092ebe0 at __mtx_unlock_sleep+0x60
> #8 0xffffffff8092eb69 at __mtx_unlock_flags+0x69
> #9 0xffffffff81a1b724 at smb_iod_thread+0xb4
> #10 0xffffffff8091244a at fork_exit+0x9a
> #11 0xffffffff80d30d2e at fork_trampoline+0xe
> Uptime: 1d18h34m4s
> Dumping 161 out of 999 MB:..10%..20%..30%..40%..50%..60%..70%..80%..90%..100%
> 
> Reading symbols from /boot/kernel/smbfs.ko.symbols...done.
> Loaded symbols for /boot/kernel/smbfs.ko.symbols
> Reading symbols from /boot/kernel/libiconv.ko.symbols...done.
> Loaded symbols for /boot/kernel/libiconv.ko.symbols
> Reading symbols from /boot/kernel/libmchain.ko.symbols...done.
> Loaded symbols for /boot/kernel/libmchain.ko.symbols
> #0  doadump (textdump=<value optimized out>) at pcpu.h:219
> 219     pcpu.h: No such file or directory.
>          in pcpu.h
> (kgdb) frame 9
> #9  0xffffffff8092ebe0 in __mtx_unlock_sleep (c=0xfffff8002f531790, opts=<value optimized out>,
>      file=0xffffffff81a25801 "%s: Can't handle disordered parameters %d:%d\n", line=1) at /usr/src/sys/kern/kern_mutex.c:791
> 791     /usr/src/sys/kern/kern_mutex.c: No such file or directory.
>          in /usr/src/sys/kern/kern_mutex.c
> Current language:  auto; currently minimal
> (kgdb) p (struct mtx *)c
> $1 = (struct mtx *) 0xfffff8002f531790
> (kgdb) p *(struct mtx *)c
> $2 = {lock_object = {lo_name = 0x6 <Address 0x6 out of bounds>, lo_flags = 0, lo_data = 0, lo_witness = 0xfffff8002f531798},
>    mtx_lock = 1444181401}

Ok, so that is a destroyed mutex.  This means it is probably not Giant, and
it might be some mutex in smb_iod_main() that shows up in smb_iod_thread() due
to inlining.

Actually, we know this from your earlier mail:

                if (evp->ev_type & SMBIOD_EV_SYNC) {
                        SMB_IOD_EVLOCK(iod);
                        wakeup(evp);
                        SMB_IOD_EVUNLOCK(iod);

Line 624 is that SMB_IOD_EVUNLOCK().

Hmm, does 'p *evp' work at frame 10?  If not, can you try building the
devel/gdb port from a recent ports tree with the 'KGDB' option enabled and
use 'kgdb710' instead of 'kgdb' to see if you can print out '*evp'?

> (kgdb)
> --snipp--
> 
> I can build a GENERIC kernel with INVARIANTS enabled on the box to see if we get a better assertions next time this happens.

That would be great, but please keep the existing core and kernel.  We might
be able to figure this out from that still.

Also, go ahead and put this patch in and let me know if you ever see the
printf logged.  If you do, that could explain this panic (and we might need
a more involved fix to avoid memory leaks).

Index: smb_iod.c
===================================================================
--- smb_iod.c   (revision 288952)
+++ smb_iod.c   (working copy)
@@ -624,6 +624,13 @@
                        SMB_IOD_EVUNLOCK(iod);
                } else
                        free(evp, M_SMBIOD);
+               if (iod->iod_flags & SMBIOD_SHUTDOWN) {
+                       if (!STAILQ_EMPTY(&iod->iod_evlist))
+                               printf("%s: shutdown with pending events\n",
+                                   __func__);
+                       }
+                       return;
+               }
        }
 #if 0
        if (iod->iod_state == SMBIOD_ST_VCACTIVE) {

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3563189.eDHDcCgW5L>