Date: Tue, 06 Oct 2015 17:08:54 -0700 From: John Baldwin <jhb@freebsd.org> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: Christian Kratzer <ck@cksoft.de>, freebsd-stable@freebsd.org Subject: Re: smbfs crashes since approx. 10.1-RELEASE Message-ID: <2148690.gx9M0ZzrG1@ralph.baldwin.cx> In-Reply-To: <1721669289.24365403.1444083414400.JavaMail.zimbra@uoguelph.ca> References: <alpine.BSF.2.20.1510051157450.16263@noc1.cksoft.de> <1721669289.24365403.1444083414400.JavaMail.zimbra@uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
On Monday, October 05, 2015 06:16:54 PM Rick Macklem wrote: > Christian Kratzer wrote: > > Hi, > > > > I run a regular rsync job that runs from cron and copies stuff that gets > > created on a Windows smbfs share. > > > > Starting about 10.1-RELEASE the VM has become unstable and started panicing. > > > > I have narrowed the issue down to the aforementioned rsync job. > > > > When I move the job to a different VM the the other VM starts crashing and > > the VM without the job becomes stable agin. > > > > I have panics and crashinfos stored in /var/crash if anybody is interested: > > > > root@noc2:/var/crash # uname -a > > FreeBSD noc2.cksoft.de 10.2-RELEASE FreeBSD 10.2-RELEASE #0 r286666: Wed > > Aug 12 15:26:37 UTC 2015 > > root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 > > root@noc2:/var/crash # freebsd-version -u > > 10.2-RELEASE-p5 > > root@noc2:/var/crash # freebsd-version -k > > 10.2-RELEASE > > root@noc2:/var/crash # > > > > This is what I have in /var/crash/core.txt.0 > > > > Fatal trap 12: page fault while in kernel mode > > cpuid = 0; apic id = 00 > > fault virtual address = 0x20 > > fault code = supervisor read data, page not present > > instruction pointer = 0x20:0xffffffff80996c7c > > stack pointer = 0x28:0xfffffe003d6c0ac0 > > frame pointer = 0x28:0xfffffe003d6c0af0 > > code segment = base 0x0, limit 0xfffff, type 0x1b > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > processor eflags = resume, IOPL = 0 > > current process = 1349 (smbiod10) > > trap number = 12 > > panic: page fault > > cpuid = 0 > > KDB: stack backtrace: > > #0 0xffffffff80984e30 at kdb_backtrace+0x60 > > #1 0xffffffff809489e6 at vpanic+0x126 > > #2 0xffffffff809488b3 at panic+0x43 > > #3 0xffffffff80d4aadb at trap_fatal+0x36b > > #4 0xffffffff80d4addd at trap_pfault+0x2ed > > #5 0xffffffff80d4a47a at trap+0x47a > > #6 0xffffffff80d307f2 at calltrap+0x8 > > #7 0xffffffff8092ebe0 at __mtx_unlock_sleep+0x60 > > #8 0xffffffff8092eb69 at __mtx_unlock_flags+0x69 > > #9 0xffffffff81a1b724 at smb_iod_thread+0xb4 > > #10 0xffffffff8091244a at fork_exit+0x9a > > #11 0xffffffff80d30d2e at fork_trampoline+0xe > > Uptime: 2h43m55s > > Dumping 103 out of 999 MB: (CTRL-C to abort) > > ..16%..31%..47%..62%..78%..93% > > > This crash is occurring when doing an mtx_unlock(&Giant). Unfortunately, I'm not > conversant w.r.t. this code. I've cc'd jhb@ in case he has some insight. > If you don't get any responses, I'd suggest reposting to freebsd-current@ with > "crashes in mtx_unlock(&Giant)" in the subject line. > > Btw John, the code does tsleep() in a loop before the mtx_unlock(&Giant). I do > remember that was once allowed, but am not sure if it still is (ie a tsleep() call > while holding Giant)? > > Hopefully someone who knows what is special about Giant that might cause this will > respond. > > Good luck with it, rick tsleep() with Giant is still allowed. However, this sort of panic usually means you unlocked a mutex you didn't hold (but without INVARIANTS enabled or you'd get an assertion failure earlier). I don't see anything obviously wrong in smb_iod_thread() however. If you have the crashdump, can you please run this in kgdb: frame 9 p (struct mtx *)c p *(struct mtx *)c -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2148690.gx9M0ZzrG1>