From owner-freebsd-stable@freebsd.org  Wed Oct  7 00:09:13 2015
Return-Path: <owner-freebsd-stable@freebsd.org>
Delivered-To: freebsd-stable@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0B0809D00B6
 for <freebsd-stable@mailman.ysv.freebsd.org>;
 Wed,  7 Oct 2015 00:09:13 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id D96BD6BF
 for <freebsd-stable@freebsd.org>; Wed,  7 Oct 2015 00:09:12 +0000 (UTC)
 (envelope-from jhb@freebsd.org)
Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net
 [73.231.226.104])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 90795B918;
 Tue,  6 Oct 2015 20:09:11 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Rick Macklem <rmacklem@uoguelph.ca>
Cc: Christian Kratzer <ck@cksoft.de>, freebsd-stable@freebsd.org
Subject: Re: smbfs crashes since approx. 10.1-RELEASE
Date: Tue, 06 Oct 2015 17:08:54 -0700
Message-ID: <2148690.gx9M0ZzrG1@ralph.baldwin.cx>
User-Agent: KMail/4.14.3 (FreeBSD/10.2-PRERELEASE; KDE/4.14.3; amd64; ; )
In-Reply-To: <1721669289.24365403.1444083414400.JavaMail.zimbra@uoguelph.ca>
References: <alpine.BSF.2.20.1510051157450.16263@noc1.cksoft.de>
 <1721669289.24365403.1444083414400.JavaMail.zimbra@uoguelph.ca>
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Tue, 06 Oct 2015 20:09:11 -0400 (EDT)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-stable>, 
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 07 Oct 2015 00:09:13 -0000

On Monday, October 05, 2015 06:16:54 PM Rick Macklem wrote:
> Christian Kratzer wrote:
> > Hi,
> > 
> > I run a regular rsync job that runs from cron and copies stuff that gets
> > created on a Windows smbfs share.
> > 
> > Starting about 10.1-RELEASE the VM has become unstable and started panicing.
> > 
> > I have narrowed the issue down to the aforementioned rsync job.
> > 
> > When I move the job to a different VM the the other VM starts crashing and
> > the VM without the job becomes stable agin.
> > 
> > I have panics and crashinfos stored in /var/crash if anybody is interested:
> > 
> >      root@noc2:/var/crash # uname -a
> >      FreeBSD noc2.cksoft.de 10.2-RELEASE FreeBSD 10.2-RELEASE #0 r286666: Wed
> >      Aug 12 15:26:37 UTC 2015
> >      root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64
> >      root@noc2:/var/crash # freebsd-version -u
> >      10.2-RELEASE-p5
> >      root@noc2:/var/crash # freebsd-version -k
> >      10.2-RELEASE
> >      root@noc2:/var/crash #
> > 
> > This is what I have in /var/crash/core.txt.0
> > 
> >      Fatal trap 12: page fault while in kernel mode
> >      cpuid = 0; apic id = 00
> >      fault virtual address   = 0x20
> >      fault code              = supervisor read data, page not present
> >      instruction pointer     = 0x20:0xffffffff80996c7c
> >      stack pointer           = 0x28:0xfffffe003d6c0ac0
> >      frame pointer           = 0x28:0xfffffe003d6c0af0
> >      code segment            = base 0x0, limit 0xfffff, type 0x1b
> >  			    = DPL 0, pres 1, long 1, def32 0, gran 1
> >      processor eflags        = resume, IOPL = 0
> >      current process         = 1349 (smbiod10)
> >      trap number             = 12
> >      panic: page fault
> >      cpuid = 0
> >      KDB: stack backtrace:
> >      #0 0xffffffff80984e30 at kdb_backtrace+0x60
> >      #1 0xffffffff809489e6 at vpanic+0x126
> >      #2 0xffffffff809488b3 at panic+0x43
> >      #3 0xffffffff80d4aadb at trap_fatal+0x36b
> >      #4 0xffffffff80d4addd at trap_pfault+0x2ed
> >      #5 0xffffffff80d4a47a at trap+0x47a
> >      #6 0xffffffff80d307f2 at calltrap+0x8
> >      #7 0xffffffff8092ebe0 at __mtx_unlock_sleep+0x60
> >      #8 0xffffffff8092eb69 at __mtx_unlock_flags+0x69
> >      #9 0xffffffff81a1b724 at smb_iod_thread+0xb4
> >      #10 0xffffffff8091244a at fork_exit+0x9a
> >      #11 0xffffffff80d30d2e at fork_trampoline+0xe
> >      Uptime: 2h43m55s
> >      Dumping 103 out of 999 MB: (CTRL-C to abort)
> >      ..16%..31%..47%..62%..78%..93%
> > 
> This crash is occurring when doing an mtx_unlock(&Giant). Unfortunately, I'm not
> conversant w.r.t. this code. I've cc'd jhb@ in case he has some insight.
> If you don't get any responses, I'd suggest reposting to freebsd-current@ with
> "crashes in mtx_unlock(&Giant)" in the subject line.
> 
> Btw John, the code does tsleep() in a loop before the mtx_unlock(&Giant). I do
> remember that was once allowed, but am not sure if it still is (ie a tsleep() call
> while holding Giant)?
> 
> Hopefully someone who knows what is special about Giant that might cause this will
> respond.
> 
> Good luck with it, rick

tsleep() with Giant is still allowed.  However, this sort of panic usually means
you unlocked a mutex you didn't hold (but without INVARIANTS enabled or you'd get
an assertion failure earlier).

I don't see anything obviously wrong in smb_iod_thread() however.

If you have the crashdump, can you please run this in kgdb:

frame 9
p (struct mtx *)c
p *(struct mtx *)c

-- 
John Baldwin