From owner-freebsd-stable@freebsd.org  Wed Oct 19 06:58:18 2016
Return-Path: <owner-freebsd-stable@freebsd.org>
Delivered-To: freebsd-stable@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3CD79C162E7
 for <freebsd-stable@mailman.ysv.freebsd.org>;
 Wed, 19 Oct 2016 06:58:18 +0000 (UTC) (envelope-from ml@netfence.it)
Received: from smtp206.alice.it (smtp206.alice.it [82.57.200.102])
 by mx1.freebsd.org (Postfix) with ESMTP id CC7C2F3D
 for <freebsd-stable@freebsd.org>; Wed, 19 Oct 2016 06:58:17 +0000 (UTC)
 (envelope-from ml@netfence.it)
Received: from soth.ventu (79.46.7.147) by smtp206.alice.it (8.6.060.28)
 (authenticated as acanedi@alice.it)
 id 57FB3880018073BD for freebsd-stable@freebsd.org;
 Wed, 19 Oct 2016 08:52:26 +0200
Received: from alamar.ventu (alamar.local.netfence.it [10.1.2.18])
 by soth.ventu (8.15.2/8.15.2) with ESMTP id u9J6qOeC067100
 for <freebsd-stable@freebsd.org>; Wed, 19 Oct 2016 08:52:25 +0200 (CEST)
 (envelope-from ml@netfence.it)
X-Authentication-Warning: soth.ventu: Host alamar.local.netfence.it
 [10.1.2.18] claimed to be alamar.ventu
From: Andrea Venturoli <ml@netfence.it>
Subject: Nightly disk-related panic since upgrade to 10.3
To: "freebsd-stable@freebsd.org" <freebsd-stable@freebsd.org>
Message-ID: <e923a01a-0739-1fc6-32aa-3a1658cd9e7f@netfence.it>
Date: Wed, 19 Oct 2016 08:52:24 +0200
User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:45.0) Gecko/20100101
 Thunderbird/45.4.0
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-stable>, 
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Oct 2016 06:58:18 -0000

Hello.

Last week I upgraded a 9.3/amd64 box to 10.3: since then, it crashed and 
rebooted at least once every night.

The only exception was on Friday, when it locked without rebooting: it 
still answered ping request and logins through HTTP would half work; I'm 
under the impression that the disk subsystem was hung, so ICMP would 
work since it does no I/O and HTTP too worked as far as no disk access 
was required.

Today I was able to get a couple of (almost identical) dumps:

> cpuid = 1
> KDB: stack backtrace:
> #0 0xffffffff804ee170 at kdb_backtrace+0x60
> #1 0xffffffff804b4576 at vpanic+0x126
> #2 0xffffffff804b4443 at panic+0x43
> #3 0xffffffff8068fd2a at softdep_deallocate_dependencies+0x6a
> #4 0xffffffff805394b5 at brelse+0x145
> #5 0xffffffff8053793c at bufwrite+0x3c
> #6 0xffffffff806ae20f at ffs_write+0x3df
> #7 0xffffffff8076d519 at VOP_WRITE_APV+0x149
> #8 0xffffffff806ec7c9 at vnode_pager_generic_putpages+0x2a9
> #9 0xffffffff8076f3b7 at VOP_PUTPAGES_APV+0xa7
> #10 0xffffffff806ea6f5 at vnode_pager_putpages+0xc5
> #11 0xffffffff806e17f8 at vm_pageout_flush+0xc8
> #12 0xffffffff806db432 at vm_object_page_collect_flush+0x182
> #13 0xffffffff806db1cd at vm_object_page_clean+0x13d
> #14 0xffffffff806dadbe at vm_object_terminate+0x8e
> #15 0xffffffff806eac60 at vnode_destroy_vobject+0x90
> #16 0xffffffff806b4232 at ufs_reclaim+0x22
> #17 0xffffffff8076e5c7 at VOP_RECLAIM_APV+0xa7


Has anyone any better insight on what might be going on?
The disks are all connected to a SAS RAID adapter running on mfi; I 
don't think it might be an hardware issue, since it has worked perfectly 
for years until I did the upgrade; also mfiutil says everything is ok 
and nothing mfi-related is in the logs.


Some ideas come to mind about which I might use a second opinion:

_ soft-update is broken: that would really surprise me, since I've been 
using that for years on this and several other boxes (10.3 too);

_ snapshot creation/deletion is causing this: again I'm using that 
almost anywhere, so I don't think this might be the cause alone; 
besides, I've been able to do some dumps without trouble and I don't 
think anything was messing with snapshots at the time of the last two 
panics;

_ mfi driver is broken on 10.3: this is more reasonable to me, since 
this is the only machine I have it on and it's the only case where I get 
this panics.
I found https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=183618, but I 
get no "g_vfs_done()..." messages.

Any other hint?


I'd really like to find out what's going on, I'll appreciate any help 
and I'm willing to provide any useful info.

On the other hand, this is a production server, so I have to solve this 
really soon.
Some idea comes to mind, like disabling softupdate (knowing which file 
system was having trouble would help here; is there any way to know?), 
trying to enable journaling, upgrading to 10-STABLE, build a kernel with 
INVARIANTS/WITNESS/etc..., but I'd appreciate a second opinion before I 
start shooting in the dark.


  bye & Thanks
	av.