From owner-freebsd-fs@FreeBSD.ORG  Thu Oct 11 09:32:57 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 573A69DC;
 Thu, 11 Oct 2012 09:32:57 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id 63C2E8FC08;
 Thu, 11 Oct 2012 09:32:55 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id MAA00482;
 Thu, 11 Oct 2012 12:32:54 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1TMF8P-000Opr-Ry; Thu, 11 Oct 2012 12:32:54 +0300
Message-ID: <50769243.2010208@FreeBSD.org>
Date: Thu, 11 Oct 2012 12:32:51 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:15.0) Gecko/20120913 Thunderbird/15.0.1
MIME-Version: 1.0
To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Subject: Re: ZFS crashing during snapdir lookup for non-existent snapshot...
References: <B244C0E9-539D-4F7C-8616-378E8469F4BB@chittenden.org>
 <5075E3E0.7060706@FreeBSD.org>
 <0A6567E7-3BA5-4F27-AEB2-1C00EDE00641@chittenden.org>
 <5075EDDD.4030008@FreeBSD.org>
 <A1901AB5-6E83-488E-9D29-EA7C4E3720F3@chittenden.org>
 <5075FA8E.10200@FreeBSD.org>
In-Reply-To: <5075FA8E.10200@FreeBSD.org>
X-Enigmail-Version: 1.4.3
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: "freebsd-fs@freebsd.org" <freebsd-fs@FreeBSD.org>,
 Sean Chittenden <sean@chittenden.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Oct 2012 09:32:57 -0000

on 11/10/2012 01:45 Andriy Gapon said the following:
> 
> [restoring mailing list cc]
> 
> on 11/10/2012 00:58 Sean Chittenden said the following:
>>>> I don't have a dump from this particular system, only the backtrace from the crash. The system is ZFS only and I only have a ZFS swapdir. :-/
>>>>
>>>> I have the kernel still so I can poke at the code and the compiled kernel (kernel.symbols). ? What are you looking for? -sc
>>>>
>>>
>>> list *zfsctl_snapdir_lookup+0x124 in kgdb
>>
>> (kgdb) list *zfsctl_snapdir_lookup+0x124
>> 0xffffffff816e9384 is in zfsctl_snapdir_lookup (/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c:992).
>> 987				*direntflags = ED_CASE_CONFLICT;
>> 988	#endif
>> 989		}
>> 990	
>> 991		mutex_enter(&sdp->sd_lock);
>> 992		search.se_name = (char *)nm;
>> 993		if ((sep = avl_find(&sdp->sd_snaps, &search, &where)) != NULL) {
>> 994			*vpp = sep->se_root;
>> 995			VN_HOLD(*vpp);
>> 996			err = traverse(vpp, LK_EXCLUSIVE | LK_RETRY);
> 
> It seems that the problem is in Solaris-ism that remained in the code.
> I think that zfsctl_snapdir_inactive should not destroy sdp, that should be a
> job of vop_reclaim.  Otherwise, if the vnode is re-activated its v_data points
> to freed memory.
> 


Particularly I have this scenario in mind:
- one thread, T1, performs a vput-ish operation which leads to vop_inactive on a
current vnode that represents ".zfs/snapshot"
- at the same time T2 executes a lookup that goes into zfsctl_root_lookup
- let's assume that at some point T1 is at the very start of
zfsctl_snapdir_inactive, it holds just a vnode lock
- at the same time T2 is in gfs_dir_lookup->gfs_dir_lookup_static and it has
gfs_dir_lock
- so T2 finds the 'snapshot' static entry in gfsd_static[]
- T2 finds the cached vnode and adds a reference
- T2 does gfs_dir_unlock and returns the vnode
- now T1 proceeds through zfsctl_snapdir_inactive and destroys the v_data (but
without clearing the pointer, even)
- T2 uses the vnode and gets a crash

Possible resolutions:
- make vop_inactive a noop and make vop_reclaim call the current inactive methods
- check v_usecount in gfs_file_inactive after gfs_dir_lock is obtained and bail
out if it is > 0 (somewhat similar to what zfs_zinactive does)
- something else?

Easy way to reproduce the problem in one way or another - run many of the
following in parallel:
while true; do ls -l /pool/fs/.zfs/ >/dev/null; done

Here is another panic that is a variation of the above scenario.  Duplicate
gfs_vop_inactive is called after a "harmless" vop_pathconf call (doesn't touch a
vnode).  In this case the "shares" entry appears to be a random victim:

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x18
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff825fe7dd
stack pointer           = 0x28:0xffffff80e040b800
frame pointer           = 0x28:0xffffff80e040b830
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, IOPL = 0
current process         = 712 (ls)
trap number             = 12
panic: page fault
cpuid = 1
curthread: 0xfffffe0003d8a9a0
KDB: stack backtrace:
db_trace_self_wrapper() at 0xffffffff802d2bba = db_trace_self_wrapper+0x2a
kdb_backtrace() at 0xffffffff805596fa = kdb_backtrace+0x3a
panic() at 0xffffffff8051c2a6 = panic+0x266
trap_fatal() at 0xffffffff8070741d = trap_fatal+0x3ad
trap_pfault() at 0xffffffff8070756c = trap_pfault+0x12c
trap() at 0xffffffff80707d19 = trap+0x4f9
calltrap() at 0xffffffff806ef903 = calltrap+0x8
--- trap 0xc, rip = 0xffffffff825fe7dd, rsp = 0xffffff80e040b800, rbp =
0xffffff80e040b830 ---
gfs_vop_inactive() at 0xffffffff825fe7dd = gfs_vop_inactive+0x1d
VOP_INACTIVE_APV() at 0xffffffff80782fb4 = VOP_INACTIVE_APV+0x114
vinactive() at 0xffffffff805c84ad = vinactive+0x15d
vputx() at 0xffffffff805ca962 = vputx+0x4d2
vput() at 0xffffffff805ca9ce = vput+0xe
kern_pathconf() at 0xffffffff805cd44e = kern_pathconf+0x10e
sys_lpathconf() at 0xffffffff805cd4aa = sys_lpathconf+0x1a
amd64_syscall() at 0xffffffff80706953 = amd64_syscall+0x313
Xfast_syscall() at 0xffffffff806efbe7 = Xfast_syscall+0xf7
-- 
Andriy Gapon