From owner-freebsd-stable@FreeBSD.ORG Fri Nov 2 14:21:19 2012 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id F0C8E329; Fri, 2 Nov 2012 14:21:19 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com [209.85.217.182]) by mx1.freebsd.org (Postfix) with ESMTP id 31AB88FC12; Fri, 2 Nov 2012 14:21:18 +0000 (UTC) Received: by mail-lb0-f182.google.com with SMTP id b5so3363390lbd.13 for ; Fri, 02 Nov 2012 07:21:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=2Akeu1u5HuYUbGfSqwbhL60RXjlEMrRX92hT1xyoK78=; b=mlc5Hb7kSuDcvGoqrR/M+a5XePSlRSkZUXgBjdnJIBvVeAaEYXef/5ff6qln/uSYpl ilW8h2686kgzDGuWXz6cARJz69UIAn0uBMOrLL5DeV2+9Tbf0bjPWjChUo8ZPhmZm2Jp d7ki+ZGVvIbytcf+J934MWCMteRDZMabwA7iB0gstQAcqxsTKS31yH2Kexk+gT2fUrmf 2fzkpjb6BRxmAXoSjGPTIDDjFfyCsugHuGsEiU9DbSefHwX8+53T/NWytCWRTmfNASoX FJ9nnv0B8eYxD5JX+AnUTJLsYy9FmBphzPK0rnpde1O0B1v55p8DFcoGZYRyAZdR5OOT iVUw== MIME-Version: 1.0 Received: by 10.152.110.234 with SMTP id id10mr1828414lab.15.1351866078077; Fri, 02 Nov 2012 07:21:18 -0700 (PDT) Sender: asmrookie@gmail.com Received: by 10.112.30.37 with HTTP; Fri, 2 Nov 2012 07:21:18 -0700 (PDT) In-Reply-To: <50910751.9030303@omnilan.de> References: <5022840B.3060708@omnilan.de> <5048C6D1.8020007@omnilan.de> <508EDB2F.3010608@omnilan.de> <50910751.9030303@omnilan.de> Date: Fri, 2 Nov 2012 14:21:18 +0000 X-Google-Sender-Auth: HhNSXDZsjHAOX5F6VQTvwyjVl58 Message-ID: Subject: Re: lock violation in unionfs (9.0-STABLE r230270) From: Attilio Rao To: Harald Schmalzbauer Content-Type: text/plain; charset=UTF-8 Cc: stable@freebsd.org, daichi@freebsd.org, Pavel Polyakov X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: attilio@FreeBSD.org List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Nov 2012 14:21:20 -0000 On Wed, Oct 31, 2012 at 11:11 AM, Harald Schmalzbauer wrote: > schrieb Attilio Rao am 29.10.2012 23:02 (localtime): >> On Mon, Oct 29, 2012 at 7:37 PM, Harald Schmalzbauer >> wrote: >>> schrieb Attilio Rao am 27.10.2012 23:07 (localtime): >>>> On Sat, Oct 27, 2012 at 9:46 PM, Attilio Rao wrote: >>>>> On Sat, Sep 8, 2012 at 12:48 AM, Attilio Rao wrote: >>>>>> On Thu, Sep 6, 2012 at 4:52 PM, Harald Schmalzbauer >>>>>> wrote: >>>>>>> schrieb Attilio Rao am 09.08.2012 20:26 (localtime): >>>>>>>> On 8/8/12, Harald Schmalzbauer wrote: >>>>>>>>> schrieb Pavel Polyakov am 06.03.2012 11:20 (localtime): >>>>>>>>>>>> mount -t unionfs -o noatime /usr /mnt >>>>>>>>>>>> >>>>>>>>>>>> insmntque: mp-safe fs and non-locked vp: 0xfffffe01d96704f0 is not >>>>>>>>>>>> exclusive locked but should be >>>>>>>>>>>> KDB: enter: lock violation >>>>>>>>>>> Pavel, >>>>>>>>>>> can you give a spin to this patch?: >>>>>>>>>>> http://www.freebsd.org/~attilio/unionfs_missing_insmntque_lock.patch >>>>>>>>>>> >>>>>>>>>>> I think that the unlocking is due at that point as the vnode lock can >>>>>>>>>>> be switch later on. >>>>>>>>>>> >>>>>>>>>>> Let me know what you think about it and what the test does. >>>>>>>>>> Thanks! >>>>>>>>>> This patch fixes the problem with lock violation. Sorry I've tested it so >>>>>>>>>> late. >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> this patch still applies cleanly to RELENG_9_1. Was there another fix >>>>>>>>> for the issue or has it just not been PR-sent and thus forgotten? >>>>>>>> Can you and Pavel try the attached patch? Unfortunately I had no time >>>>>>>> to test it, I just made in 5 free mins from a non-FreeBSD workstation, >>>>>>> Sorry, couldn't test earlier, but now I did: >>>>>>> With this patch applied the machine hangs without debug kernel and the >>>>>>> latter gives the following panic: >>>>>>> System call nmount returning with the following locks held: >>>>>>> exclusive lockmgr ufs (ufs) r = 0 (0xc5438278) locked @ >>>>>>> src/sys/fs/unionfs/union_vnops.c:1938 >>>>>>> panic: witness_warn >>>>>>> cpuid = 0 >>>>>>> KDB: stack backtrace: >>>>>>> db_trace_self_wrapper(c0a04f7f,c0c112c4,d1de3bb4,c097aa8c,fc,...) at >>>>>>> db_trace_self_wrapper+0x26 >>>>>>> kdb_backtrace(c0a4965f,0,c09c2ede3c1c,0,...) at kdb_backtrace+0x2a >>>>>>> witness_warn(2,0,c0a4ac34,c0a0990a,286,...) at witness_warn+0x1e4 >>>>>>> syscall(d1de3d08) ar syscall+0x415 >>>>>>> Xint0x80_syscall() at Xint0x80_syscall+0x21 >>>>>>> --- syscall (0, FreeBSD ELF32, nosys), eip = 0x280b883f,esp = >>>>>>> 0xbfbfe46c, ebp = 0xbfbfede8 --- >>>>>>> KDB: enter: panic >>>>>>> [ thread pid 86 tid 100054 ] >>>>>>> Stopped ad kdb_enter+0x3a: movl $0,kdb_why >>>>>>> db> bt >>>>>>> Tracing pid 86 tid 100054 td 0xc541b000 >>>>>>> kdb_enter(c0a00d16,c0a09130,0,0,0,...) at panix+0x190 >>>>>>> witness_warn(2,0,x0a4ac34,c0a0990a,286,...) at witness_warn+0x1e4 >>>>>>> syscall(d1de3d08) at syscall+0x415 >>>>>>> Xint0x80_syscall() at Xint0x80_syscall+0x21 >>>>>>> >>>>>>> Hmm, I guess I forgot to install kernel debug symbols... >>>>>>> Coming back if I have more >>>>>> Unfortunately unionfs does very wrong things with the insmntque() locking. >>>>>> It basically expects the vnode to return locked in the same way >>>>>> requested by the precedent namei() (when that happens) but when you do >>>>>> insmntque() you can only have an LK_EXCLUSIVE lock on the vnode. >>>>> Hello, >>>>> the following patch should workout the issues around unionfs_nodeget() a bit: >>>>> http://www.freebsd.org/~attilio/unionfs_nodeget2.patch >>>>> >>>>> Unfortunately unionfs code is rather messy in the lookup path about >>>>> locking requirements so follow what it needs to be done there is a bit >>>>> difficult. >>>>> I have no way to test this patch, so it is just test-compiled at the >>>>> moment, but I would need that you also test lookup path (so directory >>>>> "ls", find(1) on the whole unionfs volume, etc.) to validate it >>>>> someway. >>>> On a second thought, I think that locking in lookup (and also other >>>> operations) is so fragile and difficult to follow that it makes all >>>> vnops real locking landmines. >>>> I think that the following patch fixes the insmntque insertion and >>>> follows the old approach well enough to be committed separately: >>>> http://www.freebsd.org/~attilio/unionfs_nodeget3.patch >>>> >>> Unfortunately I have no idea about all those locking strategies and >>> implementations. >>> Applying unionfs_nodeget3.patch results in: >>> sys/fs/unionfs/union_subr.c: In function 'unionfs_nodeget': >>> sys/fs/unionfs/union_subr.c:332: error: expected statement >>> before ')' token >>> *** [union_subr.o] Error code 1 >>> >>> I guess there is a typo in this chunk: >>> @@ -317,11 +328,11 @@ unionfs_nodeget(struct mount *mp, struct vnode *up >>> >>> vref(vp); >>> } else >>> *vpp = vp; >>> - >>> -unionfs_nodeget_out: >>> - if (lkflags & LK_TYPE_MASK) >>> - vn_lock(vp, lkflags | LK_RETRY); >>> - >>> + if (lkflags & LK_TYPE_MASK) { >>> + if (lkflags == LK_SHARED)) >>> ---------------------------------------- ^ >>> + vn_lock(vp, LK_DOWNGRADE | LK_RETRY); >>> + } else >>> + VOP_UNLOCK(vp, LK_RELEASE); >>> return (0); >>> } >>> >>> After removing the second right parenthesis kernel compiles. >>> But it still crashes: >>> panic: Lock (lockmgr) ufs not locked @ sys/kern/vfs_default.c:512 >>> cpuid = 1 >>> KDB: stack backtrace: >>> ... >>> If you can use the bt info I'll transcribe - no serial console available :-( >>> >>> Am I right that I should only apply _one_ unionfs-patchX.patch >>> (unionfs_nodeget3.patch in that case)? >> Yes, only that one. >> Can you please do "bt" from DDB and take a picture of you screen with a camera? > > Ok, now I had a reason to take some time finding out how ESXi handles > serial ports ;-) It's quiet easy and very flexible, so no problem > setting up a debug console. > Please find attached the backtrace. > Do I have to load any symbols? It's not very informative what I see, right? Hi Harry, well done. Can you please backout the prior patch and try this one instead?: http://www.freebsd.org/~attilio/unionfs_nodeget4.patch Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein