From owner-freebsd-stable@FreeBSD.ORG Sat Oct 25 15:02:53 2014 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 93FD12CB; Sat, 25 Oct 2014 15:02:53 +0000 (UTC) Received: from mail.madpilot.net (grunt.madpilot.net [78.47.145.38]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 39C47F3; Sat, 25 Oct 2014 15:02:52 +0000 (UTC) Received: from mail (mail [192.168.254.3]) by mail.madpilot.net (Postfix) with ESMTP id 3jQ5Bw4hRSzb3H; Sat, 25 Oct 2014 17:02:32 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=madpilot.net; h= content-transfer-encoding:content-type:content-type:in-reply-to :references:subject:subject:mime-version:user-agent:from:from :date:date:message-id:received:received; s=mail; t=1414249350; x=1416063751; bh=jOdZQts3DDqVZVbV67r+Q6l7Fu5JyaGlJTvk93/BPow=; b= ZLiJ06sHMeE3nSEbWoUcQ88PDpVX7/I150LBmedPGyAkh0Rfje3qPR1XblGbG9mn ubeRY92hwF0G1E6fB3ODR5GUMSH4yfD6Ex1O8+54a4L9tMg0EDI7voKs7qor/cN0 EU9SEoKGWdi2qJf3Mcaxby87A/opshHyfYKO1Ul65+A= Received: from mail.madpilot.net ([192.168.254.3]) by mail (mail.madpilot.net [192.168.254.3]) (amavisd-new, port 10024) with ESMTP id vHBL_i1R6wKW; Sat, 25 Oct 2014 17:02:30 +0200 (CEST) Received: from tommy.madpilot.net (micro.madpilot.net [88.149.173.206]) by mail.madpilot.net (Postfix) with ESMTPSA; Sat, 25 Oct 2014 17:02:30 +0200 (CEST) Message-ID: <544BBB85.2020909@madpilot.net> Date: Sat, 25 Oct 2014 17:02:29 +0200 From: Guido Falsi User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: FreeBSD FS Subject: Re: panic: detach with active requests on 10.1-RC3 References: <544A538F.6060202@FreeBSD.org> In-Reply-To: <544A538F.6060202@FreeBSD.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Cc: Glen Barber , freebsd-stable@FreeBSD.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Oct 2014 15:02:53 -0000 On 10/24/14 15:26, Guido Falsi wrote: > Hi, > > I'm making some experiments with 10.1-RC3 on alix boards as hardware > using NanoBSD. > > By mounting and umounting UFS filesystems I have seen umount constantly > hanging hard in a deadlock. I have tested on two boards with two > distinct compactflash disks with same results. This was not happening > with 10.0-RELEASE. > > I have build a 10.1-RC3 kernel with full debugging and caused the > problem to happen, I got this: > > root@qtest:~ [0]# umount /cfg > panic: detach with active requests > KDB: stack backtrace: > db_trace_self_wrapper(c0968053,c08ea7f0,c2d48800,c23d6bc8,c0536a16,...) > at db_trace_self_wrapper+0x2d/frame 0xc23d6b98 > kdb_backtrace(c09639e1,c09fa7e8,c095761d,c23d6c54,c095761d,...) at > kdb_backtrace+0x30/frame 0xc23d6c00 > vpanic(c09fa682,100,c095761d,c23d6c54,c23d6c54,...) at vpanic+0x80/frame > 0xc23d6c24 > kassert_panic(c095761d,c09575b3,c2d7acc0,4c7,c2d7acc0,...) at > kassert_panic+0xe9/frame 0xc23d6c48 > g_detach(c2d7acc0,4,c095725c,1c2,c09c8d5c,...) at g_detach+0x1d3/frame > 0xc23d6c64 > g_wither_washer(c09f7df4,0,c0956544,124,0,...) at > g_wither_washer+0x109/frame 0xc23d6c90 > g_run_events(0,c23d6d08,c095d42a,3dc,0,...) at g_run_events+0x40/frame > 0xc23d6ccc > fork_exit(c05c4e60,0,c23d6d08) at fork_exit+0x7f/frame 0xc23d6cf4 > fork_trampoline() at fork_trampoline+0x8/frame 0xc23d6cf4 > --- trap 0, eip = 0, esp = 0xc23d6d40, ebp = 0 --- > KDB: enter: panic > [ thread pid 12 tid 100006 ] > Stopped at kdb_enter+0x3d: movl $0,kdb_why > db> > I tried to investigate some more by myself. Maybe what I found is obvious to anyone with decent VFS knowledge, anyway: After some fumbling around I did: db> show geom 0xc2e98b40 consumer: 0xc2e98b40 class: VFS (0xc09c8d5c) geom: ffs.ada0s3 (0xc3293600) provider: ada0s3 (0xc2e7e200) access: r0w0e0 flags: 0x0030 nstart: 19 nend: 18 Which shows nstart != nend, while g_detach asserts them to be the same. Going up the chain of providers I find also it's providers have nstart - nend == 1: db> show geom 0xc2e9b7c0 consumer: 0xc2e9b7c0 class: PART (0xc09c96b0) geom: ada0 (0xc2e7e780) provider: ada0 (0xc2e7e500) access: r2w0e0 flags: 0x0030 nstart: 1430 nend: 1429 db> show geom 0xc2e7e500 provider: ada0 (0xc2e7e500) class: DISK (0xc09c8890) geom: ada0 (0xc2e7e580) mediasize: 4017807360 sectorsize: 512 stripesize: 0 stripeoffset: 0 access: r2w0e0 flags: (0x0030) error: 0 nstart: 2085 nend: 2084 consumer: 0xc2e9a700 (ada0), access=r0w0e0, flags=0x0030 consumer: 0xc2e9b480 (ada0), access=r0w0e0, flags=0x0030 consumer: 0xc2e9b7c0 (ada0), access=r2w0e0, flags=0x0030 Looking at the code these values are touched only in g_io_request() and g_io_deliver() respectively. So this one now looks like a geom problem. In fact the only commit which touched those functions between 10.0 and 10.1 branches is r260385, which merged quite a few things. I've tried reverting it to test without that, but "svn merge -c -260385 ." generated a few conflicts I'm unable to resolve. So I need some guidance even to perform this simple test. > > The machine is sitting there, I am connected with serial console, anyone > willing to help me debug this further? I really know very little about > kernel debugging. If necessary I can also make myself available via IRC > or Jabber. > > It looks like this has some similarities with what was reported here: > > https://lists.freebsd.org/pipermail/freebsd-fs/2014-September/020035.html > > I also tested with head (including r272130) and it does deadlock the same. > After the analysis above I think that there really is no similitude with the probllem reported by bdrewery. -- Guido Falsi