From owner-freebsd-stable@freebsd.org Mon Nov 14 09:35:24 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6DA6CC40DD3 for ; Mon, 14 Nov 2016 09:35:24 +0000 (UTC) (envelope-from hlh@restart.be) Received: from tignes.restart.be (tignes.restart.be [IPv6:2001:41d0:8:bdbe:0:1::]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "tignes.restart.be", Issuer "CA master" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 1A9F71528; Mon, 14 Nov 2016 09:35:24 +0000 (UTC) (envelope-from hlh@restart.be) X-Comment: SPF check N/A for local connections - client-ip=2001:41d0:8:bdbe:1:1::; helo=restart.be; envelope-from=hlh@restart.be; receiver=avg@freebsd.org DKIM-Filter: OpenDKIM Filter v2.10.3 tignes.restart.be 3tHQNp0kCRzsPf DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=restart.be; s=tignes; t=1479116122; bh=sGwCldBiIZP1LhTxj85bafc7iu8V8yUcw5azYmngvNI=; h=Subject:To:References:Cc:From:Date:In-Reply-To; z=Subject:=20Re:=20Freebsd=2011.0=20RELEASE=20-=20ZFS=20deadlock|To :=20Andriy=20Gapon=20,=20freebsd-stable@FreeBSD.o rg|References:=20<0c223160-b76f-c635-bb15-4a068ba7efe7@restart.be> =0D=0A=20=0D=0A= 20<43c9d4d4-1995-5626-d70a-f92a5b456629@FreeBSD.org>=0D=0A=20=0D=0A=20<9d1f9a76-5a8 d-6eca-9a50-907d55099847@FreeBSD.org>=0D=0A=20<6bc95dce-31e1-3013- bfe3-7c2dd80f9d1e@restart.be>=0D=0A=20=0D=0A=20<23a66749-f138-1f1a-afae-c775f906ff 37@restart.be>=0D=0A=20<8e7547ef-87f7-7fab-6f45-221e8cea1989@FreeB SD.org>=0D=0A=20<6d991cea-b420-531e-12cc-001e4aeed66b@restart.be>= 0D=0A=20<67f2e8bd-bff0-f808-7557-7dabe5cad78c@FreeBSD.org>=0D=0A=2 0<1cb09c54-5f0e-2259-a41a-fefe76b4fe8b@restart.be>=0D=0A=20=0D=0A=20<9f20020b-e2f1 -862b-c3fc-dc6ff94e301e@restart.be>=0D=0A=20=0D=0A=20<599c5a5b-aa08-2030-34f3-23ff 19d09a9b@restart.be>=0D=0A=20<32686283-948a-6faf-7ded-ed8fcd23affb @FreeBSD.org>=0D=0A=20=0D=0A=20|C c:=20Konstantin=20Belousov=20|From:=20Henri=20Hen nebert=20|Date:=20Mon,=2014=20Nov=202016=2010:35:2 0=20+0100|In-Reply-To:=20; b=lwFiqlAZ4R1FuXCJxo91/czDMCKXY+HxjwpvKqhir9LMQmWjVjxRrGj+bpnU/5gPa PPyqF++8GX+4E1jde+sJytc3wimUmxP0NfoYRk0VZ9VXLLJhguNhyOSfTsD4Lwn7r6 aHOhxU/HHzMWiyv9uu4pCc8UNlrPml9qn+LDqaXZPevnY7pULMZ3qsRWrEu1wobyKr B/rALC/230w0C84yh0Yg12jRt/Ys5F+sNLKRInkcdxpH4sTa+aV5CcBzCkFYfk5b4Z Mbgd1XOQ9RJzZP5LW9sTZ89Mb99uAADxDB9KtGy7AMAoqWI89tY1hsbx3HN1jguewq PD4H+sq95hhOQ== Received: from restart.be (avoriaz.restart.be [IPv6:2001:41d0:8:bdbe:1:1::]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "smtp.restart.be", Issuer "CA master" (verified OK)) by tignes.restart.be (Postfix) with ESMTPS id 3tHQNp0kCRzsPf; Mon, 14 Nov 2016 10:35:21 +0100 (CET) Received: from chamonix.restart.bel (chamonix.restart.bel [IPv6:2001:41d0:8:bdbe:1:9:0:0]) (authenticated bits=0) by restart.be (8.15.2/8.15.2) with ESMTPSA id uAE9ZKdG098099 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Mon, 14 Nov 2016 10:35:21 +0100 (CET) (envelope-from hlh@restart.be) Subject: Re: Freebsd 11.0 RELEASE - ZFS deadlock To: Andriy Gapon , freebsd-stable@FreeBSD.org References: <0c223160-b76f-c635-bb15-4a068ba7efe7@restart.be> <43c9d4d4-1995-5626-d70a-f92a5b456629@FreeBSD.org> <9d1f9a76-5a8d-6eca-9a50-907d55099847@FreeBSD.org> <6bc95dce-31e1-3013-bfe3-7c2dd80f9d1e@restart.be> <23a66749-f138-1f1a-afae-c775f906ff37@restart.be> <8e7547ef-87f7-7fab-6f45-221e8cea1989@FreeBSD.org> <6d991cea-b420-531e-12cc-001e4aeed66b@restart.be> <67f2e8bd-bff0-f808-7557-7dabe5cad78c@FreeBSD.org> <1cb09c54-5f0e-2259-a41a-fefe76b4fe8b@restart.be> <9f20020b-e2f1-862b-c3fc-dc6ff94e301e@restart.be> <599c5a5b-aa08-2030-34f3-23ff19d09a9b@restart.be> <32686283-948a-6faf-7ded-ed8fcd23affb@FreeBSD.org> Cc: Konstantin Belousov From: Henri Hennebert Message-ID: <26512d69-94c2-92da-e3ea-50aebf17e3a0@restart.be> Date: Mon, 14 Nov 2016 10:35:20 +0100 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Nov 2016 09:35:24 -0000 On 11/14/2016 10:07, Andriy Gapon wrote: > On 13/11/2016 15:28, Henri Hennebert wrote: >> On 11/13/2016 11:06, Andriy Gapon wrote: >>> On 12/11/2016 14:40, Henri Hennebert wrote: > [snip] > > Could you please show 'info local' in frame 14? > I expected that 'nd' variable would be defined there and it may contain some > useful information. > No luck there: (kgdb) fr 14 #14 0xffffffff80636838 in kern_statat (td=0xfffff80009ba0500, flag=, fd=-100, path=0x0, pathseg=, sbp=, hook=0x800e2a388) at /usr/src/sys/kern/vfs_syscalls.c:2160 2160 if ((error = namei(&nd)) != 0) (kgdb) info local rights = nd = error = sb = (kgdb) >> I also try to get information from the execve of the other treads: >> >> for tid 101250: >> (kgdb) fr 10 >> #10 0xffffffff80508ccc in sys_execve (td=0xfffff800b6429000, >> uap=0xfffffe010184fb80) at /usr/src/sys/kern/kern_exec.c:218 >> 218 error = kern_execve(td, &args, NULL); >> (kgdb) print *uap >> $4 = {fname_l_ = 0xfffffe010184fb80 "`\220\217\002\b", fname = 0x8028f9060 >>
, >> fname_r_ = 0xfffffe010184fb88 "`¶ÿÿÿ\177", argv_l_ = 0xfffffe010184fb88 >> "`¶ÿÿÿ\177", argv = 0x7fffffffb660, >> argv_r_ = 0xfffffe010184fb90 "\bÜÿÿÿ\177", envv_l_ = 0xfffffe010184fb90 >> "\bÜÿÿÿ\177", envv = 0x7fffffffdc08, >> envv_r_ = 0xfffffe010184fb98 ""} >> (kgdb) >> >> for tid 101243: >> >> (kgdb) f 15 >> #15 0xffffffff80508ccc in sys_execve (td=0xfffff800b642b500, >> uap=0xfffffe010182cb80) at /usr/src/sys/kern/kern_exec.c:218 >> 218 error = kern_execve(td, &args, NULL); >> (kgdb) print *uap >> $5 = {fname_l_ = 0xfffffe010182cb80 "ÀÏ\205\002\b", fname = 0x80285cfc0
> 0x80285cfc0 out of bounds>, >> fname_r_ = 0xfffffe010182cb88 "`¶ÿÿÿ\177", argv_l_ = 0xfffffe010182cb88 >> "`¶ÿÿÿ\177", argv = 0x7fffffffb660, >> argv_r_ = 0xfffffe010182cb90 "\bÜÿÿÿ\177", envv_l_ = 0xfffffe010182cb90 >> "\bÜÿÿÿ\177", envv = 0x7fffffffdc08, >> envv_r_ = 0xfffffe010182cb98 ""} >> (kgdb) > > I think that you see garbage in those structures because they contain pointers > to userland data. > > Hmm, I've just noticed another interesting thread: > Thread 668 (Thread 101245): > #0 sched_switch (td=0xfffff800b642aa00, newtd=0xfffff8000285f000, flags= optimized out>) at /usr/src/sys/kern/sched_ule.c:1973 > #1 0xffffffff80561ae2 in mi_switch (flags=, newtd=0x0) at > /usr/src/sys/kern/kern_synch.c:455 > #2 0xffffffff805ae8da in sleepq_wait (wchan=0x0, pri=0) at > /usr/src/sys/kern/subr_sleepqueue.c:646 > #3 0xffffffff805614b1 in _sleep (ident=, lock= optimized out>, priority=, wmesg=0xffffffff809c51bc > "vmpfw", sbt=0, pr=, flags=) at > /usr/src/sys/kern/kern_synch.c:229 > #4 0xffffffff8089d1c1 in vm_page_busy_sleep (m=0xfffff800df68cd40, wmesg= optimized out>) at /usr/src/sys/vm/vm_page.c:753 > #5 0xffffffff8089dd4d in vm_page_sleep_if_busy (m=0xfffff800df68cd40, > msg=0xffffffff809c51bc "vmpfw") at /usr/src/sys/vm/vm_page.c:1086 > #6 0xffffffff80886be9 in vm_fault_hold (map=, vaddr= optimized out>, fault_type=4 '\004', fault_flags=0, m_hold=0x0) at > /usr/src/sys/vm/vm_fault.c:495 > #7 0xffffffff80885448 in vm_fault (map=0xfffff80011d66000, vaddr= optimized out>, fault_type=4 '\004', fault_flags=) at > /usr/src/sys/vm/vm_fault.c:273 > #8 0xffffffff808d3c49 in trap_pfault (frame=0xfffffe0101836c00, usermode=1) at > /usr/src/sys/amd64/amd64/trap.c:741 > #9 0xffffffff808d3386 in trap (frame=0xfffffe0101836c00) at > /usr/src/sys/amd64/amd64/trap.c:333 > #10 0xffffffff808b7af1 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236 This tread is another program from the news system: 668 Thread 101245 (PID=49124: innfeed) sched_switch (td=0xfffff800b642aa00, newtd=0xfffff8000285f000, flags=) at /usr/src/sys/kern/sched_ule.c:1973 > > I strongly suspect that this is thread that we were looking for. > I think that it has the vnode lock in the shared mode while trying to fault in a > page. > > Could you please check that by going to frame 6 and printing 'fs' and '*fs.vp'? > It'd be interesting to understand why this thread is waiting here. > So, please also print '*fs.m' and '*fs.object'. No luck :-( (kgdb) fr 6 #6 0xffffffff80886be9 in vm_fault_hold (map=, vaddr=, fault_type=4 '\004', fault_flags=0, m_hold=0x0) at /usr/src/sys/vm/vm_fault.c:495 495 vm_page_sleep_if_busy(fs.m, "vmpfw"); (kgdb) print fs Cannot access memory at address 0xffff00001fa0 (kgdb) Henri