From owner-freebsd-arm@FreeBSD.ORG Sun Oct 26 12:12:09 2014 Return-Path: Delivered-To: freebsd-arm@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 84663AB5; Sun, 26 Oct 2014 12:12:09 +0000 (UTC) Received: from smarthost1.greenhost.nl (smarthost1.greenhost.nl [195.190.28.81]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4547880; Sun, 26 Oct 2014 12:12:08 +0000 (UTC) Received: from smtp.greenhost.nl ([213.108.104.138]) by smarthost1.greenhost.nl with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from ) id 1XiMfq-0004Js-AT; Sun, 26 Oct 2014 13:12:00 +0100 Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: "Konstantin Belousov" , "Rick Macklem" Subject: Re: panic in nfs on arm References: <1340373913.7617662.1414324829387.JavaMail.root@uoguelph.ca> Date: Sun, 26 Oct 2014 13:11:53 +0100 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: "Ronald Klop" Message-ID: In-Reply-To: <1340373913.7617662.1414324829387.JavaMail.root@uoguelph.ca> User-Agent: Opera Mail/12.16 (FreeBSD) X-Authenticated-As-Hash: bdb49c4ff80bd276e321aade33e76e02752072e2 X-Virus-Scanned: by clamav at smarthost1.samage.net X-Spam-Level: / X-Spam-Score: -0.2 X-Spam-Status: No, score=-0.2 required=5.0 tests=ALL_TRUSTED, BAYES_50 autolearn=disabled version=3.3.1 X-Scan-Signature: 503f1a2b1db20d3cc8283cfb339c155f Cc: freebsd-fs@freebsd.org, freebsd-arm@freebsd.org X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Oct 2014 12:12:09 -0000 On Sun, 26 Oct 2014 13:00:29 +0100, Rick Macklem wrote: > Kostik wrote: >> On Sat, Oct 25, 2014 at 07:21:13PM -0400, Rick Macklem wrote: >> > Ronald Klop wrote: >> > > Hi, >> > > >> > > I got a panic on my arm computer while building a port with >> > > /usr/ports >> > > mounted from my FreeBSD-10-STABLE/amd64 machine. >> > > >> > > This is the machine which paniced: >> > > FreeBSD 11.0-CURRENT #1 r272028M: Tue Sep 23 17:11:45 CEST 2014 >> > > root@sjakie.klop.ws:/usr/obj-arm/arm.arm/usr/src-arm/sys/SHEEVAPLUG >> > > arm >> > > >> > > >> > > Tracing pid 90295 tid 100119 td 0xc5f8c960 >> > > db_trace_self() at db_trace_self >> > > pc = 0xc0bb12c8 lr = 0xc0bb1354 (db_trace_thread+0x50) >> > > sp = 0xdf29e5d0 fp = 0xc3e07120 >> > > db_trace_thread() at db_trace_thread+0x50 >> > > pc = 0xc0bb1354 lr = 0xc0936314 >> > > (db_command_init+0x5a4) >> > > sp = 0xdf29e630 fp = 0xc3e07120 >> > > db_command_init() at db_command_init+0x5a4 >> > > pc = 0xc0936314 lr = 0xc0935ad0 (db_skip_to_eol+0x484) >> > > sp = 0xdf29e648 fp = 0xc3e07120 >> > > r4 = 0xc0c8d350 r5 = 0x00000000 >> > > db_skip_to_eol() at db_skip_to_eol+0x484 >> > > pc = 0xc0935ad0 lr = 0xc0935c38 (db_command_loop+0x5c) >> > > sp = 0xdf29e6e8 fp = 0xc3e07120 >> > > r4 = 0xdf29e6fc r5 = 0xc0c8d64c >> > > r6 = 0x3cd90e75 r7 = 0x00000000 >> > > r8 = 0x00000001 r10 = 0x600000d3 >> > > db_command_loop() at db_command_loop+0x5c >> > > pc = 0xc0935c38 lr = 0xc0937f80 >> > > (X_db_sym_numargs+0xec) >> > > sp = 0xdf29e6f0 fp = 0xc3e07120 >> > > X_db_sym_numargs() at X_db_sym_numargs+0xec >> > > pc = 0xc0937f80 lr = 0xc0a6f0c0 (kdb_trap+0x94) >> > > sp = 0xdf29e808 fp = 0xc3e07120 >> > > r4 = 0xdf29e8f8 >> > > kdb_trap() at kdb_trap+0x94 >> > > pc = 0xc0a6f0c0 lr = 0xc0bc1d60 (badaddr_read+0x274) >> > > sp = 0xdf29e828 fp = 0xc3e07120 >> > > r4 = 0xdf29e8f8 r5 = 0x00000001 >> > > r6 = 0x3cd90e75 r7 = 0xc5f8c960 >> > > r8 = 0xdf29e8f8 r10 = 0xdf2a1eb0 >> > > badaddr_read() at badaddr_read+0x274 >> > > pc = 0xc0bc1d60 lr = 0xc0bc1e98 (badaddr_read+0x3ac) >> > > sp = 0xdf29e840 fp = 0xc3e07120 >> > > r4 = 0xc5f8c960 r5 = 0xdf29e8f8 >> > > r6 = 0x3cd90e05 >> > > badaddr_read() at badaddr_read+0x3ac >> > > pc = 0xc0bc1e98 lr = 0xc0bc2278 >> > > (data_abort_handler+0x10c) >> > > sp = 0xdf29e858 fp = 0xc3e07120 >> > > r4 = 0xc0cd8af8 r5 = 0xffff1004 >> > > data_abort_handler() at data_abort_handler+0x10c >> > > pc = 0xc0bc2278 lr = 0xc0bb2f40 (exception_exit) >> > > sp = 0xdf29e8f8 fp = 0xc3e07120 >> > > r4 = 0xffffffff r5 = 0xffff1004 >> > > r6 = 0x3cd90e05 r7 = 0xc0e0ea48 >> > > r8 = 0x0000000f r9 = 0x00000101 >> > > r10 = 0x0000001d >> > > exception_exit() at exception_exit >> > > pc = 0xc0bb2f40 lr = 0xc0b8daf8 (uma_reclaim+0x1f8) >> > > sp = 0xdf29e948 fp = 0xc3e07120 >> > > r0 = 0xba9b9127 r1 = 0x8b3de5fb >> > > r2 = 0xc61c1fc8 r3 = 0xba9b9126 >> > > r4 = 0x00000000 r5 = 0xc61c1fc8 >> > > r6 = 0x3cd90e05 r7 = 0xc0e0ea48 >> > > r8 = 0x0000000f r9 = 0x00000101 >> > > r10 = 0x0000001d r12 = 0x00000000 >> > > uma_reclaim() at uma_reclaim+0x24c >> > This looks to me like a crash in uma_reclaim() and I find UMA >> > way too obscure to understand. >> > >> > I have no idea if it might be related, but alc@ put a fix for low >> > memory situations in r272071 (or maybe it's r272221?). >> > >> > Might be worth trying a slightly newer kernel to see if the >> > problem still occurs. >> > >> > And hopefully someone more conversant with UMA (or this stack >> > trace) can help more. >> > >> > rick >> > >> > > pc = 0xc0b8db4c lr = 0xc0b8c800 (uma_zalloc_arg+0x2f0) >> > > sp = 0xdf29e978 fp = 0xdf29ec10 >> > > r4 = 0xc3e071d8 r5 = 0xc0e0ea00 >> > > r6 = 0xc3e07120 r7 = 0x00000000 >> > > r8 = 0x00000102 r9 = 0xdf29ecf8 >> > > r10 = 0xc61c0760 >> > > uma_zalloc_arg() at uma_zalloc_arg+0x2f0 >> uma_reclaim() is not called from uma_zalloc(). >> I think there is some issue with ddb on arm, which means that >> the backtrace is not useful. See below for one more. >> > Yea, I noticed that and the one below (ie. I knew the stack dump > wasn't correct). I kinda hoped it was right w.r.t. the crash > happening in uma_reclaim() { which only seems to be called from > the pageout daemon? }, so that doesn't match up with the thread. > > Also, I couldn't see what the panic message actually was. Is it > this one at the bottom: > Sleeping thread (tid 100119, pid 90295) owns a non-sleepable lock > or was that what happened when you tried to crash dump? > > Btw, nfscl_nget() does call uma_zalloc(M_WAITOK), but it doesn't hold a > mutex > when it does this. > > rick Hi, The non-sleepable lock is not the original panic. That non-sleepable lock happened when I dumped the memory to dumpdev from the debugger. I don't have the original panic message. It was not on the serial output anymore. Is it possible to let the debugger print it again? I rebooted the machine already. Let's see if it happens again someday. Ronald. >> > > pc = 0xc0b8c800 lr = 0xc09e1df0 (nfscl_nget+0x308) >> > > sp = 0xdf29e990 fp = 0xdf29ec10 >> > > r4 = 0x9bb9fa43 r5 = 0x00000000 >> > > r6 = 0xc550dce8 r7 = 0xc3edaa00 >> > > r8 = 0xc3ebbac0 >> > > nfscl_nget() at nfscl_nget+0x308 >> > > pc = 0xc09e1df0 lr = 0xc09da69c >> > > (ncl_readlinkrpc+0xf60) >> > > sp = 0xdf29e9d8 fp = 0xdf29ea10 >> > > r4 = 0xc550dce8 r5 = 0x00000000 >> > > r6 = 0xc550dcf8 r7 = 0xdf29ecf8 >> > > r8 = 0xdf29ec6c r9 = 0x00000000 >> > > r10 = 0xdf29ed28 >> > > ncl_readlinkrpc() at ncl_readlinkrpc+0xf60 >> > > pc = 0xc09da69c lr = 0xc0bdae44 (VOP_MKDIR_APV+0x94) >> > > sp = 0xdf29ec40 fp = 0xbffff620 >> > > r4 = 0xc0c95c68 r5 = 0xdf29ec6c >> > > r6 = 0x00000001 r7 = 0x00020284 >> > > r8 = 0xffffff9c r9 = 0x00200800 >> > > r10 = 0xc5f8c960 >> > > VOP_MKDIR_APV() at VOP_MKDIR_APV+0x94 >> I do not see how VOP_MKDIR() may end up calling ncl_readlinkrpc(), >> esp. without intervening frame. >> >> > > pc = 0xc0bdae44 lr = 0xc0aca614 (kern_mkdirat+0x18c) >> > > sp = 0xdf29ec50 fp = 0xbffff620 >> > > r4 = 0xdf29ed28 r5 = 0xdf29ec90 >> > > r6 = 0x00000000 >> > > kern_mkdirat() at kern_mkdirat+0x18c >> > > pc = 0xc0aca614 lr = 0xc0aca684 (kern_mkdir+0x24) >> > > sp = 0xdf29ede0 fp = 0xbffff620 >> > > r4 = 0x00020290 r5 = 0xc5f8c960 >> > > r6 = 0x00000000 r7 = 0xc5f7f000 >> > > r8 = 0x00000000 r10 = 0x00013640 >> > > kern_mkdir() at kern_mkdir+0x24 >> > > pc = 0xc0aca684 lr = 0xc0aca6a8 (sys_mkdir+0x1c) >> > > sp = 0xdf29edf0 fp = 0xbffff620 >> > > sys_mkdir() at sys_mkdir+0x1c >> > > pc = 0xc0aca6a8 lr = 0xc0bc2884 (swi_handler+0x254) >> > > sp = 0xdf29edf8 fp = 0xbffff620 >> > > swi_handler() at swi_handler+0x254 >> > > pc = 0xc0bc2884 lr = 0xc0bb2ed0 (swi_exit) >> > > sp = 0xdf29ee60 fp = 0xbffff620 >> > > r4 = 0x00020290 r5 = 0x2085e8e0 >> > > r6 = 0x00020284 r7 = 0x00000088 >> > > r8 = 0x00000001 >> > > swi_exit() at swi_exit >> > > pc = 0xc0bb2ed0 lr = 0xc0bb2ed0 (swi_exit) >> > > sp = 0xdf29ee60 fp = 0xbffff620 >> > > Unable to unwind further >> > > >> > > >> > > Unfortunately dumping the kernel core also paniced. >> > > db> dump >> > > Physical memory: 507 MB >> > > Dumping 74 MB: 71 67 63 >> > > vm_fault(0xc4147000, 0, 1, 0) -> 0 >> > > Fatal kernel mode data abort: 'Translation Fault (P)' >> > > trapframe: 0xdf29e0b8 >> > > FSR=00000017, FAR=00000014, spsr=a00000d3 >> > > r0 =c0cd0f40, r1 =00000000, r2 =c5f8c960, r3 =00000004 >> > > r4 =00000000, r5 =00000000, r6 =00000000, r7 =c3ead01c >> > > r8 =c3ead000, r9 =c3e9e88c, r10=00000000, r11=0000000a >> > > r12=600000d3, ssp=df29e108, slr=c0bb4e24, pc =c0a7d060 >> > > >> > > panic: Fatal abort >> > > Uptime: 3d18h30m32s >> > > Sleeping thread (tid 100119, pid 90295) owns a non-sleepable lock >> > > _______________________________________________ >> > > freebsd-fs@freebsd.org mailing list >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> > > To unsubscribe, send any mail to >> > > "freebsd-fs-unsubscribe@freebsd.org" >> > > >> > _______________________________________________ >> > freebsd-fs@freebsd.org mailing list >> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> > To unsubscribe, send any mail to >> > "freebsd-fs-unsubscribe@freebsd.org"