From owner-freebsd-stable@FreeBSD.ORG Sun Aug 14 14:53:54 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 458D7106566C; Sun, 14 Aug 2011 14:53:54 +0000 (UTC) (envelope-from prvs=120731b379=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 962ED8FC13; Sun, 14 Aug 2011 14:53:53 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Sun, 14 Aug 2011 15:42:54 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Sun, 14 Aug 2011 15:42:54 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014595600.msg; Sun, 14 Aug 2011 15:42:53 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=120731b379=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk> From: "Steven Hartland" To: "Andriy Gapon" References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org> <4E43E272.1060204@FreeBSD.org> <62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk> <4E440865.1040500@FreeBSD.org> <6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk> <4E441314.6060606@FreeBSD.org> Date: Sun, 14 Aug 2011 15:43:26 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Aug 2011 14:53:54 -0000 ----- Original Message ----- From: "Andriy Gapon" > > Maybe test it on couple of machines first just in case I overlooked something > essential, although I have a report from another use that the patch didn't break > anything for him (it was tested for an unrelated issue). We've got this running on a ~40 machines and just had the first panic since the update. Unfortunately it doesn't seem to have changed anything :( We have 352 thread entries starting with:- #0 sched_switch (td=0xffffffff8083e4e0, newtd=0xffffff0012d838c0, flags=Variable "flags" is not available. 23 with:- cpustop_handler () at atomic.h:285 and 16 with:- #0 fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:562 The main message being:- panic: double fault GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: <118>Aug 14 15:13:33 amsbld15 syslogd: exiting on signal 15 Fatal double fault rip = 0xffffffff8053b691 rsp = 0xffffff8d8f356fb0 rbp = 0xffffff8d8f357210 cpuid = 2; apic id = 02 panic: double fault cpuid = 2 KDB: stack backtrace: #0 0xffffffff803bb75e at kdb_backtrace+0x5e #1 0xffffffff8038956e at panic+0x2ae #2 0xffffffff805802b6 at dblfault_handler+0x96 #3 0xffffffff8056900d at Xdblfault+0xad stack: 0xffffff8d8f357000, 4 rsp = 0xffffff800009ae10 Uptime: 2d21h6m18s Physical memory: 49132 MB Dumping 17080 MB: 17065... Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /boot/kernel/zfs.ko.symbols...done. done. Loaded symbols for /boot/kernel/zfs.ko Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. done. Loaded symbols for /boot/kernel/opensolaris.ko Reading symbols from /boot/kernel/linprocfs.ko...Reading symbols from /boot/kernel/linprocfs.ko.symbols...done. done. Loaded symbols for /boot/kernel/linprocfs.ko Reading symbols from /boot/kernel/nullfs.ko...Reading symbols from /boot/kernel/nullfs.ko.symbols...done. done. Loaded symbols for /boot/kernel/nullfs.ko #0 sched_switch (td=0xffffffff8083e4e0, newtd=0xffffff0012d838c0, flags=Variable "flags" is not available.) at /usr/src/sys/kern/sched_ule.c:1858 1858 cpuid = PCPU_GET(cpuid); (kgdb) #0 sched_switch (td=0xffffffff8083e4e0, newtd=0xffffff0012d838c0, flags=Variable "flags" is not available.) at /usr/src/sys/kern/sched_ule.c:1858 #1 0xffffffff80391a99 in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:451 #2 0xffffffff803c5112 in sleepq_timedwait (wchan=0xffffffff8083e080, pri=68) at /usr/src/sys/kern/subr_sleepqueue.c:644 #3 0xffffffff80391efb in _sleep (ident=0xffffffff8083e080, lock=0x0, priority=Variable "priority" is not available.) at /usr/src/sys/kern/kern_synch.c:230 #4 0xffffffff8053ebc9 in scheduler (dummy=Variable "dummy" is not available.) at /usr/src/sys/vm/vm_glue.c:807 #5 0xffffffff80341767 in mi_startup () at /usr/src/sys/kern/init_main.c:254 #6 0xffffffff8016efdc in btext () at /usr/src/sys/amd64/amd64/locore.S:81 #7 0xffffffff80863dc8 in sleepq_chains () #8 0xffffffff80848ae0 in cpu_top () #9 0x0000000000000000 in ?? () #10 0xffffffff8083e4e0 in proc0 () #11 0xffffffff80bb3b90 in ?? () #12 0xffffffff80bb3b38 in ?? () #13 0xffffff0012d838c0 in ?? () #14 0xffffffff803aeb19 in sched_switch (td=0x0, newtd=0x0, flags=Variable "flags" is not available.) at /usr/src/sys/kern/sched_ule.c:1852 Previous frame inner to this frame (corrupt stack?) There are some indications that stopping jails could be the cause of the panics so on one test box I've added in invariants to see if we get anything shows up from that. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-stable@FreeBSD.ORG Sun Aug 14 23:45:02 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 16BB4106564A; Sun, 14 Aug 2011 23:45:02 +0000 (UTC) (envelope-from prvs=120731b379=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 666118FC1C; Sun, 14 Aug 2011 23:45:00 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Mon, 15 Aug 2011 00:44:00 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Mon, 15 Aug 2011 00:44:00 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014600121.msg; Mon, 15 Aug 2011 00:43:59 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=120731b379=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: From: "Steven Hartland" To: "Attilio Rao" , "Jeremy Chadwick" References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><44DD20E1CFA949E8A1B15B3847769DCB@multiplay.co.uk><20110811092858.GA94514@icarus.home.lan> Date: Mon, 15 Aug 2011 00:44:34 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="UTF-8"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: freebsd-stable@freebsd.org, Andriy Gapon Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Aug 2011 23:45:02 -0000 ----- Original Message ----- From: "Attilio Rao" > Anyway, we really would need much more information in order to take a > proactive action. > Would it be possible to access to one of the panic'ing machine? Is it > always the same panic which is happening or it is variadic (like: once > page fault, once fatal double fault, once fatal trap, etc.). They are always double fault, 99% of the time with no additional info we've seen 1 mention of java on one of the machines but the vmcore didn't seem to mention anything to do with that after dump. My colleague informs me when he did the upgrade to add in schedule stop patch, pretty much every machine paniced when shutting the java servers down, which is essentially a jail stop. I've also had two panics when rebooting my test machine to change kernel settings, although this could be a side effect of the scheduler patch? This single test machine is now running with the following none standard settings:- options INVARIANTS options INVARIANT_SUPPORT options DDB options KSTACK_PAGES=12 I've got several vmcores from a number or different machines but none seem to be any use, as they don't seem to list any thread that caused the panic i.e. no mention of dump, or fault. Is there something else in particular I should be looking for? Circumstantial evidence seems to indicate uptime may to be a factor, machines under 2 days seem much less likely to panic. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-stable@FreeBSD.ORG Mon Aug 15 08:31:44 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C9A281065674 for ; Mon, 15 Aug 2011 08:31:44 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 01AEF8FC17 for ; Mon, 15 Aug 2011 08:31:43 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id LAA09764; Mon, 15 Aug 2011 11:31:40 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1QssaB-000750-OY; Mon, 15 Aug 2011 11:31:39 +0300 Message-ID: <4E48D967.9060804@FreeBSD.org> Date: Mon, 15 Aug 2011 11:31:35 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110706 Thunderbird/5.0 MIME-Version: 1.0 To: Steven Hartland References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org> <4E43E272.1060204@FreeBSD.org> <62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk> <4E440865.1040500@FreeBSD.org> <6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk> <4E441314.6060606@FreeBSD.org> <2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk> In-Reply-To: <2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk> X-Enigmail-Version: 1.2pre Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Aug 2011 08:31:45 -0000 on 14/08/2011 17:43 Steven Hartland said the following: > ----- Original Message ----- From: "Andriy Gapon" >> >> Maybe test it on couple of machines first just in case I overlooked something >> essential, although I have a report from another use that the patch didn't break >> anything for him (it was tested for an unrelated issue). > > We've got this running on a ~40 machines and just had the first panic > since the update. Unfortunately it doesn't seem to have changed anything :( > > We have 352 thread entries starting with:- > #0 sched_switch (td=0xffffffff8083e4e0, newtd=0xffffff0012d838c0, > flags=Variable "flags" is not available. > 23 with:- > cpustop_handler () at atomic.h:285 > and 16 with:- > #0 fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:562 I would like to get a full output of thread apply all bt. > The main message being:- > panic: double fault > > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "amd64-marcel-freebsd"... > > Unread portion of the kernel message buffer: > <118>Aug 14 15:13:33 amsbld15 syslogd: exiting on signal 15 So this line, does it indicate a shutdown of a jail or of the whole system? > Fatal double fault > rip = 0xffffffff8053b691 Can you please provide output of 'list *0xffffffff8053b691' in kgdb? > rsp = 0xffffff8d8f356fb0 > rbp = 0xffffff8d8f357210 > cpuid = 2; apic id = 02 > panic: double fault > cpuid = 2 > KDB: stack backtrace: > #0 0xffffffff803bb75e at kdb_backtrace+0x5e > #1 0xffffffff8038956e at panic+0x2ae > #2 0xffffffff805802b6 at dblfault_handler+0x96 > #3 0xffffffff8056900d at Xdblfault+0xad I think (not 100% sure) that with DDB in kernel we could get a better backtrace here, possibly with pre-dblfault stack frames, because DDB backend is a bit more smarter than the trivial stack(9) printer. > stack: 0xffffff8d8f357000, 4 One thing I can say is that this looks like like a double-fault because of stack exhaustion (the most typical cause): rsp value is below td_kstack. Can you please also provide the following information: p *((struct pcb *)((char *)0xffffff8d8f357000 + KSTACK_PAGES * PAGE_SIZE) - 1) where KSTACK_PAGES is a value of KSTACK_PAGES option (amd64 default is 4) and PAGE_SIZE is 4096. > rsp = 0xffffff800009ae10 [snip] > There are some indications that stopping jails could be the > cause of the panics so on one test box I've added in invariants > to see if we get anything shows up from that. OK. -- Andriy Gapon From owner-freebsd-stable@FreeBSD.ORG Mon Aug 15 10:45:16 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9C4151065673; Mon, 15 Aug 2011 10:45:16 +0000 (UTC) (envelope-from prvs=1208040d95=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id EB69E8FC1C; Mon, 15 Aug 2011 10:45:15 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Mon, 15 Aug 2011 11:33:24 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Mon, 15 Aug 2011 11:33:24 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014604141.msg; Mon, 15 Aug 2011 11:33:23 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1208040d95=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk> From: "Steven Hartland" To: "Andriy Gapon" References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org> <4E43E272.1060204@FreeBSD.org> <62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk> <4E440865.1040500@FreeBSD.org> <6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk> <4E441314.6060606@FreeBSD.org> <2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk> <4E48D967.9060804@FreeBSD.org> Date: Mon, 15 Aug 2011 11:34:02 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Aug 2011 10:45:16 -0000 ----- Original Message ----- From: "Andriy Gapon" >> We have 352 thread entries starting with:- >> #0 sched_switch (td=0xffffffff8083e4e0, newtd=0xffffff0012d838c0, >> flags=Variable "flags" is not available. >> 23 with:- >> cpustop_handler () at atomic.h:285 >> and 16 with:- >> #0 fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:562 > > I would like to get a full output of thread apply all bt. http://blog.multplay.co.uk/dropzone/freebsd/panic-2011-08-14-1524.txt >> The main message being:- >> panic: double fault >> >> GNU gdb 6.1.1 [FreeBSD] >> Copyright 2004 Free Software Foundation, Inc. >> GDB is free software, covered by the GNU General Public License, and you are >> welcome to change it and/or distribute copies of it under certain conditions. >> Type "show copying" to see the conditions. >> There is absolutely no warranty for GDB. Type "show warranty" for details. >> This GDB was configured as "amd64-marcel-freebsd"... >> >> Unread portion of the kernel message buffer: >> <118>Aug 14 15:13:33 amsbld15 syslogd: exiting on signal 15 > > So this line, does it indicate a shutdown of a jail or of the whole system? This specific panic was caused by me running "reboot" after all jails (~40) where shutdown, which is slightly different from what my collegue was seeing last friday, where the machines where panicing when the jails themselves where stopped. I may have a crash from one of these if needed. >> Fatal double fault >> rip = 0xffffffff8053b691 > > Can you please provide output of 'list *0xffffffff8053b691' in kgdb? (kgdb) list *0xffffffff8053b691 0xffffffff8053b691 is in vm_fault (/usr/src/sys/vm/vm_fault.c:239). 234 /* 235 * Find the backing store object and offset into it to begin the 236 * search. 237 */ 238 fs.map = map; 239 result = vm_map_lookup(&fs.map, vaddr, fault_type, &fs.entry, 240 &fs.first_object, &fs.first_pindex, &prot, &wired); 241 if (result != KERN_SUCCESS) { 242 if (result != KERN_PROTECTION_FAILURE || 243 (fault_flags & VM_FAULT_WIRE_MASK) != VM_FAULT_USER_WIRE) { > >> rsp = 0xffffff8d8f356fb0 >> rbp = 0xffffff8d8f357210 >> cpuid = 2; apic id = 02 >> panic: double fault >> cpuid = 2 >> KDB: stack backtrace: >> #0 0xffffffff803bb75e at kdb_backtrace+0x5e >> #1 0xffffffff8038956e at panic+0x2ae >> #2 0xffffffff805802b6 at dblfault_handler+0x96 >> #3 0xffffffff8056900d at Xdblfault+0xad > > I think (not 100% sure) that with DDB in kernel we could get a better backtrace > here, possibly with pre-dblfault stack frames, because DDB backend is a bit more > smarter than the trivial stack(9) printer. I've added this into the the kernel on my test machine and will try to get it panic over the next few days. Seems to need a few days on uptime before the panics start happening. In addition to increasing KSTACK_PAGES to 12, if you believe this may be stack exhaustion, do you want me to remove this increase? >> stack: 0xffffff8d8f357000, 4 > > One thing I can say is that this looks like like a double-fault because of stack > exhaustion (the most typical cause): rsp value is below td_kstack. > > Can you please also provide the following information: > p *((struct pcb *)((char *)0xffffff8d8f357000 + KSTACK_PAGES * PAGE_SIZE) - 1) > where KSTACK_PAGES is a value of KSTACK_PAGES option (amd64 default is 4) and > PAGE_SIZE is 4096. (kgdb) p *((struct pcb *)((char *)0xffffff8d8f357000 + 4 * 4096) - 1) $1 = {pcb_r15 = -2138686968, pcb_r14 = -1070655224792, pcb_r13 = 0, pcb_r12 = -1070655225856, pcb_rbp = -491518580864, pcb_rsp = -491518580952, pcb_rbx = -1099195460512, pcb_rip = -2143622375, pcb_fsbase = 34365428376, pcb_gsbase = 0, pcb_kgsbase = 0, pcb_cr0 = 0, pcb_cr2 = 0, pcb_cr3 = 12406784, pcb_cr4 = 0, pcb_dr0 = 0, pcb_dr1 = 0, pcb_dr2 = 0, pcb_dr3 = 0, pcb_dr6 = 0, pcb_dr7 = 0, pcb_flags = 0, pcb_initial_fpucw = 895, pcb_onfault = 0x0, pcb_gs32sd = {sd_lolimit = 0, sd_lobase = 0, sd_type = 0, sd_dpl = 0, sd_p = 0, sd_hilimit = 0, sd_xx = 0, sd_long = 0, sd_def32 = 0, sd_gran = 0, sd_hibase = 0}, pcb_tssp = 0x0, pcb_save = 0xffffff8d8f35ae00, pcb_full_iret = 0 '\0', pcb_gdt = {rd_limit = 0, rd_base = 0}, pcb_idt = {rd_limit = 0, rd_base = 0}, pcb_ldt = {rd_limit = 0, rd_base = 0}, pcb_tr = 0, pcb_user_save = {sv_env = {en_cw = 895, en_sw = 0, en_tw = 0 '\0', en_zero = 0 '\0', en_opcode = 0, en_rip = 0, en_rdp = 0, en_mxcsr = 8096, en_mxcsr_mask = 65535}, sv_fp = {{fp_acc = {fp_bytes = "\000\000\000\000\000\000\000\000\000"}, fp_pad = "\000\000\000\000\000"}, {fp_acc = {fp_bytes = "\000\000\000\000\000\000\000\000\000"}, fp_pad = "\000\000\000\000\000"}, {fp_acc = {fp_bytes = "\000\000\000\000\000\000\000\000\000"}, fp_pad = "\000\000\000\000\000"}, {fp_acc = {fp_bytes = "\000\000\000\000\000\000\000\000\000"}, fp_pad = "\000\000\000\000\000"}, {fp_acc = {fp_bytes = "\000\000\000\000\000\000\000\000\000"}, fp_pad = "\000\000\000\000\000"}, {fp_acc = {fp_bytes = "\000\000\000\000\000\000\000\000\000"}, fp_pad = "\000\000\000\000\000"}, {fp_acc = {fp_bytes = "\000\000\000\000\000\000\000\000\000"}, fp_pad = "\000\000\000\000\000"}, {fp_acc = {fp_bytes = "\000\000\000\000\000\000\000\000\000"}, fp_pad = "\000\000\000\000\000"}}, sv_xmm = {{xmm_bytes = "\000\000\000\b\030\212rA\000\000\000\000\000\000\000"}, { xmm_bytes = '\0' } }, sv_pad = '\0' }} Thanks for your help on this, as its way over my head ;-) Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-stable@FreeBSD.ORG Mon Aug 15 12:00:06 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4EAF1106566C for ; Mon, 15 Aug 2011 12:00:05 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id EFE758FC0C for ; Mon, 15 Aug 2011 12:00:04 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id PAA12470; Mon, 15 Aug 2011 15:00:00 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4E490A3F.1000205@FreeBSD.org> Date: Mon, 15 Aug 2011 14:59:59 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110705 Thunderbird/5.0 MIME-Version: 1.0 To: Steven Hartland References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org> <4E43E272.1060204@FreeBSD.org> <62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk> <4E440865.1040500@FreeBSD.org> <6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk> <4E441314.6060606@FreeBSD.org> <2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk> <4E48D967.9060804@FreeBSD.org> <9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk> In-Reply-To: <9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk> X-Enigmail-Version: 1.2pre Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Aug 2011 12:00:06 -0000 on 15/08/2011 13:34 Steven Hartland said the following: > ----- Original Message ----- From: "Andriy Gapon" >> I think (not 100% sure) that with DDB in kernel we could get a better backtrace >> here, possibly with pre-dblfault stack frames, because DDB backend is a bit more >> smarter than the trivial stack(9) printer. > > I've added this into the the kernel on my test machine and will try > to get it panic over the next few days. Seems to need a few days on > uptime before the panics start happening. In addition to increasing > KSTACK_PAGES to 12, if you believe this may be stack exhaustion, do > you want me to remove this increase? Yes, I think it would make sense to change KSTACK_PAGES to the default value. But, OTOH, if you can afford to have DDB in a few more machines, then it would be interesting to compare behavior with different stack sizes. BTW, if you don't want your machines to sit at ddb prompt after panic, then you'd also need either KDB_UNATTENDED option or set debug.debugger_on_panic=0. -- Andriy Gapon From owner-freebsd-stable@FreeBSD.ORG Mon Aug 15 12:14:43 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8FF861065675 for ; Mon, 15 Aug 2011 12:14:43 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id DA9C38FC16 for ; Mon, 15 Aug 2011 12:14:42 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id PAA12721; Mon, 15 Aug 2011 15:14:40 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4E490DAF.1080009@FreeBSD.org> Date: Mon, 15 Aug 2011 15:14:39 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110705 Thunderbird/5.0 MIME-Version: 1.0 To: Steven Hartland References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org> <4E43E272.1060204@FreeBSD.org> <62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk> <4E440865.1040500@FreeBSD.org> <6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk> <4E441314.6060606@FreeBSD.org> <2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk> <4E48D967.9060804@FreeBSD.org> <9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk> In-Reply-To: <9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk> X-Enigmail-Version: 1.2pre Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Aug 2011 12:14:43 -0000 on 15/08/2011 13:34 Steven Hartland said the following: > (kgdb) list *0xffffffff8053b691 > 0xffffffff8053b691 is in vm_fault (/usr/src/sys/vm/vm_fault.c:239). > 234 /* > 235 * Find the backing store object and offset into it to begin the > 236 * search. > 237 */ > 238 fs.map = map; > 239 result = vm_map_lookup(&fs.map, vaddr, fault_type, &fs.entry, > 240 &fs.first_object, &fs.first_pindex, &prot, &wired); > 241 if (result != KERN_SUCCESS) { > 242 if (result != KERN_PROTECTION_FAILURE || > 243 (fault_flags & VM_FAULT_WIRE_MASK) != > VM_FAULT_USER_WIRE) { > Interesting... thanks! Can you please also additionally provide (lengthy) output of x/512a 0xffffff8d8f356fb0 ? -- Andriy Gapon From owner-freebsd-stable@FreeBSD.ORG Mon Aug 15 12:52:21 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DCE181065672; Mon, 15 Aug 2011 12:52:21 +0000 (UTC) (envelope-from prvs=1208040d95=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 378718FC18; Mon, 15 Aug 2011 12:52:20 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Mon, 15 Aug 2011 13:51:07 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Mon, 15 Aug 2011 13:51:07 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014605358.msg; Mon, 15 Aug 2011 13:51:06 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1208040d95=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk> From: "Steven Hartland" To: "Andriy Gapon" References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org> <4E43E272.1060204@FreeBSD.org> <62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk> <4E440865.1040500@FreeBSD.org> <6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk> <4E441314.6060606@FreeBSD.org> <2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk> <4E48D967.9060804@FreeBSD.org> <9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk> <4E490DAF.1080009@FreeBSD.org> Date: Mon, 15 Aug 2011 13:51:44 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Aug 2011 12:52:21 -0000 ----- Original Message ----- From: "Andriy Gapon" > on 15/08/2011 13:34 Steven Hartland said the following: >> (kgdb) list *0xffffffff8053b691 >> 0xffffffff8053b691 is in vm_fault (/usr/src/sys/vm/vm_fault.c:239). >> 234 /* >> 235 * Find the backing store object and offset into it to begin the >> 236 * search. >> 237 */ >> 238 fs.map = map; >> 239 result = vm_map_lookup(&fs.map, vaddr, fault_type, &fs.entry, >> 240 &fs.first_object, &fs.first_pindex, &prot, &wired); >> 241 if (result != KERN_SUCCESS) { >> 242 if (result != KERN_PROTECTION_FAILURE || >> 243 (fault_flags & VM_FAULT_WIRE_MASK) != >> VM_FAULT_USER_WIRE) { >> > > Interesting... thanks! > Can you please also additionally provide (lengthy) output of x/512a > 0xffffff8d8f356fb0 ? Sorry I'm not sure I follow your their? Do you mean any of the following:- (kgdb) x/512a 0xffffff8d8f35b000: Cannot access memory at address 0xffffff8d8f35b000 (kgdb) list *0xffffff8d8f356fb0 No source file for address 0xffffff8d8f356fb0. or: (kgdb) x/512a 0xffffff8d8f356fb0 0xffffff8d8f356fb0: Cannot access memory at address 0xffffff8d8f356fb0 Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-stable@FreeBSD.ORG Mon Aug 15 13:20:05 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EDF4B1065673 for ; Mon, 15 Aug 2011 13:20:05 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 3C8298FC17 for ; Mon, 15 Aug 2011 13:20:04 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id QAA13741; Mon, 15 Aug 2011 16:20:02 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4E491D01.1090902@FreeBSD.org> Date: Mon, 15 Aug 2011 16:20:01 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110705 Thunderbird/5.0 MIME-Version: 1.0 To: Steven Hartland References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org> <4E43E272.1060204@FreeBSD.org> <62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk> <4E440865.1040500@FreeBSD.org> <6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk> <4E441314.6060606@FreeBSD.org> <2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk> <4E48D967.9060804@FreeBSD.org> <9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk> <4E490DAF.1080009@FreeBSD.org> <796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk> In-Reply-To: <796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk> X-Enigmail-Version: 1.2pre Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Aug 2011 13:20:06 -0000 on 15/08/2011 15:51 Steven Hartland said the following: > ----- Original Message ----- From: "Andriy Gapon" > > >> on 15/08/2011 13:34 Steven Hartland said the following: >>> (kgdb) list *0xffffffff8053b691 >>> 0xffffffff8053b691 is in vm_fault (/usr/src/sys/vm/vm_fault.c:239). >>> 234 /* >>> 235 * Find the backing store object and offset into it to begin the >>> 236 * search. >>> 237 */ >>> 238 fs.map = map; >>> 239 result = vm_map_lookup(&fs.map, vaddr, fault_type, &fs.entry, >>> 240 &fs.first_object, &fs.first_pindex, &prot, &wired); >>> 241 if (result != KERN_SUCCESS) { >>> 242 if (result != KERN_PROTECTION_FAILURE || >>> 243 (fault_flags & VM_FAULT_WIRE_MASK) != >>> VM_FAULT_USER_WIRE) { >>> >> >> Interesting... thanks! >> Can you please also additionally provide (lengthy) output of x/512a >> 0xffffff8d8f356fb0 ? > > Sorry I'm not sure I follow your their? It seems that you got me correctly :) > Do you mean any of the following:- > (kgdb) x/512a > 0xffffff8d8f35b000: Cannot access memory at address 0xffffff8d8f35b000 > > (kgdb) list *0xffffff8d8f356fb0 > No source file for address 0xffffff8d8f356fb0. > > or: > (kgdb) x/512a 0xffffff8d8f356fb0 > 0xffffff8d8f356fb0: Cannot access memory at address 0xffffff8d8f356fb0 Can you please try this (the last command) with 0xffffff8d8f357210 instead? -- Andriy Gapon From owner-freebsd-stable@FreeBSD.ORG Mon Aug 15 14:57:26 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C0CFD1065675; Mon, 15 Aug 2011 14:57:26 +0000 (UTC) (envelope-from prvs=1208040d95=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id C99618FC1E; Mon, 15 Aug 2011 14:57:25 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Mon, 15 Aug 2011 15:55:51 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Mon, 15 Aug 2011 15:55:51 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014606518.msg; Mon, 15 Aug 2011 15:55:50 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1208040d95=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk> From: "Steven Hartland" To: "Andriy Gapon" References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org> <4E43E272.1060204@FreeBSD.org> <62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk> <4E440865.1040500@FreeBSD.org> <6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk> <4E441314.6060606@FreeBSD.org> <2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk> <4E48D967.9060804@FreeBSD.org> <9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk> <4E490DAF.1080009@FreeBSD.org> <796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk> <4E491D01.1090902@FreeBSD.org> Date: Mon, 15 Aug 2011 15:56:27 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Aug 2011 14:57:26 -0000 ----- Original Message ----- From: "Andriy Gapon" To: "Steven Hartland" Cc: Sent: Monday, August 15, 2011 2:20 PM Subject: Re: debugging frequent kernel panics on 8.2-RELEASE > on 15/08/2011 15:51 Steven Hartland said the following: >> ----- Original Message ----- From: "Andriy Gapon" >> >> >>> on 15/08/2011 13:34 Steven Hartland said the following: >>>> (kgdb) list *0xffffffff8053b691 >>>> 0xffffffff8053b691 is in vm_fault (/usr/src/sys/vm/vm_fault.c:239). >>>> 234 /* >>>> 235 * Find the backing store object and offset into it to begin the >>>> 236 * search. >>>> 237 */ >>>> 238 fs.map = map; >>>> 239 result = vm_map_lookup(&fs.map, vaddr, fault_type, &fs.entry, >>>> 240 &fs.first_object, &fs.first_pindex, &prot, &wired); >>>> 241 if (result != KERN_SUCCESS) { >>>> 242 if (result != KERN_PROTECTION_FAILURE || >>>> 243 (fault_flags & VM_FAULT_WIRE_MASK) != >>>> VM_FAULT_USER_WIRE) { >>>> >>> >>> Interesting... thanks! >>> Can you please also additionally provide (lengthy) output of x/512a >>> 0xffffff8d8f356fb0 ? >> >> Sorry I'm not sure I follow your their? > > It seems that you got me correctly :) > >> Do you mean any of the following:- >> (kgdb) x/512a >> 0xffffff8d8f35b000: Cannot access memory at address 0xffffff8d8f35b000 >> >> (kgdb) list *0xffffff8d8f356fb0 >> No source file for address 0xffffff8d8f356fb0. >> >> or: >> (kgdb) x/512a 0xffffff8d8f356fb0 >> 0xffffff8d8f356fb0: Cannot access memory at address 0xffffff8d8f356fb0 > > Can you please try this (the last command) with 0xffffff8d8f357210 instead? (kgdb) x/512a 0xffffff8d8f357210 0xffffff8d8f357210: 0xffffff8d8f357280 0xffffffff805807d3 0xffffff8d8f357220: 0x0 0xffffff8d8f357370 0xffffff8d8f357230: 0xffffff06b7f9c000 0x30 0xffffff8d8f357240: 0x100000000 0x0 0xffffff8d8f357250: 0x0 0x9 0xffffff8d8f357260: 0xc 0xffffff8d8f357370 0xffffff8d8f357270: 0xffffff06b7f9c000 0x0 0xffffff8d8f357280: 0xffffff8d8f357360 0xffffffff80580e0f 0xffffff8d8f357290: 0x0 0x0 0xffffff8d8f3572a0: 0x80074e49e 0x2 0xffffff8d8f3572b0: 0x80071cba0 0x80071cdc0 0xffffff8d8f3572c0: 0x80071c9a0 0x0 0xffffff8d8f3572d0: 0x0 0x0 0xffffff8d8f3572e0: 0x0 0x0 0xffffff8d8f3572f0: 0x0 0x0 0xffffff8d8f357300: 0x80074e49e 0x1 0xffffff8d8f357310: 0x80071cba0 0x80071cdc0 0xffffff8d8f357320: 0x80071c9a0 0x0 0xffffff8d8f357330: 0x0 0x4 0xffffff8d8f357340: 0xffffff070b5a48c0 0xffffff06b7f9c000 0xffffff8d8f357350: 0x0 0xffffffff8083e920 0xffffff8d8f357360: 0xffffff8d8f357430 0xffffffff80568f04 0xffffff8d8f357370: 0xffffff070b5a48c0 0x3 0xffffff8d8f357380: 0xffffff8d8f357440 0x0 0xffffff8d8f357390: 0xffffff8d8f357440 0x30 0xffffff8d8f3573a0: 0xffffff06b7f9c000 0x4 0xffffff8d8f3573b0: 0xffffff8d8f357430 0xffffffff8083e920 0xffffff8d8f3573c0: 0xffffff06b7f9c000 0xffffff070b5a48c0 0xffffff8d8f3573d0: 0xffffff06b7f9c000 0x0 0xffffff8d8f3573e0: 0xffffffff8083e920 0x1b00130000000c 0xffffff8d8f3573f0: 0x30 0x3b003b00000001 0xffffff8d8f357400: 0x0 0xffffffff80384632 0xffffff8d8f357410: 0x20 0x10206 0xffffff8d8f357420: 0xffffff8d8f357430 0x28 0xffffff8d8f357430: 0xffffff8d8f357450 0xffffffff80384681 0xffffff8d8f357440: 0x4 0xffffff070b5a48c0 0xffffff8d8f357450: 0xffffff8d8f357500 0xffffffff80543ffd 0xffffff8d8f357460: 0xffffff8d8f357470 0xffffff8d8f3576d8 0xffffff8d8f357470: 0xffffff8d8f357500 0xffffffff80544ef8 0xffffff8d8f357480: 0xffffff070b5a49b8 0x0 0xffffff8d8f357490: 0x8 0xffffff06b7f9c000 0xffffff8d8f3574a0: 0xffffff06b7f9c000 0xffffff8d8f3576d8 0xffffff8d8f3574b0: 0xffffff8d8f3576d0 0xffffff8d8f3576e8 0xffffff8d8f3574c0: 0x0 0xffffff8d8f3576e0 0xffffff8d8f3574d0: 0x100000001 0x1 0xffffff8d8f3574e0: 0xffffff06b7f9c000 0x1 0xffffff8d8f3574f0: 0x0 0xffffffff8083e920 0xffffff8d8f357500: 0xffffff8d8f357770 0xffffffff8053c723 0xffffff8d8f357510: 0xffffff8d8f35773f 0xffffff8d8f357738 0xffffff8d8f357520: 0x80085e4f9 0x80085e4f8 0xffffff8d8f357530: 0xffffff06b7f9c000 0xffffff8d8f3576e0 0xffffff8d8f357540: 0xffffff8d8f3576e8 0xffffff8d8f3576d0 0xffffff8d8f357550: 0xffffff8d8f3576d8 0x80085e4f9 0xffffff8d8f357560: 0x80085e4f9 0x80085e4f9 0xffffff8d8f357570: 0x80085e4f9 0x80085e4f9 0xffffff8d8f357580: 0x80085e4f9 0x80085e4f9 0xffffff8d8f357590: 0x80085e4f9 0x80085e4f9 0xffffff8d8f3575a0: 0x80085e4f9 0x734210 0xffffff8d8f3575b0: 0x10000000001 0x80073ada0 0xffffff8d8f3575c0: 0x0 0xffffffff8083e920 0xffffff8d8f3575d0: 0x80073aec0 0x1 0xffffff8d8f3575e0: 0x6967614d00000000 0x454e4f4e 0xffffff8d8f3575f0: 0x0 0x0 0xffffff8d8f357600: 0x0 0x0 0xffffff8d8f357610: 0x0 0xfffd 0xffffff8d8f357620: 0x200 0x200 0xffffff8d8f357630: 0x200 0x200 0xffffff8d8f357640: 0x200 0x200 0xffffff8d8f357650: 0x200 0x200 0xffffff8d8f357660: 0x200 0x24200 0xffffff8d8f357670: 0x4200 0x4200 0xffffff8d8f357680: 0x4200 0x4200 0xffffff8d8f357690: 0x200 0x200 0xffffff8d8f3576a0: 0x200 0x200 0xffffff8d8f3576b0: 0x200 0x200 0xffffff8d8f3576c0: 0x200 0x200 0xffffff8d8f3576d0: 0x200 0x200 0xffffff8d8f3576e0: 0xffffffff8083e920 0xffffffff8083e920 0xffffff8d8f3576f0: 0x200 0x0 0xffffff8d8f357700: 0x0 0x200 0xffffff8d8f357710: 0x200 0x200 0xffffff8d8f357720: 0x64000 0x42800 0xffffff8d8f357730: 0x42800 0x42800 0xffffff8d8f357740: 0x42800 0xffffff070b5a48c0 0xffffff8d8f357750: 0xffffff06b7f9c000 0x4 0xffffff8d8f357760: 0x0 0xffffffff8083e920 0xffffff8d8f357770: 0xffffff8d8f3577e0 0xffffffff805807d3 0xffffff8d8f357780: 0x42800 0xffffff8d8f3578d0 0xffffff8d8f357790: 0xffffff06b7f9c000 0x30 0xffffff8d8f3577a0: 0x100050c00 0x50c01 0xffffff8d8f3577b0: 0x50c02 0x9 0xffffff8d8f3577c0: 0xc 0xffffff8d8f3578d0 0xffffff8d8f3577d0: 0xffffff06b7f9c000 0x0 0xffffff8d8f3577e0: 0xffffff8d8f3578c0 0xffffffff80580e0f 0xffffff8d8f3577f0: 0x42800 0x42800 0xffffff8d8f357800: 0x42800 0x42800 0xffffff8d8f357810: 0x42800 0x42800 0xffffff8d8f357820: 0x42800 0x5890a 0xffffff8d8f357830: 0x5890b 0x5890c 0xffffff8d8f357840: 0x5890d 0x5890e 0xffffff8d8f357850: 0x5890f 0x48900 0xffffff8d8f357860: 0x48900 0x48900 0xffffff8d8f357870: 0x48900 0x48900 0xffffff8d8f357880: 0x48900 0x48900 0xffffff8d8f357890: 0x48900 0x4 0xffffff8d8f3578a0: 0xffffff070b5a48c0 0xffffff06b7f9c000 0xffffff8d8f3578b0: 0x0 0xffffffff8083e920 0xffffff8d8f3578c0: 0xffffff8d8f357990 0xffffffff80568f04 0xffffff8d8f3578d0: 0xffffff070b5a48c0 0x3 0xffffff8d8f3578e0: 0xffffff8d8f3579a0 0x0 0xffffff8d8f3578f0: 0xffffff8d8f3579a0 0x30 0xffffff8d8f357900: 0xffffff06b7f9c000 0x4 0xffffff8d8f357910: 0xffffff8d8f357990 0xffffffff8083e920 0xffffff8d8f357920: 0xffffff06b7f9c000 0xffffff070b5a48c0 0xffffff8d8f357930: 0xffffff06b7f9c000 0x0 0xffffff8d8f357940: 0xffffffff8083e920 0x1b00130000000c 0xffffff8d8f357950: 0x30 0x3b003b00000001 0xffffff8d8f357960: 0x0 0xffffffff80384632 0xffffff8d8f357970: 0x20 0x10206 0xffffff8d8f357980: 0xffffff8d8f357990 0x28 0xffffff8d8f357990: 0xffffff8d8f3579b0 0xffffffff80384681 0xffffff8d8f3579a0: 0x4 0xffffff070b5a48c0 0xffffff8d8f3579b0: 0xffffff8d8f357a60 0xffffffff80543ffd 0xffffff8d8f3579c0: 0xffffff8d8f3579d0 0xffffff8d8f357c38 0xffffff8d8f3579d0: 0xffffff8d8f357a60 0xffffffff80544ef8 0xffffff8d8f3579e0: 0xffffff070b5a49b8 0x0 0xffffff8d8f3579f0: 0x41900 0xffffff06b7f9c000 0xffffff8d8f357a00: 0xffffff06b7f9c000 0xffffff8d8f357c38 0xffffff8d8f357a10: 0xffffff8d8f357c30 0xffffff8d8f357c48 0xffffff8d8f357a20: 0x0 0xffffff8d8f357c40 0xffffff8d8f357a30: 0x0 0x1 0xffffff8d8f357a40: 0xffffff06b7f9c000 0x1 0xffffff8d8f357a50: 0x0 0xffffffff8083e920 0xffffff8d8f357a60: 0xffffff8d8f357cd0 0xffffffff8053c723 0xffffff8d8f357a70: 0xffffff8d8f357c9f 0xffffff8d8f357c98 0xffffff8d8f357a80: 0x0 0x0 0xffffff8d8f357a90: 0xffffff06b7f9c000 0xffffff8d8f357c40 0xffffff8d8f357aa0: 0xffffff8d8f357c48 0xffffff8d8f357c30 0xffffff8d8f357ab0: 0xffffff8d8f357c38 0x0 0xffffff8d8f357ac0: 0x0 0x0 0xffffff8d8f357ad0: 0x0 0x0 0xffffff8d8f357ae0: 0x0 0x0 0xffffff8d8f357af0: 0x0 0x0 0xffffff8d8f357b00: 0x0 0x0 0xffffff8d8f357b10: 0x1 0x0 0xffffff8d8f357b20: 0x0 0xffffffff8083e920 0xffffff8d8f357b30: 0x0 0x1 0xffffff8d8f357b40: 0x0 0x0 0xffffff8d8f357b50: 0x0 0x0 0xffffff8d8f357b60: 0x0 0x0 0xffffff8d8f357b70: 0x0 0x0 0xffffff8d8f357b80: 0x0 0x0 0xffffff8d8f357b90: 0x0 0x0 0xffffff8d8f357ba0: 0x0 0x0 0xffffff8d8f357bb0: 0x0 0x0 0xffffff8d8f357bc0: 0x0 0x0 0xffffff8d8f357bd0: 0x0 0x0 0xffffff8d8f357be0: 0x0 0x0 0xffffff8d8f357bf0: 0x0 0x0 0xffffff8d8f357c00: 0x0 0x0 0xffffff8d8f357c10: 0x0 0x0 0xffffff8d8f357c20: 0x0 0x0 0xffffff8d8f357c30: 0x0 0x0 0xffffff8d8f357c40: 0xffffffff8083e920 0xffffffff8083e920 0xffffff8d8f357c50: 0x0 0x0 0xffffff8d8f357c60: 0x0 0x0 0xffffff8d8f357c70: 0x0 0x0 0xffffff8d8f357c80: 0x0 0x0 0xffffff8d8f357c90: 0x0 0x0 0xffffff8d8f357ca0: 0x0 0xffffff070b5a48c0 0xffffff8d8f357cb0: 0xffffff06b7f9c000 0x4 0xffffff8d8f357cc0: 0x0 0xffffffff8083e920 0xffffff8d8f357cd0: 0xffffff8d8f357d40 0xffffffff805807d3 0xffffff8d8f357ce0: 0x0 0xffffff8d8f357e30 0xffffff8d8f357cf0: 0xffffff06b7f9c000 0x30 0xffffff8d8f357d00: 0x100000000 0x0 0xffffff8d8f357d10: 0x0 0x9 0xffffff8d8f357d20: 0xc 0xffffff8d8f357e30 0xffffff8d8f357d30: 0xffffff06b7f9c000 0x0 0xffffff8d8f357d40: 0xffffff8d8f357e20 0xffffffff80580e0f 0xffffff8d8f357d50: 0x0 0x0 0xffffff8d8f357d60: 0x0 0x0 0xffffff8d8f357d70: 0x0 0x0 0xffffff8d8f357d80: 0x0 0x0 0xffffff8d8f357d90: 0x0 0x0 0xffffff8d8f357da0: 0x0 0x0 0xffffff8d8f357db0: 0x0 0x0 0xffffff8d8f357dc0: 0x0 0x0 0xffffff8d8f357dd0: 0x0 0x0 0xffffff8d8f357de0: 0x0 0x0 0xffffff8d8f357df0: 0x0 0x4 0xffffff8d8f357e00: 0xffffff070b5a48c0 0xffffff06b7f9c000 0xffffff8d8f357e10: 0x0 0xffffffff8083e920 0xffffff8d8f357e20: 0xffffff8d8f357ef0 0xffffffff80568f04 0xffffff8d8f357e30: 0xffffff070b5a48c0 0x3 0xffffff8d8f357e40: 0xffffff8d8f357f00 0x0 0xffffff8d8f357e50: 0xffffff8d8f357f00 0x30 0xffffff8d8f357e60: 0xffffff06b7f9c000 0x4 0xffffff8d8f357e70: 0xffffff8d8f357ef0 0xffffffff8083e920 0xffffff8d8f357e80: 0xffffff06b7f9c000 0xffffff070b5a48c0 0xffffff8d8f357e90: 0xffffff06b7f9c000 0x0 0xffffff8d8f357ea0: 0xffffffff8083e920 0x1b00130000000c 0xffffff8d8f357eb0: 0x30 0x3b003b00000001 0xffffff8d8f357ec0: 0x0 0xffffffff80384632 0xffffff8d8f357ed0: 0x20 0x10206 0xffffff8d8f357ee0: 0xffffff8d8f357ef0 0x28 0xffffff8d8f357ef0: 0xffffff8d8f357f10 0xffffffff80384681 0xffffff8d8f357f00: 0x4 0xffffff070b5a48c0 0xffffff8d8f357f10: 0xffffff8d8f357fc0 0xffffffff80543ffd 0xffffff8d8f357f20: 0xffffff8d8f357f30 0xffffff8d8f358198 0xffffff8d8f357f30: 0xffffff8d8f357fc0 0xffffffff80544ef8 0xffffff8d8f357f40: 0xffffff070b5a49b8 0x0 0xffffff8d8f357f50: 0x6c 0xffffff06b7f9c000 0xffffff8d8f357f60: 0xffffff06b7f9c000 0xffffff8d8f358198 0xffffff8d8f357f70: 0xffffff8d8f358190 0xffffff8d8f3581a8 0xffffff8d8f357f80: 0x0 0xffffff8d8f3581a0 0xffffff8d8f357f90: 0x5d0000005c 0x1 0xffffff8d8f357fa0: 0xffffff06b7f9c000 0x1 0xffffff8d8f357fb0: 0x0 0xffffffff8083e920 0xffffff8d8f357fc0: 0xffffff8d8f358230 0xffffffff8053c723 0xffffff8d8f357fd0: 0xffffff8d8f3581ff 0xffffff8d8f3581f8 0xffffff8d8f357fe0: 0x7100000070 0x7300000072 0xffffff8d8f357ff0: 0xffffff06b7f9c000 0xffffff8d8f3581a0 0xffffff8d8f358000: 0xffffff8d8f3581a8 0xffffff8d8f358190 0xffffff8d8f358010: 0xffffff8d8f358198 0x7f0000007e 0xffffff8d8f358020: 0x8100000080 0x8300000082 0xffffff8d8f358030: 0x8500000084 0x8700000086 0xffffff8d8f358040: 0x8900000088 0x8b0000008a 0xffffff8d8f358050: 0x8d0000008c 0x8f0000008e 0xffffff8d8f358060: 0x9100000090 0x92 0xffffff8d8f358070: 0x9500000001 0x9700000096 0xffffff8d8f358080: 0x0 0xffffffff8083e920 0xffffff8d8f358090: 0x9d0000009c 0x1 0xffffff8d8f3580a0: 0xa100000000 0xa3000000a2 0xffffff8d8f3580b0: 0xa5000000a4 0xa7000000a6 0xffffff8d8f3580c0: 0xa9000000a8 0xab000000aa 0xffffff8d8f3580d0: 0xad000000ac 0xaf000000ae 0xffffff8d8f3580e0: 0xb1000000b0 0xb2 0xffffff8d8f3580f0: 0xb5000000b4 0xb7000000b6 0xffffff8d8f358100: 0xb9000000b8 0xbb000000ba 0xffffff8d8f358110: 0xbd000000bc 0xbf000000be 0xffffff8d8f358120: 0xc1000000c0 0xc3000000c2 0xffffff8d8f358130: 0xc5000000c4 0xc7000000c6 0xffffff8d8f358140: 0xc9000000c8 0xcb000000ca 0xffffff8d8f358150: 0xcd000000cc 0xcf000000ce 0xffffff8d8f358160: 0xd1000000d0 0xd3000000d2 0xffffff8d8f358170: 0xd5000000d4 0xd7000000d6 0xffffff8d8f358180: 0xd9000000d8 0xdb000000da 0xffffff8d8f358190: 0xdd000000dc 0xdf000000de 0xffffff8d8f3581a0: 0xffffffff8083e920 0xffffffff8083e920 0xffffff8d8f3581b0: 0xe5000000e4 0x0 0xffffff8d8f3581c0: 0xe900000000 0xeb000000ea 0xffffff8d8f3581d0: 0xed000000ec 0xef000000ee 0xffffff8d8f3581e0: 0xf1000000f0 0xf3000000f2 0xffffff8d8f3581f0: 0xf5000000f4 0xf7000000f6 0xffffff8d8f358200: 0xf9000000f8 0xffffff070b5a48c0 (kgdb) Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-stable@FreeBSD.ORG Mon Aug 15 15:36:35 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E0624106566B; Mon, 15 Aug 2011 15:36:35 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 019128FC0A; Mon, 15 Aug 2011 15:36:34 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id SAA15475; Mon, 15 Aug 2011 18:36:31 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4E493CFE.6010207@FreeBSD.org> Date: Mon, 15 Aug 2011 18:36:30 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110705 Thunderbird/5.0 MIME-Version: 1.0 To: Steven Hartland References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org> <4E43E272.1060204@FreeBSD.org> <62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk> <4E440865.1040500@FreeBSD.org> <6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk> <4E441314.6060606@FreeBSD.org> <2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk> <4E48D967.9060804@FreeBSD.org> <9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk> <4E490DAF.1080009@FreeBSD.org> <796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk> <4E491D01.1090902@FreeBSD.org> <570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk> In-Reply-To: <570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk> X-Enigmail-Version: 1.2pre Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Aug 2011 15:36:36 -0000 on 15/08/2011 17:56 Steven Hartland said the following: > > ----- Original Message ----- From: "Andriy Gapon" > To: "Steven Hartland" > Cc: > Sent: Monday, August 15, 2011 2:20 PM > Subject: Re: debugging frequent kernel panics on 8.2-RELEASE > > >> on 15/08/2011 15:51 Steven Hartland said the following: >>> ----- Original Message ----- From: "Andriy Gapon" >>> >>> >>>> on 15/08/2011 13:34 Steven Hartland said the following: >>>>> (kgdb) list *0xffffffff8053b691 >>>>> 0xffffffff8053b691 is in vm_fault (/usr/src/sys/vm/vm_fault.c:239). >>>>> 234 /* >>>>> 235 * Find the backing store object and offset into it to begin the >>>>> 236 * search. >>>>> 237 */ >>>>> 238 fs.map = map; >>>>> 239 result = vm_map_lookup(&fs.map, vaddr, fault_type, &fs.entry, >>>>> 240 &fs.first_object, &fs.first_pindex, &prot, &wired); >>>>> 241 if (result != KERN_SUCCESS) { >>>>> 242 if (result != KERN_PROTECTION_FAILURE || >>>>> 243 (fault_flags & VM_FAULT_WIRE_MASK) != >>>>> VM_FAULT_USER_WIRE) { >>>>> >>>> >>>> Interesting... thanks! [snip] > (kgdb) x/512a 0xffffff8d8f357210 This is not conclusive, but that stack looks like the following recursive chain: vm_fault -> {vm_map_lookup, vm_map_growstack} -> trap -> trap_pfault -> vm_fault So I suspect that increasing kernel stack size won't help here much. Where does this chain come from? I have no answer at the moment, maybe other developers could help here. I suspect that we shouldn't be getting that trap in vm_map_growstack or should handle it in a different way. > 0xffffff8d8f357210: 0xffffff8d8f357280 0xffffffff805807d3 > 0xffffff8d8f357220: 0x0 0xffffff8d8f357370 > 0xffffff8d8f357230: 0xffffff06b7f9c000 0x30 > 0xffffff8d8f357240: 0x100000000 0x0 > 0xffffff8d8f357250: 0x0 0x9 > 0xffffff8d8f357260: 0xc 0xffffff8d8f357370 > 0xffffff8d8f357270: 0xffffff06b7f9c000 0x0 > 0xffffff8d8f357280: 0xffffff8d8f357360 0xffffffff80580e0f > 0xffffff8d8f357290: 0x0 0x0 > 0xffffff8d8f3572a0: 0x80074e49e 0x2 > 0xffffff8d8f3572b0: 0x80071cba0 0x80071cdc0 > 0xffffff8d8f3572c0: 0x80071c9a0 0x0 > 0xffffff8d8f3572d0: 0x0 0x0 > 0xffffff8d8f3572e0: 0x0 0x0 > 0xffffff8d8f3572f0: 0x0 0x0 > 0xffffff8d8f357300: 0x80074e49e 0x1 > 0xffffff8d8f357310: 0x80071cba0 0x80071cdc0 > 0xffffff8d8f357320: 0x80071c9a0 0x0 > 0xffffff8d8f357330: 0x0 0x4 > 0xffffff8d8f357340: 0xffffff070b5a48c0 0xffffff06b7f9c000 > 0xffffff8d8f357350: 0x0 0xffffffff8083e920 > 0xffffff8d8f357360: 0xffffff8d8f357430 0xffffffff80568f04 > 0xffffff8d8f357370: 0xffffff070b5a48c0 0x3 > 0xffffff8d8f357380: 0xffffff8d8f357440 0x0 > 0xffffff8d8f357390: 0xffffff8d8f357440 0x30 > 0xffffff8d8f3573a0: 0xffffff06b7f9c000 0x4 > 0xffffff8d8f3573b0: 0xffffff8d8f357430 0xffffffff8083e920 > 0xffffff8d8f3573c0: 0xffffff06b7f9c000 0xffffff070b5a48c0 > 0xffffff8d8f3573d0: 0xffffff06b7f9c000 0x0 > 0xffffff8d8f3573e0: 0xffffffff8083e920 0x1b00130000000c > 0xffffff8d8f3573f0: 0x30 0x3b003b00000001 > 0xffffff8d8f357400: 0x0 0xffffffff80384632 > 0xffffff8d8f357410: 0x20 0x10206 > 0xffffff8d8f357420: 0xffffff8d8f357430 0x28 > 0xffffff8d8f357430: 0xffffff8d8f357450 0xffffffff80384681 > 0xffffff8d8f357440: 0x4 0xffffff070b5a48c0 > 0xffffff8d8f357450: 0xffffff8d8f357500 0xffffffff80543ffd > > 0xffffff8d8f357460: 0xffffff8d8f357470 0xffffff8d8f3576d8 > 0xffffff8d8f357470: 0xffffff8d8f357500 0xffffffff80544ef8 > [trim] -- Andriy Gapon From owner-freebsd-stable@FreeBSD.ORG Mon Aug 15 16:03:08 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx2.freebsd.org (mx2.freebsd.org [IPv6:2001:4f8:fff6::35]) by hub.freebsd.org (Postfix) with ESMTP id 931DA1065670 for ; Mon, 15 Aug 2011 16:03:08 +0000 (UTC) (envelope-from ae@FreeBSD.org) Received: from [127.0.0.1] (hub.freebsd.org [IPv6:2001:4f8:fff6::36]) by mx2.freebsd.org (Postfix) with ESMTP id 9E1CE1508EA; Mon, 15 Aug 2011 16:03:05 +0000 (UTC) Message-ID: <4E49430D.10609@FreeBSD.org> Date: Mon, 15 Aug 2011 20:02:21 +0400 From: "Andrey V. Elsukov" User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.17) Gecko/20110429 Thunderbird/3.1.10 MIME-Version: 1.0 To: Kevin Oberman References: In-Reply-To: X-Enigmail-Version: 1.1.2 OpenPGP: id=10C8A17A Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig05A5EE2C5BDC0AE357693AFC" Cc: "freebsd-stable@freebsd.org Stable" Subject: Re: GPT boot blocks, booting and booteasy X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Aug 2011 16:03:08 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig05A5EE2C5BDC0AE357693AFC Content-Type: text/plain; charset=KOI8-R Content-Transfer-Encoding: quoted-printable On 10.08.2011 07:12, Kevin Oberman wrote: > I have /boot/pmbr loaded into the PMBR and gptboot into the > freebsd-boot partition. I'll > admit that I did this by rote and don't understand how these two files > interact with the > UEFI BIOS to get the loader started. I'm not really certain that I > even need both. >=20 > Is it possible to build a "custom" booteasy boot system with boot0cfg > or some other tool > so I can select d ifferent bootable partition or my other disk which > is sliced in the traditional > fashion? Can anyone point me to any information on how the boot > process works with GPT? PMBR is a simple variant of MBR which does know enough to parse GPT partition table and how to load bootcode from the "freebsd-boot" partition. Then gptboot does search bootable UFS partition. At this time we do not have any bootcodes like booteasy for GPT. But you can try to use bootme and bootonce GPT attributes (see gpart(8)). Also you can use grub boot loader. --=20 WBR, Andrey V. Elsukov --------------enig05A5EE2C5BDC0AE357693AFC Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (FreeBSD) iQEcBAEBAgAGBQJOSUMTAAoJEAHF6gQQyKF6B20IAK4akiXcjdsauGiFw9zCulFx fIQBUqN6T7Zjq3HX5FBx2695S9ScsSI/nzKi1I+sXCZcMXf75bIF07WPXJRqdD8W Aw/CAIvBqglvHEA0Edt5Ov1J3z2qoIWERG4bCPgryKK1GxSQ58yLWv4I734HHyiI oZUOORwr3tLnkDQf0ZxZCMXtJhNDy5fi9/Vy7ZI0cOf5BjPzHXDYzHBBSN9VodfT jLhVCI0dP5tCjZYo2SdxzSBg/GTh3LO9xlDxZDVhVG1JipELJuPUw1EbUPM3I3me rmaZ9CC4VG/8y0ea6U/1TP4XKNYyZDoQ37x26poxOLOtpMv4J11+Isv1zQR6aYU= =raVf -----END PGP SIGNATURE----- --------------enig05A5EE2C5BDC0AE357693AFC-- From owner-freebsd-stable@FreeBSD.ORG Mon Aug 15 16:14:29 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1490C106564A; Mon, 15 Aug 2011 16:14:29 +0000 (UTC) (envelope-from prvs=1208040d95=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 564A48FC19; Mon, 15 Aug 2011 16:14:27 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Mon, 15 Aug 2011 17:13:11 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Mon, 15 Aug 2011 17:13:10 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014607352.msg; Mon, 15 Aug 2011 17:13:08 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1208040d95=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <94438CD02F1447EAB4889D16BFC610B5@multiplay.co.uk> From: "Steven Hartland" To: "Andriy Gapon" References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><4E43E272.1060204@FreeBSD.org><62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk><4E440865.1040500@FreeBSD.org><6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk><4E441314.6060606@FreeBSD.org><2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk><4E48D967.9060804@FreeBSD.org><9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk><4E490DAF.1080009@FreeBSD.org><796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk><4E491D01.1090902@FreeBSD.org><570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk> <4E493CFE.6010207@FreeBSD.org> Date: Mon, 15 Aug 2011 17:13:43 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Aug 2011 16:14:29 -0000 ----- Original Message ----- From: "Andriy Gapon" To: "Steven Hartland" Cc: Sent: Monday, August 15, 2011 4:36 PM Subject: Re: debugging frequent kernel panics on 8.2-RELEASE > on 15/08/2011 17:56 Steven Hartland said the following: >> >> ----- Original Message ----- From: "Andriy Gapon" >> To: "Steven Hartland" >> Cc: >> Sent: Monday, August 15, 2011 2:20 PM >> Subject: Re: debugging frequent kernel panics on 8.2-RELEASE >> >> >>> on 15/08/2011 15:51 Steven Hartland said the following: >>>> ----- Original Message ----- From: "Andriy Gapon" >>>> >>>> >>>>> on 15/08/2011 13:34 Steven Hartland said the following: >>>>>> (kgdb) list *0xffffffff8053b691 >>>>>> 0xffffffff8053b691 is in vm_fault (/usr/src/sys/vm/vm_fault.c:239). >>>>>> 234 /* >>>>>> 235 * Find the backing store object and offset into it to begin the >>>>>> 236 * search. >>>>>> 237 */ >>>>>> 238 fs.map = map; >>>>>> 239 result = vm_map_lookup(&fs.map, vaddr, fault_type, &fs.entry, >>>>>> 240 &fs.first_object, &fs.first_pindex, &prot, &wired); >>>>>> 241 if (result != KERN_SUCCESS) { >>>>>> 242 if (result != KERN_PROTECTION_FAILURE || >>>>>> 243 (fault_flags & VM_FAULT_WIRE_MASK) != >>>>>> VM_FAULT_USER_WIRE) { >>>>>> >>>>> >>>>> Interesting... thanks! > [snip] >> (kgdb) x/512a 0xffffff8d8f357210 > > This is not conclusive, but that stack looks like the following recursive chain: > vm_fault -> {vm_map_lookup, vm_map_growstack} -> trap -> trap_pfault -> vm_fault > So I suspect that increasing kernel stack size won't help here much. > Where does this chain come from? I have no answer at the moment, maybe other > developers could help here. I suspect that we shouldn't be getting that trap in > vm_map_growstack or should handle it in a different way. > Just in case its relevant I've checked other crashes and all rip entries point to: vm_fault (/usr/src/sys/vm/vm_fault.c:239). A more typical layout is from a selection of machines is:- Unread portion of the kernel message buffer: Fatal double fault rip = 0xffffffff8053b061 rsp = 0xffffff86ccf8ffb0 rbp = 0xffffff86ccf90210 cpuid = 8; apic id = 10 panic: double fault cpuid = 8 KDB: stack backtrace: #0 0xffffffff803bb28e at kdb_backtrace+0x5e #1 0xffffffff80389187 at panic+0x187 #2 0xffffffff8057fc86 at dblfault_handler+0x96 #3 0xffffffff805689dd at Xdblfault+0xad Uptime: 2d21h25m4s Physical memory: 24555 MB Dumping 4184 MB:... ---- Unread portion of the kernel message buffer: Fatal double fault rip = 0xffffffff8053b061 rsp = 0xffffff86cc742fb0 rbp = 0xffffff86cc743210 cpuid = 8; apic id = 10 panic: double fault cpuid = 8 KDB: stack backtrace: #0 0xffffffff803bb28e at kdb_backtrace+0x5e #1 0xffffffff80389187 at panic+0x187 #2 0xffffffff8057fc86 at dblfault_handler+0x96 #3 0xffffffff805689dd at Xdblfault+0xad Uptime: 2d4h30m58s Physical memory: 24555 MB Dumping 5088 MB:... ---- Fatal double fault rip = 0xffffffff8053b061 rsp = 0xffffff86caeabfb0 rbp = 0xffffff86caeac210 cpuid = 8; apic id = 10 panic: double fault cpuid = 8 KDB: stack backtrace: #0 0xffffffff803bb28e at kdb_backtrace+0x5e #1 0xffffffff80389187 at panic+0x187 #2 0xffffffff8057fc86 at dblfault_handler+0x96 #3 0xffffffff805689dd at Xdblfault+0xad Uptime: 3d1h56m45s Physical memory: 24555 MB Dumping 4690 MB:... ---- Fatal double fault rip = 0xffffffff8053b061 rsp = 0xffffff86cb1c7fb0 rbp = 0xffffff86cb1c8210 cpuid = 4; apic id = 04 panic: double fault cpuid = 4 KDB: stack backtrace: #0 0xffffffff803bb28e at kdb_backtrace+0x5e #1 0xffffffff80389187 at panic+0x187 #2 0xffffffff8057fc86 at dblfault_handler+0x96 #3 0xffffffff805689dd at Xdblfault+0xad Uptime: 1d13h41m19s Physical memory: 24555 MB Dumping 3626 MB:... And in case any of the changes to loader.conf or sysctl.conf are relevant here they are:- [loader.conf] zfs_load="YES" vfs.root.mountfrom="zfs:tank/root" # fix swap zone exhausted, increase kern.maxswzone kern.maxswzone=67108864 # Reduce the minimum arc level we want our apps to have the memory vfs.zfs.arc_min="512M" [/loader.conf] [sysctl.conf] vfs.read_max=32 net.inet.tcp.inflight.enable=0 net.inet.tcp.sendspace=65536 kern.ipc.maxsockbuf=524288 kern.maxfiles=50000 kern.ipc.nmbclusters=51200 [/sysctl.conf] Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-stable@FreeBSD.ORG Mon Aug 15 20:34:08 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BE087106566B for ; Mon, 15 Aug 2011 20:34:08 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 951C28FC0C for ; Mon, 15 Aug 2011 20:34:08 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 49C1E46B06; Mon, 15 Aug 2011 16:34:08 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id D8C718A02E; Mon, 15 Aug 2011 16:34:07 -0400 (EDT) From: John Baldwin To: freebsd-stable@freebsd.org Date: Mon, 15 Aug 2011 16:16:44 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110617; KDE/4.5.5; amd64; ; ) References: <4846F699-215D-4408-BD3C-4860305BF6B8@transactionware.com> In-Reply-To: <4846F699-215D-4408-BD3C-4860305BF6B8@transactionware.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="windows-1252" Content-Transfer-Encoding: quoted-printable Message-Id: <201108151616.44880.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Mon, 15 Aug 2011 16:34:07 -0400 (EDT) Cc: Jan Mikkelsen Subject: Re: Patch to puc(4) to support Moxa CP-112UL board X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Aug 2011 20:34:08 -0000 On Wednesday, August 10, 2011 7:55:18 pm Jan Mikkelsen wrote: > Hi, >=20 > I have added these device IDs to pucdata.c to support the Moxa CP-112UL b= oard family. >=20 > Should I submit a problem report, or is there an easier way to get the pa= tch merged? >=20 > (I care about 8-STABLE at the moment =85) >=20 > Thanks, >=20 > Jan Mikkelsen Committed to HEAD, will MFC in a week or so, thanks! =2D-=20 John Baldwin From owner-freebsd-stable@FreeBSD.ORG Tue Aug 16 00:36:47 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CDE1E106564A for ; Tue, 16 Aug 2011 00:36:47 +0000 (UTC) (envelope-from kob6558@gmail.com) Received: from mail-yi0-f54.google.com (mail-yi0-f54.google.com [209.85.218.54]) by mx1.freebsd.org (Postfix) with ESMTP id 8E5628FC12 for ; Tue, 16 Aug 2011 00:36:47 +0000 (UTC) Received: by yib19 with SMTP id 19so4017104yib.13 for ; Mon, 15 Aug 2011 17:36:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=x2ncZz/CexHzeXtiwuWIGJ7e0cJIcN82nV1g7Ii+7JI=; b=xLChC7QFPpCjLZnnjGoqaWX7/pbtZD+AqdUOyUGNUhLZDTFYvyT89hLuhd19MAOY2B ycAuBXwd2zPnZJztn+S06sGLBvapq8JhGXF1ax8IfuzvUQS7Zpjn3EPtQSsinh3CGs8b sOnNOc06eFJnFmnWSJoSDuBWjoF2aO6K5m3+g= MIME-Version: 1.0 Received: by 10.150.215.2 with SMTP id n2mr5388754ybg.152.1313455006792; Mon, 15 Aug 2011 17:36:46 -0700 (PDT) Received: by 10.151.98.3 with HTTP; Mon, 15 Aug 2011 17:36:46 -0700 (PDT) In-Reply-To: <4E49430D.10609@FreeBSD.org> References: <4E49430D.10609@FreeBSD.org> Date: Mon, 15 Aug 2011 17:36:46 -0700 Message-ID: From: Kevin Oberman To: "Andrey V. Elsukov" Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: "freebsd-stable@freebsd.org Stable" Subject: Re: GPT boot blocks, booting and booteasy X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2011 00:36:47 -0000 2011/8/15 Andrey V. Elsukov : > On 10.08.2011 07:12, Kevin Oberman wrote: >> I have /boot/pmbr loaded into the PMBR and gptboot into the >> freebsd-boot partition. I'll >> admit that I did this by rote and don't understand how these two files >> interact with the >> UEFI BIOS to get the loader started. I'm not really certain that I >> even need both. >> >> Is it possible to build a "custom" booteasy boot system with boot0cfg >> or some other tool >> so I can select d ifferent bootable partition or my other disk which >> is sliced in the traditional >> fashion? Can anyone point me to any information on how the boot >> process works with GPT? > > PMBR is a simple variant of MBR which does know enough to parse GPT > partition table and how to load bootcode from the "freebsd-boot" > partition. Then gptboot does search bootable UFS partition. > At this time we do not have any =A0bootcodes like booteasy for GPT. > But you can try to use bootme and bootonce GPT attributes (see > gpart(8)). Also you can use grub boot loader. Andrey, Thanks for the response. The 'bootme' and 'bootonce' attributes look to sol= ve some issues. Looks like I might need to have a bios-boot partition to use g= rub, but I may give it a shot. On the whole, the advantages of GPT are such that= I would love to see FreeBSD move to make it the standard partitioning scheme, though I understand this will not be easy until/unless Windows develops ful= l GPT support. Just having more than 4 partitions as opposed to having to sub-partition a = real partition (slice) is very nice. --=20 R. Kevin Oberman, Network Engineer - Retired E-mail: kob6558@gmail.com From owner-freebsd-stable@FreeBSD.ORG Tue Aug 16 06:44:38 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C3A87106566B for ; Tue, 16 Aug 2011 06:44:38 +0000 (UTC) (envelope-from yuri@rawbw.com) Received: from shell0.rawbw.com (shell0.rawbw.com [198.144.192.45]) by mx1.freebsd.org (Postfix) with ESMTP id 980028FC12 for ; Tue, 16 Aug 2011 06:44:38 +0000 (UTC) Received: from eagle.yuri.org (stunnel@localhost [127.0.0.1]) (authenticated bits=0) by shell0.rawbw.com (8.14.4/8.14.4) with ESMTP id p7G6LsCJ033597 for ; Mon, 15 Aug 2011 23:21:54 -0700 (PDT) (envelope-from yuri@rawbw.com) Message-ID: <4E4A0C81.7020501@rawbw.com> Date: Mon, 15 Aug 2011 23:21:53 -0700 From: Yuri User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110716 Thunderbird/5.0 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: How to use unrecognized COM port card? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2011 06:44:38 -0000 I have dual COM port pci card: none7@pci0:8:1:0: class=0x070002 card=0x32534348 chip=0x32534348 rev=0x10 hdr=0x00 class = simple comms subclass = UART bar [10] = type I/O Port, range 32, base 0xe880, size 8, enabled bar [14] = type I/O Port, range 32, base 0xe800, size 8, enabled Manufacturer 0x4348 isn't recognized by http://www.pcidatabase.com. It was purchased from China through ebay. How to make it to work in 8.2-STABLE? Yuri From owner-freebsd-stable@FreeBSD.ORG Tue Aug 16 07:48:15 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4F89E1065670 for ; Tue, 16 Aug 2011 07:48:15 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from anubis.delphij.net (anubis.delphij.net [IPv6:2001:470:1:117::25]) by mx1.freebsd.org (Postfix) with ESMTP id 3467A8FC15 for ; Tue, 16 Aug 2011 07:48:15 +0000 (UTC) Received: from delta.delphij.net (c-76-102-50-245.hsd1.ca.comcast.net [76.102.50.245]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by anubis.delphij.net (Postfix) with ESMTPSA id E7048139CA; Tue, 16 Aug 2011 00:48:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=delphij.net; s=anubis; t=1313480895; bh=pOgxblFWQSI2Vjlkv5hOrnrwYqBIdVy+9Yk02rdPmCk=; h=Message-ID:Date:From:Reply-To:MIME-Version:To:Subject:References: In-Reply-To:Content-Type; b=vKoVRUQ4SwuNuERXxzZ+mI9PbgCrPe+mW7oInWzvxqR8olnTdR8BREa6XWsYgBpsW X6AdkYbIXaov22RUfPIT6E26LT8Dv41TL1dwUy1S7gN5GHbVzkLjN6WBE2ony6G8Wl JGoMsEPsemMpwwDepX6HaNPaK3KF0m+RENqnm/Uw= Message-ID: <4E4A20BE.3060603@delphij.net> Date: Tue, 16 Aug 2011 00:48:14 -0700 From: Xin LI Organization: The FreeBSD Project MIME-Version: 1.0 To: freebsd-stable@freebsd.org References: <4E4A0C81.7020501@rawbw.com> In-Reply-To: <4E4A0C81.7020501@rawbw.com> OpenPGP: id=3FCA37C1; url=http://www.delphij.net/delphij.asc Content-Type: multipart/mixed; boundary="------------070402070505040801070209" Subject: Re: How to use unrecognized COM port card? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: d@delphij.net List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2011 07:48:15 -0000 This is a multi-part message in MIME format. --------------070402070505040801070209 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 08/15/11 23:21, Yuri wrote: > I have dual COM port pci card: none7@pci0:8:1:0: class=0x070002 > card=0x32534348 chip=0x32534348 rev=0x10 hdr=0x00 class = simple > comms subclass = UART bar [10] = type I/O Port, range 32, base > 0xe880, size 8, enabled bar [14] = type I/O Port, range 32, base > 0xe800, size 8, enabled > > Manufacturer 0x4348 isn't recognized by http://www.pcidatabase.com. > It was purchased from China through ebay. > > How to make it to work in 8.2-STABLE? A wild guess... (You gotta to provide more details rather than just PCI IDs). My guess is that it's using these chips: http://www.winchiphead.com/product/ch365detail.htm http://www.winchiphead.com/product/ch353detail.htm It didn't talked about possible cards' configuration so I used BAR0, which could be 0x14, 0x18, etc. Cheers, - -- Xin LI https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iQEcBAEBCAAGBQJOSiC9AAoJEATO+BI/yjfB5oAH/R0yt8Zx3HDVOXA5jUOXzlWl A+XCmbaau4MNhOtiyVJ8sWERE1CukgQeIE7DWze1rJ6YU7bTXKAgoRbqVJsfiAbH CEhLx+Y2T7HLow9ZojCGrqk6ydrGxheWIyf2AM7nTORZQdEUceEWGLE4GMXJghTp Y4udsGfSRqa+1O7tTOpechDi5jtG/cW+dDFeyZqVo0AjfS78D10wEqoiudloIkBd IAEyy7JGCU/R6AM+DhHHm0dIT68MkHxULOpTLy0GxxzJecWruknqd+h+V36Q3X+h brg2isOawCGLhWgzCDXVZXwJWIXA28RaRmDPeZRNv5TKUESmZEenR8lEpH7ji+s= =KUoE -----END PGP SIGNATURE----- --------------070402070505040801070209 Content-Type: text/plain; name="uart.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="uart.diff" Index: sys/dev/uart/uart_bus_pci.c =================================================================== --- sys/dev/uart/uart_bus_pci.c (revision 224900) +++ sys/dev/uart/uart_bus_pci.c (working copy) @@ -111,6 +111,7 @@ { 0x1415, 0x950b, 0xffff, 0, "Oxford Semiconductor OXCB950 Cardbus 16950 UART", 0x10, 16384000 }, { 0x151f, 0x0000, 0xffff, 0, "TOPIC Semiconductor TP560 56k modem", 0x10 }, +{ 0x4348, 0x3253, 0xffff, 0, "WinChipHead Dual Port RS-232", 0x10 }, { 0x9710, 0x9820, 0x1000, 1, "NetMos NM9820 Serial Port", 0x10 }, { 0x9710, 0x9835, 0x1000, 1, "NetMos NM9835 Serial Port", 0x10 }, { 0x9710, 0x9865, 0xa000, 0x1000, "NetMos NM9865 Serial Port", 0x10 }, --------------070402070505040801070209-- From owner-freebsd-stable@FreeBSD.ORG Tue Aug 16 09:01:44 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0242B106566C for ; Tue, 16 Aug 2011 09:01:44 +0000 (UTC) (envelope-from damian.jagosz@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id 86C0A8FC17 for ; Tue, 16 Aug 2011 09:01:43 +0000 (UTC) Received: by fxe4 with SMTP id 4so5287418fxe.13 for ; Tue, 16 Aug 2011 02:01:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=from:content-type:content-transfer-encoding:subject:date:message-id :to:mime-version:x-mailer; bh=47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=; b=k01WzM81K6rTnaCaf+3+/hjyujXWYvpvAQRCrr54pOo2Wn3OLg+ES4pb72SL/m5aey X6/pcUKS//+sSsl3mI120lzjvC64ygTq7nRcweUpBKC/yXkHmQDMlHKyAnonB5PVyb3A ayVlExkM+9Jp9XFcQnULOIp75G8cack71irUE= Received: by 10.223.61.79 with SMTP id s15mr5761025fah.117.1313483980590; Tue, 16 Aug 2011 01:39:40 -0700 (PDT) Received: from [192.168.10.197] ([31.134.59.96]) by mx.google.com with ESMTPS id f12sm1019690fai.25.2011.08.16.01.39.37 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 16 Aug 2011 01:39:39 -0700 (PDT) From: Damian Jagosz Content-Type: text/plain Content-Transfer-Encoding: 7bit Date: Tue, 16 Aug 2011 10:39:33 +0200 Message-Id: <101CCD1C-AAFB-4638-91B0-46D085C71B11@gmail.com> To: freebsd-stable@freebsd.org Mime-Version: 1.0 (Apple Message framework v1244.3) X-Mailer: Apple Mail (2.1244.3) Subject: off X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2011 09:01:44 -0000 From owner-freebsd-stable@FreeBSD.ORG Tue Aug 16 09:25:41 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E5C7D106566C for ; Tue, 16 Aug 2011 09:25:41 +0000 (UTC) (envelope-from yuri@rawbw.com) Received: from shell0.rawbw.com (shell0.rawbw.com [198.144.192.45]) by mx1.freebsd.org (Postfix) with ESMTP id CF8A88FC0C for ; Tue, 16 Aug 2011 09:25:41 +0000 (UTC) Received: from eagle.yuri.org (stunnel@localhost [127.0.0.1]) (authenticated bits=0) by shell0.rawbw.com (8.14.4/8.14.4) with ESMTP id p7G9PSsK061099; Tue, 16 Aug 2011 02:25:29 -0700 (PDT) (envelope-from yuri@rawbw.com) Message-ID: <4E4A3788.3030605@rawbw.com> Date: Tue, 16 Aug 2011 02:25:28 -0700 From: Yuri User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110716 Thunderbird/5.0 MIME-Version: 1.0 To: d@delphij.net References: <4E4A0C81.7020501@rawbw.com> <4E4A20BE.3060603@delphij.net> In-Reply-To: <4E4A20BE.3060603@delphij.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-stable@freebsd.org, Xin LI Subject: Re: How to use unrecognized COM port card? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2011 09:25:42 -0000 On 08/16/2011 00:48, Xin LI wrote: > A wild guess... (You gotta to provide more details rather than just PCI > IDs). > > My guess is that it's using these chips: > > http://www.winchiphead.com/product/ch365detail.htm > http://www.winchiphead.com/product/ch353detail.htm > > It didn't talked about possible cards' configuration so I used BAR0, > which could be 0x14, 0x18, etc. Actually, the main chip there is CH352L. Plus there are two more chips ST75185C, one per COM port. Your patch made this pci device to connect to uart driver: uart2@pci0:8:1:0. uart2: <16550 or compatible> port 0xe880-0xe887,0xe800-0xe807 irq 17 at device 1.0 on pci8 uart2: [FILTER] Also new devices showed up: /dev/cuau2 /dev/cuau2.init /dev/cuau2.lock /dev/ttyu2 /dev/ttyu2.init /dev/ttyu2.lock cuau2 is probably the same as COM port. I don't have an easy way to check now. I believe adding another entry with 0x14 would add the second COM port. Thank you! Yuri From owner-freebsd-stable@FreeBSD.ORG Tue Aug 16 15:57:22 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5F046106566C for ; Tue, 16 Aug 2011 15:57:22 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 32B7F8FC1F for ; Tue, 16 Aug 2011 15:57:22 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id CDB0246B06; Tue, 16 Aug 2011 11:57:21 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 6AC688A02E; Tue, 16 Aug 2011 11:57:21 -0400 (EDT) From: John Baldwin To: freebsd-stable@freebsd.org Date: Tue, 16 Aug 2011 11:57:20 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110617; KDE/4.5.5; amd64; ; ) References: <4E4A0C81.7020501@rawbw.com> <4E4A20BE.3060603@delphij.net> <4E4A3788.3030605@rawbw.com> In-Reply-To: <4E4A3788.3030605@rawbw.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201108161157.20890.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Tue, 16 Aug 2011 11:57:21 -0400 (EDT) Cc: Yuri , d@delphij.net, Xin LI Subject: Re: How to use unrecognized COM port card? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2011 15:57:22 -0000 On Tuesday, August 16, 2011 5:25:28 am Yuri wrote: > On 08/16/2011 00:48, Xin LI wrote: > > A wild guess... (You gotta to provide more details rather than just PCI > > IDs). > > > > My guess is that it's using these chips: > > > > http://www.winchiphead.com/product/ch365detail.htm > > http://www.winchiphead.com/product/ch353detail.htm > > > > It didn't talked about possible cards' configuration so I used BAR0, > > which could be 0x14, 0x18, etc. > > Actually, the main chip there is CH352L. Plus there are two more chips > ST75185C, one per COM port. > > Your patch made this pci device to connect to uart driver: uart2@pci0:8:1:0. > > uart2: <16550 or compatible> port 0xe880-0xe887,0xe800-0xe807 irq 17 at > device 1.0 on pci8 > uart2: [FILTER] > > Also new devices showed up: > /dev/cuau2 > /dev/cuau2.init > /dev/cuau2.lock > /dev/ttyu2 > /dev/ttyu2.init > /dev/ttyu2.lock > > cuau2 is probably the same as COM port. I don't have an easy way to > check now. > I believe adding another entry with 0x14 would add the second COM port. For multiport devices you will want to add an entry to sys/dev/puc/pucdata.c and use the puc driver instead of patching uart directly. Perhaps this: Index: pucdata.c =================================================================== --- pucdata.c (revision 224898) +++ pucdata.c (working copy) @@ -862,6 +862,13 @@ const struct puc_cfg puc_pci_devices[] = { .config_function = puc_config_syba }, + { + 0x4348, 0x3253, 0xffff, 0, + "WinChipHead Dual Port RS-232", + DEFAULT_RCLK, + PUC_PORT_2S, 0x10, 4, 0, + }, + { 0x6666, 0x0001, 0xffff, 0, "Decision Computer Inc, PCCOM 4-port serial", DEFAULT_RCLK, -- John Baldwin From owner-freebsd-stable@FreeBSD.ORG Tue Aug 16 19:36:31 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 85AA2106566B for ; Tue, 16 Aug 2011 19:36:31 +0000 (UTC) (envelope-from petros.fraser@gmail.com) Received: from mail-qy0-f175.google.com (mail-qy0-f175.google.com [209.85.216.175]) by mx1.freebsd.org (Postfix) with ESMTP id 4539E8FC08 for ; Tue, 16 Aug 2011 19:36:31 +0000 (UTC) Received: by mail-qy0-f175.google.com with SMTP id 4so1879692qyk.13 for ; Tue, 16 Aug 2011 12:36:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; bh=u5/OWYaX+GSU6cKF2Qw3nWe14s3kfC04axhGR4Y3kqU=; b=fH+EsLTmJWE3urrGfycIPprHzSKGeQ0DfN2WdsqfygZB7spbF2sWtH28dCSnI6LsB0 R0FJUfFtqMlxMJ4lrwvWb9sJs909Pwr0gxV51/1ZhHfibaP1fH2wxwJ/crDKcBYS3Wti opyRWP1j0VBlChQRfMY/tD6KlgrzCvMZUfPE8= MIME-Version: 1.0 Received: by 10.52.93.98 with SMTP id ct2mr94179vdb.314.1313521847351; Tue, 16 Aug 2011 12:10:47 -0700 (PDT) Received: by 10.52.184.225 with HTTP; Tue, 16 Aug 2011 12:10:47 -0700 (PDT) Date: Tue, 16 Aug 2011 14:10:47 -0500 Message-ID: From: Peter Fraser To: freebsd-stable@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Subject: Upgrade to 7.4 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2011 19:36:31 -0000 Hi All I just ran freebsd-update to upgrade from 7.0 to 7.4 I figured everything went ok. This is what I did. 1. freebsd-update upgrade -r 7.4-RELEASE 2. freebsd-update install 3. shutdown -r now 4. freebsd-update install 5. shutdown -r now The system came back up ok but now if I run another freebsd-update fetch, I get this error below config_IDSIgnorePaths: not found Error processing configuration file, line 26: ==> IDSIgnorePaths /usr/share/man/cat Is this an error I need to worry about? How can I correct this if so? From owner-freebsd-stable@FreeBSD.ORG Tue Aug 16 19:53:22 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A8C58106566B for ; Tue, 16 Aug 2011 19:53:22 +0000 (UTC) (envelope-from yuri@rawbw.com) Received: from shell0.rawbw.com (shell0.rawbw.com [198.144.192.45]) by mx1.freebsd.org (Postfix) with ESMTP id 70FA48FC14 for ; Tue, 16 Aug 2011 19:53:22 +0000 (UTC) Received: from eagle.yuri.org (stunnel@localhost [127.0.0.1]) (authenticated bits=0) by shell0.rawbw.com (8.14.4/8.14.4) with ESMTP id p7GJrHNH031330; Tue, 16 Aug 2011 12:53:17 -0700 (PDT) (envelope-from yuri@rawbw.com) Message-ID: <4E4ACAAD.3030506@rawbw.com> Date: Tue, 16 Aug 2011 12:53:17 -0700 From: Yuri User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110716 Thunderbird/5.0 MIME-Version: 1.0 To: John Baldwin References: <4E4A0C81.7020501@rawbw.com> <4E4A20BE.3060603@delphij.net> <4E4A3788.3030605@rawbw.com> <201108161157.20890.jhb@freebsd.org> In-Reply-To: <201108161157.20890.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: d@delphij.net, freebsd-stable@freebsd.org, Xin LI Subject: Re: How to use unrecognized COM port card? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2011 19:53:22 -0000 On 08/16/2011 08:57, John Baldwin wrote: > For multiport devices you will want to add an entry to sys/dev/puc/pucdata.c > and use the puc driver instead of patching uart directly. Perhaps this: John, I did what you suggested: puc0: port 0xe880-0xe887,0xe800-0xe807 irq 17 at device 1.0 on pci8 But it doesn't show up as a serial device and tty. Yuri From owner-freebsd-stable@FreeBSD.ORG Tue Aug 16 20:30:25 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C0308106567E for ; Tue, 16 Aug 2011 20:30:25 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 117778FC1A for ; Tue, 16 Aug 2011 20:30:24 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA10423; Tue, 16 Aug 2011 23:30:22 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1QtQHF-000Afe-SH; Tue, 16 Aug 2011 23:30:21 +0300 Message-ID: <4E4AD35C.7020504@FreeBSD.org> Date: Tue, 16 Aug 2011 23:30:20 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110706 Thunderbird/5.0 MIME-Version: 1.0 To: Steven Hartland References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org> <4E43E272.1060204@FreeBSD.org> <62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk> <4E440865.1040500@FreeBSD.org> <6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk> <4E441314.6060606@FreeBSD.org> <2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk> <4E48D967.9060804@FreeBSD.org> <9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk> <4E490DAF.1080009@FreeBSD.org> <796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk> <4E491D01.1090902@FreeBSD.org> <570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk> In-Reply-To: <570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk> X-Enigmail-Version: 1.2pre Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2011 20:30:25 -0000 on 15/08/2011 17:56 Steven Hartland said the following: > (kgdb) x/512a 0xffffff8d8f357210 [snip] Can you please also provide the following for this core? list *vm_map_growstack+93 list *lim_cur+17 list *lim_rlimit+18 Also, it would be interesting to get panic output with DDB option. -- Andriy Gapon From owner-freebsd-stable@FreeBSD.ORG Tue Aug 16 20:37:35 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C0C7C1065680; Tue, 16 Aug 2011 20:37:35 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from anubis.delphij.net (anubis.delphij.net [IPv6:2001:470:1:117::25]) by mx1.freebsd.org (Postfix) with ESMTP id A25818FC16; Tue, 16 Aug 2011 20:37:35 +0000 (UTC) Received: from delta.delphij.net (drawbridge.ixsystems.com [206.40.55.65]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by anubis.delphij.net (Postfix) with ESMTPSA id 69C4813EB2; Tue, 16 Aug 2011 13:37:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=delphij.net; s=anubis; t=1313527055; bh=ZlDdj2pqbMzUDJklO0LSOYV6MB7rQU57e/5RmMdoLM8=; h=Message-ID:Date:From:Reply-To:MIME-Version:To:CC:Subject: References:In-Reply-To:Content-Type; b=gY2aTO7Z65z8qWGrM3R7IAJ/mRb5HiXGdhXZevg8SvacTVMiVG8BCZKCx14m4bJLI yQQpwYIV7YjeUroaZ7aiQtzkV5xLeatB2OqS65smeFaqLEAUprhtLJwPh2ntfiLS5y dt8E2fVrWqqgIUCkjekV8gJT0fYDx9dbNQdUCN4g= Message-ID: <4E4AD50E.6050906@delphij.net> Date: Tue, 16 Aug 2011 13:37:34 -0700 From: Xin LI Organization: The FreeBSD Project MIME-Version: 1.0 To: Yuri References: <4E4A0C81.7020501@rawbw.com> <4E4A20BE.3060603@delphij.net> <4E4A3788.3030605@rawbw.com> <201108161157.20890.jhb@freebsd.org> <4E4ACAAD.3030506@rawbw.com> In-Reply-To: <4E4ACAAD.3030506@rawbw.com> OpenPGP: id=3FCA37C1; url=http://www.delphij.net/delphij.asc Content-Type: multipart/mixed; boundary="------------060306060307040906060005" Cc: d@delphij.net, freebsd-stable@freebsd.org, John Baldwin Subject: Re: How to use unrecognized COM port card? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: d@delphij.net List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2011 20:37:35 -0000 This is a multi-part message in MIME format. --------------060306060307040906060005 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 08/16/11 12:53, Yuri wrote: > On 08/16/2011 08:57, John Baldwin wrote: >> For multiport devices you will want to add an entry to >> sys/dev/puc/pucdata.c and use the puc driver instead of patching >> uart directly. Perhaps this: > > John, > > I did what you suggested: puc0: port > 0xe880-0xe887,0xe800-0xe807 irq 17 at device 1.0 on pci8 > > But it doesn't show up as a serial device and tty. I found a datasheet: http://wch-ic.com/download/down.asp?id=116 (English) and http://winchiphead.com/download/CH352/CH352DS1.PDF (Chinese) And I think John's patch is right, I've added a new PCI ID for it though, found from the datasheet. Did you have uart(4) in your kernel (remove my old patch)? Cheers, - -- Xin LI https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iQEcBAEBCAAGBQJOStUOAAoJEATO+BI/yjfBSw0IANPaoND+0Xa2QtueAxI8Qa42 V86MiUnaZopRb0coiWf8dQNk+nIlayVuFstC9+77zC9NEEu1O7Mp8T4n2Bx2N7WP jtsevUnLJq6lIyo0jYRTf4x84eYd1VDBduHqsWbI0B7aMArgfNtHvPV0qUD9Emrn 4yR6I3/tmO3sX3+cWcggYC4s3DIm7XidiyT/6lcWilsmy2QkQlw00HoAkoKl0V4m DBkKHkmOB2oTUYadpBOKCt6HvdI29xWYF+1zN/sE0B3XwTy+Q1pp4Uq5KiBUyJi3 tNF533Z7COh/mog/Z9cpGpLSRJpWQgI2uCY7gAHZRAMT2+7k1AqkdNPWTJPXoCk= =CcI6 -----END PGP SIGNATURE----- --------------060306060307040906060005 Content-Type: text/plain; name="puc.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="puc.diff" Index: sys/dev/puc/pucdata.c =================================================================== --- sys/dev/puc/pucdata.c (revision 224912) +++ sys/dev/puc/pucdata.c (working copy) @@ -862,6 +862,20 @@ const struct puc_cfg puc_pci_devices[] = { .config_function = puc_config_syba }, + { + 0x4348, 0x3253, 0x4348, 0x3253, + "WinChipHead Dual Port RS-232", + DEFAULT_RCLK, + PUC_PORT_2S, 0x10, 4, 0, + }, + + { + 0x4348, 0x5053, 0x4348, 0x5053, + "WinChipHead RS-232 and Printer port", + DEFAULT_RCLK, + PUC_PORT_1S1P, 0x10, 4, 0, + }, + { 0x6666, 0x0001, 0xffff, 0, "Decision Computer Inc, PCCOM 4-port serial", DEFAULT_RCLK, --------------060306060307040906060005-- From owner-freebsd-stable@FreeBSD.ORG Tue Aug 16 20:54:33 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BDC5B106566C; Tue, 16 Aug 2011 20:54:33 +0000 (UTC) (envelope-from prvs=1209a97202=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 0B4998FC13; Tue, 16 Aug 2011 20:54:32 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Tue, 16 Aug 2011 21:42:41 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Tue, 16 Aug 2011 21:42:40 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014625368.msg; Tue, 16 Aug 2011 21:42:40 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1209a97202=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <6A7238AED44542A880B082A40304D940@multiplay.co.uk> From: "Steven Hartland" To: "Andriy Gapon" References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org> <4E43E272.1060204@FreeBSD.org> <62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk> <4E440865.1040500@FreeBSD.org> <6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk> <4E441314.6060606@FreeBSD.org> <2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk> <4E48D967.9060804@FreeBSD.org> <9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk> <4E490DAF.1080009@FreeBSD.org> <796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk> <4E491D01.1090902@FreeBSD.org> <570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk> <4E4AD35C.7020504@FreeBSD.org> Date: Tue, 16 Aug 2011 21:43:21 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2011 20:54:33 -0000 ----- Original Message ----- From: "Andriy Gapon" To: "Steven Hartland" Cc: Sent: Tuesday, August 16, 2011 9:30 PM Subject: Re: debugging frequent kernel panics on 8.2-RELEASE > on 15/08/2011 17:56 Steven Hartland said the following: >> (kgdb) x/512a 0xffffff8d8f357210 > [snip] > > Can you please also provide the following for this core? > list *vm_map_growstack+93 > list *lim_cur+17 > list *lim_rlimit+18 > > Also, it would be interesting to get panic output with DDB option. Here's the info:- (kgdb) list *vm_map_growstack+93 0xffffffff80543ffd is in vm_map_growstack (/usr/src/sys/vm/vm_map.c:3305). 3300 struct uidinfo *uip; 3301 3302 Retry: 3303 PROC_LOCK(p); 3304 stacklim = lim_cur(p, RLIMIT_STACK); 3305 vmemlim = lim_cur(p, RLIMIT_VMEM); 3306 PROC_UNLOCK(p); 3307 3308 vm_map_lock_read(map); 3309 (kgdb) list *lim_cur+17 0xffffffff80384681 is in lim_cur (/usr/src/sys/kern/kern_resource.c:1150). 1145 rlim_t 1146 lim_cur(struct proc *p, int which) 1147 { 1148 struct rlimit rl; 1149 1150 lim_rlimit(p, which, &rl); 1151 return (rl.rlim_cur); 1152 } 1153 1154 /* (kgdb) list *lim_rlimit+18 0xffffffff80384632 is in lim_rlimit (/usr/src/sys/kern/kern_resource.c:1165). 1160 { 1161 1162 PROC_LOCK_ASSERT(p, MA_OWNED); 1163 KASSERT(which >= 0 && which < RLIM_NLIMITS, 1164 ("request for invalid resource limit")); 1165 *rlp = p->p_limit->pl_rlimit[which]; 1166 if (p->p_sysent->sv_fixlimit != NULL) 1167 p->p_sysent->sv_fixlimit(rlp, which); 1168 } 1169 I've yet to have the machine with DDB + expanded stack panic. I plan to leave it a day or so more then try a reboot to see if that triggers it. If not I'll drop the stack back down to 4 and see if that enables us to get another panic. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-stable@FreeBSD.ORG Tue Aug 16 20:57:05 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 185D1106564A for ; Tue, 16 Aug 2011 20:57:05 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id E30978FC08 for ; Tue, 16 Aug 2011 20:57:04 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 90ED446B09; Tue, 16 Aug 2011 16:57:04 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 0F98A8A02F; Tue, 16 Aug 2011 16:57:04 -0400 (EDT) From: John Baldwin To: freebsd-stable@freebsd.org Date: Tue, 16 Aug 2011 16:57:03 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110617; KDE/4.5.5; amd64; ; ) References: <4E4A0C81.7020501@rawbw.com> <201108161157.20890.jhb@freebsd.org> <4E4ACAAD.3030506@rawbw.com> In-Reply-To: <4E4ACAAD.3030506@rawbw.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201108161657.03574.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Tue, 16 Aug 2011 16:57:04 -0400 (EDT) Cc: Yuri , d@delphij.net, Xin LI Subject: Re: How to use unrecognized COM port card? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2011 20:57:05 -0000 On Tuesday, August 16, 2011 3:53:17 pm Yuri wrote: > On 08/16/2011 08:57, John Baldwin wrote: > > For multiport devices you will want to add an entry to sys/dev/puc/pucdata.c > > and use the puc driver instead of patching uart directly. Perhaps this: > > John, > > I did what you suggested: > puc0: port 0xe880-0xe887,0xe800-0xe807 > irq 17 at device 1.0 on pci8 > > But it doesn't show up as a serial device and tty. Hmmm, can you get devinfo -v output? Specifically there should be two children of puc0 and they should have extra data specifying what type of port each child device is. -- John Baldwin From owner-freebsd-stable@FreeBSD.ORG Tue Aug 16 20:57:30 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B5C731065701; Tue, 16 Aug 2011 20:57:30 +0000 (UTC) (envelope-from yuri@rawbw.com) Received: from shell0.rawbw.com (shell0.rawbw.com [198.144.192.45]) by mx1.freebsd.org (Postfix) with ESMTP id 9D2DA8FC14; Tue, 16 Aug 2011 20:57:30 +0000 (UTC) Received: from eagle.yuri.org (stunnel@localhost [127.0.0.1]) (authenticated bits=0) by shell0.rawbw.com (8.14.4/8.14.4) with ESMTP id p7GKvQAb081091; Tue, 16 Aug 2011 13:57:26 -0700 (PDT) (envelope-from yuri@rawbw.com) Message-ID: <4E4AD9B6.2030001@rawbw.com> Date: Tue, 16 Aug 2011 13:57:26 -0700 From: Yuri User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110716 Thunderbird/5.0 MIME-Version: 1.0 To: d@delphij.net References: <4E4A0C81.7020501@rawbw.com> <4E4A20BE.3060603@delphij.net> <4E4A3788.3030605@rawbw.com> <201108161157.20890.jhb@freebsd.org> <4E4ACAAD.3030506@rawbw.com> <4E4AD50E.6050906@delphij.net> In-Reply-To: <4E4AD50E.6050906@delphij.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-stable@freebsd.org, Xin LI , John Baldwin Subject: Re: How to use unrecognized COM port card? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2011 20:57:30 -0000 On 08/16/2011 13:37, Xin LI wrote: > And I think John's patch is right, I've added a new PCI ID for it > though, found from the datasheet. Did you have uart(4) in your kernel > (remove my old patch)? Yes, uart(4) is in kernel and puc(4) is the loaded module. I think this might be a problem that puc(4) is a module loaded later and that's why serial device isn't registered. I found the reference to the similar situation with some other card that got cured when puc(4) was compiled into kernel. (http://www.adras.com/Quadtech-DSC-100-PCI-dual-serial-port-on-8-0R-i386.t6999-79.html) I have yet to try building puc(4) into kernel, but the way how I have it now is the default in GENERIC. Should uart(4) instead be removed from kernel and made loadable too to prevent such initialization order issue? Or what would be the right fix? Have too much stuff in kernel isn't right too. uart probably isn't used by 99% of users. Yuri From owner-freebsd-stable@FreeBSD.ORG Tue Aug 16 20:59:46 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 63D4F1065672; Tue, 16 Aug 2011 20:59:46 +0000 (UTC) (envelope-from yuri@rawbw.com) Received: from shell0.rawbw.com (shell0.rawbw.com [198.144.192.45]) by mx1.freebsd.org (Postfix) with ESMTP id 4E8578FC08; Tue, 16 Aug 2011 20:59:46 +0000 (UTC) Received: from eagle.yuri.org (stunnel@localhost [127.0.0.1]) (authenticated bits=0) by shell0.rawbw.com (8.14.4/8.14.4) with ESMTP id p7GKxhck081442; Tue, 16 Aug 2011 13:59:43 -0700 (PDT) (envelope-from yuri@rawbw.com) Message-ID: <4E4ADA3E.1070309@rawbw.com> Date: Tue, 16 Aug 2011 13:59:42 -0700 From: Yuri User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110716 Thunderbird/5.0 MIME-Version: 1.0 To: John Baldwin References: <4E4A0C81.7020501@rawbw.com> <201108161157.20890.jhb@freebsd.org> <4E4ACAAD.3030506@rawbw.com> <201108161657.03574.jhb@freebsd.org> In-Reply-To: <201108161657.03574.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: d@delphij.net, freebsd-stable@freebsd.org, Xin LI Subject: Re: How to use unrecognized COM port card? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2011 20:59:46 -0000 On 08/16/2011 13:57, John Baldwin wrote: > Hmmm, can you get devinfo -v output? Specifically there should be two > children of puc0 and they should have extra data specifying what type of port > each child device is. > Here is the only reference to puc0 in devinfo -v output: pcib8 pnpinfo vendor=0x8086 device=0x244e subvendor=0x1043 subdevice=0x82d4 class=0x060401 at slot=30 function=0 handle=\_SB_.PCI0.P0P1 pci8 pcm0 pnpinfo vendor=0x1274 device=0x5000 subvendor=0x4942 subdevice=0x4c4c class=0x040100 at slot=0 function=0 puc0 pnpinfo vendor=0x4348 device=0x3253 subvendor=0x4348 subdevice=0x3253 class=0x070002 at slot=1 function=0 isab0 pnpinfo vendor=0x8086 device=0x3a16 subvendor=0x1043 subdevice=0x82d4 class=0x060100 at slot=31 function=0 handle=\_SB_.PCI0.SBRG isa0 orm0 Yuri From owner-freebsd-stable@FreeBSD.ORG Tue Aug 16 23:08:17 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 233281065670 for ; Tue, 16 Aug 2011 23:08:17 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id EB0088FC12 for ; Tue, 16 Aug 2011 23:08:16 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 911A046B3B; Tue, 16 Aug 2011 19:08:16 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 2B3688A02E; Tue, 16 Aug 2011 19:08:16 -0400 (EDT) From: John Baldwin To: Yuri Date: Tue, 16 Aug 2011 19:03:43 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110617; KDE/4.5.5; amd64; ; ) References: <4E4A0C81.7020501@rawbw.com> <201108161657.03574.jhb@freebsd.org> <4E4ADA3E.1070309@rawbw.com> In-Reply-To: <4E4ADA3E.1070309@rawbw.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201108161903.43881.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Tue, 16 Aug 2011 19:08:16 -0400 (EDT) Cc: d@delphij.net, freebsd-stable@freebsd.org, Xin LI Subject: Re: How to use unrecognized COM port card? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2011 23:08:17 -0000 On Tuesday, August 16, 2011 4:59:42 pm Yuri wrote: > On 08/16/2011 13:57, John Baldwin wrote: > > Hmmm, can you get devinfo -v output? Specifically there should be two > > children of puc0 and they should have extra data specifying what type of port > > each child device is. > > > > Here is the only reference to puc0 in devinfo -v output: > > pcib8 pnpinfo vendor=0x8086 device=0x244e subvendor=0x1043 > subdevice=0x82d4 class=0x060401 at slot=30 function=0 handle=\_SB_.PCI0.P0P1 > pci8 > pcm0 pnpinfo vendor=0x1274 device=0x5000 subvendor=0x4942 > subdevice=0x4c4c class=0x040100 at slot=0 function=0 > puc0 pnpinfo vendor=0x4348 device=0x3253 subvendor=0x4348 > subdevice=0x3253 class=0x070002 at slot=1 function=0 > isab0 pnpinfo vendor=0x8086 device=0x3a16 subvendor=0x1043 > subdevice=0x82d4 class=0x060100 at slot=31 function=0 handle=\_SB_.PCI0.SBRG > isa0 > orm0 > Ugh, the dumb driver deletes ports if they don't probe which is rediculous thing for it to do. -- John Baldwin From owner-freebsd-stable@FreeBSD.ORG Tue Aug 16 23:08:17 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 58A4B1065672; Tue, 16 Aug 2011 23:08:17 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 2EA2B8FC13; Tue, 16 Aug 2011 23:08:17 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id D087D46B43; Tue, 16 Aug 2011 19:08:16 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 6AFF28A02F; Tue, 16 Aug 2011 19:08:16 -0400 (EDT) From: John Baldwin To: Yuri Date: Tue, 16 Aug 2011 19:08:15 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110617; KDE/4.5.5; amd64; ; ) References: <4E4A0C81.7020501@rawbw.com> <4E4AD50E.6050906@delphij.net> <4E4AD9B6.2030001@rawbw.com> In-Reply-To: <4E4AD9B6.2030001@rawbw.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201108161908.15840.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Tue, 16 Aug 2011 19:08:16 -0400 (EDT) Cc: Xin LI , Marcel Moolenaar , d@delphij.net, freebsd-stable@freebsd.org Subject: Re: How to use unrecognized COM port card? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2011 23:08:17 -0000 On Tuesday, August 16, 2011 4:57:26 pm Yuri wrote: > On 08/16/2011 13:37, Xin LI wrote: > > And I think John's patch is right, I've added a new PCI ID for it > > though, found from the datasheet. Did you have uart(4) in your kernel > > (remove my old patch)? > > Yes, uart(4) is in kernel and puc(4) is the loaded module. I think this > might be a problem that puc(4) is a module loaded later and that's why > serial device isn't registered. I found the reference to the similar > situation with some other card that got cured when puc(4) was compiled > into kernel. > (http://www.adras.com/Quadtech-DSC-100-PCI-dual-serial-port-on-8-0R- i386.t6999-79.html) > > I have yet to try building puc(4) into kernel, but the way how I have it > now is the default in GENERIC. Should uart(4) instead be removed from > kernel and made loadable too to prevent such initialization order issue? > Or what would be the right fix? Have too much stuff in kernel isn't > right too. uart probably isn't used by 99% of users. Err, uart is in _lots_ of machines (just about every rack-mounted x86 server I've ever used). The real bug here is the uart driver and the way it is compiled into the kernel. It should just always include the 'puc' attachment I believe, or do so if any of the busses supported by 'puc' are compiled in. The puc attachment for uart is really tiny, and KOBJ is used in new-bus specifically so that attachments don't require the full bus driver to be present. Something like this: Index: files =================================================================== --- files (revision 224879) +++ files (working copy) @@ -1842,7 +1842,7 @@ dev/uart/uart_bus_fdt.c optional uart fdt dev/uart/uart_bus_isa.c optional uart isa dev/uart/uart_bus_pccard.c optional uart pccard dev/uart/uart_bus_pci.c optional uart pci -dev/uart/uart_bus_puc.c optional uart puc +dev/uart/uart_bus_puc.c optional uart puc | uart pccard | uart pci dev/uart/uart_bus_scc.c optional uart scc dev/uart/uart_core.c optional uart dev/uart/uart_dbg.c optional uart gdb -- John Baldwin From owner-freebsd-stable@FreeBSD.ORG Wed Aug 17 01:24:11 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C9965106566C for ; Wed, 17 Aug 2011 01:24:11 +0000 (UTC) (envelope-from janm@transactionware.com) Received: from midgard.transactionware.com (mail2.transactionware.com [203.14.245.36]) by mx1.freebsd.org (Postfix) with SMTP id 3173F8FC13 for ; Wed, 17 Aug 2011 01:24:10 +0000 (UTC) Received: (qmail 65962 invoked by uid 907); 17 Aug 2011 01:24:09 -0000 Received: from jmmacpro.transactionware.com (HELO jmmacpro.transactionware.com) (192.168.1.33) by midgard.transactionware.com (qpsmtpd/0.82) with ESMTP; Wed, 17 Aug 2011 11:24:09 +1000 Mime-Version: 1.0 (Apple Message framework v1244.3) Content-Type: text/plain; charset=iso-8859-1 From: Jan Mikkelsen In-Reply-To: <4E4AD9B6.2030001@rawbw.com> Date: Wed, 17 Aug 2011 11:24:09 +1000 Content-Transfer-Encoding: quoted-printable Message-Id: <16D60EA7-85C7-486D-A722-50299407DC69@transactionware.com> References: <4E4A0C81.7020501@rawbw.com> <4E4A20BE.3060603@delphij.net> <4E4A3788.3030605@rawbw.com> <201108161157.20890.jhb@freebsd.org> <4E4ACAAD.3030506@rawbw.com> <4E4AD50E.6050906@delphij.net> <4E4AD9B6.2030001@rawbw.com> To: Yuri X-Mailer: Apple Mail (2.1244.3) Cc: freebsd-stable@freebsd.org, d@delphij.net, John Baldwin , Xin LI Subject: Re: How to use unrecognized COM port card? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2011 01:24:11 -0000 On 17/08/2011, at 6:57 AM, Yuri wrote: > On 08/16/2011 13:37, Xin LI wrote: >> And I think John's patch is right, I've added a new PCI ID for it >> though, found from the datasheet. Did you have uart(4) in your = kernel >> (remove my old patch)? >=20 > Yes, uart(4) is in kernel and puc(4) is the loaded module. I think = this might be a problem that puc(4) is a module loaded later and that's = why serial device isn't registered. I found the reference to the similar = situation with some other card that got cured when puc(4) was compiled = into kernel. = (http://www.adras.com/Quadtech-DSC-100-PCI-dual-serial-port-on-8-0R-i386.t= 6999-79.html) >=20 > I have yet to try building puc(4) into kernel, but the way how I have = it now is the default in GENERIC. Should uart(4) instead be removed from = kernel and made loadable too to prevent such initialization order issue? = Or what would be the right fix? Have too much stuff in kernel isn't = right too. uart probably isn't used by 99% of users. For my recent Moxa 2 port serial card addition, I had to include puc in = the kernel config; it didn't work as a module. Jan. From owner-freebsd-stable@FreeBSD.ORG Wed Aug 17 11:12:35 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7FCE71065678 for ; Wed, 17 Aug 2011 11:12:35 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id ABCBF8FC0C for ; Wed, 17 Aug 2011 11:12:34 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA22120; Wed, 17 Aug 2011 14:12:31 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4E4BA21F.6010805@FreeBSD.org> Date: Wed, 17 Aug 2011 14:12:31 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110705 Thunderbird/5.0 MIME-Version: 1.0 To: Steven Hartland References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org> <4E43E272.1060204@FreeBSD.org> <62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk> <4E440865.1040500@FreeBSD.org> <6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk> <4E441314.6060606@FreeBSD.org> <2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk> <4E48D967.9060804@FreeBSD.org> <9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk> <4E490DAF.1080009@FreeBSD.org> <796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk> <4E491D01.1090902@FreeBSD.org> <570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk> <4E4AD35C.7020504@FreeBSD.org> <6A7238AED44542A880B082A40304D940@multiplay.co.uk> In-Reply-To: <6A7238AED44542A880B082A40304D940@multiplay.co.uk> X-Enigmail-Version: 1.2pre Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2011 11:12:36 -0000 on 16/08/2011 23:43 Steven Hartland said the following: > > ----- Original Message ----- From: "Andriy Gapon" > To: "Steven Hartland" > Cc: > Sent: Tuesday, August 16, 2011 9:30 PM > Subject: Re: debugging frequent kernel panics on 8.2-RELEASE > > >> on 15/08/2011 17:56 Steven Hartland said the following: >>> (kgdb) x/512a 0xffffff8d8f357210 >> [snip] >> >> Can you please also provide the following for this core? >> list *vm_map_growstack+93 >> list *lim_cur+17 >> list *lim_rlimit+18 >> >> Also, it would be interesting to get panic output with DDB option. > > Here's the info:- > > (kgdb) list *vm_map_growstack+93 > 0xffffffff80543ffd is in vm_map_growstack (/usr/src/sys/vm/vm_map.c:3305). > 3300 struct uidinfo *uip; > 3301 > 3302 Retry: > 3303 PROC_LOCK(p); > 3304 stacklim = lim_cur(p, RLIMIT_STACK); > 3305 vmemlim = lim_cur(p, RLIMIT_VMEM); > 3306 PROC_UNLOCK(p); > 3307 > 3308 vm_map_lock_read(map); > 3309 > (kgdb) list *lim_cur+17 > 0xffffffff80384681 is in lim_cur (/usr/src/sys/kern/kern_resource.c:1150). > 1145 rlim_t > 1146 lim_cur(struct proc *p, int which) > 1147 { > 1148 struct rlimit rl; > 1149 > 1150 lim_rlimit(p, which, &rl); > 1151 return (rl.rlim_cur); > 1152 } > 1153 > 1154 /* > (kgdb) list *lim_rlimit+18 > 0xffffffff80384632 is in lim_rlimit (/usr/src/sys/kern/kern_resource.c:1165). > 1160 { > 1161 > 1162 PROC_LOCK_ASSERT(p, MA_OWNED); > 1163 KASSERT(which >= 0 && which < RLIM_NLIMITS, > 1164 ("request for invalid resource limit")); > 1165 *rlp = p->p_limit->pl_rlimit[which]; > 1166 if (p->p_sysent->sv_fixlimit != NULL) > 1167 p->p_sysent->sv_fixlimit(rlp, which); > 1168 } > 1169 > > I've yet to have the machine with DDB + expanded stack panic. > > I plan to leave it a day or so more then try a reboot to see if that > triggers it. If not I'll drop the stack back down to 4 and see if that > enables us to get another panic. OK, thank you for continuing to debug this! Another request: could you please execute the following commands in kgdb on the above core file? define allpcpu set $i = 0 while ($i <= mp_maxid) p *cpuid_to_pcpu[$i] set $i = $i + 1 end end allpcpu A little bit later I will send you another patch that, I hope, will produce better diagnostics for this crash (without DDB in kernel). -- Andriy Gapon From owner-freebsd-stable@FreeBSD.ORG Wed Aug 17 11:26:55 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 32EB1106564A for ; Wed, 17 Aug 2011 11:26:55 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 75DBD8FC0A for ; Wed, 17 Aug 2011 11:26:54 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA22369; Wed, 17 Aug 2011 14:26:51 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4E4BA57B.6050407@FreeBSD.org> Date: Wed, 17 Aug 2011 14:26:51 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110705 Thunderbird/5.0 MIME-Version: 1.0 To: Steven Hartland References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org> <4E43E272.1060204@FreeBSD.org> <62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk> <4E440865.1040500@FreeBSD.org> <6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk> <4E441314.6060606@FreeBSD.org> <2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk> <4E48D967.9060804@FreeBSD.org> <9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk> <4E490DAF.1080009@FreeBSD.org> <796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk> <4E491D01.1090902@FreeBSD.org> <570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk> <4E4AD35C.7020504@FreeBSD.org> <6A7238AED44542A880B082A40304D940@multiplay.co.uk> <4E4BA21F.6010805@FreeBSD.org> In-Reply-To: <4E4BA21F.6010805@FreeBSD.org> X-Enigmail-Version: 1.2pre Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2011 11:26:55 -0000 on 17/08/2011 14:12 Andriy Gapon said the following: > A little bit later I will send you another patch that, I hope, will produce better > diagnostics for this crash (without DDB in kernel). The patch: Index: sys/amd64/amd64/trap.c =================================================================== --- sys/amd64/amd64/trap.c (revision 224782) +++ sys/amd64/amd64/trap.c (working copy) @@ -198,6 +198,10 @@ PCPU_INC(cnt.v_trap); type = frame->tf_trapno; + if ((uintptr_t)frame->tf_rip >= (uintptr_t)&lim_rlimit + && (uintptr_t)frame->tf_rip < (uintptr_t)&lim_rlimit + 40) + panic("trap in lim_rlimit"); + #ifdef SMP /* Handler for NMI IPIs used for stopping CPUs. */ if (type == T_NMI) { -- Andriy Gapon From owner-freebsd-stable@FreeBSD.ORG Wed Aug 17 12:26:10 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 752571065678; Wed, 17 Aug 2011 12:26:10 +0000 (UTC) (envelope-from prvs=1210f20b9f=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 8D7208FC2A; Wed, 17 Aug 2011 12:26:08 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Wed, 17 Aug 2011 13:14:29 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Wed, 17 Aug 2011 13:14:29 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014633673.msg; Wed, 17 Aug 2011 13:14:27 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1210f20b9f=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk> From: "Steven Hartland" To: "Andriy Gapon" References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org> <4E43E272.1060204@FreeBSD.org> <62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk> <4E440865.1040500@FreeBSD.org> <6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk> <4E441314.6060606@FreeBSD.org> <2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk> <4E48D967.9060804@FreeBSD.org> <9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk> <4E490DAF.1080009@FreeBSD.org> <796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk> <4E491D01.1090902@FreeBSD.org> <570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk> <4E4AD35C.7020504@FreeBSD.org> <6A7238AED44542A880B082A40304D940@multiplay.co.uk> <4E4BA21F.6010805@FreeBSD.org> Date: Wed, 17 Aug 2011 13:15:04 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2011 12:26:10 -0000 ----- Original Message ----- From: "Andriy Gapon" To: "Steven Hartland" Cc: Sent: Wednesday, August 17, 2011 12:12 PM Subject: Re: debugging frequent kernel panics on 8.2-RELEASE > on 16/08/2011 23:43 Steven Hartland said the following: >> >> ----- Original Message ----- From: "Andriy Gapon" >> To: "Steven Hartland" >> Cc: >> Sent: Tuesday, August 16, 2011 9:30 PM >> Subject: Re: debugging frequent kernel panics on 8.2-RELEASE >> >> >>> on 15/08/2011 17:56 Steven Hartland said the following: >>>> (kgdb) x/512a 0xffffff8d8f357210 >>> [snip] >>> >>> Can you please also provide the following for this core? >>> list *vm_map_growstack+93 >>> list *lim_cur+17 >>> list *lim_rlimit+18 >>> >>> Also, it would be interesting to get panic output with DDB option. >> >> Here's the info:- >> >> (kgdb) list *vm_map_growstack+93 >> 0xffffffff80543ffd is in vm_map_growstack (/usr/src/sys/vm/vm_map.c:3305). >> 3300 struct uidinfo *uip; >> 3301 >> 3302 Retry: >> 3303 PROC_LOCK(p); >> 3304 stacklim = lim_cur(p, RLIMIT_STACK); >> 3305 vmemlim = lim_cur(p, RLIMIT_VMEM); >> 3306 PROC_UNLOCK(p); >> 3307 >> 3308 vm_map_lock_read(map); >> 3309 >> (kgdb) list *lim_cur+17 >> 0xffffffff80384681 is in lim_cur (/usr/src/sys/kern/kern_resource.c:1150). >> 1145 rlim_t >> 1146 lim_cur(struct proc *p, int which) >> 1147 { >> 1148 struct rlimit rl; >> 1149 >> 1150 lim_rlimit(p, which, &rl); >> 1151 return (rl.rlim_cur); >> 1152 } >> 1153 >> 1154 /* >> (kgdb) list *lim_rlimit+18 >> 0xffffffff80384632 is in lim_rlimit (/usr/src/sys/kern/kern_resource.c:1165). >> 1160 { >> 1161 >> 1162 PROC_LOCK_ASSERT(p, MA_OWNED); >> 1163 KASSERT(which >= 0 && which < RLIM_NLIMITS, >> 1164 ("request for invalid resource limit")); >> 1165 *rlp = p->p_limit->pl_rlimit[which]; >> 1166 if (p->p_sysent->sv_fixlimit != NULL) >> 1167 p->p_sysent->sv_fixlimit(rlp, which); >> 1168 } >> 1169 >> >> I've yet to have the machine with DDB + expanded stack panic. >> >> I plan to leave it a day or so more then try a reboot to see if that >> triggers it. If not I'll drop the stack back down to 4 and see if that >> enables us to get another panic. > > OK, thank you for continuing to debug this! No thank you for the help :) > Another request: could you please execute the following commands in kgdb on the > above core file? > > define allpcpu > set $i = 0 > while ($i <= mp_maxid) > p *cpuid_to_pcpu[$i] > set $i = $i + 1 > end > end > allpcpu Here's the output. $1 = {pc_curthread = 0xffffff0012d708c0, pc_idlethread = 0xffffff0012d838c0, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb = 0xffffff8000149d00, pc_switchtime = 564139965450231, pc_switchticks = 247796551, pc_cpuid = 0, pc_cpumask = 1, pc_other_cpus = 16777214, pc_allcpu = {sle_next = 0x0}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 1246344506, v_trap = 121031682, v_syscall = 2590785278, v_intr = 866415, v_soft = 174249227, v_vm_faults = 24640099, v_cow_faults = 2606934, v_cow_optim = 678, v_zfod = 19177479, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 24007, v_vnodeout = 41, v_vnodepgsin = 24007, v_vnodepgsout = 322, v_intrans = 7300, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 25056637, v_page_size = 0, v_page_count = 0, v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 35906, v_vforks = 21218, v_rforks = 0, v_kthreads = 20, v_forkpages = 9357854, v_vforkpages = 4445028, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {9035196, 1438, 426481, 1091491, 22402335}, pc_device = 0xffffff0012da2700, pc_netisr = 0xffffff0012cfe500, pc_rm_queue = {rmq_next = 0xffffffff808af550, rmq_prev = 0xffffffff808af550}, pc_dynamic = 3737856, pc_monitorbuf = '\0' , pc_prvspace = 0xffffffff808af400, pc_curpmap = 0xffffff0012d74ef8, pc_tssp = 0xffffffff808ae700, pc_commontssp = 0xffffffff808ae700, pc_rsp0 = -549754462976, pc_scratch_rsp = 140737488348968, pc_apic_id = 0, pc_acpi_id = 1, pc_fs32p = 0xffffffff808ad530, pc_gs32p = 0xffffffff808ad538, pc_ldt = 0xffffffff808ad578, pc_tss = 0xffffffff808ad568, pc_cmci_mask = 364} $2 = {pc_curthread = 0xffffff0012d85000, pc_idlethread = 0xffffff0012d85000, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb = 0xffffff80001bcd00, pc_switchtime = 564139964769035, pc_switchticks = 247796551, pc_cpuid = 1, pc_cpumask = 2, pc_other_cpus = 16777213, pc_allcpu = {sle_next = 0xffffffff808af400}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 457697994, v_trap = 61700571, v_syscall = 670428238, v_intr = 298981, v_soft = 58852682, v_vm_faults = 7228810, v_cow_faults = 442573, v_cow_optim = 116, v_zfod = 6082240, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 5151, v_vnodeout = 50, v_vnodepgsin = 5151, v_vnodepgsout = 397, v_intrans = 5575, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 8282005, v_page_size = 0, v_page_count = 0, v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 10015, v_vforks = 11459, v_rforks = 0, v_kthreads = 0, v_forkpages = 2626771, v_vforkpages = 2444076, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {8641747, 395, 532547, 157762, 23624411}, pc_device = 0xffffff0012da2600, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 0xffffffff808af7d0, rmq_prev = 0xffffffff808af7d0}, pc_dynamic = 18446743526093297920, pc_monitorbuf = '\0' , pc_prvspace = 0xffffffff808af680, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 0xffffffff808ae768, pc_commontssp = 0xffffffff808ae768, pc_rsp0 = -549753991936, pc_scratch_rsp = 140737488339432, pc_apic_id = 1, pc_acpi_id = 13, pc_fs32p = 0xffffffff808ad598, pc_gs32p = 0xffffffff808ad5a0, pc_ldt = 0xffffffff808ad5e0, pc_tss = 0xffffffff808ad5d0, pc_cmci_mask = 0} $3 = {pc_curthread = 0xffffff06b7f9c000, pc_idlethread = 0xffffff0012d85460, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb = 0xffffff8d8f35ad00, pc_switchtime = 564139963042291, pc_switchticks = 247796550, pc_cpuid = 2, pc_cpumask = 4, pc_other_cpus = 16777211, pc_allcpu = {sle_next = 0xffffffff808af680}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 1005391948, v_trap = 95927887, v_syscall = 2033274537, v_intr = 137253, v_soft = 151981308, v_vm_faults = 14199910, v_cow_faults = 1468132, v_cow_optim = 533, v_zfod = 11032593, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 17238, v_vnodeout = 48, v_vnodepgsin = 17238, v_vnodepgsout = 378, v_intrans = 6753, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 15435380, v_page_size = 0, v_page_count = 0, v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 24041, v_vforks = 16857, v_rforks = 0, v_kthreads = 0, v_forkpages = 6281292, v_vforkpages = 3606842, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {8629094, 693, 594838, 24425, 23707811}, pc_device = 0xffffff0012da2500, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 0xffffffff808afa50, rmq_prev = 0xffffffff808afa50}, pc_dynamic = 18446743526093326592, pc_monitorbuf = '\0' , pc_prvspace = 0xffffffff808af900, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 0xffffffff808ae7d0, pc_commontssp = 0xffffffff808ae7d0, pc_rsp0 = -491518579456, pc_scratch_rsp = 140737488347240, pc_apic_id = 2, pc_acpi_id = 2, pc_fs32p = 0xffffffff808ad600, pc_gs32p = 0xffffffff808ad608, pc_ldt = 0xffffffff808ad648, pc_tss = 0xffffffff808ad638, pc_cmci_mask = 8} $4 = {pc_curthread = 0xffffff0012d858c0, pc_idlethread = 0xffffff0012d858c0, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb = 0xffffff80001b2d00, pc_switchtime = 564139960864579, pc_switchticks = 247796549, pc_cpuid = 3, pc_cpumask = 8, pc_other_cpus = 16777207, pc_allcpu = {sle_next = 0xffffffff808af900}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 375825838, v_trap = 57311463, v_syscall = 571437816, v_intr = 126334, v_soft = 46300913, v_vm_faults = 6398769, v_cow_faults = 365115, v_cow_optim = 101, v_zfod = 5434860, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 6044, v_vnodeout = 16, v_vnodepgsin = 6044, v_vnodepgsout = 128, v_intrans = 5456, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 7824928, v_page_size = 0, v_page_count = 0, v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 8975, v_vforks = 11796, v_rforks = 0, v_kthreads = 0, v_forkpages = 2359166, v_vforkpages = 2604538, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {8374580, 378, 189751, 113208, 24278945}, pc_device = 0xffffff0012eee600, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 0xffffffff808afcd0, rmq_prev = 0xffffffff808afcd0}, pc_dynamic = 18446743526093355264, pc_monitorbuf = '\0' , pc_prvspace = 0xffffffff808afb80, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 0xffffffff808ae838, pc_commontssp = 0xffffffff808ae838, pc_rsp0 = -549754032896, pc_scratch_rsp = 140737488341768, pc_apic_id = 3, pc_acpi_id = 14, pc_fs32p = 0xffffffff808ad668, pc_gs32p = 0xffffffff808ad670, pc_ldt = 0xffffffff808ad6b0, pc_tss = 0xffffffff808ad6a0, pc_cmci_mask = 36} $5 = {pc_curthread = 0xffffff0016ef7460, pc_idlethread = 0xffffff0012d7e000, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb = 0xffffff8d8d249d00, pc_switchtime = 564139958831726, pc_switchticks = 247796548, pc_cpuid = 4, pc_cpumask = 16, pc_other_cpus = 16777199, pc_allcpu = {sle_next = 0xffffffff808afb80}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 806444301, v_trap = 81626382, v_syscall = 1826349511, v_intr = 123653, v_soft = 144961951, v_vm_faults = 9705936, v_cow_faults = 966760, v_cow_optim = 329, v_zfod = 7605338, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 7070, v_vnodeout = 38, v_vnodepgsin = 7070, v_vnodepgsout = 298, v_intrans = 6176, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 10505534, v_page_size = 0, v_page_count = 0, v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 16806, v_vforks = 12551, v_rforks = 0, v_kthreads = 0, v_forkpages = 4380008, v_vforkpages = 2702450, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {8548739, 486, 66166, 155291, 24186180}, pc_device = 0xffffff0012eee500, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 0xffffffff808aff50, rmq_prev = 0xffffffff808aff50}, pc_dynamic = 18446743526093383936, pc_monitorbuf = '\0' , pc_prvspace = 0xffffffff808afe00, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 0xffffffff808ae8a0, pc_commontssp = 0xffffffff808ae8a0, pc_rsp0 = -491553252096, pc_scratch_rsp = 140737488347240, pc_apic_id = 4, pc_acpi_id = 3, pc_fs32p = 0xffffffff808ad6d0, pc_gs32p = 0xffffffff808ad6d8, pc_ldt = 0xffffffff808ad718, pc_tss = 0xffffffff808ad708, pc_cmci_mask = 44} $6 = {pc_curthread = 0xffffff0016d40460, pc_idlethread = 0xffffff0012d7e460, pc_fpcurthread = 0xffffff0016d40460, pc_deadthread = 0x0, pc_curpcb = 0xffffff8d8d47ed00, pc_switchtime = 564139958865046, pc_switchticks = 247796548, pc_cpuid = 5, pc_cpumask = 32, pc_other_cpus = 16777183, pc_allcpu = {sle_next = 0xffffffff808afe00}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 328647871, v_trap = 51585678, v_syscall = 541729242, v_intr = 195389, v_soft = 45565082, v_vm_faults = 5629366, v_cow_faults = 317486, v_cow_optim = 82, v_zfod = 4813949, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 4750, v_vnodeout = 16, v_vnodepgsin = 4750, v_vnodepgsout = 125, v_intrans = 4461, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 6982766, v_page_size = 0, v_page_count = 0, v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 7103, v_vforks = 10517, v_rforks = 0, v_kthreads = 0, v_forkpages = 1858863, v_vforkpages = 2375468, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {8156742, 242, 143793, 754, 24655331}, pc_device = 0xffffff0012eee400, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 0xffffffff808b01d0, rmq_prev = 0xffffffff808b01d0}, pc_dynamic = 18446743526093412608, pc_monitorbuf = '\0' , pc_prvspace = 0xffffffff808b0080, pc_curpmap = 0xffffff00285acd70, pc_tssp = 0xffffffff808ae908, pc_commontssp = 0xffffffff808ae908, pc_rsp0 = -491550937856, pc_scratch_rsp = 140737488338216, pc_apic_id = 5, pc_acpi_id = 15, pc_fs32p = 0xffffffff808ad738, pc_gs32p = 0xffffffff808ad740, pc_ldt = 0xffffffff808ad780, pc_tss = 0xffffffff808ad770, pc_cmci_mask = 44} $7 = {pc_curthread = 0xffffff0012d7e8c0, pc_idlethread = 0xffffff0012d7e8c0, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb = 0xffffff80001a3d00, pc_switchtime = 564139963916274, pc_switchticks = 247796550, pc_cpuid = 6, pc_cpumask = 64, pc_other_cpus = 16777151, pc_allcpu = {sle_next = 0xffffffff808b0080}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 571134015, v_trap = 71997786, v_syscall = 1463142320, v_intr = 279742, v_soft = 132911942, v_vm_faults = 7791389, v_cow_faults = 708630, v_cow_optim = 253, v_zfod = 6277796, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 6392, v_vnodeout = 39, v_vnodepgsin = 6392, v_vnodepgsout = 312, v_intrans = 5737, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 9292572, v_page_size = 0, v_page_count = 0, v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 11420, v_vforks = 9776, v_rforks = 0, v_kthreads = 0, v_forkpages = 2973042, v_vforkpages = 2103188, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {8387371, 350, 74084, 53153, 24441904}, pc_device = 0xffffff0012eee300, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 0xffffffff808b0450, rmq_prev = 0xffffffff808b0450}, pc_dynamic = 18446743526093441280, pc_monitorbuf = '\0' , pc_prvspace = 0xffffffff808b0300, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 0xffffffff808ae970, pc_commontssp = 0xffffffff808ae970, pc_rsp0 = -549754094336, ---Type to continue, or q to quit--- pc_scratch_rsp = 140737488347240, pc_apic_id = 16, pc_acpi_id = 4, pc_fs32p = 0xffffffff808ad7a0, pc_gs32p = 0xffffffff808ad7a8, pc_ldt = 0xffffffff808ad7e8, pc_tss = 0xffffffff808ad7d8, pc_cmci_mask = 44} $8 = {pc_curthread = 0xffffff0012d7f000, pc_idlethread = 0xffffff0012d7f000, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb = 0xffffff800019ed00, pc_switchtime = 564139961406818, pc_switchticks = 247796549, pc_cpuid = 7, pc_cpumask = 128, pc_other_cpus = 16777087, pc_allcpu = {sle_next = 0xffffffff808b0300}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 249485946, v_trap = 42612704, v_syscall = 513323841, v_intr = 158985, v_soft = 49793772, v_vm_faults = 4953550, v_cow_faults = 288574, v_cow_optim = 66, v_zfod = 4279446, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 5127, v_vnodeout = 18, v_vnodepgsin = 5127, v_vnodepgsout = 144, v_intrans = 4430, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 6781637, v_page_size = 0, v_page_count = 0, v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 7191, v_vforks = 8106, v_rforks = 0, v_kthreads = 0, v_forkpages = 1911571, v_vforkpages = 1791118, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {7834102, 189, 67132, 7190, 25048249}, pc_device = 0xffffff0012eee200, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 0xffffffff808b06d0, rmq_prev = 0xffffffff808b06d0}, pc_dynamic = 18446743526093469952, pc_monitorbuf = '\0' , pc_prvspace = 0xffffffff808b0580, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 0xffffffff808ae9d8, pc_commontssp = 0xffffffff808ae9d8, pc_rsp0 = -549754114816, pc_scratch_rsp = 140737488341768, pc_apic_id = 17, pc_acpi_id = 16, pc_fs32p = 0xffffffff808ad808, pc_gs32p = 0xffffffff808ad810, pc_ldt = 0xffffffff808ad850, pc_tss = 0xffffffff808ad840, pc_cmci_mask = 44} $9 = {pc_curthread = 0xffffff0012d7f460, pc_idlethread = 0xffffff0012d7f460, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb = 0xffffff8000199d00, pc_switchtime = 564139961215887, pc_switchticks = 247796549, pc_cpuid = 8, pc_cpumask = 256, pc_other_cpus = 16776959, pc_allcpu = {sle_next = 0xffffffff808b0580}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 464956409, v_trap = 64334946, v_syscall = 1027059020, v_intr = 0, v_soft = 93052690, v_vm_faults = 6917455, v_cow_faults = 567595, v_cow_optim = 160, v_zfod = 5697686, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 4817, v_vnodeout = 36, v_vnodepgsin = 4817, v_vnodepgsout = 285, v_intrans = 5954, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 7425675, v_page_size = 0, v_page_count = 0, v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 9992, v_vforks = 8433, v_rforks = 0, v_kthreads = 0, v_forkpages = 2600081, v_vforkpages = 1793522, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {7987187, 281, 218255, 10560, 24740579}, pc_device = 0xffffff0012eee100, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 0xffffffff808b0950, rmq_prev = 0xffffffff808b0950}, pc_dynamic = 18446743526093498624, pc_monitorbuf = '\0' , pc_prvspace = 0xffffffff808b0800, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 0xffffffff808aea40, pc_commontssp = 0xffffffff808aea40, pc_rsp0 = -549754135296, pc_scratch_rsp = 140737488349480, pc_apic_id = 18, pc_acpi_id = 5, pc_fs32p = 0xffffffff808ad870, pc_gs32p = 0xffffffff808ad878, pc_ldt = 0xffffffff808ad8b8, pc_tss = 0xffffffff808ad8a8, pc_cmci_mask = 44} $10 = {pc_curthread = 0xffffff0012d7f8c0, pc_idlethread = 0xffffff0012d7f8c0, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb = 0xffffff8000194d00, pc_switchtime = 564139962352563, pc_switchticks = 247796549, pc_cpuid = 9, pc_cpumask = 512, pc_other_cpus = 16776703, pc_allcpu = {sle_next = 0xffffffff808b0800}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 274982887, v_trap = 31399978, v_syscall = 444601703, v_intr = 3206219, v_soft = 40294508, v_vm_faults = 4841563, v_cow_faults = 271228, v_cow_optim = 62, v_zfod = 4257729, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 3551, v_vnodeout = 18, v_vnodepgsin = 3551, v_vnodepgsout = 144, v_intrans = 4426, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 5832072, v_page_size = 0, v_page_count = 0, v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 6130, v_vforks = 6921, v_rforks = 0, v_kthreads = 0, v_forkpages = 1593321, v_vforkpages = 1503680, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {7446553, 129, 81200, 80535, 25348445}, pc_device = 0xffffff0012eee000, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 0xffffffff808b0bd0, rmq_prev = 0xffffffff808b0bd0}, pc_dynamic = 18446743526093527296, pc_monitorbuf = '\0' , pc_prvspace = 0xffffffff808b0a80, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 0xffffffff808aeaa8, pc_commontssp = 0xffffffff808aeaa8, pc_rsp0 = -549754155776, pc_scratch_rsp = 140737488350536, pc_apic_id = 19, pc_acpi_id = 17, pc_fs32p = 0xffffffff808ad8d8, pc_gs32p = 0xffffffff808ad8e0, pc_ldt = 0xffffffff808ad920, pc_tss = 0xffffffff808ad910, pc_cmci_mask = 44} $11 = {pc_curthread = 0xffffff0012d81000, pc_idlethread = 0xffffff0012d81000, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb = 0xffffff800018fd00, pc_switchtime = 564139960703995, pc_switchticks = 247796549, pc_cpuid = 10, pc_cpumask = 1024, pc_other_cpus = 16776191, pc_allcpu = {sle_next = 0xffffffff808b0a80}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 409239430, v_trap = 63190665, v_syscall = 916205775, v_intr = 0, v_soft = 84751808, v_vm_faults = 5931079, v_cow_faults = 475456, v_cow_optim = 101, v_zfod = 4924752, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 3365, v_vnodeout = 31, v_vnodepgsin = 3365, v_vnodepgsout = 248, v_intrans = 5616, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 6799603, v_page_size = 0, v_page_count = 0, v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 7789, v_vforks = 7721, v_rforks = 0, v_kthreads = 0, v_forkpages = 2032846, v_vforkpages = 1672147, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {7664964, 263, 457601, 10909, 24823125}, pc_device = 0xffffff0012e73e00, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 0xffffffff808b0e50, rmq_prev = 0xffffffff808b0e50}, pc_dynamic = 18446743526093555968, pc_monitorbuf = '\0' , pc_prvspace = 0xffffffff808b0d00, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 0xffffffff808aeb10, pc_commontssp = 0xffffffff808aeb10, pc_rsp0 = -549754176256, pc_scratch_rsp = 140737488349208, pc_apic_id = 20, pc_acpi_id = 6, pc_fs32p = 0xffffffff808ad940, pc_gs32p = 0xffffffff808ad948, pc_ldt = 0xffffffff808ad988, pc_tss = 0xffffffff808ad978, pc_cmci_mask = 12} $12 = {pc_curthread = 0xffffff0012d7c000, pc_idlethread = 0xffffff0012d7c000, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb = 0xffffff800018ad00, pc_switchtime = 564139964659179, pc_switchticks = 247796550, pc_cpuid = 11, pc_cpumask = 2048, pc_other_cpus = 16775167, pc_allcpu = {sle_next = 0xffffffff808b0d00}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 286462305, v_trap = 43327710, v_syscall = 585895149, v_intr = 0, v_soft = 58132961, v_vm_faults = 4529044, v_cow_faults = 253158, v_cow_optim = 51, v_zfod = 3997600, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 2889, v_vnodeout = 14, v_vnodepgsin = 2889, v_vnodepgsout = 112, v_intrans = 4397, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 5501089, v_page_size = 0, v_page_count = 0, v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 5059, v_vforks = 7478, v_rforks = 0, v_kthreads = 0, v_forkpages = 1317908, v_vforkpages = 1695236, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {6940693, 83, 34106, 26044, 25955936}, pc_device = 0xffffff0012e73d00, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 0xffffffff808b10d0, rmq_prev = 0xffffffff808b10d0}, pc_dynamic = 18446743526093584640, pc_monitorbuf = '\0' , pc_prvspace = 0xffffffff808b0f80, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 0xffffffff808aeb78, pc_commontssp = 0xffffffff808aeb78, pc_rsp0 = -549754196736, pc_scratch_rsp = 140737488341352, pc_apic_id = 21, pc_acpi_id = 18, pc_fs32p = 0xffffffff808ad9a8, pc_gs32p = 0xffffffff808ad9b0, pc_ldt = 0xffffffff808ad9f0, pc_tss = 0xffffffff808ad9e0, pc_cmci_mask = 32} $13 = {pc_curthread = 0xffffff0012d7c460, pc_idlethread = 0xffffff0012d7c460, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb = 0xffffff8000185d00, pc_switchtime = 564139964135679, pc_switchticks = 247796550, pc_cpuid = 12, pc_cpumask = 4096, pc_other_cpus = 16773119, pc_allcpu = {sle_next = 0xffffffff808b0f80}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 1452228911, v_trap = 138992214, v_syscall = 3481775635, v_intr = 0, v_soft = 170473572, v_vm_faults = 25297505, v_cow_faults = 2415176, v_cow_optim = 725, v_zfod = 20124416, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 20013, v_vnodeout = 39, v_vnodepgsin = 20013, v_vnodepgsout = 303, v_intrans = 8910, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 26563945, v_page_size = 0, v_page_count = 0, v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 32999, v_vforks = 20923, v_rforks = 0, v_kthreads = 0, v_forkpages = 8640201, v_vforkpages = 4383325, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {9099202, 1744, 94283, 697, 23760936}, pc_device = 0xffffff0012e73c00, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 0xffffffff808b1350, rmq_prev = 0xffffffff808b1350}, pc_dynamic = 18446743526093613312, pc_monitorbuf = '\0' , pc_prvspace = 0xffffffff808b1200, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 0xffffffff808aebe0, pc_commontssp = 0xffffffff808aebe0, pc_rsp0 = -549754217216, pc_scratch_rsp = 140737488348728, pc_apic_id = 32, pc_acpi_id = 7, pc_fs32p = 0xffffffff808ada10, pc_gs32p = 0xffffffff808ada18, pc_ldt = 0xffffffff808ada58, pc_tss = 0xffffffff808ada48, pc_cmci_mask = 40} $14 = {pc_curthread = 0xffffff081149b460, pc_idlethread = 0xffffff0012d7c8c0, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb = 0xffffff8d8e5a9d00, pc_switchtime = 564139964136603, pc_switchticks = 247796550, pc_cpuid = 13, pc_cpumask = 8192, pc_other_cpus = 16769023, pc_allcpu = {sle_next = 0xffffffff808b1200}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 476914812, v_trap = 74744115, v_syscall = 936915699, v_intr = 0, v_soft = 80021803, v_vm_faults = 7523961, v_cow_faults = 390299, v_cow_optim = 133, v_zfod = 6338882, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 3069, v_vnodeout = 31, v_vnodepgsin = 3069, v_vnodepgsout = 242, v_intrans = 7411, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 8400243, v_page_size = 0, v_page_count = 0, v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 9846, v_vforks = 12050, v_rforks = 0, v_kthreads = 0, v_forkpages = 2609276, v_vforkpages = 2565142, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {8441142, 622, 122914, 718, 24391466}, pc_device = 0xffffff0012e73b00, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 0xffffffff808b15d0, rmq_prev = 0xffffffff808b15d0}, pc_dynamic = 18446743526093641984, ---Type to continue, or q to quit--- pc_monitorbuf = '\0' , pc_prvspace = 0xffffffff808b1480, pc_curpmap = 0xffffff017af0a440, pc_tssp = 0xffffffff808aec48, pc_commontssp = 0xffffffff808aec48, pc_rsp0 = -491532935936, pc_scratch_rsp = 140737488347000, pc_apic_id = 33, pc_acpi_id = 19, pc_fs32p = 0xffffffff808ada78, pc_gs32p = 0xffffffff808ada80, pc_ldt = 0xffffffff808adac0, pc_tss = 0xffffffff808adab0, pc_cmci_mask = 44} $15 = {pc_curthread = 0xffffff0012d7d000, pc_idlethread = 0xffffff0012d7d000, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb = 0xffffff800017bd00, pc_switchtime = 564139961068041, pc_switchticks = 247796549, pc_cpuid = 14, pc_cpumask = 16384, pc_other_cpus = 16760831, pc_allcpu = {sle_next = 0xffffffff808b1480}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 1106841263, v_trap = 97803361, v_syscall = 2661991003, v_intr = 0, v_soft = 167213229, v_vm_faults = 11378004, v_cow_faults = 1160991, v_cow_optim = 428, v_zfod = 8894351, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 12399, v_vnodeout = 42, v_vnodepgsin = 12399, v_vnodepgsout = 333, v_intrans = 8773, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 11845093, v_page_size = 0, v_page_count = 0, v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 18853, v_vforks = 12152, v_rforks = 0, v_kthreads = 0, v_forkpages = 4923904, v_vforkpages = 2572782, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {8264202, 687, 502611, 825769, 23363593}, pc_device = 0xffffff0012e73a00, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 0xffffffff808b1850, rmq_prev = 0xffffffff808b1850}, pc_dynamic = 18446743526093670656, pc_monitorbuf = '\0' , pc_prvspace = 0xffffffff808b1700, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 0xffffffff808aecb0, pc_commontssp = 0xffffffff808aecb0, pc_rsp0 = -549754258176, pc_scratch_rsp = 140737488332648, pc_apic_id = 34, pc_acpi_id = 8, pc_fs32p = 0xffffffff808adae0, pc_gs32p = 0xffffffff808adae8, pc_ldt = 0xffffffff808adb28, pc_tss = 0xffffffff808adb18, pc_cmci_mask = 296} $16 = {pc_curthread = 0xffffff0012d7d460, pc_idlethread = 0xffffff0012d7d460, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb = 0xffffff8000176d00, pc_switchtime = 564139964473317, pc_switchticks = 247796550, pc_cpuid = 15, pc_cpumask = 32768, pc_other_cpus = 16744447, pc_allcpu = {sle_next = 0xffffffff808b1700}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 403303982, v_trap = 62431250, v_syscall = 772577344, v_intr = 0, v_soft = 66350085, v_vm_faults = 6382252, v_cow_faults = 350483, v_cow_optim = 113, v_zfod = 5469113, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 4512, v_vnodeout = 25, v_vnodepgsin = 4512, v_vnodepgsout = 190, v_intrans = 7276, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 7254252, v_page_size = 0, v_page_count = 0, v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 9383, v_vforks = 9353, v_rforks = 0, v_kthreads = 0, v_forkpages = 2458231, v_vforkpages = 2018955, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {8065725, 424, 321250, 606, 24568857}, pc_device = 0xffffff0012e73900, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 0xffffffff808b1ad0, rmq_prev = 0xffffffff808b1ad0}, pc_dynamic = 18446743526093699328, pc_monitorbuf = '\0' , pc_prvspace = 0xffffffff808b1980, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 0xffffffff808aed18, pc_commontssp = 0xffffffff808aed18, pc_rsp0 = -549754278656, pc_scratch_rsp = 140737488341304, pc_apic_id = 35, pc_acpi_id = 20, pc_fs32p = 0xffffffff808adb48, pc_gs32p = 0xffffffff808adb50, pc_ldt = 0xffffffff808adb90, pc_tss = 0xffffffff808adb80, pc_cmci_mask = 68} $17 = {pc_curthread = 0xffffff0012d7d8c0, pc_idlethread = 0xffffff0012d7d8c0, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb = 0xffffff8000171d00, pc_switchtime = 564139964351635, pc_switchticks = 247796550, pc_cpuid = 16, pc_cpumask = 65536, pc_other_cpus = 16711679, pc_allcpu = {sle_next = 0xffffffff808b1980}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 877084980, v_trap = 88142770, v_syscall = 2088145984, v_intr = 0, v_soft = 177372820, v_vm_faults = 7362548, v_cow_faults = 753161, v_cow_optim = 299, v_zfod = 5798775, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 3826, v_vnodeout = 21, v_vnodepgsin = 3826, v_vnodepgsout = 168, v_intrans = 8678, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 8584916, v_page_size = 0, v_page_count = 0, v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 11478, v_vforks = 9188, v_rforks = 0, v_kthreads = 0, v_forkpages = 2993442, v_vforkpages = 1934018, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {9852924, 500, 2875093, 71501, 20156844}, pc_device = 0xffffff0012e73800, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 0xffffffff808b1d50, rmq_prev = 0xffffffff808b1d50}, pc_dynamic = 18446743526093728000, pc_monitorbuf = '\0' , pc_prvspace = 0xffffffff808b1c00, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 0xffffffff808aed80, pc_commontssp = 0xffffffff808aed80, pc_rsp0 = -549754299136, pc_scratch_rsp = 140737488347240, pc_apic_id = 36, pc_acpi_id = 9, pc_fs32p = 0xffffffff808adbb0, pc_gs32p = 0xffffffff808adbb8, pc_ldt = 0xffffffff808adbf8, pc_tss = 0xffffffff808adbe8, pc_cmci_mask = 8} $18 = {pc_curthread = 0xffffff0016b94000, pc_idlethread = 0xffffff0012d71460, pc_fpcurthread = 0xffffff0016b94000, pc_deadthread = 0x0, pc_curpcb = 0xffffff8d8d389d00, pc_switchtime = 564139958856111, pc_switchticks = 247796548, pc_cpuid = 17, pc_cpumask = 131072, pc_other_cpus = 16646143, pc_allcpu = {sle_next = 0xffffffff808b1c00}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 381936054, v_trap = 50211651, v_syscall = 515466461, v_intr = 0, v_soft = 45237711, v_vm_faults = 6414094, v_cow_faults = 389134, v_cow_optim = 137, v_zfod = 5507950, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 5530, v_vnodeout = 53, v_vnodepgsin = 5530, v_vnodepgsout = 424, v_intrans = 6725, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 7461119, v_page_size = 0, v_page_count = 0, v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 10518, v_vforks = 8957, v_rforks = 0, v_kthreads = 0, v_forkpages = 2736040, v_vforkpages = 1935285, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {7863823, 191, 438619, 69173, 24585056}, pc_device = 0xffffff0012e73700, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 0xffffffff808b1fd0, rmq_prev = 0xffffffff808b1fd0}, pc_dynamic = 18446743526093756672, pc_monitorbuf = '\0' , pc_prvspace = 0xffffffff808b1e80, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 0xffffffff808aede8, pc_commontssp = 0xffffffff808aede8, pc_rsp0 = -491551941376, pc_scratch_rsp = 140737488341208, pc_apic_id = 37, pc_acpi_id = 21, pc_fs32p = 0xffffffff808adc18, pc_gs32p = 0xffffffff808adc20, pc_ldt = 0xffffffff808adc60, pc_tss = 0xffffffff808adc50, pc_cmci_mask = 36} $19 = {pc_curthread = 0xffffff0012d718c0, pc_idlethread = 0xffffff0012d718c0, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb = 0xffffff8000167d00, pc_switchtime = 564139961112754, pc_switchticks = 247796549, pc_cpuid = 18, pc_cpumask = 262144, pc_other_cpus = 16515071, pc_allcpu = {sle_next = 0xffffffff808b1e80}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 766728401, v_trap = 80727218, v_syscall = 2226067386, v_intr = 0, v_soft = 164836456, v_vm_faults = 7049688, v_cow_faults = 590626, v_cow_optim = 179, v_zfod = 5751647, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 5189, v_vnodeout = 18, v_vnodepgsin = 5189, v_vnodepgsout = 141, v_intrans = 8491, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 8391955, v_page_size = 0, v_page_count = 0, v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 9985, v_vforks = 8855, v_rforks = 0, v_kthreads = 0, v_forkpages = 2607970, v_vforkpages = 1888180, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {8073713, 395, 119748, 2812, 24760194}, pc_device = 0xffffff0012e73600, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 0xffffffff808b2250, rmq_prev = 0xffffffff808b2250}, pc_dynamic = 18446743526093785344, pc_monitorbuf = '\0' , pc_prvspace = 0xffffffff808b2100, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 0xffffffff808aee50, pc_commontssp = 0xffffffff808aee50, pc_rsp0 = -549754340096, pc_scratch_rsp = 140737488340200, pc_apic_id = 48, pc_acpi_id = 10, pc_fs32p = 0xffffffff808adc80, pc_gs32p = 0xffffffff808adc88, pc_ldt = 0xffffffff808adcc8, pc_tss = 0xffffffff808adcb8, pc_cmci_mask = 44} $20 = {pc_curthread = 0xffffff0016b95000, pc_idlethread = 0xffffff0012d7b000, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb = 0xffffff8d8d37ad00, pc_switchtime = 564139964352202, pc_switchticks = 247796550, pc_cpuid = 19, pc_cpumask = 524288, pc_other_cpus = 16252927, pc_allcpu = {sle_next = 0xffffffff808b2100}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 273765031, v_trap = 40409221, v_syscall = 606284542, v_intr = 0, v_soft = 60032824, v_vm_faults = 4751488, v_cow_faults = 263767, v_cow_optim = 85, v_zfod = 4135690, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 4928, v_vnodeout = 17, v_vnodepgsin = 4928, v_vnodepgsout = 130, v_intrans = 6643, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 6264634, v_page_size = 0, v_page_count = 0, v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 7126, v_vforks = 6926, v_rforks = 0, v_kthreads = 0, v_forkpages = 1865103, v_vforkpages = 1510438, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {7556063, 190, 97828, 19877, 25282904}, pc_device = 0xffffff0012eef700, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 0xffffffff808b24d0, rmq_prev = 0xffffffff808b24d0}, pc_dynamic = 18446743526093814016, pc_monitorbuf = '\0' , pc_prvspace = 0xffffffff808b2380, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 0xffffffff808aeeb8, pc_commontssp = 0xffffffff808aeeb8, pc_rsp0 = -491552002816, pc_scratch_rsp = 140737488347448, pc_apic_id = 49, pc_acpi_id = 22, pc_fs32p = 0xffffffff808adce8, pc_gs32p = 0xffffffff808adcf0, pc_ldt = 0xffffffff808add30, pc_tss = 0xffffffff808add20, pc_cmci_mask = 44} $21 = {pc_curthread = 0xffffff0012d7b460, pc_idlethread = 0xffffff0012d7b460, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb = 0xffffff800015dd00, pc_switchtime = 564139966988296, pc_switchticks = 247796551, pc_cpuid = 20, pc_cpumask = 1048576, pc_other_cpus = 15728639, pc_allcpu = {sle_next = 0xffffffff808b2380}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 650673237, v_trap = 75330345, v_syscall = 1869508277, v_intr = 0, v_soft = 159703865, v_vm_faults = 6100783, v_cow_faults = 476018, v_cow_optim = 144, v_zfod = 5037626, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 2676, v_vnodeout = 30, v_vnodepgsin = 2676, v_vnodepgsout = 240, v_intrans = 8171, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 7518937, v_page_size = 0, v_page_count = 0, v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, ---Type to continue, or q to quit--- v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 7717, v_vforks = 7894, v_rforks = 0, v_kthreads = 0, v_forkpages = 2012738, v_vforkpages = 1684979, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {7882266, 330, 49708, 171, 25024387}, pc_device = 0xffffff0012eef600, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 0xffffffff808b2750, rmq_prev = 0xffffffff808b2750}, pc_dynamic = 18446743526093842688, pc_monitorbuf = '\0' , pc_prvspace = 0xffffffff808b2600, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 0xffffffff808aef20, pc_commontssp = 0xffffffff808aef20, pc_rsp0 = -549754381056, pc_scratch_rsp = 140737488347000, pc_apic_id = 50, pc_acpi_id = 11, pc_fs32p = 0xffffffff808add50, pc_gs32p = 0xffffffff808add58, pc_ldt = 0xffffffff808add98, pc_tss = 0xffffffff808add88, pc_cmci_mask = 44} $22 = {pc_curthread = 0xffffff0012d7b8c0, pc_idlethread = 0xffffff0012d7b8c0, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb = 0xffffff8000158d00, pc_switchtime = 564139962266028, pc_switchticks = 247796549, pc_cpuid = 21, pc_cpumask = 2097152, pc_other_cpus = 14680063, pc_allcpu = {sle_next = 0xffffffff808b2600}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 241921442, v_trap = 27867529, v_syscall = 396426755, v_intr = 0, v_soft = 36203414, v_vm_faults = 4579069, v_cow_faults = 243971, v_cow_optim = 80, v_zfod = 4045607, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 2750, v_vnodeout = 13, v_vnodepgsin = 2750, v_vnodepgsout = 104, v_intrans = 6546, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 6134243, v_page_size = 0, v_page_count = 0, v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 5441, v_vforks = 7475, v_rforks = 0, v_kthreads = 0, v_forkpages = 1420342, v_vforkpages = 1634961, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {7064507, 140, 130489, 35805, 25725921}, pc_device = 0xffffff0012eef500, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 0xffffffff808b29d0, rmq_prev = 0xffffffff808b29d0}, pc_dynamic = 18446743526093871360, pc_monitorbuf = '\0' , pc_prvspace = 0xffffffff808b2880, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 0xffffffff808aef88, pc_commontssp = 0xffffffff808aef88, pc_rsp0 = -549754401536, pc_scratch_rsp = 140737488349080, pc_apic_id = 51, pc_acpi_id = 23, pc_fs32p = 0xffffffff808addb8, pc_gs32p = 0xffffffff808addc0, pc_ldt = 0xffffffff808ade00, pc_tss = 0xffffffff808addf0, pc_cmci_mask = 44} $23 = {pc_curthread = 0xffffff0012d70000, pc_idlethread = 0xffffff0012d70000, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb = 0xffffff8000153d00, pc_switchtime = 564139963254342, pc_switchticks = 247796550, pc_cpuid = 22, pc_cpumask = 4194304, pc_other_cpus = 12582911, pc_allcpu = {sle_next = 0xffffffff808b2880}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 561077805, v_trap = 70503560, v_syscall = 1547897389, v_intr = 0, v_soft = 145263466, v_vm_faults = 5402089, v_cow_faults = 373167, v_cow_optim = 88, v_zfod = 4593818, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 3032, v_vnodeout = 16, v_vnodepgsin = 3032, v_vnodepgsout = 128, v_intrans = 7615, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 6367193, v_page_size = 0, v_page_count = 0, v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 5531, v_vforks = 6954, v_rforks = 0, v_kthreads = 0, v_forkpages = 1440742, v_vforkpages = 1477372, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {7664569, 311, 54855, 173, 25236954}, pc_device = 0xffffff0012eef400, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 0xffffffff808b2c50, rmq_prev = 0xffffffff808b2c50}, pc_dynamic = 18446743526093900032, pc_monitorbuf = '\0' , pc_prvspace = 0xffffffff808b2b00, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 0xffffffff808aeff0, pc_commontssp = 0xffffffff808aeff0, pc_rsp0 = -549754422016, pc_scratch_rsp = 140737488347256, pc_apic_id = 52, pc_acpi_id = 12, pc_fs32p = 0xffffffff808ade20, pc_gs32p = 0xffffffff808ade28, pc_ldt = 0xffffffff808ade68, pc_tss = 0xffffffff808ade58, pc_cmci_mask = 44} $24 = {pc_curthread = 0xffffff0012d70460, pc_idlethread = 0xffffff0012d70460, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb = 0xffffff800014ed00, pc_switchtime = 564139962421094, pc_switchticks = 247796549, pc_cpuid = 23, pc_cpumask = 8388608, pc_other_cpus = 8388607, pc_allcpu = {sle_next = 0xffffffff808b2b00}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 206202024, v_trap = 21901993, v_syscall = 456089216, v_intr = 0, v_soft = 44500078, v_vm_faults = 4394323, v_cow_faults = 229085, v_cow_optim = 71, v_zfod = 3915765, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 2733, v_vnodeout = 5, v_vnodepgsin = 2733, v_vnodepgsout = 40, v_intrans = 6360, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 5866068, v_page_size = 0, v_page_count = 0, v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 4299, v_vforks = 5814, v_rforks = 0, v_kthreads = 0, v_forkpages = 1116482, v_vforkpages = 1250752, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {7132190, 176, 32115, 131, 25792250}, pc_device = 0xffffff0012eef300, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 0xffffffff808b2ed0, rmq_prev = 0xffffffff808b2ed0}, pc_dynamic = 18446743526093928704, pc_monitorbuf = '\0' , pc_prvspace = 0xffffffff808b2d80, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 0xffffffff808af058, pc_commontssp = 0xffffffff808af058, pc_rsp0 = -549754442496, pc_scratch_rsp = 140737488347256, pc_apic_id = 53, pc_acpi_id = 24, pc_fs32p = 0xffffffff808ade88, pc_gs32p = 0xffffffff808ade90, pc_ldt = 0xffffffff808aded0, pc_tss = 0xffffffff808adec0, pc_cmci_mask = 44} > A little bit later I will send you another patch that, I hope, will produce better > diagnostics for this crash (without DDB in kernel). Kernel with the patch is now installed on the test machine. I've taken DDB, INVARIANTS and STACK changes out for now. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-stable@FreeBSD.ORG Wed Aug 17 12:56:35 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B8075106566B for ; Wed, 17 Aug 2011 12:56:35 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id DF6458FC18 for ; Wed, 17 Aug 2011 12:56:34 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id PAA24369; Wed, 17 Aug 2011 15:56:32 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4E4BBA7F.30907@FreeBSD.org> Date: Wed, 17 Aug 2011 15:56:31 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110705 Thunderbird/5.0 MIME-Version: 1.0 To: Steven Hartland References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org> <4E43E272.1060204@FreeBSD.org> <62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk> <4E440865.1040500@FreeBSD.org> <6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk> <4E441314.6060606@FreeBSD.org> <2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk> <4E48D967.9060804@FreeBSD.org> <9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk> <4E490DAF.1080009@FreeBSD.org> <796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk> <4E491D01.1090902@FreeBSD.org> <570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk> <4E4AD35C.7020504@FreeBSD.org> <6A7238AED44542A880B082A40304D940@multiplay.co.uk> <4E4BA21F.6010805@FreeBSD.org> <581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk> In-Reply-To: <581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk> X-Enigmail-Version: 1.2pre Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2011 12:56:35 -0000 on 17/08/2011 15:15 Steven Hartland said the following: >> define allpcpu >> set $i = 0 >> while ($i <= mp_maxid) >> p *cpuid_to_pcpu[$i] >> set $i = $i + 1 >> end >> end >> allpcpu > > Here's the output. [snip] > $3 = {pc_curthread = 0xffffff06b7f9c000, pc_idlethread = 0xffffff0012d85460, > pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb = 0xffffff8d8f35ad00, > pc_switchtime = 564139963042291, pc_switchticks = 247796550, pc_cpuid = 2, > pc_cpumask = 4, pc_other_cpus = 16777211, pc_allcpu = {sle_next = > 0xffffffff808af680}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 1005391948, v_trap = > 95927887, v_syscall = 2033274537, v_intr = 137253, v_soft = 151981308, > v_vm_faults = 14199910, v_cow_faults = 1468132, v_cow_optim = 533, v_zfod = > 11032593, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout > = 0, v_vnodein = 17238, v_vnodeout = 48, v_vnodepgsin = 17238, > v_vnodepgsout = 378, v_intrans = 6753, v_reactivated = 0, v_pdwakeups = 0, > v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 15435380, > v_page_size = 0, v_page_count = 0, v_free_reserved = 0, > v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, > v_active_count = 0, v_inactive_target = 0, v_inactive_count = 0, v_cache_count = > 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0, > v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 24041, v_vforks = 16857, > v_rforks = 0, v_kthreads = 0, v_forkpages = 6281292, v_vforkpages = 3606842, > v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {8629094, > 693, 594838, 24425, 23707811}, pc_device = 0xffffff0012da2500, pc_netisr = 0x0, > pc_rm_queue = {rmq_next = 0xffffffff808afa50, rmq_prev = 0xffffffff808afa50}, > pc_dynamic = 18446743526093326592, > pc_monitorbuf = '\0' , pc_prvspace = 0xffffffff808af900, > pc_curpmap = 0xffffffff8083ea50, pc_tssp = 0xffffffff808ae7d0, pc_commontssp = > 0xffffffff808ae7d0, pc_rsp0 = -491518579456, > pc_scratch_rsp = 140737488347240, pc_apic_id = 2, pc_acpi_id = 2, pc_fs32p = > 0xffffffff808ad600, pc_gs32p = 0xffffffff808ad608, pc_ldt = 0xffffffff808ad648, > pc_tss = 0xffffffff808ad638, pc_cmci_mask = 8} [snip] Thank you. A few more questions: 1. more kgdb info for the core: p *(cpuid_to_pcpu[2]->pc_curthread) p *(cpuid_to_pcpu[2]->pc_curthread->td_proc) p *(cpuid_to_pcpu[2]->pc_curthread->td_proc->p_limit) 2. do you have any additional patches in your source tree besides those debugging patches that I provided to you? 3. do you have any thirdparty/out-of-tree kernel modules? 4. could you please send me your kernel config? -- Andriy Gapon From owner-freebsd-stable@FreeBSD.ORG Wed Aug 17 13:54:42 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ECED51065670 for ; Wed, 17 Aug 2011 13:54:42 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from elsa.codelab.cz (elsa.codelab.cz [94.124.105.4]) by mx1.freebsd.org (Postfix) with ESMTP id A92CE8FC0A for ; Wed, 17 Aug 2011 13:54:42 +0000 (UTC) Received: from elsa.codelab.cz (localhost [127.0.0.1]) by elsa.codelab.cz (Postfix) with ESMTP id DEEFA28429 for ; Wed, 17 Aug 2011 15:35:16 +0200 (CEST) Received: from [192.168.1.2] (ip-86-49-61-235.net.upcbroadband.cz [86.49.61.235]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by elsa.codelab.cz (Postfix) with ESMTPSA id 0DFED28424 for ; Wed, 17 Aug 2011 15:35:10 +0200 (CEST) Message-ID: <4E4BC38D.1050808@quip.cz> Date: Wed, 17 Aug 2011 15:35:09 +0200 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.1.19) Gecko/20110420 Lightning/1.0b1 SeaMonkey/2.0.14 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Content-Type: text/plain; charset=ISO-8859-2; format=flowed Content-Transfer-Encoding: 7bit Subject: can not boot from RAIDZ with 8-STABLE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2011 13:54:43 -0000 I tried mfsBSD installation on Dell T110 with PERC H200A and 4x 500GB SATA disks. If I create zpool with RAIDZ, the boot immediately hangs with following error: ZFS: i/o error - all block copies unavailable ZFS: can't read MOS ZFS: unexpected object set type 0 ZFS: unexpected object set type 0 FreeBSD/x86 boot Default: tank0:/boot/kernel/kernel boot: ZFS: unexpected object set type 0 FreeBSD/x86 boot Default: tank0:/boot/kernel/kernel boot: The system is FreeBSD 8.2-STABLE #0: Sat Aug 13 20:33:31 CEST 2011 GENERIC amd64 Built from sources from Aug 13 2011. Identical system is booting fine from external (USB) drive and I can use data on zpool RAIDZ tank0 without any problems. So the pool and disks are fine, only boot failed. Disks (da0 - da3) are using GPT: => 34 976773101 da0 GPT (465G) 34 128 1 freebsd-boot (64k) 162 8388608 2 freebsd-swap (4.0G) 8388770 964689920 3 freebsd-zfs (460G) 973078690 3694445 - free - (1.8G) I also tried to create the pool manually instead of script from mfsBSD, but the result is the same. This was my manual method: gpart create -s GPT da0 gpart add -b 34 -s 128 -t freebsd-boot da0 gpart add -s 4g -t freebsd-swap -l swap0 da0 gpart add -s 460g -t freebsd-zfs -l tank0 da0 gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da0 gpart create -s GPT da1 gpart add -b 34 -s 128 -t freebsd-boot da1 gpart add -s 4g -t freebsd-swap -l swap1 da1 gpart add -s 460g -t freebsd-zfs -l tank1 da1 gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da1 gpart create -s GPT da2 gpart add -b 34 -s 128 -t freebsd-boot da2 gpart add -s 4g -t freebsd-swap -l swap2 da2 gpart add -s 460g -t freebsd-zfs -l tank2 da2 gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da2 gpart create -s GPT da3 gpart add -b 34 -s 128 -t freebsd-boot da3 gpart add -s 4g -t freebsd-swap -l swap3 da3 gpart add -s 460g -t freebsd-zfs -l tank3 da3 gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da3 gmirror label -F -h -b load gmswap0 /dev/gpt/{swap0,swap1,swap2,swap3} zpool create -O mountpoint=/mnt -O atime=off -O setuid=off -O canmount=off tank0 raidz /dev/gpt/tank0 /dev/gpt/tank1 /dev/gpt/tank2 /dev/gpt/tank3 zfs create -o mountpoint=legacy -o setuid=on tank0/root zpool set bootfs=tank0/root tank0 (...then zfs create for about 10 filesystems according to http://blogs.freebsdish.org/pjd/2010/08/06/from-sysinstall-to-zfs-only-configuration/ ) zfs set mountpoint=/ system (...then rsync data from external USB disk with working system...) And after reboot, the same error as above. Has somebody any suggestions? Miroslav Lachman PS: I can't try 8.2-RELEASE, because there is no support for PERC H200A which was commited after RELEASE. From owner-freebsd-stable@FreeBSD.ORG Wed Aug 17 13:56:12 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1842A1065676; Wed, 17 Aug 2011 13:56:12 +0000 (UTC) (envelope-from prvs=1210f20b9f=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 51A868FC20; Wed, 17 Aug 2011 13:56:10 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Wed, 17 Aug 2011 14:55:32 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Wed, 17 Aug 2011 14:55:32 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014634579.msg; Wed, 17 Aug 2011 14:55:31 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1210f20b9f=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk> From: "Steven Hartland" To: "Andriy Gapon" References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org> <4E43E272.1060204@FreeBSD.org> <62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk> <4E440865.1040500@FreeBSD.org> <6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk> <4E441314.6060606@FreeBSD.org> <2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk> <4E48D967.9060804@FreeBSD.org> <9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk> <4E490DAF.1080009@FreeBSD.org> <796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk> <4E491D01.1090902@FreeBSD.org> <570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk> <4E4AD35C.7020504@FreeBSD.org> <6A7238AED44542A880B082A40304D940@multiplay.co.uk> <4E4BA21F.6010805@FreeBSD.org> <581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk> <4E4BBA7F.30907@FreeBSD.org> Date: Wed, 17 Aug 2011 14:56:10 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2011 13:56:12 -0000 ----- Original Message ----- From: "Andriy Gapon" To: "Steven Hartland" Cc: Sent: Wednesday, August 17, 2011 1:56 PM Subject: Re: debugging frequent kernel panics on 8.2-RELEASE > on 17/08/2011 15:15 Steven Hartland said the following: >>> define allpcpu >>> set $i = 0 >>> while ($i <= mp_maxid) >>> p *cpuid_to_pcpu[$i] >>> set $i = $i + 1 >>> end >>> end >>> allpcpu >> >> Here's the output. > [snip] >> $3 = {pc_curthread = 0xffffff06b7f9c000, pc_idlethread = 0xffffff0012d85460, >> pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb = 0xffffff8d8f35ad00, >> pc_switchtime = 564139963042291, pc_switchticks = 247796550, pc_cpuid = 2, >> pc_cpumask = 4, pc_other_cpus = 16777211, pc_allcpu = {sle_next = >> 0xffffffff808af680}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 1005391948, v_trap = >> 95927887, v_syscall = 2033274537, v_intr = 137253, v_soft = 151981308, >> v_vm_faults = 14199910, v_cow_faults = 1468132, v_cow_optim = 533, v_zfod = >> 11032593, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout >> = 0, v_vnodein = 17238, v_vnodeout = 48, v_vnodepgsin = 17238, >> v_vnodepgsout = 378, v_intrans = 6753, v_reactivated = 0, v_pdwakeups = 0, >> v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 15435380, >> v_page_size = 0, v_page_count = 0, v_free_reserved = 0, >> v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, >> v_active_count = 0, v_inactive_target = 0, v_inactive_count = 0, v_cache_count = >> 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0, >> v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 24041, v_vforks = 16857, >> v_rforks = 0, v_kthreads = 0, v_forkpages = 6281292, v_vforkpages = 3606842, >> v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {8629094, >> 693, 594838, 24425, 23707811}, pc_device = 0xffffff0012da2500, pc_netisr = 0x0, >> pc_rm_queue = {rmq_next = 0xffffffff808afa50, rmq_prev = 0xffffffff808afa50}, >> pc_dynamic = 18446743526093326592, >> pc_monitorbuf = '\0' , pc_prvspace = 0xffffffff808af900, >> pc_curpmap = 0xffffffff8083ea50, pc_tssp = 0xffffffff808ae7d0, pc_commontssp = >> 0xffffffff808ae7d0, pc_rsp0 = -491518579456, >> pc_scratch_rsp = 140737488347240, pc_apic_id = 2, pc_acpi_id = 2, pc_fs32p = >> 0xffffffff808ad600, pc_gs32p = 0xffffffff808ad608, pc_ldt = 0xffffffff808ad648, >> pc_tss = 0xffffffff808ad638, pc_cmci_mask = 8} > [snip] > > Thank you. > A few more questions: > 1. more kgdb info for the core: > p *(cpuid_to_pcpu[2]->pc_curthread) > p *(cpuid_to_pcpu[2]->pc_curthread->td_proc) > p *(cpuid_to_pcpu[2]->pc_curthread->td_proc->p_limit) > (kgdb) p *(cpuid_to_pcpu[2]->pc_curthread) $1 = {td_lock = 0xffffffff8084a440, td_proc = 0xffffff070b5a48c0, td_plist = {tqe_next = 0x0, tqe_prev = 0xffffff070b5a48d0}, td_runq = {tqe_next = 0x0, tqe_prev = 0xffffffff8084a688}, td_slpq = {tqe_next = 0x0, tqe_prev = 0xffffff0296460900}, td_lockq = {tqe_next = 0x0, tqe_prev = 0xffffff8d8fb5c8b0}, td_cpuset = 0xffffff0012d65dc8, td_sel = 0xffffff0a1b76c700, td_sleepqueue = 0xffffff0296460900, td_turnstile = 0xffffff05f31d8000, td_umtxq = 0xffffff05513d9780, td_tid = 102057, td_sigqueue = {sq_signals = {__bits = {0, 0, 0, 0}}, sq_kill = {__bits = {0, 0, 0, 0}}, sq_list = {tqh_first = 0x0, tqh_last = 0xffffff06b7f9c0a0}, sq_proc = 0xffffff070b5a48c0, sq_flags = 1}, td_flags = 6, td_inhibitors = 0, td_pflags = 0, td_dupfd = 0, td_sqqueue = 0, td_wchan = 0x0, td_wmesg = 0x0, td_lastcpu = 2 '\002', td_oncpu = 2 '\002', td_owepreempt = 0 '\0', td_tsqueue = 0 '\0', td_locks = 998, td_rw_rlocks = 0, td_lk_slocks = 0, td_blocked = 0x0, td_lockname = 0x0, td_contested = {lh_first = 0x0}, td_sleeplocks = 0x0, td_intr_nesting_level = 0, td_pinned = 1, td_ucred = 0xffffff0551cf9900, td_estcpu = 0, td_slptick = 0, td_blktick = 0, td_ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 0}, ru_maxrss = 2068, ru_ixrss = 5280, ru_idrss = 19296, ru_isrss = 6144, ru_minflt = 5015, ru_majflt = 0, ru_nswap = 0, ru_inblock = 0, ru_oublock = 0, ru_msgsnd = 241, ru_msgrcv = 2076, ru_nsignals = 1, ru_nvcsw = 2264, ru_nivcsw = 159}, td_incruntime = 4257692, td_runtime = 487523210, td_pticks = 0, td_sticks = 0, td_iticks = 0, td_uticks = 0, td_intrval = 4, td_oldsigmask = {__bits = {0, 0, 0, 0}}, td_sigmask = {__bits = {16384, 0, 0, 0}}, td_generation = 2423, td_sigstk = {ss_sp = 0x0, ss_size = 0, ss_flags = 4}, td_xsig = 0, td_profil_addr = 0, td_profil_ticks = 0, td_name = "httpd", '\0' , td_fpop = 0x0, td_dbgflags = 0, td_dbgksi = { ksi_link = {tqe_next = 0x0, tqe_prev = 0x0}, ksi_info = {si_signo = 0, si_errno = 0, si_code = 0, si_pid = 0, si_uid = 0, si_status = 0, si_addr = 0x0, si_value = {sival_int = 0, sival_ptr = 0x0, sigval_int = 0, sigval_ptr = 0x0}, _reason = {_fault = {_trapno = 0}, _timer = {_timerid = 0, _overrun = 0}, _mesgq = {_mqd = 0}, _poll = {_band = 0}, __spare__ = {__spare1__ = 0, __spare2__ = {0, 0, 0, 0, 0, 0, 0}}}}, ksi_flags = 0, ksi_sigq = 0x0}, td_ng_outbound = 0, td_osd = {osd_nslots = 0, osd_slots = 0x0, osd_next = {le_next = 0x0, le_prev = 0x0}}, td_rqindex = 32 ' ', td_base_pri = 128 '\200', td_priority = 128 '\200', td_pri_class = 3 '\003', td_user_pri = 128 '\200', td_base_user_pri = 128 '\200', td_pcb = 0xffffff8d8f35ad00, td_state = TDS_RUNNING, td_retval = {0, 8}, td_slpcallout = {c_links = {sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0xffffff800088ce00}}, c_time = 247622368, c_arg = 0xffffff06b7f9c000, c_func = 0xffffffff803c4bd0 , c_lock = 0x0, c_flags = 16, c_cpu = 13}, td_frame = 0xffffff8d8f35ac40, td_kstack_obj = 0xffffff0a51ee5e58, td_kstack = 18446743582190956544, td_kstack_pages = 4, td_unused1 = 0x0, td_unused2 = 0, td_unused3 = 0, td_critnest = 0, td_md = {md_spinlock_count = 0, md_saved_flags = 70}, td_sched = 0xffffff06b7f9c428, td_ar = 0x0, td_syscalls = 129862, td_lprof = {{lh_first = 0x0}, {lh_first = 0x0}}, td_dtrace = 0x0, td_errno = 0, td_vnet = 0x0, td_vnet_lpush = 0x0, td_rux = {rux_runtime = 483265518, rux_uticks = 7, rux_sticks = 17, rux_iticks = 0, rux_uu = 0, rux_su = 0, rux_tu = 0}, td_map_def_user = 0x0} (kgdb) p *(cpuid_to_pcpu[2]->pc_curthread->td_proc) $2 = {p_list = {le_next = 0xffffff0653ff78c0, le_prev = 0xffffffff80841b48}, p_threads = {tqh_first = 0xffffff06b7f9c000, tqh_last = 0xffffff06b7f9c010}, p_slock = {lock_object = {lo_name = 0xffffffff806323c0 "process slock", lo_flags = 720896, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, p_ucred = 0xffffff0551cf9900, p_fd = 0x0, p_fdtol = 0x0, p_stats = 0xffffff04ea565600, p_limit = 0x0, p_limco = {c_links = {sle = {sle_next = 0x0}, tqe = { tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, c_arg = 0x0, c_func = 0, c_lock = 0xffffff070b5a49b8, c_flags = 0, c_cpu = 0}, p_sigacts = 0xffffff0a663a1000, p_flag = 268443904, p_state = PRS_NORMAL, p_pid = 78097, p_hash = {le_next = 0x0, le_prev = 0xffffff800021c888}, p_pglist = {le_next = 0xffffff00285c5460, le_prev = 0xffffff0afa9b8988}, p_pptr = 0xffffff0afa9b88c0, p_sibling = {le_next = 0xffffff00285c5460, le_prev = 0xffffff0afa9b89b0}, p_children = {lh_first = 0x0}, p_mtx = {lock_object = {lo_name = 0xffffffff806323b3 "process lock", lo_flags = 21168128, lo_data = 10, lo_witness = 0x0}, mtx_lock = 18446743003054325761}, p_ksi = 0xffffff0016738bd0, p_sigqueue = {sq_signals = {__bits = {16384, 0, 0, 0}}, sq_kill = {__bits = {0, 0, 0, 0}}, sq_list = {tqh_first = 0xffffff033829d070, tqh_last = 0xffffff033829d070}, sq_proc = 0xffffff070b5a48c0, sq_flags = 1}, p_oppid = 0, p_vmspace = 0xffffffff8083e920, p_swtick = 89392056, p_realtimer = {it_interval = {tv_sec = 0, tv_usec = 0}, it_value = {tv_sec = 0, tv_usec = 0}}, p_ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 0}, ru_maxrss = 0, ru_ixrss = 0, ru_idrss = 0, ru_isrss = 0, ru_minflt = 0, ru_majflt = 0, ru_nswap = 0, ru_inblock = 0, ru_oublock = 0, ru_msgsnd = 0, ru_msgrcv = 0, ru_nsignals = 0, ru_nvcsw = 0, ru_nivcsw = 0}, p_rux = {rux_runtime = 483265518, rux_uticks = 7, rux_sticks = 17, rux_iticks = 0, rux_uu = 61934, rux_su = 150412, rux_tu = 212347}, p_crux = {rux_runtime = 80058539464, rux_uticks = 2914, rux_sticks = 1778, rux_iticks = 0, rux_uu = 21847439, rux_su = 13330387, rux_tu = 35177827}, p_profthreads = 0, p_exitthreads = 0, p_traceflag = 0, p_tracevp = 0x0, p_tracecred = 0x0, p_textvp = 0x0, p_lock = 11, p_sigiolst = {slh_first = 0x0}, p_sigparent = 20, p_sig = 0, p_code = 0, p_stops = 0, p_stype = 0, p_step = 0 '\0', p_pfsflags = 0 '\0', p_nlminfo = 0x0, p_aioinfo = 0x0, p_singlethread = 0x0, p_suspcount = 0, p_xthread = 0xffffff06b7f9c000, p_boundary_count = 0, p_pendingcnt = 1, p_itimers = 0x0, p_magic = 3203398350, p_osrel = 802000, p_comm = "httpd", '\0' , p_pgrp = 0xffffff05f3928080, p_sysent = 0xffffffff807fe180, p_args = 0xffffff0a8ad5e600, p_cpulimit = 9223372036854775807, p_nice = 0 '\0', p_fibnum = 0, p_xstat = 0, p_klist = {kl_list = {slh_first = 0x0}, kl_lock = 0xffffffff803586e0 , kl_unlock = 0xffffffff803586b0 , kl_assert_locked = 0xffffffff80355380 , kl_assert_unlocked = 0xffffffff80355390 , kl_lockarg = 0xffffff070b5a49b8}, p_numthreads = 1, p_md = {md_ldt = 0x0, md_ldt_sd = {sd_lolimit = 0, sd_lobase = 0, sd_type = 0, sd_dpl = 0, sd_p = 0, sd_hilimit = 0, sd_xx0 = 0, sd_gran = 0, sd_hibase = 0, sd_xx1 = 0, sd_mbz = 0, sd_xx2 = 0}}, p_itcallout = {c_links = {sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, c_arg = 0x0, c_func = 0, c_lock = 0x0, c_flags = 16, c_cpu = 0}, p_acflag = 1, p_peers = 0x0, p_leader = 0xffffff070b5a48c0, p_emuldata = 0x0, p_label = 0x0, p_sched = 0xffffff070b5a4d20, p_ktr = {stqh_first = 0x0, stqh_last = 0xffffff070b5a4cf0}, p_mqnotifier = {lh_first = 0x0}, p_dtrace = 0x0, p_pwait = {cv_description = 0xffffffff80632b87 "ppwait", cv_waiters = 0}} (kgdb) p *(cpuid_to_pcpu[2]->pc_curthread->td_proc->p_limit) Cannot access memory at address 0x0 > 2. do you have any additional patches in your source tree besides those debugging > patches that I provided to you? Yes, in this build we have:- 1. tcp_reass.c-logdebug+missingsegment-20110811-lstewart.patch (fixes tcp stalling) http://people.freebsd.org/~lstewart/patches/misctcp/tcp_reass.c-logdebug%2bmissingsegment-20110811-lstewart.diff 2. libz.patch (disables assembly optimisations in libz as it causes application crashes) 3. udp6_usrreq.c.patch (fixes ipv4 on ipv6 sockets) http://svnweb.freebsd.org/base/head/sys/netinet6/udp6_usrreq.c?r1=220463&r2=220462&pathrev=220463 4. cam-timeout-fix.patch (fixes overflow in cam timeouts) http://codelabs.ru/fbsd/patches/cam/CAM-properly-convert-timeout-to-ticks.diff 5. ixgbe.c.patch & ixgbe.h.patch (fixes ipconfig disconnecting link) 6. stop_scheduler_on_panic.8.x.patch (your first patch) 7. panic-info.patch (your second patch) The only patches of these present when we initially noticed the problem where #2, #3 & #5 (but these machines are not using this driver) > 3. do you have any thirdparty/out-of-tree kernel modules? Nope, our kernel is compiled with a load of drivers disabled and then the following:- device ahci makeoptions MODULES_OVERRIDE="linux linprocfs acpi nullfs unionfs accf_http if_lagg opensolaris zfs ipmi i2c" options COMPAT_LINUX32 options DEVICE_POLLING N.B. although device polling is compiled in its not used on any of these machines. > 4. could you please send me your kernel config? See direct email, as not sure it will go to the list. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-stable@FreeBSD.ORG Wed Aug 17 14:14:39 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8108B106564A for ; Wed, 17 Aug 2011 14:14:39 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230]) by mx1.freebsd.org (Postfix) with ESMTP id EA6C58FC17 for ; Wed, 17 Aug 2011 14:14:38 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.4/8.14.4) with ESMTP id p7HEERj2095723 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Wed, 17 Aug 2011 17:14:33 +0300 (EEST) (envelope-from daniel@digsys.bg) Message-ID: <4E4BCCC3.60601@digsys.bg> Date: Wed, 17 Aug 2011 17:14:27 +0300 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110720 Thunderbird/5.0 MIME-Version: 1.0 To: freebsd-stable@freebsd.org References: <4E4BC38D.1050808@quip.cz> In-Reply-To: <4E4BC38D.1050808@quip.cz> Content-Type: text/plain; charset=ISO-8859-2; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: can not boot from RAIDZ with 8-STABLE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2011 14:14:39 -0000 On 17.08.11 16:35, Miroslav Lachman wrote: > I tried mfsBSD installation on Dell T110 with PERC H200A and 4x 500GB > SATA disks. If I create zpool with RAIDZ, the boot immediately hangs > with following error: > May be it that the BIOS does not see all drives at boot? From owner-freebsd-stable@FreeBSD.ORG Wed Aug 17 17:39:02 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6A7421065672; Wed, 17 Aug 2011 17:39:02 +0000 (UTC) (envelope-from hrs@FreeBSD.org) Received: from mail.allbsd.org (gatekeeper-int.allbsd.org [IPv6:2001:2f0:104:e002::2]) by mx1.freebsd.org (Postfix) with ESMTP id E10CB8FC0A; Wed, 17 Aug 2011 17:39:01 +0000 (UTC) Received: from alph.allbsd.org (p3028-ipbf608funabasi.chiba.ocn.ne.jp [125.175.94.28]) (authenticated bits=128) by mail.allbsd.org (8.14.4/8.14.4) with ESMTP id p7HHcdTb013376; Thu, 18 Aug 2011 02:38:49 +0900 (JST) (envelope-from hrs@FreeBSD.org) Received: from localhost (localhost [IPv6:::1]) (authenticated bits=0) by alph.allbsd.org (8.14.4/8.14.4) with ESMTP id p7HHcaNH039802; Thu, 18 Aug 2011 02:38:38 +0900 (JST) (envelope-from hrs@FreeBSD.org) Date: Thu, 18 Aug 2011 02:38:32 +0900 (JST) Message-Id: <20110818.023832.373949045518579359.hrs@allbsd.org> To: mike@sentex.net From: Hiroki Sato In-Reply-To: <4E15A08C.6090407@sentex.net> References: <20110707082027.GX48734@deviant.kiev.zoral.com.ua> <4E159959.2070401@sentex.net> <4E15A08C.6090407@sentex.net> X-PGPkey-fingerprint: BDB3 443F A5DD B3D0 A530 FFD7 4F2C D3D8 2793 CF2D X-Mailer: Mew version 6.3 on Emacs 23.1 / Mule 6.0 (HANACHIRUSATO) Mime-Version: 1.0 Content-Type: Multipart/Signed; protocol="application/pgp-signature"; micalg=pgp-sha1; boundary="--Security_Multipart(Thu_Aug_18_02_38_32_2011_300)--" Content-Transfer-Encoding: 7bit X-Virus-Scanned: clamav-milter 0.97 at gatekeeper.allbsd.org X-Virus-Status: Clean X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.3 (mail.allbsd.org [133.31.130.32]); Thu, 18 Aug 2011 02:38:54 +0900 (JST) X-Spam-Status: No, score=-102.6 required=13.0 tests=BAYES_00, CONTENT_TYPE_PRESENT,DIRECTOCNDYN,RCVD_IN_RP_RNBL,SPF_SOFTFAIL, USER_IN_WHITELIST autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on gatekeeper.allbsd.org Cc: kostikbel@gmail.com, freebsd-stable@FreeBSD.org, avg@FreeBSD.org Subject: Re: panic: spin lock held too long (RELENG_8 from today) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2011 17:39:02 -0000 ----Security_Multipart(Thu_Aug_18_02_38_32_2011_300)-- Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi, Mike Tancsa wrote in <4E15A08C.6090407@sentex.net>: mi> On 7/7/2011 7:32 AM, Mike Tancsa wrote: mi> > On 7/7/2011 4:20 AM, Kostik Belousov wrote: mi> >> mi> >> BTW, we had a similar panic, "spinlock held too long", the spinlock mi> >> is the sched lock N, on busy 8-core box recently upgraded to the mi> >> stable/8. Unfortunately, machine hung dumping core, so the stack trace mi> >> for the owner thread was not available. mi> >> mi> >> I was unable to make any conclusion from the data that was present. mi> >> If the situation is reproducable, you coulld try to revert r221937. This mi> >> is pure speculation, though. mi> > mi> > Another crash just now after 5hrs uptime. I will try and revert r221937 mi> > unless there is any extra debugging you want me to add to the kernel mi> > instead ? I am also suffering from a reproducible panic on an 8-STABLE box, an NFS server with heavy I/O load. I could not get a kernel dump because this panic locked up the machine just after it occurred, but according to the stack trace it was the same as posted one. Switching to an 8.2R kernel can prevent this panic. Any progress on the investigation? -- spin lock 0xffffffff80cb46c0 (sched lock 0) held by 0xffffff01900458c0 (tid 100489) too long panic: spin lock held too long cpuid = 1 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 panic() at panic+0x187 _mtx_lock_spin_failed() at _mtx_lock_spin_failed+0x39 _mtx_lock_spin() at _mtx_lock_spin+0x9e sched_add() at sched_add+0x117 setrunnable() at setrunnable+0x78 sleepq_signal() at sleepq_signal+0x7a cv_signal() at cv_signal+0x3b xprt_active() at xprt_active+0xe3 svc_vc_soupcall() at svc_vc_soupcall+0xc sowakeup() at sowakeup+0x69 tcp_do_segment() at tcp_do_segment+0x25e7 tcp_input() at tcp_input+0xcdd ip_input() at ip_input+0xac netisr_dispatch_src() at netisr_dispatch_src+0x7e ether_demux() at ether_demux+0x14d ether_input() at ether_input+0x17d em_rxeof() at em_rxeof+0x1ca em_handle_que() at em_handle_que+0x5b taskqueue_run_locked() at taskqueue_run_locked+0x85 taskqueue_thread_loop() at taskqueue_thread_loop+0x4e fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe -- -- Hiroki ----Security_Multipart(Thu_Aug_18_02_38_32_2011_300)-- Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEABECAAYFAk5L/JgACgkQTyzT2CeTzy3bGgCgtnOCsXiCHd7Ghg5RReen9Q4/ FU4AoKIlZkp/sSlduoEme4rspSG7ZQWR =8Yer -----END PGP SIGNATURE----- ----Security_Multipart(Thu_Aug_18_02_38_32_2011_300)---- From owner-freebsd-stable@FreeBSD.ORG Wed Aug 17 17:52:07 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 50E451065670 for ; Wed, 17 Aug 2011 17:52:07 +0000 (UTC) (envelope-from sterling@camdensoftware.com) Received: from wh1.interactivevillages.com (ca.2e.7bae.static.theplanet.com [174.123.46.202]) by mx1.freebsd.org (Postfix) with ESMTP id 1833E8FC1F for ; Wed, 17 Aug 2011 17:52:06 +0000 (UTC) Received: from 184-78-197-203.war.clearwire-wmx.net ([184.78.197.203] helo=_HOSTNAME_) by wh1.interactivevillages.com with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from ) id 1QtkHb-0004li-Lu for freebsd-stable@FreeBSD.org; Wed, 17 Aug 2011 10:51:40 -0700 Received: by _HOSTNAME_ (sSMTP sendmail emulation); Wed, 17 Aug 2011 10:52:01 -0700 Date: Wed, 17 Aug 2011 10:52:01 -0700 From: Chip Camden To: freebsd-stable@FreeBSD.org Message-ID: <20110817175201.GB1973@libertas.local.camdensoftware.com> Mail-Followup-To: freebsd-stable@FreeBSD.org References: <20110707082027.GX48734@deviant.kiev.zoral.com.ua> <4E159959.2070401@sentex.net> <4E15A08C.6090407@sentex.net> <20110818.023832.373949045518579359.hrs@allbsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="ftEhullJWpWg/VHq" Content-Disposition: inline In-Reply-To: <20110818.023832.373949045518579359.hrs@allbsd.org> User-Agent: Mutt/1.4.2.3i Company: Camden Software Consulting URL: http://camdensoftware.com X-PGP-Key: http://pgp.mit.edu:11371/pks/lookup?search=0xD6DBAF91 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - wh1.interactivevillages.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - camdensoftware.com X-Source: X-Source-Args: X-Source-Dir: Cc: Subject: Re: panic: spin lock held too long (RELENG_8 from today) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2011 17:52:07 -0000 --ftEhullJWpWg/VHq Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Quoth Hiroki Sato on Thursday, 18 August 2011: > Hi, >=20 > Mike Tancsa wrote > in <4E15A08C.6090407@sentex.net>: >=20 > mi> On 7/7/2011 7:32 AM, Mike Tancsa wrote: > mi> > On 7/7/2011 4:20 AM, Kostik Belousov wrote: > mi> >> > mi> >> BTW, we had a similar panic, "spinlock held too long", the spinlock > mi> >> is the sched lock N, on busy 8-core box recently upgraded to the > mi> >> stable/8. Unfortunately, machine hung dumping core, so the stack t= race > mi> >> for the owner thread was not available. > mi> >> > mi> >> I was unable to make any conclusion from the data that was present. > mi> >> If the situation is reproducable, you coulld try to revert r221937= . This > mi> >> is pure speculation, though. > mi> > > mi> > Another crash just now after 5hrs uptime. I will try and revert r22= 1937 > mi> > unless there is any extra debugging you want me to add to the kernel > mi> > instead ? >=20 > I am also suffering from a reproducible panic on an 8-STABLE box, an > NFS server with heavy I/O load. I could not get a kernel dump > because this panic locked up the machine just after it occurred, but > according to the stack trace it was the same as posted one. > Switching to an 8.2R kernel can prevent this panic. >=20 > Any progress on the investigation? >=20 > -- > spin lock 0xffffffff80cb46c0 (sched lock 0) held by 0xffffff01900458c0 (t= id 100489) too long > panic: spin lock held too long > cpuid =3D 1 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > kdb_backtrace() at kdb_backtrace+0x37 > panic() at panic+0x187 > _mtx_lock_spin_failed() at _mtx_lock_spin_failed+0x39 > _mtx_lock_spin() at _mtx_lock_spin+0x9e > sched_add() at sched_add+0x117 > setrunnable() at setrunnable+0x78 > sleepq_signal() at sleepq_signal+0x7a > cv_signal() at cv_signal+0x3b > xprt_active() at xprt_active+0xe3 > svc_vc_soupcall() at svc_vc_soupcall+0xc > sowakeup() at sowakeup+0x69 > tcp_do_segment() at tcp_do_segment+0x25e7 > tcp_input() at tcp_input+0xcdd > ip_input() at ip_input+0xac > netisr_dispatch_src() at netisr_dispatch_src+0x7e > ether_demux() at ether_demux+0x14d > ether_input() at ether_input+0x17d > em_rxeof() at em_rxeof+0x1ca > em_handle_que() at em_handle_que+0x5b > taskqueue_run_locked() at taskqueue_run_locked+0x85 > taskqueue_thread_loop() at taskqueue_thread_loop+0x4e > fork_exit() at fork_exit+0x11f > fork_trampoline() at fork_trampoline+0xe > -- >=20 > -- Hiroki I'm also getting similar panics on 8.2-STABLE. Locks up everything and I have to power off. Once, I happened to be looking at the console when it happened and copied dow the following: Sleeping thread (tif 100037, pid 0) owns a non-sleepable lock panic: sleeping thread cpuid=3D1 Another time I got: lock order reversal: 1st 0xffffff000593e330 snaplk (snaplk) @ /usr/src/sys/kern/vfr_vnops.c:296 2nd 0xffffff0005e5d578 ufs (ufs) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:1587 I didn't copy down the traceback. These panics seem to hit when I'm doing heavy WAN I/O. I can go for about a day without one as long as I stay away from the web or even chat. Last night this system copied a backup of 35GB over the local network without failing, but as soon as I hopped onto Firefox this morning, down she went. I don't know if that's coincidence or useful data. I didn't get to say "Thanks" to Eitan Adler for attempting to help me with this on Monday night. Thanks, Eitan! --=20 =2EO. | Sterling (Chip) Camden | http://camdensoftware.com =2E.O | sterling@camdensoftware.com | http://chipsquips.com OOO | 2048R/D6DBAF91 | http://chipstips.com --ftEhullJWpWg/VHq Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iQEcBAEBAgAGBQJOS//BAAoJEIpckszW26+Rb80H/3/7eQlINeIaoLUz6iE2dSG8 /7Eoyt87VSs1H8XUYVPD+tiYXFgvpz6zu49zkXTcNwS/kwgJjzMHngEeY3eKom8v 6iaWilwe12nrkDOdkJZXB4kml6WTa71VkAlpC0hUJHuPD+trriZfSdJKDBwOXaA/ rJzp25k0TZU+BlJQJr3eXGPP1L/KjxSPLbIeowGWpV7ZPcRQRm3JerAGcn3f38ud PR4cBwVKHcPYzLm8ZAQLL99QJy5ZqyTWjLVE16Erc2AUyD1coURH2X6w3JtJ4mQ2 YBQhdREV1tchj/mvM30b/xnozcjTZuHDOoXpZgGPxKAQqDRG3Y7FG5jc33yELjg= =OoPD -----END PGP SIGNATURE----- --ftEhullJWpWg/VHq-- From owner-freebsd-stable@FreeBSD.ORG Wed Aug 17 18:26:15 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6E09B106566C; Wed, 17 Aug 2011 18:26:15 +0000 (UTC) (envelope-from mike@sentex.net) Received: from smarthost1.sentex.ca (smarthost1-6.sentex.ca [IPv6:2607:f3e0:0:1::12]) by mx1.freebsd.org (Postfix) with ESMTP id 17CA88FC13; Wed, 17 Aug 2011 18:26:15 +0000 (UTC) Received: from [IPv6:2607:f3e0:0:4:f025:8813:7603:7e4a] (saphire3.sentex.ca [IPv6:2607:f3e0:0:4:f025:8813:7603:7e4a]) by smarthost1.sentex.ca (8.14.4/8.14.4) with ESMTP id p7HIQBwZ025744; Wed, 17 Aug 2011 14:26:11 -0400 (EDT) (envelope-from mike@sentex.net) Message-ID: <4E4C07C8.9090909@sentex.net> Date: Wed, 17 Aug 2011 14:26:16 -0400 From: Mike Tancsa Organization: Sentex Communications User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7 MIME-Version: 1.0 To: Hiroki Sato References: <20110707082027.GX48734@deviant.kiev.zoral.com.ua> <4E159959.2070401@sentex.net> <4E15A08C.6090407@sentex.net> <20110818.023832.373949045518579359.hrs@allbsd.org> In-Reply-To: <20110818.023832.373949045518579359.hrs@allbsd.org> X-Enigmail-Version: 1.1.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.71 on IPv6:2607:f3e0:0:1::12 Cc: kostikbel@gmail.com, freebsd-stable@FreeBSD.org, avg@FreeBSD.org Subject: Re: panic: spin lock held too long (RELENG_8 from today) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2011 18:26:15 -0000 On 8/17/2011 1:38 PM, Hiroki Sato wrote: > Any progress on the investigation? Unfortunately, I cannot reproduce it yet with a debugging kernel :( ---Mike > > -- > spin lock 0xffffffff80cb46c0 (sched lock 0) held by 0xffffff01900458c0 (tid 100489) too long > panic: spin lock held too long > cpuid = 1 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > kdb_backtrace() at kdb_backtrace+0x37 > panic() at panic+0x187 > _mtx_lock_spin_failed() at _mtx_lock_spin_failed+0x39 > _mtx_lock_spin() at _mtx_lock_spin+0x9e > sched_add() at sched_add+0x117 > setrunnable() at setrunnable+0x78 > sleepq_signal() at sleepq_signal+0x7a > cv_signal() at cv_signal+0x3b > xprt_active() at xprt_active+0xe3 > svc_vc_soupcall() at svc_vc_soupcall+0xc > sowakeup() at sowakeup+0x69 > tcp_do_segment() at tcp_do_segment+0x25e7 > tcp_input() at tcp_input+0xcdd > ip_input() at ip_input+0xac > netisr_dispatch_src() at netisr_dispatch_src+0x7e > ether_demux() at ether_demux+0x14d > ether_input() at ether_input+0x17d > em_rxeof() at em_rxeof+0x1ca > em_handle_que() at em_handle_que+0x5b > taskqueue_run_locked() at taskqueue_run_locked+0x85 > taskqueue_thread_loop() at taskqueue_thread_loop+0x4e > fork_exit() at fork_exit+0x11f > fork_trampoline() at fork_trampoline+0xe > -- > > -- Hiroki -- ------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/ From owner-freebsd-stable@FreeBSD.ORG Wed Aug 17 18:37:02 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 71F11106566B; Wed, 17 Aug 2011 18:37:02 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com [209.85.213.182]) by mx1.freebsd.org (Postfix) with ESMTP id 0C67F8FC18; Wed, 17 Aug 2011 18:37:01 +0000 (UTC) Received: by yxn22 with SMTP id 22so46128yxn.13 for ; Wed, 17 Aug 2011 11:37:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=trAAOYcmYpZlnYgok46AMUtjvZHFC2k++U47mGVVj2U=; b=sqUGJAaE4KDjM/3jS82nUBzMIjgmSNj8I0ZNcNjFyakDCmUxDTnN3Kf5tjoJuOYfoE 7ZWafvuExXnIBNA+PqhF8NQg5qyTSTF+/ta2Bg0nLy37QYvJswGp7npoS+BAJgNE2+O9 dLCECfFvvihEYnE93cW0AuajGmL6EESACfJU8= MIME-Version: 1.0 Received: by 10.236.75.228 with SMTP id z64mr4420989yhd.68.1313606221377; Wed, 17 Aug 2011 11:37:01 -0700 (PDT) Sender: asmrookie@gmail.com Received: by 10.236.108.33 with HTTP; Wed, 17 Aug 2011 11:37:01 -0700 (PDT) In-Reply-To: <20110818.023832.373949045518579359.hrs@allbsd.org> References: <20110707082027.GX48734@deviant.kiev.zoral.com.ua> <4E159959.2070401@sentex.net> <4E15A08C.6090407@sentex.net> <20110818.023832.373949045518579359.hrs@allbsd.org> Date: Wed, 17 Aug 2011 20:37:01 +0200 X-Google-Sender-Auth: teH8Tr77CO5VlnvGwZYAJJyguaI Message-ID: From: Attilio Rao To: Hiroki Sato Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: kostikbel@gmail.com, freebsd-stable@freebsd.org, avg@freebsd.org Subject: Re: panic: spin lock held too long (RELENG_8 from today) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2011 18:37:02 -0000 2011/8/17 Hiroki Sato : > Hi, > > Mike Tancsa wrote > =C2=A0in <4E15A08C.6090407@sentex.net>: > > mi> On 7/7/2011 7:32 AM, Mike Tancsa wrote: > mi> > On 7/7/2011 4:20 AM, Kostik Belousov wrote: > mi> >> > mi> >> BTW, we had a similar panic, "spinlock held too long", the spinloc= k > mi> >> is the sched lock N, on busy 8-core box recently upgraded to the > mi> >> stable/8. Unfortunately, machine hung dumping core, so the stack t= race > mi> >> for the owner thread was not available. > mi> >> > mi> >> I was unable to make any conclusion from the data that was present= . > mi> >> If the situation is reproducable, you coulld try to revert r221937= . This > mi> >> is pure speculation, though. > mi> > > mi> > Another crash just now after 5hrs uptime. I will try and revert r22= 1937 > mi> > unless there is any extra debugging you want me to add to the kerne= l > mi> > instead =C2=A0? > > =C2=A0I am also suffering from a reproducible panic on an 8-STABLE box, a= n > =C2=A0NFS server with heavy I/O load. =C2=A0I could not get a kernel dump > =C2=A0because this panic locked up the machine just after it occurred, bu= t > =C2=A0according to the stack trace it was the same as posted one. > =C2=A0Switching to an 8.2R kernel can prevent this panic. > > =C2=A0Any progress on the investigation? Hiroki, how easilly can you reproduce it? It would be important to have a DDB textdump with these informations: - bt - ps - show allpcpu - alltrace Alternatively, a coredump which has the stop cpu patch which Andryi can pro= vide. Thanks, Attilio --=20 Peace can only be achieved by understanding - A. Einstein From owner-freebsd-stable@FreeBSD.ORG Wed Aug 17 18:57:30 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DF24C1065670 for ; Wed, 17 Aug 2011 18:57:30 +0000 (UTC) (envelope-from artemb@gmail.com) Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id 7A3138FC17 for ; Wed, 17 Aug 2011 18:57:30 +0000 (UTC) Received: by wyh15 with SMTP id 15so1128899wyh.13 for ; Wed, 17 Aug 2011 11:57:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=tSOqYllt8iTeFJUjPd66NgJ7HSw32N+s/TlwTImshDQ=; b=kH6UOyC6Q+S8sGdk4gxQHLTPwDxemMuUsMK48DmIUyWdCKY0KI0QVu8fLwEtNCjqkO k/kk8pSBT/0DMTN+vlkiiMg0UgO4uV0xJp187T6CKX6al9B3IeAKzLtFb13FKzGFhAMi IrtU+AX4QXAxKKCgez7NhBYGZlZ1XlniCr8Bo= MIME-Version: 1.0 Received: by 10.217.6.81 with SMTP id x59mr1160358wes.50.1313607449443; Wed, 17 Aug 2011 11:57:29 -0700 (PDT) Sender: artemb@gmail.com Received: by 10.216.181.210 with HTTP; Wed, 17 Aug 2011 11:57:29 -0700 (PDT) In-Reply-To: <4E4BCCC3.60601@digsys.bg> References: <4E4BC38D.1050808@quip.cz> <4E4BCCC3.60601@digsys.bg> Date: Wed, 17 Aug 2011 11:57:29 -0700 X-Google-Sender-Auth: wjA8PTrtbTVbXuJtpIlzsr1lheA Message-ID: From: Artem Belevich To: Miroslav Lachman <000.fbsd@quip.cz> Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-stable@freebsd.org, Daniel Kalchev Subject: Re: can not boot from RAIDZ with 8-STABLE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2011 18:57:30 -0000 2011/8/17 Daniel Kalchev : > On 17.08.11 16:35, Miroslav Lachman wrote: >> >> I tried mfsBSD installation on Dell T110 with PERC H200A and 4x 500GB SATA >> disks. If I create zpool with RAIDZ, the boot immediately hangs with >> following error: >> > May be it that the BIOS does not see all drives at boot? Indeed. On one of my systems BIOS only allows access to the first four HDDs in the BIOS' boot priority list. What's especially annoying is that BIOS keep rearranging boot list every time new device is added or removed or if SATA controller card is moved to another slot. Every time it happens I have to go back and rearrange the drives so that my RAIDZ drives are on top of the list. If you can boot off CD or USB how many drives does bootloader report just before it gets to the menu? --Artem From owner-freebsd-stable@FreeBSD.ORG Wed Aug 17 19:40:57 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E7A4F1065670; Wed, 17 Aug 2011 19:40:56 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from elsa.codelab.cz (elsa.codelab.cz [94.124.105.4]) by mx1.freebsd.org (Postfix) with ESMTP id 71DD28FC08; Wed, 17 Aug 2011 19:40:56 +0000 (UTC) Received: from elsa.codelab.cz (localhost [127.0.0.1]) by elsa.codelab.cz (Postfix) with ESMTP id 3C9B828427; Wed, 17 Aug 2011 21:40:55 +0200 (CEST) Received: from [192.168.1.2] (ip-86-49-61-235.net.upcbroadband.cz [86.49.61.235]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by elsa.codelab.cz (Postfix) with ESMTPSA id 64D5A28424; Wed, 17 Aug 2011 21:40:54 +0200 (CEST) Message-ID: <4E4C1945.5030504@quip.cz> Date: Wed, 17 Aug 2011 21:40:53 +0200 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.1.19) Gecko/20110420 Lightning/1.0b1 SeaMonkey/2.0.14 MIME-Version: 1.0 To: Artem Belevich References: <4E4BC38D.1050808@quip.cz> <4E4BCCC3.60601@digsys.bg> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-stable@freebsd.org, Daniel Kalchev Subject: Re: can not boot from RAIDZ with 8-STABLE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2011 19:40:57 -0000 Artem Belevich wrote: > 2011/8/17 Daniel Kalchev: >> On 17.08.11 16:35, Miroslav Lachman wrote: >>> >>> I tried mfsBSD installation on Dell T110 with PERC H200A and 4x 500GB SATA >>> disks. If I create zpool with RAIDZ, the boot immediately hangs with >>> following error: >>> >> May be it that the BIOS does not see all drives at boot? > > Indeed. On one of my systems BIOS only allows access to the first four > HDDs in the BIOS' boot priority list. What's especially annoying is > that BIOS keep rearranging boot list every time new device is added or > removed or if SATA controller card is moved to another slot. Every > time it happens I have to go back and rearrange the drives so that my > RAIDZ drives are on top of the list. > > If you can boot off CD or USB how many drives does bootloader report > just before it gets to the menu? Thank you guys, you are right. The BIOS provides only 1 disk to the loader! I checked it from loader prompt by lsdev (booted from USB external HDD). So I will try to make a small zpool mirror for root and boot (if ZFS mirror can be made of 4 providers instead of two) and the rest will be in RAIDZ. If that fails, I will go my old way with internal USB flash disk with UFS for booting and RAIDZ of 4 disks for storage as I did it few years ago with 7.0 or 7.1. Thank you again! Miroslav Lachman From owner-freebsd-stable@FreeBSD.ORG Wed Aug 17 19:44:37 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 31279106564A; Wed, 17 Aug 2011 19:44:37 +0000 (UTC) (envelope-from hrs@FreeBSD.org) Received: from mail.allbsd.org (gatekeeper-int.allbsd.org [IPv6:2001:2f0:104:e002::2]) by mx1.freebsd.org (Postfix) with ESMTP id A08688FC17; Wed, 17 Aug 2011 19:44:36 +0000 (UTC) Received: from alph.allbsd.org (p3028-ipbf608funabasi.chiba.ocn.ne.jp [125.175.94.28]) (authenticated bits=128) by mail.allbsd.org (8.14.4/8.14.4) with ESMTP id p7HJiEp5092450; Thu, 18 Aug 2011 04:44:24 +0900 (JST) (envelope-from hrs@FreeBSD.org) Received: from localhost (localhost [IPv6:::1]) (authenticated bits=0) by alph.allbsd.org (8.14.4/8.14.4) with ESMTP id p7HJiAHT041636; Thu, 18 Aug 2011 04:44:12 +0900 (JST) (envelope-from hrs@FreeBSD.org) Date: Thu, 18 Aug 2011 04:33:32 +0900 (JST) Message-Id: <20110818.043332.27079545013461535.hrs@allbsd.org> To: attilio@FreeBSD.org From: Hiroki Sato In-Reply-To: References: <4E15A08C.6090407@sentex.net> <20110818.023832.373949045518579359.hrs@allbsd.org> X-PGPkey-fingerprint: BDB3 443F A5DD B3D0 A530 FFD7 4F2C D3D8 2793 CF2D X-Mailer: Mew version 6.3 on Emacs 23.1 / Mule 6.0 (HANACHIRUSATO) Mime-Version: 1.0 Content-Type: Multipart/Signed; protocol="application/pgp-signature"; micalg=pgp-sha1; boundary="--Security_Multipart(Thu_Aug_18_04_33_32_2011_840)--" Content-Transfer-Encoding: 7bit X-Virus-Scanned: clamav-milter 0.97 at gatekeeper.allbsd.org X-Virus-Status: Clean X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.3 (mail.allbsd.org [133.31.130.32]); Thu, 18 Aug 2011 04:44:29 +0900 (JST) X-Spam-Status: No, score=-102.2 required=13.0 tests=BAYES_00, CONTENT_TYPE_PRESENT,DIRECTOCNDYN,MIMEQENC,QENCPTR2,RCVD_IN_RP_RNBL, SPF_SOFTFAIL,USER_IN_WHITELIST autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on gatekeeper.allbsd.org Cc: kostikbel@gmail.com, freebsd-stable@FreeBSD.org, avg@FreeBSD.org Subject: Re: panic: spin lock held too long (RELENG_8 from today) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2011 19:44:37 -0000 ----Security_Multipart(Thu_Aug_18_04_33_32_2011_840)-- Content-Type: Text/Plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Attilio Rao wrote in : at> 2011/8/17 Hiroki Sato : at> > Hi, at> > at> > Mike Tancsa wrote at> > =A0in <4E15A08C.6090407@sentex.net>: at> > at> > mi> On 7/7/2011 7:32 AM, Mike Tancsa wrote: at> > mi> > On 7/7/2011 4:20 AM, Kostik Belousov wrote: at> > mi> >> at> > mi> >> BTW, we had a similar panic, "spinlock held too long", the= spinlock at> > mi> >> is the sched lock N, on busy 8-core box recently upgraded = to the at> > mi> >> stable/8. Unfortunately, machine hung dumping core, so the= stack trace at> > mi> >> for the owner thread was not available. at> > mi> >> at> > mi> >> I was unable to make any conclusion from the data that was= present. at> > mi> >> If the situation is reproducable, you coulld try to revert= r221937. This at> > mi> >> is pure speculation, though. at> > mi> > at> > mi> > Another crash just now after 5hrs uptime. I will try and re= vert r221937 at> > mi> > unless there is any extra debugging you want me to add to t= he kernel at> > mi> > instead =A0? at> > at> > =A0I am also suffering from a reproducible panic on an 8-STABLE b= ox, an at> > =A0NFS server with heavy I/O load. =A0I could not get a kernel du= mp at> > =A0because this panic locked up the machine just after it occurre= d, but at> > =A0according to the stack trace it was the same as posted one. at> > =A0Switching to an 8.2R kernel can prevent this panic. at> > at> > =A0Any progress on the investigation? at> = at> Hiroki, at> how easilly can you reproduce it? It takes 5-10 hours. I installed another kernel for debugging just now, so I think I will be able to collect more detail information in a couple of days. at> It would be important to have a DDB textdump with these information= s: at> - bt at> - ps at> - show allpcpu at> - alltrace at> = at> Alternatively, a coredump which has the stop cpu patch which Andryi= can provide. Okay, I will post them once I can get another panic. Thanks! -- Hiroki ----Security_Multipart(Thu_Aug_18_04_33_32_2011_840)-- Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEABECAAYFAk5MF4wACgkQTyzT2CeTzy0Z6gCgluxIPrG308LTbGGysww6wQ4R 4TsAnj2fiZoQOXYk0jycI9e3TPKTFcpy =lTzB -----END PGP SIGNATURE----- ----Security_Multipart(Thu_Aug_18_04_33_32_2011_840)---- From owner-freebsd-stable@FreeBSD.ORG Wed Aug 17 20:21:48 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DA4E8106567A; Wed, 17 Aug 2011 20:21:47 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 339038FC20; Wed, 17 Aug 2011 20:21:45 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA00935; Wed, 17 Aug 2011 23:21:43 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1QtmcR-000Du9-7E; Wed, 17 Aug 2011 23:21:43 +0300 Message-ID: <4E4C22D6.6070407@FreeBSD.org> Date: Wed, 17 Aug 2011 23:21:42 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:6.0) Gecko/20110817 Thunderbird/6.0 MIME-Version: 1.0 To: freebsd-jail@FreeBSD.org, freebsd-hackers@FreeBSD.org References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org> <4E43E272.1060204@FreeBSD.org> <62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk> <4E440865.1040500@FreeBSD.org> <6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk> <4E441314.6060606@FreeBSD.org> <2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk> <4E48D967.9060804@FreeBSD.org> <9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk> <4E490DAF.1080009@FreeBSD.org> <796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk> <4E491D01.1090902@FreeBSD.org> <570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk> <4E4AD35C.7020504@FreeBSD.org> <6A7238AED44542A880B082A40304D940@multiplay.co.uk> <4E4BA21F.6010805@FreeBSD.org> <581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk> <4E4BBA7F.30907@FreeBSD.org> <88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk> In-Reply-To: <88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk> X-Enigmail-Version: 1.2.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-stable@FreeBSD.org, Steven Hartland Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2011 20:21:48 -0000 Thanks to the debug that Steven provided and to the help that I received from Kostik, I think that now I understand the basic mechanics of this panic, but, unfortunately, not the details of its root cause. It seems like everything starts with some kind of a race between terminating processes in a jail and termination of the jail itself. This is where the details are very thin so far. What we see is that a process (http) is in exit(2) syscall, in exit1() function actually, and past the place where P_WEXIT flag is set and even past the place where p_limit is freed and reset to NULL. At that place the thread calls prison_proc_free(), which calls prison_deref(). Then, we see that in prison_deref() the thread gets a page fault because of what seems like a NULL pointer dereference. That's just the start of the problem and its root cause. Then, trap_pfault() gets invoked and, because addresses close to NULL look like userspace addresses, vm_fault/vm_fault_hold gets called, which in its turn goes on to call vm_map_growstack. First thing that vm_map_growstack does is a call to lim_cur(), but because p_limit is already NULL, that call results in a NULL pointer dereference and a page fault. Goto the beginning of this paragraph. So we get this recursion of sorts, which only ends when a stack is exhausted and a CPU generates a double-fault. So, of course, Steven is interested in finding and fixing the root cause. I hope we will get to that with some help from the "prison guards" :-) But I also would like to use this opportunity to discuss how we can make it easier to debug such issue as this. I think that this problem demonstrates that when we treat certain junk in kernel address value as a userland address value, we throw additional heaps of irrelevant stuff on top of an actual problem. One solution could be to use a special flag that would mark all actual attempts to access userland address (e.g. setting the flag on entrance to copyin and clearing it upon return), so that in the page fault handler we could distinguish actual faults on userland addresses from faults on garbage kernel addresses. I am sure that there could be other clever techniques to catch such garbage addresses early. -- Andriy Gapon From owner-freebsd-stable@FreeBSD.ORG Wed Aug 17 21:04:49 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3B180106566C for ; Wed, 17 Aug 2011 21:04:49 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta07.emeryville.ca.mail.comcast.net (qmta07.emeryville.ca.mail.comcast.net [76.96.30.64]) by mx1.freebsd.org (Postfix) with ESMTP id 229B48FC0C for ; Wed, 17 Aug 2011 21:04:48 +0000 (UTC) Received: from omta18.emeryville.ca.mail.comcast.net ([76.96.30.74]) by qmta07.emeryville.ca.mail.comcast.net with comcast id MZ471h00C1bwxycA7Z4kZc; Wed, 17 Aug 2011 21:04:44 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta18.emeryville.ca.mail.comcast.net with comcast id MZ4M1h00u1t3BNj8eZ4NGT; Wed, 17 Aug 2011 21:04:22 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 04CF4102C1A; Wed, 17 Aug 2011 14:04:47 -0700 (PDT) Date: Wed, 17 Aug 2011 14:04:47 -0700 From: Jeremy Chadwick To: freebsd-stable@FreeBSD.org Message-ID: <20110817210446.GA49737@icarus.home.lan> References: <20110707082027.GX48734@deviant.kiev.zoral.com.ua> <4E159959.2070401@sentex.net> <4E15A08C.6090407@sentex.net> <20110818.023832.373949045518579359.hrs@allbsd.org> <20110817175201.GB1973@libertas.local.camdensoftware.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110817175201.GB1973@libertas.local.camdensoftware.com> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Subject: Re: panic: spin lock held too long (RELENG_8 from today) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2011 21:04:49 -0000 On Wed, Aug 17, 2011 at 10:52:01AM -0700, Chip Camden wrote: > Quoth Hiroki Sato on Thursday, 18 August 2011: > > Hi, > > > > Mike Tancsa wrote > > in <4E15A08C.6090407@sentex.net>: > > > > mi> On 7/7/2011 7:32 AM, Mike Tancsa wrote: > > mi> > On 7/7/2011 4:20 AM, Kostik Belousov wrote: > > mi> >> > > mi> >> BTW, we had a similar panic, "spinlock held too long", the spinlock > > mi> >> is the sched lock N, on busy 8-core box recently upgraded to the > > mi> >> stable/8. Unfortunately, machine hung dumping core, so the stack trace > > mi> >> for the owner thread was not available. > > mi> >> > > mi> >> I was unable to make any conclusion from the data that was present. > > mi> >> If the situation is reproducable, you coulld try to revert r221937. This > > mi> >> is pure speculation, though. > > mi> > > > mi> > Another crash just now after 5hrs uptime. I will try and revert r221937 > > mi> > unless there is any extra debugging you want me to add to the kernel > > mi> > instead ? > > > > I am also suffering from a reproducible panic on an 8-STABLE box, an > > NFS server with heavy I/O load. I could not get a kernel dump > > because this panic locked up the machine just after it occurred, but > > according to the stack trace it was the same as posted one. > > Switching to an 8.2R kernel can prevent this panic. > > > > Any progress on the investigation? > > > > -- > > spin lock 0xffffffff80cb46c0 (sched lock 0) held by 0xffffff01900458c0 (tid 100489) too long > > panic: spin lock held too long > > cpuid = 1 > > KDB: stack backtrace: > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > > kdb_backtrace() at kdb_backtrace+0x37 > > panic() at panic+0x187 > > _mtx_lock_spin_failed() at _mtx_lock_spin_failed+0x39 > > _mtx_lock_spin() at _mtx_lock_spin+0x9e > > sched_add() at sched_add+0x117 > > setrunnable() at setrunnable+0x78 > > sleepq_signal() at sleepq_signal+0x7a > > cv_signal() at cv_signal+0x3b > > xprt_active() at xprt_active+0xe3 > > svc_vc_soupcall() at svc_vc_soupcall+0xc > > sowakeup() at sowakeup+0x69 > > tcp_do_segment() at tcp_do_segment+0x25e7 > > tcp_input() at tcp_input+0xcdd > > ip_input() at ip_input+0xac > > netisr_dispatch_src() at netisr_dispatch_src+0x7e > > ether_demux() at ether_demux+0x14d > > ether_input() at ether_input+0x17d > > em_rxeof() at em_rxeof+0x1ca > > em_handle_que() at em_handle_que+0x5b > > taskqueue_run_locked() at taskqueue_run_locked+0x85 > > taskqueue_thread_loop() at taskqueue_thread_loop+0x4e > > fork_exit() at fork_exit+0x11f > > fork_trampoline() at fork_trampoline+0xe > > -- > > > > -- Hiroki > > > I'm also getting similar panics on 8.2-STABLE. Locks up everything and I > have to power off. Once, I happened to be looking at the console when it > happened and copied dow the following: > > Sleeping thread (tif 100037, pid 0) owns a non-sleepable lock > panic: sleeping thread > cpuid=1 No idea, might be relevant to the thread. > Another time I got: > > lock order reversal: > 1st 0xffffff000593e330 snaplk (snaplk) @ /usr/src/sys/kern/vfr_vnops.c:296 > 2nd 0xffffff0005e5d578 ufs (ufs) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:1587 > > I didn't copy down the traceback. "snaplk" refers to UFS snapshots. The above must have been typed in manually as well, due to some typos in filenames as well. Either this is a different problem, or if everyone in this thread is doing UFS snapshots (dump -L, mksnap_ffs, etc.) and having this problem happen then I recommend people stop using UFS snapshots. I've ranted about their unreliability in the past (years upon years ago -- still seems valid) and just how badly they can "wedge" a system. This is one of the many (MANY!) reasons why we use rsnapshot/rsync instead. The atime clobbering issue is the only downside. I don't see what this has to do with "heavy WAN I/O" unless you're doing something like dump-over-ssh, in which case see the above paragraph. > These panics seem to hit when I'm doing heavy WAN I/O. I can go for > about a day without one as long as I stay away from the web or even chat. > Last night this system copied a backup of 35GB over the local network > without failing, but as soon as I hopped onto Firefox this morning, down > she went. I don't know if that's coincidence or useful data. > > I didn't get to say "Thanks" to Eitan Adler for attempting to help me > with this on Monday night. Thanks, Eitan! -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-stable@FreeBSD.ORG Wed Aug 17 21:10:54 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A8105106566C; Wed, 17 Aug 2011 21:10:54 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 454A08FC16; Wed, 17 Aug 2011 21:10:53 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p7HLAnic075382 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 18 Aug 2011 00:10:49 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id p7HLAm2E007269; Thu, 18 Aug 2011 00:10:48 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p7HLAmDw007268; Thu, 18 Aug 2011 00:10:48 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 18 Aug 2011 00:10:48 +0300 From: Kostik Belousov To: Andriy Gapon Message-ID: <20110817211048.GZ17489@deviant.kiev.zoral.com.ua> References: <796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk> <4E491D01.1090902@FreeBSD.org> <570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk> <4E4AD35C.7020504@FreeBSD.org> <6A7238AED44542A880B082A40304D940@multiplay.co.uk> <4E4BA21F.6010805@FreeBSD.org> <581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk> <4E4BBA7F.30907@FreeBSD.org> <88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk> <4E4C22D6.6070407@FreeBSD.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="MPHowW9WJBu+8Ajw" Content-Disposition: inline In-Reply-To: <4E4C22D6.6070407@FreeBSD.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-hackers@freebsd.org, freebsd-jail@freebsd.org, Steven Hartland , freebsd-stable@freebsd.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2011 21:10:54 -0000 --MPHowW9WJBu+8Ajw Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Aug 17, 2011 at 11:21:42PM +0300, Andriy Gapon wrote: [skip] > But I also would like to use this opportunity to discuss how we can > make it easier to debug such issue as this. I think that this problem > demonstrates that when we treat certain junk in kernel address value > as a userland address value, we throw additional heaps of irrelevant > stuff on top of an actual problem. One solution could be to use a > special flag that would mark all actual attempts to access userland > address (e.g. setting the flag on entrance to copyin and clearing it > upon return), so that in the page fault handler we could distinguish > actual faults on userland addresses from faults on garbage kernel > addresses. I am sure that there could be other clever techniques to > catch such garbage addresses early. We already have such mechanism, the kernel code aware of the usermode page access sets pcb_onfault. See the end of trap_pfault() handler. In fact, we can catch it earlier, before even calling vm_fault(). BTW, I think this is esp. useful in the combination with the support for the SMEP in recent Intel CPUs. commit 2e1b36fa93f9499e37acf04a66ff0646d4f13536 Author: Konstantin Belousov Date: Thu Aug 18 00:08:50 2011 +0300 Assert that the exiting process does not return to usermode. On x86, do not call vm_fault() when the kernel is not prepared to handle unsuccessful page fault. diff --git a/sys/amd64/amd64/trap.c b/sys/amd64/amd64/trap.c index 4e5f8b8..55e1e5a 100644 --- a/sys/amd64/amd64/trap.c +++ b/sys/amd64/amd64/trap.c @@ -674,6 +674,19 @@ trap_pfault(frame, usermode) goto nogo; =20 map =3D &vm->vm_map; + + /* + * When accessing a usermode address, kernel must be + * ready to accept the page fault, and provide a + * handling routine. Since accessing the address + * without the handler is a bug, do not try to handle + * it normally, and panic immediately. + */ + if (!usermode && (td->td_intr_nesting_level !=3D 0 || + PCPU_GET(curpcb)->pcb_onfault =3D=3D NULL)) { + trap_fatal(frame, eva); + return (-1); + } } =20 /* diff --git a/sys/i386/i386/trap.c b/sys/i386/i386/trap.c index 5a8016c..e6d2b5a 100644 --- a/sys/i386/i386/trap.c +++ b/sys/i386/i386/trap.c @@ -831,6 +831,11 @@ trap_pfault(frame, usermode, eva) goto nogo; =20 map =3D &vm->vm_map; + if (!usermode && (td->td_intr_nesting_level !=3D 0 || + PCPU_GET(curpcb)->pcb_onfault =3D=3D NULL)) { + trap_fatal(frame, eva); + return (-1); + } } =20 /* diff --git a/sys/kern/subr_trap.c b/sys/kern/subr_trap.c index 3527ed1..a69b7b8 100644 --- a/sys/kern/subr_trap.c +++ b/sys/kern/subr_trap.c @@ -99,6 +99,8 @@ userret(struct thread *td, struct trapframe *frame) =20 CTR3(KTR_SYSC, "userret: thread %p (pid %d, %s)", td, p->p_pid, td->td_name); + KASSERT((p->p_flag & P_WEXIT) =3D=3D 0, + ("Exiting process returns to usermode")); #if 0 #ifdef DIAGNOSTIC /* Check that we called signotify() enough. */ --MPHowW9WJBu+8Ajw Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk5MLlgACgkQC3+MBN1Mb4hyewCgpKYy+yhG+S3bXm5A324n/C8+ 6lIAoPRTszmVWdyBQqw5vhJUnpNbhluY =i6E1 -----END PGP SIGNATURE----- --MPHowW9WJBu+8Ajw-- From owner-freebsd-stable@FreeBSD.ORG Wed Aug 17 21:46:37 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2EBCB106566B for ; Wed, 17 Aug 2011 21:46:37 +0000 (UTC) (envelope-from artemb@gmail.com) Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id BB7528FC19 for ; Wed, 17 Aug 2011 21:46:36 +0000 (UTC) Received: by wyh15 with SMTP id 15so1247555wyh.13 for ; Wed, 17 Aug 2011 14:46:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=SxOl6JX6C4cq52TyTWQht8ogs0dbX150pNhutlfhSRI=; b=eJuWHmH06uzORVpuR+HXzyvLGFNUkLS5M9faoZ+SMhE6h699WVL0aZD+u/AztH1QLO 2Bx132Xr5ke8wXqhwCfkGSSI32ZCUqlho+VUxpNKN1fp6Xruka3pr135VWJfb0UoX2Wt mKzQfln3O0YozCLeLons0SLs3uB0c4U3wTR9A= MIME-Version: 1.0 Received: by 10.216.90.19 with SMTP id d19mr4769374wef.35.1313617595556; Wed, 17 Aug 2011 14:46:35 -0700 (PDT) Sender: artemb@gmail.com Received: by 10.216.181.210 with HTTP; Wed, 17 Aug 2011 14:46:35 -0700 (PDT) In-Reply-To: <4E4C1945.5030504@quip.cz> References: <4E4BC38D.1050808@quip.cz> <4E4BCCC3.60601@digsys.bg> <4E4C1945.5030504@quip.cz> Date: Wed, 17 Aug 2011 14:46:35 -0700 X-Google-Sender-Auth: YKhN0SAOFEDL08GS1o1N-lzjkUk Message-ID: From: Artem Belevich To: Miroslav Lachman <000.fbsd@quip.cz> Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-stable@freebsd.org, Daniel Kalchev Subject: Re: can not boot from RAIDZ with 8-STABLE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2011 21:46:37 -0000 On Wed, Aug 17, 2011 at 12:40 PM, Miroslav Lachman <000.fbsd@quip.cz> wrote: > Thank you guys, you are right. The BIOS provides only 1 disk to the loader! > I checked it from loader prompt by lsdev (booted from USB external HDD). > > So I will try to make a small zpool mirror for root and boot (if ZFS mirror > can be made of 4 providers instead of two) and the rest will be in RAIDZ. > > If that fails, I will go my old way with internal USB flash disk with UFS > for booting and RAIDZ of 4 disks for storage as I did it few years ago with > 7.0 or 7.1. You seem to be booting from disks attached to some sort of add-on card. Sometimes those have per-disk 'bootable' option in their own extension ROM. You may investigate yours. Perhaps all you need to do is just tweak controller settings. --Artem From owner-freebsd-stable@FreeBSD.ORG Wed Aug 17 23:15:54 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 016FF106564A; Wed, 17 Aug 2011 23:15:54 +0000 (UTC) (envelope-from prvs=1210f20b9f=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id E01B98FC1B; Wed, 17 Aug 2011 23:15:52 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Thu, 18 Aug 2011 00:15:17 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Thu, 18 Aug 2011 00:15:17 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014640704.msg; Thu, 18 Aug 2011 00:15:16 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1210f20b9f=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <4019027648B5493AAC4B654BD821DE88@multiplay.co.uk> From: "Steven Hartland" To: "Andriy Gapon" , , References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><4E43E272.1060204@FreeBSD.org><62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk><4E440865.1040500@FreeBSD.org><6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk><4E441314.6060606@FreeBSD.org><2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk><4E48D967.9060804@FreeBSD.org><9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk><4E490DAF.1080009@FreeBSD.org><796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk><4E491D01.1090902@FreeBSD.org><570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk><4E4AD35C.7020504@FreeBSD.org><6A7238AED44542A880B082A40304D940@multiplay.co.uk><4E4BA21F.6010805@FreeBSD.org><581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk><4E4BBA7F.30907@FreeBSD.org><88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk> <4E4C22D6.6070407@FreeBSD.org> Date: Thu, 18 Aug 2011 00:15:56 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2011 23:15:54 -0000 ----- Original Message ----- From: "Andriy Gapon" > Thanks to the debug that Steven provided and to the help that I received from > Kostik, I think that now I understand the basic mechanics of this panic, but, > unfortunately, not the details of its root cause. > > It seems like everything starts with some kind of a race between terminating > processes in a jail and termination of the jail itself. This is where the > details are very thin so far. What we see is that a process (http) is in > exit(2) syscall, in exit1() function actually, and past the place where P_WEXIT > flag is set and even past the place where p_limit is freed and reset to NULL. > At that place the thread calls prison_proc_free(), which calls prison_deref(). > Then, we see that in prison_deref() the thread gets a page fault because of what > seems like a NULL pointer dereference. That's just the start of the problem and > its root cause. Thats interesting, are you using http as an example or is that something thats been gleaned from the debugging of our output? I ask as there's only one process running in each of our jails and thats a single java process. Now given your description there may be something I can add that may help clarify what the cause could be. In a nutshell the jail manager we're using will attempt to resurrect the jail from a dieing state in a few specific scenarios. Here's an exmaple:- 1. jail restart requested 2. jail is stopped, so the java processes is killed off, but active tcp sessions may prevent the timely full shutdown of the jail. 3. if an existing jail is detected, i.e. a dieing jail from #2, instead of starting a new jail we attach to the old one and exec the new java process. 4. if an existing jail isnt detected, i.e. where there where not hanging tcp sessions and #2 cleanly shutdown the jail, a new jail is created, attached to and the java exec'ed. The system uses static jailid's so its possible to determine if an existing jail for this "service" exists or not. This prevents duplicate services as well as making services easy to identify by their jailid. So what we could be seeing is a race between the jail shutdown and the attach of the new process? Now man 2 jail seems to indicate this is a valid use case for jail_set, as it documents its support for JAIL_DYING as a valid option for flags, but I suspect its something quite out of the ordinary to actually do, which may be why this panic hasnt been seen before now. As some background the reason we use static jailid's is to ensure only one instance of the jailed service is running, and the reason we re-attach to the dieing jail is so that jails can be restarted in a timely manor. Without using the re-attach we would need to wait of all tcp sessions which have been aborted to timeout. > So, of course, Steven is interested in finding and fixing the root cause. I > hope we will get to that with some help from the "prison guards" :-) Does the above potentially explain how we're getting to the situation which generates the panic? If so we can certainly look at using alternatives to the current design to workaround this issue. Flagging the jail as permanent and using manual process management and additional external locking to prevent duplicates, is what instantly springs to mind. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-stable@FreeBSD.ORG Thu Aug 18 00:01:11 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A6C13106566B for ; Thu, 18 Aug 2011 00:01:11 +0000 (UTC) (envelope-from sterling@camdensoftware.com) Received: from wh1.interactivevillages.com (ca.2e.7bae.static.theplanet.com [174.123.46.202]) by mx1.freebsd.org (Postfix) with ESMTP id 6D7EB8FC1C for ; Thu, 18 Aug 2011 00:01:11 +0000 (UTC) Received: from 184-78-197-203.war.clearwire-wmx.net ([184.78.197.203] helo=_HOSTNAME_) by wh1.interactivevillages.com with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from ) id 1Qtq2l-0007YQ-C2 for freebsd-stable@FreeBSD.org; Wed, 17 Aug 2011 17:00:44 -0700 Received: by _HOSTNAME_ (sSMTP sendmail emulation); Wed, 17 Aug 2011 17:01:05 -0700 Date: Wed, 17 Aug 2011 17:01:05 -0700 From: Chip Camden To: freebsd-stable@FreeBSD.org Message-ID: <20110818000105.GC2489@libertas.local.camdensoftware.com> Mail-Followup-To: freebsd-stable@FreeBSD.org References: <20110707082027.GX48734@deviant.kiev.zoral.com.ua> <4E159959.2070401@sentex.net> <4E15A08C.6090407@sentex.net> <20110818.023832.373949045518579359.hrs@allbsd.org> <20110817175201.GB1973@libertas.local.camdensoftware.com> <20110817210446.GA49737@icarus.home.lan> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="VywGB/WGlW4DM4P8" Content-Disposition: inline In-Reply-To: <20110817210446.GA49737@icarus.home.lan> User-Agent: Mutt/1.4.2.3i Company: Camden Software Consulting URL: http://camdensoftware.com X-PGP-Key: http://pgp.mit.edu:11371/pks/lookup?search=0xD6DBAF91 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - wh1.interactivevillages.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - camdensoftware.com X-Source: X-Source-Args: X-Source-Dir: Cc: Subject: Re: panic: spin lock held too long (RELENG_8 from today) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2011 00:01:11 -0000 --VywGB/WGlW4DM4P8 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Quoth Jeremy Chadwick on Wednesday, 17 August 2011: > >=20 > > I'm also getting similar panics on 8.2-STABLE. Locks up everything and= I > > have to power off. Once, I happened to be looking at the console when = it > > happened and copied dow the following: > >=20 > > Sleeping thread (tif 100037, pid 0) owns a non-sleepable lock > > panic: sleeping thread > > cpuid=3D1 >=20 > No idea, might be relevant to the thread. >=20 > > Another time I got: > >=20 > > lock order reversal: > > 1st 0xffffff000593e330 snaplk (snaplk) @ /usr/src/sys/kern/vfr_vnops.c:= 296 > > 2nd 0xffffff0005e5d578 ufs (ufs) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:= 1587 > >=20 > > I didn't copy down the traceback. >=20 > "snaplk" refers to UFS snapshots. The above must have been typed in > manually as well, due to some typos in filenames as well. >=20 > Either this is a different problem, or if everyone in this thread is > doing UFS snapshots (dump -L, mksnap_ffs, etc.) and having this problem > happen then I recommend people stop using UFS snapshots. I've ranted > about their unreliability in the past (years upon years ago -- still > seems valid) and just how badly they can "wedge" a system. This is one > of the many (MANY!) reasons why we use rsnapshot/rsync instead. The > atime clobbering issue is the only downside. >=20 If I'm doing UFS snapshots, I didn't know it. Yes, everything was copied manually because it only displays on the console and the keyboard does not respond after that point. So I copied first to paper, then had to decode my lousy handwriting to put it in an email. Sorry for the scribal errors. --=20 =2EO. | Sterling (Chip) Camden | http://camdensoftware.com =2E.O | sterling@camdensoftware.com | http://chipsquips.com OOO | 2048R/D6DBAF91 | http://chipstips.com --VywGB/WGlW4DM4P8 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iQEcBAEBAgAGBQJOTFZBAAoJEIpckszW26+RRtsH/jPEUungeBO9a3idYOTECrqg BsEo0zyyyz76sd3bkyVVx5QNRlfAygoxhReUsD1r6GC9QhapR0m91qUD1bYNK3yv wCxKp3bCOCbh4HOG5efwDFBKisfKLRKjyQp2SQ7d2R+RHO6fsk9VHvrPS6LQ3skH AJ0fvCUd+0GCpvKsLHzV+MqrGJpiMdz2dwPpo+Jwv+EzGZ8H2gJwrzZD4OUAkGC4 gXBqT+YTiJLNQIOr0dteYO037yymUxYRqB9q8lbNcl6RKp3s1NHQWUU3IhDJjeSL 5qTCr9j9wSOomxBCskWXsy6XzEdmc3dzMPBS95D5zbZWDYxl5JXFAE8hLKanWkw= =cyZM -----END PGP SIGNATURE----- --VywGB/WGlW4DM4P8-- From owner-freebsd-stable@FreeBSD.ORG Thu Aug 18 00:16:41 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7D2A9106564A; Thu, 18 Aug 2011 00:16:41 +0000 (UTC) (envelope-from hrs@FreeBSD.org) Received: from mail.allbsd.org (gatekeeper-int.allbsd.org [IPv6:2001:2f0:104:e002::2]) by mx1.freebsd.org (Postfix) with ESMTP id 716368FC08; Thu, 18 Aug 2011 00:16:40 +0000 (UTC) Received: from alph.allbsd.org (p3028-ipbf608funabasi.chiba.ocn.ne.jp [125.175.94.28]) (authenticated bits=128) by mail.allbsd.org (8.14.4/8.14.4) with ESMTP id p7I0GI0T059114; Thu, 18 Aug 2011 09:16:28 +0900 (JST) (envelope-from hrs@FreeBSD.org) Received: from localhost (localhost [IPv6:::1]) (authenticated bits=0) by alph.allbsd.org (8.14.4/8.14.4) with ESMTP id p7I0GDqV044396; Thu, 18 Aug 2011 09:16:15 +0900 (JST) (envelope-from hrs@FreeBSD.org) Date: Thu, 18 Aug 2011 09:16:00 +0900 (JST) Message-Id: <20110818.091600.831954331552558249.hrs@allbsd.org> To: attilio@FreeBSD.org From: Hiroki Sato In-Reply-To: <20110818.043332.27079545013461535.hrs@allbsd.org> References: <20110818.023832.373949045518579359.hrs@allbsd.org> <20110818.043332.27079545013461535.hrs@allbsd.org> X-PGPkey-fingerprint: BDB3 443F A5DD B3D0 A530 FFD7 4F2C D3D8 2793 CF2D X-Mailer: Mew version 6.3 on Emacs 23.1 / Mule 6.0 (HANACHIRUSATO) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Scanned: clamav-milter 0.97 at gatekeeper.allbsd.org X-Virus-Status: Clean X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.3 (mail.allbsd.org [133.31.130.32]); Thu, 18 Aug 2011 09:16:33 +0900 (JST) X-Spam-Status: No, score=-102.2 required=13.0 tests=BAYES_00, CONTENT_TYPE_PRESENT,DIRECTOCNDYN,MIMEQENC,QENCPTR2,RCVD_IN_RP_RNBL, SPF_SOFTFAIL,USER_IN_WHITELIST autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on gatekeeper.allbsd.org Cc: kostikbel@gmail.com, freebsd-stable@FreeBSD.org, avg@FreeBSD.org Subject: Re: panic: spin lock held too long (RELENG_8 from today) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2011 00:16:41 -0000 Hiroki Sato wrote in <20110818.043332.27079545013461535.hrs@allbsd.org>: hr> Attilio Rao wrote hr> in : hr> = hr> at> 2011/8/17 Hiroki Sato : hr> at> > Hi, hr> at> > hr> at> > Mike Tancsa wrote hr> at> > =A0in <4E15A08C.6090407@sentex.net>: hr> at> > hr> at> > mi> On 7/7/2011 7:32 AM, Mike Tancsa wrote: hr> at> > mi> > On 7/7/2011 4:20 AM, Kostik Belousov wrote: hr> at> > mi> >> hr> at> > mi> >> BTW, we had a similar panic, "spinlock held too long",= the spinlock hr> at> > mi> >> is the sched lock N, on busy 8-core box recently upgra= ded to the hr> at> > mi> >> stable/8. Unfortunately, machine hung dumping core, so= the stack trace hr> at> > mi> >> for the owner thread was not available. hr> at> > mi> >> hr> at> > mi> >> I was unable to make any conclusion from the data that= was present. hr> at> > mi> >> If the situation is reproducable, you coulld try to re= vert r221937. This hr> at> > mi> >> is pure speculation, though. hr> at> > mi> > hr> at> > mi> > Another crash just now after 5hrs uptime. I will try an= d revert r221937 hr> at> > mi> > unless there is any extra debugging you want me to add = to the kernel hr> at> > mi> > instead =A0? hr> at> > hr> at> > =A0I am also suffering from a reproducible panic on an 8-STAB= LE box, an hr> at> > =A0NFS server with heavy I/O load. =A0I could not get a kerne= l dump hr> at> > =A0because this panic locked up the machine just after it occ= urred, but hr> at> > =A0according to the stack trace it was the same as posted one= .= hr> at> > =A0Switching to an 8.2R kernel can prevent this panic. hr> at> > hr> at> > =A0Any progress on the investigation? hr> at> = hr> at> Hiroki, hr> at> how easilly can you reproduce it? hr> = hr> It takes 5-10 hours. I installed another kernel for debugging jus= t hr> now, so I think I will be able to collect more detail information = in hr> a couple of days. hr> = hr> at> It would be important to have a DDB textdump with these informa= tions: hr> at> - bt hr> at> - ps hr> at> - show allpcpu hr> at> - alltrace hr> at> = hr> at> Alternatively, a coredump which has the stop cpu patch which An= dryi can provide. hr> = hr> Okay, I will post them once I can get another panic. Thanks! I got the panic with a crash dump this time. The result of bt, ps, allpcpu, and traces can be found at the following URL: http://people.allbsd.org/~hrs/FreeBSD/pool-panic_20110818-1.txt -- Hiroki From owner-freebsd-stable@FreeBSD.ORG Thu Aug 18 00:35:38 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C73D9106566B; Thu, 18 Aug 2011 00:35:38 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com [209.85.213.182]) by mx1.freebsd.org (Postfix) with ESMTP id 5D32F8FC0C; Thu, 18 Aug 2011 00:35:38 +0000 (UTC) Received: by yxn22 with SMTP id 22so254950yxn.13 for ; Wed, 17 Aug 2011 17:35:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=nwJCiqzwXhEcptHUzCuSNX6Tqtwr9zjUxyoSseNFsi8=; b=qQrWvcZ9eyLkxMPJGE9od7N9AFoXHEpM+5hs+lMTqK9Cv5sighZh+dw5fYXAD3QIsl WDV1H/80GNMSiWqjkLLrzsgPnV6VlCsYE92Ahuxnx4jUa2M0DGs5lG3Qp08WeBYghNcY d983McfPYNhLo8AeJ5jQUU+JdU2lwEaVrzl/A= MIME-Version: 1.0 Received: by 10.236.170.9 with SMTP id o9mr11588yhl.43.1313627737497; Wed, 17 Aug 2011 17:35:37 -0700 (PDT) Sender: asmrookie@gmail.com Received: by 10.236.108.33 with HTTP; Wed, 17 Aug 2011 17:35:37 -0700 (PDT) In-Reply-To: <20110818.091600.831954331552558249.hrs@allbsd.org> References: <20110818.023832.373949045518579359.hrs@allbsd.org> <20110818.043332.27079545013461535.hrs@allbsd.org> <20110818.091600.831954331552558249.hrs@allbsd.org> Date: Thu, 18 Aug 2011 02:35:37 +0200 X-Google-Sender-Auth: MCw4hh_Hde0OfacevQCtfvzP3CU Message-ID: From: Attilio Rao To: Hiroki Sato Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: kostikbel@gmail.com, freebsd-stable@freebsd.org, avg@freebsd.org Subject: Re: panic: spin lock held too long (RELENG_8 from today) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2011 00:35:39 -0000 2011/8/18 Hiroki Sato : > Hiroki Sato wrote > =C2=A0in <20110818.043332.27079545013461535.hrs@allbsd.org>: > > hr> Attilio Rao wrote > hr> =C2=A0 in : > hr> > hr> at> 2011/8/17 Hiroki Sato : > hr> at> > Hi, > hr> at> > > hr> at> > Mike Tancsa wrote > hr> at> > =C2=A0in <4E15A08C.6090407@sentex.net>: > hr> at> > > hr> at> > mi> On 7/7/2011 7:32 AM, Mike Tancsa wrote: > hr> at> > mi> > On 7/7/2011 4:20 AM, Kostik Belousov wrote: > hr> at> > mi> >> > hr> at> > mi> >> BTW, we had a similar panic, "spinlock held too long", t= he spinlock > hr> at> > mi> >> is the sched lock N, on busy 8-core box recently upgrade= d to the > hr> at> > mi> >> stable/8. Unfortunately, machine hung dumping core, so t= he stack trace > hr> at> > mi> >> for the owner thread was not available. > hr> at> > mi> >> > hr> at> > mi> >> I was unable to make any conclusion from the data that w= as present. > hr> at> > mi> >> If the situation is reproducable, you coulld try to reve= rt r221937. This > hr> at> > mi> >> is pure speculation, though. > hr> at> > mi> > > hr> at> > mi> > Another crash just now after 5hrs uptime. I will try and = revert r221937 > hr> at> > mi> > unless there is any extra debugging you want me to add to= the kernel > hr> at> > mi> > instead =C2=A0? > hr> at> > > hr> at> > =C2=A0I am also suffering from a reproducible panic on an 8-STA= BLE box, an > hr> at> > =C2=A0NFS server with heavy I/O load. =C2=A0I could not get a k= ernel dump > hr> at> > =C2=A0because this panic locked up the machine just after it oc= curred, but > hr> at> > =C2=A0according to the stack trace it was the same as posted on= e. > hr> at> > =C2=A0Switching to an 8.2R kernel can prevent this panic. > hr> at> > > hr> at> > =C2=A0Any progress on the investigation? > hr> at> > hr> at> Hiroki, > hr> at> how easilly can you reproduce it? > hr> > hr> =C2=A0It takes 5-10 hours. =C2=A0I installed another kernel for debug= ging just > hr> =C2=A0now, so I think I will be able to collect more detail informati= on in > hr> =C2=A0a couple of days. > hr> > hr> at> It would be important to have a DDB textdump with these informati= ons: > hr> at> - bt > hr> at> - ps > hr> at> - show allpcpu > hr> at> - alltrace > hr> at> > hr> at> Alternatively, a coredump which has the stop cpu patch which Andr= yi can provide. > hr> > hr> =C2=A0Okay, I will post them once I can get another panic. =C2=A0Than= ks! > > =C2=A0I got the panic with a crash dump this time. =C2=A0The result of bt= , ps, > =C2=A0allpcpu, and traces can be found at the following URL: > > =C2=A0http://people.allbsd.org/~hrs/FreeBSD/pool-panic_20110818-1.txt I'm not sure I understand it, is also a corefile available? If yes, where I could get it? (with the relevant sources and kernel.debug). Thanks, Attilio --=20 Peace can only be achieved by understanding - A. Einstein From owner-freebsd-stable@FreeBSD.ORG Thu Aug 18 01:04:35 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 656BA106564A; Thu, 18 Aug 2011 01:04:35 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com [209.85.213.182]) by mx1.freebsd.org (Postfix) with ESMTP id DFF838FC08; Thu, 18 Aug 2011 01:04:34 +0000 (UTC) Received: by yxn22 with SMTP id 22so265810yxn.13 for ; Wed, 17 Aug 2011 18:04:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=n4cvQcucnODsJVGi9I5dVUPScuIcb3TwlppDmN49vl8=; b=epFRImjgWwckQCIxi9uEJIJvXJ44SaCcTcHEh8BBxmkJN6WEDFePl0iA/XdZul7toT HxeW81vbc4wkGqAXfe2HPix3sCrRMbh1aqsLRrRdaC2LFEbvTKIhY8e5Pn78xXeBalqU yegfacVxvQFlNq2MGb1j/XF7vvcHr3oXgSVjA= MIME-Version: 1.0 Received: by 10.236.182.66 with SMTP id n42mr113076yhm.128.1313629474018; Wed, 17 Aug 2011 18:04:34 -0700 (PDT) Sender: asmrookie@gmail.com Received: by 10.236.108.33 with HTTP; Wed, 17 Aug 2011 18:04:32 -0700 (PDT) In-Reply-To: <20110818.091600.831954331552558249.hrs@allbsd.org> References: <20110818.023832.373949045518579359.hrs@allbsd.org> <20110818.043332.27079545013461535.hrs@allbsd.org> <20110818.091600.831954331552558249.hrs@allbsd.org> Date: Thu, 18 Aug 2011 03:04:32 +0200 X-Google-Sender-Auth: k_UDqQniEWum2a7YNdHBkktTkYU Message-ID: From: Attilio Rao To: Hiroki Sato Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-stable@freebsd.org, sterling@camdensoftware.com, avg@freebsd.org, Nick Esborn , kostikbel@gmail.com, mdtansca@freebsd.org Subject: Re: panic: spin lock held too long (RELENG_8 from today) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2011 01:04:35 -0000 2011/8/18 Hiroki Sato : > Hiroki Sato wrote > =C2=A0in <20110818.043332.27079545013461535.hrs@allbsd.org>: > > hr> Attilio Rao wrote > hr> =C2=A0 in : > hr> > hr> at> 2011/8/17 Hiroki Sato : > hr> at> > Hi, > hr> at> > > hr> at> > Mike Tancsa wrote > hr> at> > =C2=A0in <4E15A08C.6090407@sentex.net>: > hr> at> > > hr> at> > mi> On 7/7/2011 7:32 AM, Mike Tancsa wrote: > hr> at> > mi> > On 7/7/2011 4:20 AM, Kostik Belousov wrote: > hr> at> > mi> >> > hr> at> > mi> >> BTW, we had a similar panic, "spinlock held too long", t= he spinlock > hr> at> > mi> >> is the sched lock N, on busy 8-core box recently upgrade= d to the > hr> at> > mi> >> stable/8. Unfortunately, machine hung dumping core, so t= he stack trace > hr> at> > mi> >> for the owner thread was not available. > hr> at> > mi> >> > hr> at> > mi> >> I was unable to make any conclusion from the data that w= as present. > hr> at> > mi> >> If the situation is reproducable, you coulld try to reve= rt r221937. This > hr> at> > mi> >> is pure speculation, though. > hr> at> > mi> > > hr> at> > mi> > Another crash just now after 5hrs uptime. I will try and = revert r221937 > hr> at> > mi> > unless there is any extra debugging you want me to add to= the kernel > hr> at> > mi> > instead =C2=A0? > hr> at> > > hr> at> > =C2=A0I am also suffering from a reproducible panic on an 8-STA= BLE box, an > hr> at> > =C2=A0NFS server with heavy I/O load. =C2=A0I could not get a k= ernel dump > hr> at> > =C2=A0because this panic locked up the machine just after it oc= curred, but > hr> at> > =C2=A0according to the stack trace it was the same as posted on= e. > hr> at> > =C2=A0Switching to an 8.2R kernel can prevent this panic. > hr> at> > > hr> at> > =C2=A0Any progress on the investigation? > hr> at> > hr> at> Hiroki, > hr> at> how easilly can you reproduce it? > hr> > hr> =C2=A0It takes 5-10 hours. =C2=A0I installed another kernel for debug= ging just > hr> =C2=A0now, so I think I will be able to collect more detail informati= on in > hr> =C2=A0a couple of days. > hr> > hr> at> It would be important to have a DDB textdump with these informati= ons: > hr> at> - bt > hr> at> - ps > hr> at> - show allpcpu > hr> at> - alltrace > hr> at> > hr> at> Alternatively, a coredump which has the stop cpu patch which Andr= yi can provide. > hr> > hr> =C2=A0Okay, I will post them once I can get another panic. =C2=A0Than= ks! > > =C2=A0I got the panic with a crash dump this time. =C2=A0The result of bt= , ps, > =C2=A0allpcpu, and traces can be found at the following URL: > > =C2=A0http://people.allbsd.org/~hrs/FreeBSD/pool-panic_20110818-1.txt Actually, I think I see the bug here. In callout_cpu_switch() if a low priority thread is migrating the callout and gets preempted after the outcoming cpu queue lock is left (and scheduled much later) we get this problem. In order to fix this bug it could be enough to use a critical section, but I think this should be really interrupt safe, thus I'd wrap them up with spinlock_enter()/spinlock_exit(). Fortunately callout_cpu_switch() should be called rarely and also we already do expensive locking operations in callout, thus we should not have problem performance-wise. Can the guys I also CC'ed here try the following patch, with all the initial kernel options that were leading you to the deadlock? (thus revert any debugging patch/option you added for the moment): http://www.freebsd.org/~attilio/callout-fixup.diff Please note that this patch is for STABLE_8, if you can confirm the good result I'll commit to -CURRENT and then backmarge as soon as possible. Thanks, Attilio --=20 Peace can only be achieved by understanding - A. Einstein From owner-freebsd-stable@FreeBSD.ORG Thu Aug 18 01:29:52 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 075391065672 for ; Thu, 18 Aug 2011 01:29:52 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta07.westchester.pa.mail.comcast.net (qmta07.westchester.pa.mail.comcast.net [76.96.62.64]) by mx1.freebsd.org (Postfix) with ESMTP id A67CA8FC0C for ; Thu, 18 Aug 2011 01:29:51 +0000 (UTC) Received: from omta24.westchester.pa.mail.comcast.net ([76.96.62.76]) by qmta07.westchester.pa.mail.comcast.net with comcast id Mcft1h0061ei1Bg57dVre2; Thu, 18 Aug 2011 01:29:51 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta24.westchester.pa.mail.comcast.net with comcast id MdVo1h01U1t3BNj3kdVqvu; Thu, 18 Aug 2011 01:29:51 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 842E0102C1A; Wed, 17 Aug 2011 18:29:47 -0700 (PDT) Date: Wed, 17 Aug 2011 18:29:47 -0700 From: Jeremy Chadwick To: freebsd-stable@FreeBSD.org Message-ID: <20110818012947.GA53983@icarus.home.lan> References: <20110707082027.GX48734@deviant.kiev.zoral.com.ua> <4E159959.2070401@sentex.net> <4E15A08C.6090407@sentex.net> <20110818.023832.373949045518579359.hrs@allbsd.org> <20110817175201.GB1973@libertas.local.camdensoftware.com> <20110817210446.GA49737@icarus.home.lan> <20110818000105.GC2489@libertas.local.camdensoftware.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110818000105.GC2489@libertas.local.camdensoftware.com> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Subject: Re: panic: spin lock held too long (RELENG_8 from today) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2011 01:29:52 -0000 On Wed, Aug 17, 2011 at 05:01:05PM -0700, Chip Camden wrote: > Quoth Jeremy Chadwick on Wednesday, 17 August 2011: > > > > > > I'm also getting similar panics on 8.2-STABLE. Locks up everything and I > > > have to power off. Once, I happened to be looking at the console when it > > > happened and copied dow the following: > > > > > > Sleeping thread (tif 100037, pid 0) owns a non-sleepable lock > > > panic: sleeping thread > > > cpuid=1 > > > > No idea, might be relevant to the thread. > > > > > Another time I got: > > > > > > lock order reversal: > > > 1st 0xffffff000593e330 snaplk (snaplk) @ /usr/src/sys/kern/vfr_vnops.c:296 > > > 2nd 0xffffff0005e5d578 ufs (ufs) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:1587 > > > > > > I didn't copy down the traceback. > > > > "snaplk" refers to UFS snapshots. The above must have been typed in > > manually as well, due to some typos in filenames as well. > > > > Either this is a different problem, or if everyone in this thread is > > doing UFS snapshots (dump -L, mksnap_ffs, etc.) and having this problem > > happen then I recommend people stop using UFS snapshots. I've ranted > > about their unreliability in the past (years upon years ago -- still > > seems valid) and just how badly they can "wedge" a system. This is one > > of the many (MANY!) reasons why we use rsnapshot/rsync instead. The > > atime clobbering issue is the only downside. > > > > If I'm doing UFS snapshots, I didn't know it. The backtrace indicates that a UFS snapshot is being made -- which causes the state to be set to string "snaplk", which is then honoured in vfs_vnops.c. You can see for yourself: grep -r snaplk /usr/src/sys. So yes, I'm inclined to believe something on your system is doing UFS snapshot generation. Whether or not other people are doing it as well is a different story. > Yes, everything was copied manually because it only displays on the > console and the keyboard does not respond after that point. So I > copied first to paper, then had to decode my lousy handwriting to put > it in an email. Sorry for the scribal errors. That sounds more or less like what I saw with UFS snapshots: the system would go catatonic in one way or another. It wouldn't "hard lock" (as in if you had powered it off, etc.), it would "live lock" (as in the kernel was wedged or held up/spinning doing something). I never saw a panic as a result of UFS snapshots, only what I described here. TL;DR -- Your system appears to be making UFS snapshots, and that situation is possibly (likely?) unrelated to the sleeping thread issue you see that causes a panic. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-stable@FreeBSD.ORG Thu Aug 18 02:55:55 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 72DDB106564A for ; Thu, 18 Aug 2011 02:55:55 +0000 (UTC) (envelope-from sterling@camdensoftware.com) Received: from wh1.interactivevillages.com (ca.2e.7bae.static.theplanet.com [174.123.46.202]) by mx1.freebsd.org (Postfix) with ESMTP id 386458FC14 for ; Thu, 18 Aug 2011 02:55:55 +0000 (UTC) Received: from 184-78-197-203.war.clearwire-wmx.net ([184.78.197.203] helo=_HOSTNAME_) by wh1.interactivevillages.com with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from ) id 1Qtslr-0001Mh-3w for freebsd-stable@freebsd.org; Wed, 17 Aug 2011 19:55:28 -0700 Received: by _HOSTNAME_ (sSMTP sendmail emulation); Wed, 17 Aug 2011 19:55:50 -0700 Date: Wed, 17 Aug 2011 19:55:50 -0700 From: Chip Camden To: freebsd-stable@freebsd.org Message-ID: <20110818025550.GA1971@libertas.local.camdensoftware.com> Mail-Followup-To: freebsd-stable@freebsd.org References: <20110818.023832.373949045518579359.hrs@allbsd.org> <20110818.043332.27079545013461535.hrs@allbsd.org> <20110818.091600.831954331552558249.hrs@allbsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="LQksG6bCIzRHxTLp" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i Company: Camden Software Consulting URL: http://camdensoftware.com X-PGP-Key: http://pgp.mit.edu:11371/pks/lookup?search=0xD6DBAF91 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - wh1.interactivevillages.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - camdensoftware.com X-Source: X-Source-Args: X-Source-Dir: Subject: Re: panic: spin lock held too long (RELENG_8 from today) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2011 02:55:55 -0000 --LQksG6bCIzRHxTLp Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Quoth Attilio Rao on Thursday, 18 August 2011: > 2011/8/18 Hiroki Sato : > > Hiroki Sato wrote > > =A0in <20110818.043332.27079545013461535.hrs@allbsd.org>: > > > > hr> Attilio Rao wrote > > hr> =A0 in : > > hr> > > hr> at> 2011/8/17 Hiroki Sato : > > hr> at> > Hi, > > hr> at> > > > hr> at> > Mike Tancsa wrote > > hr> at> > =A0in <4E15A08C.6090407@sentex.net>: > > hr> at> > > > hr> at> > mi> On 7/7/2011 7:32 AM, Mike Tancsa wrote: > > hr> at> > mi> > On 7/7/2011 4:20 AM, Kostik Belousov wrote: > > hr> at> > mi> >> > > hr> at> > mi> >> BTW, we had a similar panic, "spinlock held too long",= the spinlock > > hr> at> > mi> >> is the sched lock N, on busy 8-core box recently upgra= ded to the > > hr> at> > mi> >> stable/8. Unfortunately, machine hung dumping core, so= the stack trace > > hr> at> > mi> >> for the owner thread was not available. > > hr> at> > mi> >> > > hr> at> > mi> >> I was unable to make any conclusion from the data that= was present. > > hr> at> > mi> >> If the situation is reproducable, you coulld try to re= vert r221937. This > > hr> at> > mi> >> is pure speculation, though. > > hr> at> > mi> > > > hr> at> > mi> > Another crash just now after 5hrs uptime. I will try an= d revert r221937 > > hr> at> > mi> > unless there is any extra debugging you want me to add = to the kernel > > hr> at> > mi> > instead =A0? > > hr> at> > > > hr> at> > =A0I am also suffering from a reproducible panic on an 8-STAB= LE box, an > > hr> at> > =A0NFS server with heavy I/O load. =A0I could not get a kerne= l dump > > hr> at> > =A0because this panic locked up the machine just after it occ= urred, but > > hr> at> > =A0according to the stack trace it was the same as posted one. > > hr> at> > =A0Switching to an 8.2R kernel can prevent this panic. > > hr> at> > > > hr> at> > =A0Any progress on the investigation? > > hr> at> > > hr> at> Hiroki, > > hr> at> how easilly can you reproduce it? > > hr> > > hr> =A0It takes 5-10 hours. =A0I installed another kernel for debugging= just > > hr> =A0now, so I think I will be able to collect more detail informatio= n in > > hr> =A0a couple of days. > > hr> > > hr> at> It would be important to have a DDB textdump with these informa= tions: > > hr> at> - bt > > hr> at> - ps > > hr> at> - show allpcpu > > hr> at> - alltrace > > hr> at> > > hr> at> Alternatively, a coredump which has the stop cpu patch which An= dryi can provide. > > hr> > > hr> =A0Okay, I will post them once I can get another panic. =A0Thanks! > > > > =A0I got the panic with a crash dump this time. =A0The result of bt, ps, > > =A0allpcpu, and traces can be found at the following URL: > > > > =A0http://people.allbsd.org/~hrs/FreeBSD/pool-panic_20110818-1.txt >=20 > Actually, I think I see the bug here. >=20 > In callout_cpu_switch() if a low priority thread is migrating the > callout and gets preempted after the outcoming cpu queue lock is left > (and scheduled much later) we get this problem. >=20 > In order to fix this bug it could be enough to use a critical section, > but I think this should be really interrupt safe, thus I'd wrap them > up with spinlock_enter()/spinlock_exit(). Fortunately > callout_cpu_switch() should be called rarely and also we already do > expensive locking operations in callout, thus we should not have > problem performance-wise. >=20 > Can the guys I also CC'ed here try the following patch, with all the > initial kernel options that were leading you to the deadlock? (thus > revert any debugging patch/option you added for the moment): > http://www.freebsd.org/~attilio/callout-fixup.diff >=20 > Please note that this patch is for STABLE_8, if you can confirm the > good result I'll commit to -CURRENT and then backmarge as soon as > possible. >=20 > Thanks, > Attilio >=20 Thanks, Attilio. I've applied the patch and removed the extra debug options I had added (though keeping debug symbols). I'll let you know if I experience any more panics. Regards, --=20 =2EO. | Sterling (Chip) Camden | http://camdensoftware.com =2E.O | sterling@camdensoftware.com | http://chipsquips.com OOO | 2048R/D6DBAF91 | http://chipstips.com --LQksG6bCIzRHxTLp Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iQEcBAEBAgAGBQJOTH82AAoJEIpckszW26+Rm0oH/3Ikeau8F1c55yqTjMh6X78B /3yTy68BsfBwD/VeA00Q/cpxlCafovUeP8WwXPE9mNkdR9Rhf1VuU7K1iLOtbGHe F+UJ/rB8rNPUNxezCqo2kzoMhx2o9NbCiZPW9toyL1lW/pa/B5/lToma8BnbxzOH 2LBSU/8+HU8YphqXr4hPEPFxWUx74tSvieHOEBI1/GVZea2vpUrInO7cfqQ3DzLE /6vnvb0KVfhQjTeeApdFen46eS2mbPl+PtMKGv3C7Ctle+Bv2hm3QhoIc8DCOTTE 9lBdByd2lozIUK+bsc2DMg/+keoW9h1MRVcaNRASOhdx1L6QId6ULdg9Z5QO2G8= =jONj -----END PGP SIGNATURE----- --LQksG6bCIzRHxTLp-- From owner-freebsd-stable@FreeBSD.ORG Thu Aug 18 08:16:53 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6D4F4106566C for ; Thu, 18 Aug 2011 08:16:53 +0000 (UTC) (envelope-from melifaro@ipfw.ru) Received: from mail.ipfw.ru (unknown [IPv6:2a01:4f8:120:6141::2]) by mx1.freebsd.org (Postfix) with ESMTP id 026B38FC08 for ; Thu, 18 Aug 2011 08:16:53 +0000 (UTC) Received: from dhcp170-36-red.yandex.net ([95.108.170.36]) by mail.ipfw.ru with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.76 (FreeBSD)) (envelope-from ) id 1QtxmT-000Nxd-DC; Thu, 18 Aug 2011 12:16:49 +0400 Message-ID: <4E4CCA6C.8020408@ipfw.ru> Date: Thu, 18 Aug 2011 12:16:44 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.1.16) Gecko/20110120 Thunderbird/3.0.11 MIME-Version: 1.0 To: perryh@pluto.rain.com References: <4E4143A6.6030307@digsys.bg> <935F8EC2-88E0-45A3-BE8B-7210BE223BC5@mac.com> <4e42a0c0.e2t/9MF98O3HFjb1%perryh@pluto.rain.com> In-Reply-To: <4e42a0c0.e2t/9MF98O3HFjb1%perryh@pluto.rain.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-stable@freebsd.org, daniel@digsys.bg Subject: Re: 32GB limit per swap device? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2011 08:16:53 -0000 On 10.08.2011 19:16, perryh@pluto.rain.com wrote: > Chuck Swiger wrote: > >> On Aug 9, 2011, at 7:26 AM, Daniel Kalchev wrote: >>> I am trying to set up 64GB partitions for swap for a system that >>> has 64GB of RAM (with the idea to dump kernel core etc). But, on >>> 8-stable as of today I get: >>> >>> WARNING: reducing size to maximum of 67108864 blocks per swap unit >>> >>> Is there workaround for this limitation? Another interesting question: swap pager operates in page blocks (PAGE_SIZE=4k on common arch). Block device size in passed to swaponsomething() in number of _disk_ blocks (e.g. in DEV_BSIZE=512). After that, kernel b-lists (on top of which swap pager is build) maximum objects check is enforced. The (possible) problem is that real object count we will operate on is not the value passed to swaponsomething() since it is calculated in wrong units. we should check b-list limit on (X * DEV_BSIZE512 / PAGE_SIZE) value which is rough (X / 8) so we should be able to address 32*8=256G. The code should look like this: Index: vm/swap_pager.c =================================================================== --- vm/swap_pager.c (revision 223877) +++ vm/swap_pager.c (working copy) @@ -2129,6 +2129,15 @@ swaponsomething(struct vnode *vp, void *id, u_long u_long mblocks; /* + * nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks. + * First chop nblks off to page-align it, then convert. + * + * sw->sw_nblks is in page-sized chunks now too. + */ + nblks &= ~(ctodb(1) - 1); + nblks = dbtoc(nblks); + + /* * If we go beyond this, we get overflows in the radix * tree bitmap code. */ @@ -2138,14 +2147,6 @@ swaponsomething(struct vnode *vp, void *id, u_long mblocks); nblks = mblocks; } - /* - * nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks. - * First chop nblks off to page-align it, then convert. - * - * sw->sw_nblks is in page-sized chunks now too. - */ - nblks &= ~(ctodb(1) - 1); - nblks = dbtoc(nblks); sp = malloc(sizeof *sp, M_VMPGDATA, M_WAITOK | M_ZERO); sp->sw_vp = vp; (move pages recalculation before b-list check) Can someone comment on this? >> >> Apparently, the 32GB swapspace limit is per swap area; you can add >> up to 4 swap areas so create two or three 32GB swap partitions. > > Will that enable a 64GB dump? In 8.1, dumpon(8) says: kernel swap pager and dump facility are completely unrelated to each other. The only possible relation is that dumpon rc-script searches first swap device in fstab to notify kernel it should dump on this device. > > The dumpon utility is used to specify a device where the kernel > can save a crash dump in the case of a panic. > ... > For most systems the size of the specified dump device must be > at least the size of physical memory. > ... > The dumpon utility will refuse to enable a dump device which is > smaller than the total amount of physical memory as reported by > the hw.physmem sysctl(8) variable. > > Note the use of the singluar: "a device" and "the specified device". > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > From owner-freebsd-stable@FreeBSD.ORG Thu Aug 18 08:47:28 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4B92F106564A for ; Thu, 18 Aug 2011 08:47:28 +0000 (UTC) (envelope-from yuri@rawbw.com) Received: from shell0.rawbw.com (shell0.rawbw.com [198.144.192.45]) by mx1.freebsd.org (Postfix) with ESMTP id 218848FC0A for ; Thu, 18 Aug 2011 08:47:27 +0000 (UTC) Received: from eagle.yuri.org (stunnel@localhost [127.0.0.1]) (authenticated bits=0) by shell0.rawbw.com (8.14.4/8.14.4) with ESMTP id p7I8lQAc037584 for ; Thu, 18 Aug 2011 01:47:27 -0700 (PDT) (envelope-from yuri@rawbw.com) Message-ID: <4E4CD19E.5070108@rawbw.com> Date: Thu, 18 Aug 2011 01:47:26 -0700 From: Yuri User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110716 Thunderbird/5.0 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: WD Advanced Format: do I need to do something special? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2011 08:47:28 -0000 WD has sectors of the size 4kB in their latest hard drives, which is different from the traditional 512B. http://www.wdc.com/advformat http://wdc.custhelp.com/app/answers/detail/a_id/5655 These articles assert that something special should be done in OS to enable high performance of such drives. For ex. WD recommends to install some latest drivers of particular version. But what about FreeBSD? Should it be configured in some special way too for these drive to perform well? Is it aware of 4kB sector size? Yuri From owner-freebsd-stable@FreeBSD.ORG Thu Aug 18 09:17:28 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EE7A8106566C for ; Thu, 18 Aug 2011 09:17:28 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta08.emeryville.ca.mail.comcast.net (qmta08.emeryville.ca.mail.comcast.net [76.96.30.80]) by mx1.freebsd.org (Postfix) with ESMTP id D599B8FC16 for ; Thu, 18 Aug 2011 09:17:28 +0000 (UTC) Received: from omta14.emeryville.ca.mail.comcast.net ([76.96.30.60]) by qmta08.emeryville.ca.mail.comcast.net with comcast id MlHQ1h0021HpZEsA8lHQdR; Thu, 18 Aug 2011 09:17:24 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta14.emeryville.ca.mail.comcast.net with comcast id MlHT1h0011t3BNj8alHTup; Thu, 18 Aug 2011 09:17:27 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 409C9102C1A; Thu, 18 Aug 2011 02:17:27 -0700 (PDT) Date: Thu, 18 Aug 2011 02:17:27 -0700 From: Jeremy Chadwick To: Yuri Message-ID: <20110818091727.GA61715@icarus.home.lan> References: <4E4CD19E.5070108@rawbw.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4E4CD19E.5070108@rawbw.com> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-stable@freebsd.org Subject: Re: WD Advanced Format: do I need to do something special? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2011 09:17:29 -0000 On Thu, Aug 18, 2011 at 01:47:26AM -0700, Yuri wrote: > WD has sectors of the size 4kB in their latest hard drives, which is > different from the traditional 512B. > http://www.wdc.com/advformat > http://wdc.custhelp.com/app/answers/detail/a_id/5655 > > These articles assert that something special should be done in OS to > enable high performance of such drives. For ex. WD recommends to > install some latest drivers of particular version. > But what about FreeBSD? Should it be configured in some special way > too for these drive to perform well? > Is it aware of 4kB sector size? The below advice still applies. Do not skim the page, read it. http://ivoras.net/blog/tree/2011-01-01.freebsd-on-4k-sector-drives.html You will therefore have to go through some manual rigmarole (preferably with gpart(8)) to ensure performance. If you plan on using the disks in ZFS, you get to go through some extra rigmarole. Also be aware that mixed LBA sizes on things like RAID (and possibly ZFS?) may result in abysmal performance. I just got done assisting a user on a forum who had horrible performance on his 2-disk RAID-1 array driven by an Intel ICH9R using Intel's native RST driver under 64-bit Windows. How/why? He bought two drives, both WD10EADS (not a typo). However, one drive was WD10EADS-65M2BX (firmware 01.00A01, 512 byte physical, 512 byte logical) while the other was WD10EADS-11M2B1 (firmware 80.00A80, 4096 byte physical, 512 byte logical). He replaced the WD10EADS-65M2BX drive with another 4KB physical drive and his performance problem disappeared. I only point this out because this could happen to any user. "Oh I need to get a replacement WD10EADS drive for my system... what the heck?!?" This is going to confuse a lot of people, and caught me by surprise when I saw it. Shame on Western Digital for not adjusting the model string! Comparatively, the WD "EARS"-model drives, however, have always been 4KByte physical / 512 byte logical. The logical size is set to 512 to ensure full compatibility with existing and legacy OSes. I'm dreading the day the WD Caviar Black models succumb to all this nonsense. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-stable@FreeBSD.ORG Thu Aug 18 09:21:21 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CABC9106566C; Thu, 18 Aug 2011 09:21:21 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id AB12E8FC22; Thu, 18 Aug 2011 09:21:20 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id MAA10730; Thu, 18 Aug 2011 12:21:17 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4E4CD98C.1000301@FreeBSD.org> Date: Thu, 18 Aug 2011 12:21:16 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110705 Thunderbird/5.0 MIME-Version: 1.0 To: Steven Hartland References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><4E43E272.1060204@FreeBSD.org><62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk><4E440865.1040500@FreeBSD.org><6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk><4E441314.6060606@FreeBSD.org><2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk><4E48D967.9060804@FreeBSD.org><9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk><4E490DAF.1080009@FreeBSD.org><796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk><4E491D01.1090902@FreeBSD.org><570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk><4E4AD35C.7020504@FreeBSD.org><6A7238AED44542A880B082A40304D940@multiplay.co.uk><4E4BA21F.6010805@FreeBSD.org><581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk><4E4BBA7F.30907@FreeBSD.org><88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk> <4E4C22D6.6070407@FreeBSD.org> <4019027648B5493AAC4B654BD821DE88@multiplay.co.! uk> In-Reply-To: <4019027648B5493AAC4B654BD821DE88@multiplay.co.uk> X-Enigmail-Version: 1.2pre Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@FreeBSD.org, freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2011 09:21:21 -0000 on 18/08/2011 02:15 Steven Hartland said the following: > ----- Original Message ----- From: "Andriy Gapon" > >> Thanks to the debug that Steven provided and to the help that I received from >> Kostik, I think that now I understand the basic mechanics of this panic, but, >> unfortunately, not the details of its root cause. >> >> It seems like everything starts with some kind of a race between terminating >> processes in a jail and termination of the jail itself. This is where the >> details are very thin so far. What we see is that a process (http) is in >> exit(2) syscall, in exit1() function actually, and past the place where P_WEXIT >> flag is set and even past the place where p_limit is freed and reset to NULL. >> At that place the thread calls prison_proc_free(), which calls prison_deref(). >> Then, we see that in prison_deref() the thread gets a page fault because of what >> seems like a NULL pointer dereference. That's just the start of the problem and >> its root cause. > > Thats interesting, are you using http as an example or is that something thats > been gleaned from the debugging of our output? I ask as there's only one process > running in each of our jails and thats a single java process. It's from the debug data: p_comm = "httpd" I also would like to ask you to revert the last patch that I sent you (with tf_rip comparisons) and try the patch from Kostik instead. Given what we suspect about the problem, can please also try to provoke the problem by e.g. doing frequent jail restarts or something else that supposedly should hit the bug. -- Andriy Gapon From owner-freebsd-stable@FreeBSD.ORG Thu Aug 18 09:28:11 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5CA9B1065677 for ; Thu, 18 Aug 2011 09:28:11 +0000 (UTC) (envelope-from delphij@gmail.com) Received: from mail-gw0-f54.google.com (mail-gw0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id 1FE598FC0A for ; Thu, 18 Aug 2011 09:28:10 +0000 (UTC) Received: by gwb15 with SMTP id 15so895286gwb.13 for ; Thu, 18 Aug 2011 02:28:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=LZd0CnVhQzZ+BCDJsG6KurDrxeOi40qJiirjkHrqR8M=; b=mwMnFnNP5kH9QDzBGawGPrPvM4OUEA3RfGB5kifEboisQGyyCUt45wYLp3uWbq5BDe IHvn21Kw8U0SpVlDyQkccQhgzsKpu2OiCoaH0CLmSr4PYXsdTIROJbPsrqP0wFqrf7DF 2nX/j/RomDuyAYHQH+Zb4GRAgeHmprAHpmAmM= MIME-Version: 1.0 Received: by 10.151.157.11 with SMTP id j11mr446965ybo.392.1313658029060; Thu, 18 Aug 2011 02:00:29 -0700 (PDT) Received: by 10.150.136.11 with HTTP; Thu, 18 Aug 2011 02:00:29 -0700 (PDT) In-Reply-To: <4E4CD19E.5070108@rawbw.com> References: <4E4CD19E.5070108@rawbw.com> Date: Thu, 18 Aug 2011 02:00:29 -0700 Message-ID: From: Xin LI To: Yuri Content-Type: text/plain; charset=UTF-8 Cc: freebsd-stable@freebsd.org Subject: Re: WD Advanced Format: do I need to do something special? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2011 09:28:11 -0000 Hi, On Thu, Aug 18, 2011 at 1:47 AM, Yuri wrote: > WD has sectors of the size 4kB in their latest hard drives, which is > different from the traditional 512B. > http://www.wdc.com/advformat > http://wdc.custhelp.com/app/answers/detail/a_id/5655 > > These articles assert that something special should be done in OS to enable > high performance of such drives. For ex. WD recommends to install some > latest drivers of particular version. > But what about FreeBSD? Should it be configured in some special way too for > these drive to perform well? > Is it aware of 4kB sector size? The FreeBSD driver detects 4k drives. At this time as far as I know all AF drives on market advertises 512-bytes sector rather than 4k (mostly for compatibility with BIOS, etc). If they advertise 4k sector natively, you don't have to do anything special but currently you need to make sure: - FS Partitions starts at a 4k boundary; - FS is aware of 4k sector, e.g. through gnop -S 4k for ZFS, which will remember this so you don't have to do that at later time. For UFS you may want to specify larger fragment size and block size (4k/32k for example). Some newly developed application like FreeNAS already detect this and make adjustment for you by default. We need to check and make sure that our base system tools, especially installer, would do that though. Cheers, -- Xin LI https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die From owner-freebsd-stable@FreeBSD.ORG Thu Aug 18 09:55:39 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DBF6F106566C for ; Thu, 18 Aug 2011 09:55:38 +0000 (UTC) (envelope-from yuri@rawbw.com) Received: from shell0.rawbw.com (shell0.rawbw.com [198.144.192.45]) by mx1.freebsd.org (Postfix) with ESMTP id C88B08FC16 for ; Thu, 18 Aug 2011 09:55:38 +0000 (UTC) Received: from eagle.yuri.org (stunnel@localhost [127.0.0.1]) (authenticated bits=0) by shell0.rawbw.com (8.14.4/8.14.4) with ESMTP id p7I9tbTB049134; Thu, 18 Aug 2011 02:55:38 -0700 (PDT) (envelope-from yuri@rawbw.com) Message-ID: <4E4CE199.8030104@rawbw.com> Date: Thu, 18 Aug 2011 02:55:37 -0700 From: Yuri User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110716 Thunderbird/5.0 MIME-Version: 1.0 To: Jeremy Chadwick References: <4E4CD19E.5070108@rawbw.com> <20110818091727.GA61715@icarus.home.lan> In-Reply-To: <20110818091727.GA61715@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-stable@freebsd.org Subject: Re: WD Advanced Format: do I need to do something special? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2011 09:55:39 -0000 On 08/18/2011 02:17, Jeremy Chadwick wrote: > The below advice still applies. Do not skim the page, read it. > > http://ivoras.net/blog/tree/2011-01-01.freebsd-on-4k-sector-drives.html > > You will therefore have to go through some manual rigmarole (preferably > with gpart(8)) to ensure performance. If you plan on using the disks in > ZFS, you get to go through some extra rigmarole. I didn't know about such extra actions that are required and just created ZFS pool. zdb -C shows ashift as 9. I read it as meaning that sector size if 512bytes (wrong!). But I tested the 25GB file writing/reading speed on the middle tracks and it seems reasonable: WR 55MB/s RD 107MB/s So can I get even better speeds if it was aware of 4k sector? Yuri From owner-freebsd-stable@FreeBSD.ORG Thu Aug 18 10:09:13 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5BB1F106564A for ; Thu, 18 Aug 2011 10:09:13 +0000 (UTC) (envelope-from marc@blackend.org) Received: from smtp6-g21.free.fr (unknown [IPv6:2a01:e0c:1:1599::15]) by mx1.freebsd.org (Postfix) with ESMTP id C86738FC17 for ; Thu, 18 Aug 2011 10:09:11 +0000 (UTC) Received: from emphyrio.blackend.org (unknown [88.179.1.53]) by smtp6-g21.free.fr (Postfix) with ESMTP id 228F6822A5; Thu, 18 Aug 2011 12:09:03 +0200 (CEST) Received: from emphyrio.blackend.org (localhost [127.0.0.1]) by emphyrio.blackend.org (8.14.5/8.14.4) with ESMTP id p7IAAZkF002328; Thu, 18 Aug 2011 12:10:35 +0200 (CEST) (envelope-from marc@emphyrio.blackend.org) Received: (from marc@localhost) by emphyrio.blackend.org (8.14.5/8.14.4/Submit) id p7IAAYAg002327; Thu, 18 Aug 2011 12:10:34 +0200 (CEST) (envelope-from marc) Date: Thu, 18 Aug 2011 12:10:34 +0200 From: Marc Fonvieille To: Yuri Message-ID: <20110818101034.GA1958@emphyrio.blackend.org> References: <4E4CD19E.5070108@rawbw.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4E4CD19E.5070108@rawbw.com> X-Useless-Header: blackend.org X-Operating-System: FreeBSD 8.2-STABLE User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-stable@freebsd.org Subject: Re: WD Advanced Format: do I need to do something special? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2011 10:09:13 -0000 On Thu, Aug 18, 2011 at 01:47:26AM -0700, Yuri wrote: > WD has sectors of the size 4kB in their latest hard drives, which is > different from the traditional 512B. > http://www.wdc.com/advformat > http://wdc.custhelp.com/app/answers/detail/a_id/5655 > > These articles assert that something special should be done in OS to > enable high performance of such drives. For ex. WD recommends to install > some latest drivers of particular version. > But what about FreeBSD? Should it be configured in some special way too > for these drive to perform well? > Is it aware of 4kB sector size? > I own that (I'm running 8-STABLE): ada0 at ahcich2 bus 0 scbus2 target 0 lun 0 ada0: ATA-8 SATA 2.x device ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) which has 4kB sectors but says "512 byte sectors" :) I use the whole disk for the FreeBSD slice, I aligned all partitions on a multiple of 8 sectors (512*8=4096). By default fdisk(8) uses a 63 sectors default offset: ******* Working on device /dev/ada0 ******* parameters extracted from in-core disklabel are: cylinders=1938021 heads=16 sectors/track=63 (1008 blks/cyl) Figures below won't work with BIOS for partitions not in cyl 1 parameters to be used for BIOS calculations are: cylinders=1938021 heads=16 sectors/track=63 (1008 blks/cyl) Media sector size is 512 Warning: BIOS sector numbering starts with sector 1 Information from DOS bootblock is: The data for partition 1 is: sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD) start 63, size 1953525105 (953869 Meg), flag 80 (active) beg: cyl 0/ head 1/ sector 1; end: cyl 1023/ head 15/ sector 63 The data for partition 2 is: The data for partition 3 is: The data for partition 4 is: Look at "start 63" statement. Instead of fixing fdisk(8) behavior, I just correctly edited my bsdlabel(8) table: # /dev/ada0s1: 8 partitions: # size offset fstype [fsize bsize bps/cpg] a: 4194304 17 4.2BSD 0 0 0 b: 8388608 4194321 swap c: 1953525105 0 unused 0 0 # "raw" part, don't edit d: 16777216 12582929 4.2BSD 0 0 0 e: 1924163584 29360145 4.2BSD 0 0 0 The important part is the offset 17 to correct the fdisk(8) offset (16+1 to align the previous 63). The remaining offsets are calculted from the size I gave for the partitions (in MB, which can be divided by 8). Then I used newfs(8) with the option "-f 4096". There's another painful issue with this disk: the automatic head-parking after few seconds. I disabled it (with wdidle3) cause after 2 months of use, I was at more than 35000 head-parkings... -- Marc From owner-freebsd-stable@FreeBSD.ORG Thu Aug 18 10:47:04 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0AC10106566C; Thu, 18 Aug 2011 10:47:04 +0000 (UTC) (envelope-from prvs=12111cb08a=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 312988FC18; Thu, 18 Aug 2011 10:47:02 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Thu, 18 Aug 2011 11:35:21 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Thu, 18 Aug 2011 11:35:21 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014645776.msg; Thu, 18 Aug 2011 11:35:20 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=12111cb08a=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: From: "Steven Hartland" To: "Andriy Gapon" References: uk> <4E4CD98C.1000301@FreeBSD.org> Date: Thu, 18 Aug 2011 11:35:58 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: freebsd-hackers@FreeBSD.org, freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2011 10:47:04 -0000 ----- Original Message ----- From: "Andriy Gapon" >> Thats interesting, are you using http as an example or is that something thats >> been gleaned from the debugging of our output? I ask as there's only one process >> running in each of our jails and thats a single java process. > > > It's from the debug data: p_comm = "httpd" Hmm, there's only one httpd thats ever run on the machine and thats not in the jail its on the raw machine. > I also would like to ask you to revert the last patch that I sent you (with tf_rip > comparisons) and try the patch from Kostik instead. Sure. > Given what we suspect about the problem, can please also try to provoke the > problem by e.g. doing frequent jail restarts or something else that supposedly > should hit the bug. I've tried doing this for quite some days on the test machine, but I've been unable to provoke it, will continue to try. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-stable@FreeBSD.ORG Thu Aug 18 11:11:08 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 32E58106566C; Thu, 18 Aug 2011 11:11:08 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 21EA88FC1B; Thu, 18 Aug 2011 11:11:06 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA12584; Thu, 18 Aug 2011 14:11:04 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4E4CF347.6030908@FreeBSD.org> Date: Thu, 18 Aug 2011 14:11:03 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110705 Thunderbird/5.0 MIME-Version: 1.0 To: Steven Hartland References: uk> <4E4CD98C.1000301@FreeBSD.org> In-Reply-To: X-Enigmail-Version: 1.2pre Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@FreeBSD.org, freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2011 11:11:08 -0000 on 18/08/2011 13:35 Steven Hartland said the following: > ----- Original Message ----- From: "Andriy Gapon" >>> Thats interesting, are you using http as an example or is that something thats >>> been gleaned from the debugging of our output? I ask as there's only one process >>> running in each of our jails and thats a single java process. >> >> >> It's from the debug data: p_comm = "httpd" > > Hmm, there's only one httpd thats ever run on the machine and thats not in the jail > its on the raw machine. Probably I have mistakenly assumed that the 'prison' in prison_derefer() has something to do with an actual jail, while it could have been just prison0 where all non-jailed processes belong. -- Andriy Gapon From owner-freebsd-stable@FreeBSD.ORG Thu Aug 18 11:26:06 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 17692106566C; Thu, 18 Aug 2011 11:26:06 +0000 (UTC) (envelope-from prvs=12111cb08a=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 08B508FC16; Thu, 18 Aug 2011 11:26:04 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Thu, 18 Aug 2011 12:24:30 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Thu, 18 Aug 2011 12:24:30 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014646198.msg; Thu, 18 Aug 2011 12:24:29 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=12111cb08a=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: From: "Steven Hartland" To: "Andriy Gapon" References: uk> <4E4CD98C.1000301@FreeBSD.org> <4E4CF347.6030908@FreeBSD.org> Date: Thu, 18 Aug 2011 12:25:12 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: freebsd-hackers@FreeBSD.org, freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2011 11:26:06 -0000 ----- Original Message ----- From: "Andriy Gapon" > Probably I have mistakenly assumed that the 'prison' in prison_derefer() has > something to do with an actual jail, while it could have been just prison0 where > all non-jailed processes belong. That makes sense as this particular panic was caused by a machine reboot, which is slightly different from the more common jail panic we're seeing. Doesn't help with our reproduction scenario though unfortunately. If we don't have any joy reproducing on our single test machine I'll have this kernel rolled out across a portion of the farm, which should mean we see the panic results in a few days time. I understand there's a risk involved in this but, its important for us to determine the cause and get a confirmed fix, as well as being able to prove that the panic fix works which will help everyone in the long run. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-stable@FreeBSD.ORG Thu Aug 18 13:58:09 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 29C121065676; Thu, 18 Aug 2011 13:58:09 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from elsa.codelab.cz (elsa.codelab.cz [94.124.105.4]) by mx1.freebsd.org (Postfix) with ESMTP id 6DA6C8FC1C; Thu, 18 Aug 2011 13:58:08 +0000 (UTC) Received: from elsa.codelab.cz (localhost [127.0.0.1]) by elsa.codelab.cz (Postfix) with ESMTP id 99C2828426; Thu, 18 Aug 2011 15:58:06 +0200 (CEST) Received: from [192.168.1.2] (ip-86-49-61-235.net.upcbroadband.cz [86.49.61.235]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by elsa.codelab.cz (Postfix) with ESMTPSA id A345328424; Thu, 18 Aug 2011 15:58:05 +0200 (CEST) Message-ID: <4E4D1A6C.7060604@quip.cz> Date: Thu, 18 Aug 2011 15:58:04 +0200 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.1.19) Gecko/20110420 Lightning/1.0b1 SeaMonkey/2.0.14 MIME-Version: 1.0 To: Artem Belevich References: <4E4BC38D.1050808@quip.cz> <4E4BCCC3.60601@digsys.bg> <4E4C1945.5030504@quip.cz> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-stable@freebsd.org, Daniel Kalchev Subject: Re: can not boot from RAIDZ with 8-STABLE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2011 13:58:09 -0000 Artem Belevich wrote: > On Wed, Aug 17, 2011 at 12:40 PM, Miroslav Lachman<000.fbsd@quip.cz> wrote: >> Thank you guys, you are right. The BIOS provides only 1 disk to the loader! >> I checked it from loader prompt by lsdev (booted from USB external HDD). >> >> So I will try to make a small zpool mirror for root and boot (if ZFS mirror >> can be made of 4 providers instead of two) and the rest will be in RAIDZ. >> >> If that fails, I will go my old way with internal USB flash disk with UFS >> for booting and RAIDZ of 4 disks for storage as I did it few years ago with >> 7.0 or 7.1. > > You seem to be booting from disks attached to some sort of add-on > card. Sometimes those have per-disk 'bootable' option in their own > extension ROM. You may investigate yours. Perhaps all you need to do > is just tweak controller settings. Advanced controller settings allows me to choose which disk will be bootable - but I can mark just one of them, not all. So my working setup is made from 2 pools. First is 4 way ZFS mirror for / (root), second is RAIDZ for the rest. (plus swap made on the top of gmirrored partitions) Each disk has following partitions: # gpart show da0 => 34 976773101 da0 GPT (465G) 34 128 1 freebsd-boot (64k) 162 8388608 2 freebsd-swap (4.0G) 8388770 20971520 3 freebsd-zfs (10G) 29360290 943718400 4 freebsd-zfs (450G) 973078690 3694445 - free - (1.8G) # zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT sys 9.94G 781M 9.17G 7% 1.00x ONLINE - tank 1.75T 4.77G 1.75T 0% 1.00x ONLINE - Filesystem Size Mounted on sys/root 9.8G / devfs 1.0k /dev tank/tmp 1.3T /tmp tank/usr/home 1.3T /usr/home tank/usr/home/quip 1.3T /usr/home/quip tank/usr/local 1.3T /usr/local tank/usr/obj 1.3T /usr/obj tank/usr/ports 1.3T /usr/ports tank/usr/ports/distfiles 1.3T /usr/ports/distfiles tank/usr/ports/packages 1.3T /usr/ports/packages tank/usr/src 1.3T /usr/src tank/var/amavis 1.3T /var/amavis tank/var/audit 1.3T /var/audit tank/var/crash 1.3T /var/crash tank/var/db 1.3T /var/db tank/var/db/mysql 1.3T /var/db/mysql tank/var/log 1.3T /var/log tank/var/mail 1.3T /var/mail tank/var/tmp 1.3T /var/tmp tank/var/virusmails 1.3T /var/virusmails tank/vol0 1.3T /vol0 I hope that it helps to somebody with similar problem. Miroslav Lachman From owner-freebsd-stable@FreeBSD.ORG Thu Aug 18 14:31:28 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E2C4F106566B; Thu, 18 Aug 2011 14:31:28 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id B50648FC1F; Thu, 18 Aug 2011 14:31:27 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA15396; Thu, 18 Aug 2011 17:31:23 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4E4D222E.2090802@FreeBSD.org> Date: Thu, 18 Aug 2011 17:31:10 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110705 Thunderbird/5.0 MIME-Version: 1.0 To: Steven Hartland , freebsd-jail@FreeBSD.org References: uk> <4E4CD98C.1000301@FreeBSD.org> <4E4CF347.6030908@FreeBSD.org> In-Reply-To: <4E4CF347.6030908@FreeBSD.org> X-Enigmail-Version: 1.2pre Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-hackers , freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2011 14:31:29 -0000 on 18/08/2011 14:11 Andriy Gapon said the following: > Probably I have mistakenly assumed that the 'prison' in prison_derefer() has > something to do with an actual jail, while it could have been just prison0 where > all non-jailed processes belong. So, indeed: (kgdb) p $2->p_ucred->cr_prison $10 = (struct prison *) 0xffffffff807d5080 (kgdb) p &prison0 $11 = (struct prison *) 0xffffffff807d5080 (kgdb) p *$2->p_ucred->cr_prison $12 = {pr_list = {tqe_next = 0x0, tqe_prev = 0x0}, pr_id = 0, pr_ref = 398, pr_uref = 0, pr_flags = 386, pr_children = {lh_first = 0x0}, pr_sibling = {le_next = 0x0, le_prev = 0x0}, pr_parent = 0x0, pr_mtx = {lock_object = {lo_name = 0xffffffff8063007c "jail mutex", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, pr_task = {ta_link = {stqe_next = 0x0}, ta_pending = 0, ta_priority = 0, ta_func = 0, ta_context = 0x0}, pr_osd = {osd_nslots = 0, osd_slots = 0x0, osd_next = {le_next = 0x0, le_prev = 0x0}}, pr_cpuset = 0xffffff0012d65dc8, pr_vnet = 0x0, pr_root = 0xffffff00166ebce8, pr_ip4s = 0, pr_ip6s = 0, pr_ip4 = 0x0, pr_ip6 = 0x0, pr_sparep = {0x0, 0x0, 0x0, 0x0}, pr_childcount = 0, pr_childmax = 999999, pr_allow = 127, pr_securelevel = -1, pr_enforce_statfs = 0, pr_spare = {0, 0, 0, 0, 0}, pr_hostid = 3251597242, pr_name = "0", '\0' , pr_path = "/", '\0' , pr_hostname = "censored", '\0' , pr_domainname = '\0' , pr_hostuuid = "54443842-0054-2500-902c-0025902c3cb0", '\0' } Also, let's consider this code: if (flags & PD_DEUREF) { for (tpr = pr;; tpr = tpr->pr_parent) { if (tpr != pr) mtx_lock(&tpr->pr_mtx); if (--tpr->pr_uref > 0) break; KASSERT(tpr != &prison0, ("prison0 pr_uref=0")); mtx_unlock(&tpr->pr_mtx); } /* Done if there were only user references to remove. */ if (!(flags & PD_DEREF)) { mtx_unlock(&tpr->pr_mtx); if (flags & PD_LIST_SLOCKED) sx_sunlock(&allprison_lock); else if (flags & PD_LIST_XLOCKED) sx_xunlock(&allprison_lock); return; } if (tpr != pr) { mtx_unlock(&tpr->pr_mtx); mtx_lock(&pr->pr_mtx); } } The most suspicious thing is that pr_uref is zero in the debug data. With INVARIANTS we would hit the "prison0 pr_uref=0" KASSERT. Then, because this is prison0 and because pr_uref reached zero, tpr gets assigned to NULL. And then because tpr != pr we try to execute mtx_unlock(&tpr->pr_mtx). That's where the NULL pointer deref happens. So, now the big question is how/why we reached pr_uref == 0. -- Andriy Gapon From owner-freebsd-stable@FreeBSD.ORG Thu Aug 18 17:04:26 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E011D106564A for ; Thu, 18 Aug 2011 17:04:26 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 438BF8FC15 for ; Thu, 18 Aug 2011 17:04:25 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id UAA17015; Thu, 18 Aug 2011 20:04:11 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4E4D460A.2080100@FreeBSD.org> Date: Thu, 18 Aug 2011 20:04:10 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110705 Thunderbird/5.0 MIME-Version: 1.0 To: Andrew Boyer References: In-Reply-To: X-Enigmail-Version: 1.2pre Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-stable@FreeBSD.org, Eugene Grosbein , Vishal.Shah@netapp.com, Hans Petter Selasky , Jeremiah Lott , Steven Hartland Subject: Re: USB/coredump hangs in 8 and 9 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2011 17:04:27 -0000 on 12/08/2011 22:59 Andrew Boyer said the following: > Re: panic: bufwrite: buffer is not busy??? (originally on freebsd-net) > > Re: debugging frequent kernel panics on 8.2-RELEASE (originally on freebsd-stable) > > Re: System hang in USB umass module while processing panic (originally on > freebsd-usb) > > Hello Andriy and Hans, > > Sorry for tying in so many discussions on this topic, but I think I have an > explanation for the problems we have been reporting* with hanging coredumps on > multicore systems on 8.2-RELEASE, and it has implications for Andriy's proposed > scheduler patch** and for USB. > > In today's 8.X and 9.X branches, nothing that I can find stops the other CPUs when > the kernel panics, but many parts of the locking code get disabled (grep on > 'panicstr'). The 'bufwrite: buffer is not busy???' panic is caused by the syncer > encountering an error. If that happens when it's on the dumping CPU everything > hangs. If it's running on a different CPU, it will be blocked and hidden by the > panic_cpu spinlock in panic(), and the dump continues, polling every attached > keyboard for a Ctl-C. > > But, the new 8.X USB stack relies on multithreading. (The new stack is the > variable that broke coredumps for us in the 7.1->8.2 transition, I think.) SVN > 224223 fixes a hang that would happen when dumpsys() polls the USB keyboard (IPMI > KVM, in our case). That helps, but it only gets as far as usb_process(), where it > hangs in a loop around a cv_wait() call. This is easy to reproduce by adding code > to the watchdog to break into the debugger if panicstr is set. > > I am experimenting with Andriy's patch** to stop the scheduler and it seems to be > most of the way there, stopping the CPUs and disabling the rest of locking. There > are a few places that still reference panicstr, but that's minor. These are the > changes I made to the patch: > * Changed ukbd_do_poll() to return immediately if SCHEDULER_STOPPED() is true, so > that we don't hang up in USB. ukbd_yield() locks up in DROP_GIANT(), and if you > skip ukbd_yield(), usbd_transfer_poll() locks up trying to drop mutexes. Hmm, this is a little bit unexpected. I though that with the patch all the mutex/lock operations would be skipped. Can you please check which locks give you the trouble and why? I would like to improve the patch, so that all lock operations are by-passed (whether locking or unlocking). > * Changed the call to spinlock_enter() back to critical_enter(), so that > interrupts stay enabled and the hardclock still functions. Not sure if I like this idea in general. > * Added code in the beginning of panic() to switch to CPU 0, so that we're able > to service the hardclock interrupts and so that watchdog panics get through. Also I wouldn't like switching a panic thread to a different CPU as that messes up with a lot of state and is not safe for an arbitrary context. Also, can you please clarify what you meant by "watchdog panics get through"? Do you talk about SW_WATCHDOG specifically? > This has worked 100% for me so far, although anyone using a USB keyboard or dump > device would still be out of luck. > > Thoughts? It seems like stopping all of the other CPUs is the right thing to do > on a panic (what are they doing otherwise?). Are the USB issues fixable? If > Andriy's patch get committed it might just involve short-circuiting all of the > locking in the polling path, but I haven't gotten that far yet. I bet dumping to > NFS will have the same problem. I think that no subsystem should rely on working scheduling and interrupts in post-panic world. In fact, all the code for skipping locking is just a giant hack/workaround in my opinion. Ideally, all the subsystems that can be expected to be called after panic should be aware of that and should check for that. So they should not attempt any locking or switching threads or rebinding CPUs or expect interrupts, etc. The environment should mirror early boot where we have only one CPU, only one thread, no interrupts, only polling. If you can help Hans to figure out what you is wrong with USB subsystem in this respect that would help us all. Thank you for your testing and feedback! -- Andriy Gapon From owner-freebsd-stable@FreeBSD.ORG Thu Aug 18 20:09:38 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 807AB1065672; Thu, 18 Aug 2011 20:09:38 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 786778FC0A; Thu, 18 Aug 2011 20:09:37 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA18855; Thu, 18 Aug 2011 23:09:36 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1Qu8uF-000H9C-QY; Thu, 18 Aug 2011 23:09:35 +0300 Message-ID: <4E4D717F.3090802@FreeBSD.org> Date: Thu, 18 Aug 2011 23:09:35 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:6.0) Gecko/20110817 Thunderbird/6.0 MIME-Version: 1.0 To: freebsd-hackers@FreeBSD.org References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org> <4E43E272.1060204@FreeBSD.org> <62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk> <4E440865.1040500@FreeBSD.org> <6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk> <4E441314.6060606@FreeBSD.org> <2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk> <4E48D967.9060804@FreeBSD.org> <9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk> <4E490DAF.1080009@FreeBSD.org> <796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk> <4E491D01.1090902@FreeBSD.org> <570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk> <4E4AD35C.7020504@FreeBSD.org> <6A7238AED44542A880B082A40304D940@multiplay.co.uk> <4E4BA21F.6010805@FreeBSD.org> <581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk> <4E4BBA7F.30907@FreeBSD.org> <88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk> <4E4C22D6.6070407@FreeBSD.org> In-Reply-To: <4E4C22D6.6070407@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2011 20:09:38 -0000 on 17/08/2011 23:21 Andriy Gapon said the following: > It seems like everything starts with some kind of a race between terminating > processes in a jail and termination of the jail itself. This is where the > details are very thin so far. What we see is that a process (http) is in > exit(2) syscall, in exit1() function actually, and past the place where P_WEXIT > flag is set and even past the place where p_limit is freed and reset to NULL. > At that place the thread calls prison_proc_free(), which calls prison_deref(). > Then, we see that in prison_deref() the thread gets a page fault because of what > seems like a NULL pointer dereference. That's just the start of the problem and > its root cause. > > Then, trap_pfault() gets invoked and, because addresses close to NULL look like > userspace addresses, vm_fault/vm_fault_hold gets called, which in its turn goes > on to call vm_map_growstack. First thing that vm_map_growstack does is a call > to lim_cur(), but because p_limit is already NULL, that call results in a NULL > pointer dereference and a page fault. Goto the beginning of this paragraph. > > So we get this recursion of sorts, which only ends when a stack is exhausted and > a CPU generates a double-fault. BTW, does anyone has an idea why the thread in question would "disappear" from the kgdb's point of view? (kgdb) p cpuid_to_pcpu[2]->pc_curthread->td_tid $3 = 102057 (kgdb) tid 102057 invalid tid info threads also doesn't list the thread. Is it because the panic happened while the thread was somewhere in exit1()? is there an easy way to examine its stack in this case? -- Andriy Gapon From owner-freebsd-stable@FreeBSD.ORG Thu Aug 18 20:11:46 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 79E591065674; Thu, 18 Aug 2011 20:11:46 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-yi0-f54.google.com (mail-yi0-f54.google.com [209.85.218.54]) by mx1.freebsd.org (Postfix) with ESMTP id 1A8BA8FC08; Thu, 18 Aug 2011 20:11:45 +0000 (UTC) Received: by yib19 with SMTP id 19so2062868yib.13 for ; Thu, 18 Aug 2011 13:11:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=mJIC3dzSZFEfm+d2PPNPw14QLDbIUvhvYDXvpi+2xIs=; b=EEnO+Rzf3v4M5H2oK39bX1ICVx+TjHgVcpuYSLAx7JdLCTGDefHPWI0jZsA+SGiDeR MVd0NLPHbR9Mxtwh0kMu4xwL7Nc80S8M92vIzSpVhjYHmPsI8j/83BEQxriB/Qlw6Df4 sAFhZWoqMqR6hQ2w1rPVGiCptnAIxMw0m1YEA= MIME-Version: 1.0 Received: by 10.236.143.5 with SMTP id k5mr1139332yhj.9.1313698305236; Thu, 18 Aug 2011 13:11:45 -0700 (PDT) Sender: asmrookie@gmail.com Received: by 10.236.108.33 with HTTP; Thu, 18 Aug 2011 13:11:44 -0700 (PDT) In-Reply-To: <4E4D717F.3090802@FreeBSD.org> References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk> <4E4380C0.7070908@FreeBSD.org> <4E43E272.1060204@FreeBSD.org> <62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk> <4E440865.1040500@FreeBSD.org> <6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk> <4E441314.6060606@FreeBSD.org> <2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk> <4E48D967.9060804@FreeBSD.org> <9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk> <4E490DAF.1080009@FreeBSD.org> <796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk> <4E491D01.1090902@FreeBSD.org> <570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk> <4E4AD35C.7020504@FreeBSD.org> <6A7238AED44542A880B082A40304D940@multiplay.co.uk> <4E4BA21F.6010805@FreeBSD.org> <581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk> <4E4BBA7F.30907@FreeBSD.org> <88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk> <4E4C22D6.6070407@FreeBSD.org> <4E4D717F.3090802@FreeBSD.org> Date: Thu, 18 Aug 2011 22:11:44 +0200 X-Google-Sender-Auth: i75Ofelh7IObcWFwDsnjQQzRudA Message-ID: From: Attilio Rao To: Andriy Gapon Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-hackers@freebsd.org, freebsd-stable@freebsd.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2011 20:11:46 -0000 2011/8/18 Andriy Gapon : > on 17/08/2011 23:21 Andriy Gapon said the following: >> >> It seems like everything starts with some kind of a race between >> terminating >> processes in a jail and termination of the jail itself. =C2=A0This is wh= ere the >> details are very thin so far. =C2=A0What we see is that a process (http)= is in >> exit(2) syscall, in exit1() function actually, and past the place where >> P_WEXIT >> flag is set and even past the place where p_limit is freed and reset to >> NULL. >> At that place the thread calls prison_proc_free(), which calls >> prison_deref(). >> Then, we see that in prison_deref() the thread gets a page fault because >> of what >> seems like a NULL pointer dereference. =C2=A0That's just the start of th= e >> problem and >> its root cause. >> >> Then, trap_pfault() gets invoked and, because addresses close to NULL lo= ok >> like >> userspace addresses, vm_fault/vm_fault_hold gets called, which in its tu= rn >> goes >> on to call vm_map_growstack. =C2=A0First thing that vm_map_growstack doe= s is a >> call >> to lim_cur(), but because p_limit is already NULL, that call results in = a >> NULL >> pointer dereference and a page fault. =C2=A0Goto the beginning of this >> paragraph. >> >> So we get this recursion of sorts, which only ends when a stack is >> exhausted and >> a CPU generates a double-fault. > > BTW, does anyone has an idea why the thread in question would "disappear" > from > the kgdb's point of view? > > (kgdb) p cpuid_to_pcpu[2]->pc_curthread->td_tid > $3 =3D 102057 > (kgdb) tid 102057 > invalid tid > > info threads also doesn't list the thread. > > Is it because the panic happened while the thread was somewhere in exit1(= )? > is there an easy way to examine its stack in this case? Yes it is likely it. 'tid' command should lookup the tid_to_thread() table (or similar name) which returns NULL, which means the thread has past beyond the point it was in the lookup table. Attilio --=20 Peace can only be achieved by understanding - A. Einstein From owner-freebsd-stable@FreeBSD.ORG Thu Aug 18 21:27:27 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E07E21065674; Thu, 18 Aug 2011 21:27:27 +0000 (UTC) (envelope-from hselasky@c2i.net) Received: from swip.net (mailfe01.c2i.net [212.247.154.2]) by mx1.freebsd.org (Postfix) with ESMTP id 31E8B8FC16; Thu, 18 Aug 2011 21:27:26 +0000 (UTC) X-Cloudmark-Score: 0.000000 [] X-Cloudmark-Analysis: v=1.1 cv=yfIOS+81wnQIz0UwZPDdWOvE/jQxEvyI9Z1xC25I9wc= c=1 sm=1 a=SvYTsOw2Z4kA:10 a=EPV5yV1zpIAA:10 a=WQU8e4WWZSUA:10 a=8nJEP1OIZ-IA:10 a=CL8lFSKtTFcA:10 a=i9M/sDlu2rpZ9XS819oYzg==:17 a=oOb7PSFi1HuzztIfN6YA:9 a=XpotWNjDgNPmgIl742UA:7 a=wPNLvfGTeEIA:10 a=i9M/sDlu2rpZ9XS819oYzg==:117 Received: from [188.126.198.129] (account mc467741@c2i.net HELO laptop002.hselasky.homeunix.org) by mailfe01.swip.net (CommuniGate Pro SMTP 5.2.19) with ESMTPA id 168437914; Thu, 18 Aug 2011 23:27:25 +0200 From: Hans Petter Selasky To: Andriy Gapon Date: Thu, 18 Aug 2011 23:24:58 +0200 User-Agent: KMail/1.13.5 (FreeBSD/8.2-STABLE; KDE/4.4.5; amd64; ; ) References: <4E4D460A.2080100@FreeBSD.org> In-Reply-To: <4E4D460A.2080100@FreeBSD.org> X-Face: *nPdTl_}RuAI6^PVpA02T?$%Xa^>@hE0uyUIoiha$pC:9TVgl.Oq, NwSZ4V"|LR.+tj}g5 %V,x^qOs~mnU3]Gn; cQLv&.N>TrxmSFf+p6(30a/{)KUU!s}w\IhQBj}[g}bj0I3^glmC( :AuzV9:.hESm-x4h240C`9=w MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201108182324.58276.hselasky@c2i.net> Cc: freebsd-stable@freebsd.org, Eugene Grosbein , Andrew Boyer , Vishal.Shah@netapp.com, Jeremiah Lott , Steven Hartland Subject: Re: USB/coredump hangs in 8 and 9 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2011 21:27:28 -0000 On Thursday 18 August 2011 19:04:10 Andriy Gapon wrote: > If you can help Hans to figure out what you is wrong with USB subsystem in > this respect that would help us all. Hi, usb_busdma.c: /* we use "mtx_owned()" instead of this function */ usb_busdma.c: owned = mtx_owned(uptag->mtx); usb_compat_linux.c: do_unlock = mtx_owned(&Giant) ? 0 : 1; usb_compat_linux.c: do_unlock = mtx_owned(&Giant) ? 0 : 1; usb_compat_linux.c: do_unlock = mtx_owned(&Giant) ? 0 : 1; usb_hub.c: if (mtx_owned(&bus->bus_mtx)) { usb_transfer.c: if (!mtx_owned(info->xfer_mtx)) { usb_transfer.c: if (mtx_owned(xfer->xroot->xfer_mtx)) { usb_transfer.c: while (mtx_owned(&xroot->udev->bus->bus_mtx)) { usb_transfer.c: while (mtx_owned(xroot->xfer_mtx)) { One fix you will need to do, if mtx_owned is not giving correct value is: static void usbd_callback_wrapper(struct usb_xfer_queue *pq) { struct usb_xfer *xfer = pq->curr; struct usb_xfer_root *info = xfer->xroot; USB_BUS_LOCK_ASSERT(info->bus, MA_OWNED); if (!mtx_owned(info->xfer_mtx)) { The above "if" should be anded with && !paniced && !dumping ... or maybe the new not scheduling variable is good for this purpose? /* * Cases that end up here: * #if USB_HAVE_BUSDMA if (mtx_owned(xfer->xroot->xfer_mtx)) { struct usb_xfer_queue *pq; This case is more like a BUS-DMA error case, and is not so important to execute. --HPS From owner-freebsd-stable@FreeBSD.ORG Thu Aug 18 23:11:09 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7B373106564A for ; Thu, 18 Aug 2011 23:11:09 +0000 (UTC) (envelope-from kob6558@gmail.com) Received: from mail-gx0-f182.google.com (mail-gx0-f182.google.com [209.85.161.182]) by mx1.freebsd.org (Postfix) with ESMTP id 3E5E18FC0C for ; Thu, 18 Aug 2011 23:11:08 +0000 (UTC) Received: by gxk28 with SMTP id 28so2174010gxk.13 for ; Thu, 18 Aug 2011 16:11:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=ngyvh7D8CEvIKeuadPquGSvXdSfj0bRwNPIWTyTDoxI=; b=bppoBXYMtDvN0/hUUEhuIVHa0r30vdXzJScqDmBB4H8BIzDgpGnZaGSilpCCW3x6xx Jz1LwMx+lkwaZgiuZvYCbM52y+ud1KWFeZJyg7A8OWQjdn5GQfdxqjxyjmF5LCNSD5oV UUWSh8cwrdm9RxWmWZZeOZCH5EgGLpKZVQJbE= MIME-Version: 1.0 Received: by 10.150.74.10 with SMTP id w10mr1398144yba.224.1313709068244; Thu, 18 Aug 2011 16:11:08 -0700 (PDT) Received: by 10.151.98.3 with HTTP; Thu, 18 Aug 2011 16:11:08 -0700 (PDT) In-Reply-To: <20110818101034.GA1958@emphyrio.blackend.org> References: <4E4CD19E.5070108@rawbw.com> <20110818101034.GA1958@emphyrio.blackend.org> Date: Thu, 18 Aug 2011 16:11:08 -0700 Message-ID: From: Kevin Oberman To: Marc Fonvieille Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: Yuri , freebsd-stable@freebsd.org Subject: Re: WD Advanced Format: do I need to do something special? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2011 23:11:09 -0000 On Thu, Aug 18, 2011 at 3:10 AM, Marc Fonvieille wro= te: > On Thu, Aug 18, 2011 at 01:47:26AM -0700, Yuri wrote: >> WD has sectors of the size 4kB in their latest hard drives, which is >> different from the traditional 512B. >> http://www.wdc.com/advformat >> http://wdc.custhelp.com/app/answers/detail/a_id/5655 >> >> These articles assert that something special should be done in OS to >> enable high performance of such drives. For ex. WD recommends to install >> some latest drivers of particular version. >> But what about FreeBSD? Should it be configured in some special way too >> for these drive to perform well? >> Is it aware of 4kB sector size? >> > > I own that (I'm running 8-STABLE): > > ada0 at ahcich2 bus 0 scbus2 target 0 lun 0 > ada0: ATA-8 SATA 2.x device > ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada0: Command Queueing enabled > ada0: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) > > which has 4kB sectors but says "512 byte sectors" :) > > I use the whole disk for the FreeBSD slice, I aligned all partitions on > a multiple of 8 sectors (512*8=3D4096). > > By default fdisk(8) uses a 63 sectors default offset: > > ******* Working on device /dev/ada0 ******* > parameters extracted from in-core disklabel are: > cylinders=3D1938021 heads=3D16 sectors/track=3D63 (1008 blks/cyl) > > Figures below won't work with BIOS for partitions not in cyl 1 > parameters to be used for BIOS calculations are: > cylinders=3D1938021 heads=3D16 sectors/track=3D63 (1008 blks/cyl) > > Media sector size is 512 > Warning: BIOS sector numbering starts with sector 1 > Information from DOS bootblock is: > The data for partition 1 is: > sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD) > =A0 =A0start 63, size 1953525105 (953869 Meg), flag 80 (active) > =A0 =A0 =A0 =A0beg: cyl 0/ head 1/ sector 1; > =A0 =A0 =A0 =A0end: cyl 1023/ head 15/ sector 63 > The data for partition 2 is: > > The data for partition 3 is: > > The data for partition 4 is: > > > > Look at "start 63" statement. =A0Instead of fixing fdisk(8) behavior, I j= ust > correctly edited my bsdlabel(8) table: > > # /dev/ada0s1: > 8 partitions: > # =A0 =A0 =A0 =A0 =A0size =A0 =A0 offset =A0 =A0fstype =A0 [fsize bsize b= ps/cpg] > =A0a: =A0 =A04194304 =A0 =A0 =A0 =A0 17 =A0 =A04.2BSD =A0 =A0 =A0 =A00 = =A0 =A0 0 =A0 =A0 0 > =A0b: =A0 =A08388608 =A0 =A04194321 =A0 =A0 =A0swap > =A0c: 1953525105 =A0 =A0 =A0 =A0 =A00 =A0 =A0unused =A0 =A0 =A0 =A00 =A0 = =A0 0 =A0 =A0 # "raw" part, don't edit > =A0d: =A0 16777216 =A0 12582929 =A0 =A04.2BSD =A0 =A0 =A0 =A00 =A0 =A0 0 = =A0 =A0 0 > =A0e: 1924163584 =A0 29360145 =A0 =A04.2BSD =A0 =A0 =A0 =A00 =A0 =A0 0 = =A0 =A0 0 > > > The important part is the offset 17 to correct the fdisk(8) offset (16+1 > to align the previous 63). =A0The remaining offsets are calculted from th= e > size I gave for the partitions (in MB, which can be divided by 8). > Then I used newfs(8) with the option "-f 4096". > > > There's another painful issue with this disk: the automatic head-parking > after few seconds. =A0I disabled it (with wdidle3) cause after 2 months o= f > use, I was at more than 35000 head-parkings... I'd strongly suggest avoiding fdisk(8) and using gpart(8) on 8 and above. It has an alignment option that makes this all just work and also allows the use of G= PT formatting. (Watch out for GPT on any system that needs to run 32-bit Windo= ws.) gpart create -s gpt ada1 gpart bootcode -b /boot/pmbr ada1 gpart add -t freebsd-boot -a 4 -s 128 -b 40 ad0 gpart bootcode -p /boot/gptboot -i 1 ad0 gpart add -t freebsd-ufs -a 4 -s 2097152 ada1 gpart add -t freebsd-swap -a 4 -s 8388608 ada1 gpart add -t freebsd-ufs -a 4 -s 10485760 ada1 gpart add -t freebsd-ufs -a 4 -s 1048576 ada1 gpart add -t freebsd-ufs -a 4 ada1 This will give you a disk with a 1G root, 4G swap, 5G var, .5G tmp and the remainder for usr.. You can adjust these as you feel appropriate. I would suggest a careful reading of the gpart(8) man page, as well, just so you understand what is going on. You might find the Wikipedia entry for "GUID Partition Table" intetresting if you want to go the GPT route. You can also use gpart create -s mbr to create a traditional MBR slice/partition setup, There are several on-line articles detailing this operation. --=20 R. Kevin Oberman, Network Engineer - Retired E-mail: kob6558@gmail.com From owner-freebsd-stable@FreeBSD.ORG Fri Aug 19 00:29:01 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9AFFD106564A; Fri, 19 Aug 2011 00:29:01 +0000 (UTC) (envelope-from hrs@FreeBSD.org) Received: from mail.allbsd.org (gatekeeper-int.allbsd.org [IPv6:2001:2f0:104:e002::2]) by mx1.freebsd.org (Postfix) with ESMTP id C36848FC15; Fri, 19 Aug 2011 00:29:00 +0000 (UTC) Received: from alph.allbsd.org (p3028-ipbf608funabasi.chiba.ocn.ne.jp [125.175.94.28]) (authenticated bits=128) by mail.allbsd.org (8.14.4/8.14.4) with ESMTP id p7J0SOkw048646; Fri, 19 Aug 2011 09:28:34 +0900 (JST) (envelope-from hrs@FreeBSD.org) Received: from localhost (localhost [IPv6:::1]) (authenticated bits=0) by alph.allbsd.org (8.14.4/8.14.4) with ESMTP id p7J0SK8W078115; Fri, 19 Aug 2011 09:28:21 +0900 (JST) (envelope-from hrs@FreeBSD.org) Date: Fri, 19 Aug 2011 09:28:11 +0900 (JST) Message-Id: <20110819.092811.1087267565626420460.hrs@allbsd.org> To: attilio@FreeBSD.org From: Hiroki Sato In-Reply-To: <20110818025550.GA1971@libertas.local.camdensoftware.com> References: <20110818.091600.831954331552558249.hrs@allbsd.org> <20110818025550.GA1971@libertas.local.camdensoftware.com> X-PGPkey-fingerprint: BDB3 443F A5DD B3D0 A530 FFD7 4F2C D3D8 2793 CF2D X-Mailer: Mew version 6.3 on Emacs 23.1 / Mule 6.0 (HANACHIRUSATO) Mime-Version: 1.0 Content-Type: Multipart/Signed; protocol="application/pgp-signature"; micalg=pgp-sha1; boundary="--Security_Multipart(Fri_Aug_19_09_28_11_2011_956)--" Content-Transfer-Encoding: 7bit X-Virus-Scanned: clamav-milter 0.97 at gatekeeper.allbsd.org X-Virus-Status: Clean X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.3 (mail.allbsd.org [133.31.130.32]); Fri, 19 Aug 2011 09:28:40 +0900 (JST) X-Spam-Status: No, score=-102.6 required=13.0 tests=BAYES_00, CONTENT_TYPE_PRESENT,DIRECTOCNDYN,RCVD_IN_RP_RNBL,SPF_SOFTFAIL, USER_IN_WHITELIST autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on gatekeeper.allbsd.org Cc: freebsd-stable@FreeBSD.org, sterling@camdensoftware.com, avg@FreeBSD.org, Nick Esborn , kostikbel@gmail.com, mdtansca@FreeBSD.org Subject: Re: panic: spin lock held too long (RELENG_8 from today) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2011 00:29:01 -0000 ----Security_Multipart(Fri_Aug_19_09_28_11_2011_956)-- Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Chip Camden wrote in <20110818025550.GA1971@libertas.local.camdensoftware.com>: st> Quoth Attilio Rao on Thursday, 18 August 2011: st> > In callout_cpu_switch() if a low priority thread is migrating the st> > callout and gets preempted after the outcoming cpu queue lock is left st> > (and scheduled much later) we get this problem. st> > st> > In order to fix this bug it could be enough to use a critical section, st> > but I think this should be really interrupt safe, thus I'd wrap them st> > up with spinlock_enter()/spinlock_exit(). Fortunately st> > callout_cpu_switch() should be called rarely and also we already do st> > expensive locking operations in callout, thus we should not have st> > problem performance-wise. st> > st> > Can the guys I also CC'ed here try the following patch, with all the st> > initial kernel options that were leading you to the deadlock? (thus st> > revert any debugging patch/option you added for the moment): st> > http://www.freebsd.org/~attilio/callout-fixup.diff st> > st> > Please note that this patch is for STABLE_8, if you can confirm the st> > good result I'll commit to -CURRENT and then backmarge as soon as st> > possible. st> > st> > Thanks, st> > Attilio st> > st> st> Thanks, Attilio. I've applied the patch and removed the extra debug st> options I had added (though keeping debug symbols). I'll let you know if st> I experience any more panics. No panic for 20 hours at this moment, FYI. For my NFS server, I think another 24 hours would be sufficient to confirm the stability. I will see how it works... -- Hiroki ----Security_Multipart(Fri_Aug_19_09_28_11_2011_956)-- Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEABECAAYFAk5NrhsACgkQTyzT2CeTzy1O/ACeJPyJpjyI8X68PscHDXRU7iXu 8M0An23TY3RL9ZPaL1R+FCLHmhe9Mqi7 =FHX7 -----END PGP SIGNATURE----- ----Security_Multipart(Fri_Aug_19_09_28_11_2011_956)---- From owner-freebsd-stable@FreeBSD.ORG Fri Aug 19 00:38:07 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3DC74106566B for ; Fri, 19 Aug 2011 00:38:07 +0000 (UTC) (envelope-from sterling@camdensoftware.com) Received: from wh1.interactivevillages.com (ca.2e.7bae.static.theplanet.com [174.123.46.202]) by mx1.freebsd.org (Postfix) with ESMTP id F39218FC0C for ; Fri, 19 Aug 2011 00:38:06 +0000 (UTC) Received: from 184-78-197-203.war.clearwire-wmx.net ([184.78.197.203] helo=_HOSTNAME_) by wh1.interactivevillages.com with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from ) id 1QuD61-0004Fs-P0; Thu, 18 Aug 2011 17:37:38 -0700 Received: by _HOSTNAME_ (sSMTP sendmail emulation); Thu, 18 Aug 2011 17:38:00 -0700 Date: Thu, 18 Aug 2011 17:37:59 -0700 From: Chip Camden To: Hiroki Sato Message-ID: <20110819003759.GC54831@libertas.local.camdensoftware.com> Mail-Followup-To: Hiroki Sato , attilio@FreeBSD.org, kostikbel@gmail.com, freebsd-stable@FreeBSD.org, avg@FreeBSD.org, mdtansca@FreeBSD.org, Nick Esborn References: <20110818.091600.831954331552558249.hrs@allbsd.org> <20110818025550.GA1971@libertas.local.camdensoftware.com> <20110819.092811.1087267565626420460.hrs@allbsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="f+W+jCU1fRNres8c" Content-Disposition: inline In-Reply-To: <20110819.092811.1087267565626420460.hrs@allbsd.org> User-Agent: Mutt/1.4.2.3i Company: Camden Software Consulting URL: http://camdensoftware.com X-PGP-Key: http://pgp.mit.edu:11371/pks/lookup?search=0xD6DBAF91 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - wh1.interactivevillages.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - camdensoftware.com X-Source: X-Source-Args: X-Source-Dir: Cc: freebsd-stable@FreeBSD.org, avg@FreeBSD.org, attilio@FreeBSD.org, Nick Esborn , kostikbel@gmail.com, mdtansca@FreeBSD.org Subject: Re: panic: spin lock held too long (RELENG_8 from today) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2011 00:38:07 -0000 --f+W+jCU1fRNres8c Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Quoth Hiroki Sato on Friday, 19 August 2011: > Chip Camden wrote > in <20110818025550.GA1971@libertas.local.camdensoftware.com>: >=20 > st> Quoth Attilio Rao on Thursday, 18 August 2011: > st> > In callout_cpu_switch() if a low priority thread is migrating the > st> > callout and gets preempted after the outcoming cpu queue lock is le= ft > st> > (and scheduled much later) we get this problem. > st> > > st> > In order to fix this bug it could be enough to use a critical secti= on, > st> > but I think this should be really interrupt safe, thus I'd wrap them > st> > up with spinlock_enter()/spinlock_exit(). Fortunately > st> > callout_cpu_switch() should be called rarely and also we already do > st> > expensive locking operations in callout, thus we should not have > st> > problem performance-wise. > st> > > st> > Can the guys I also CC'ed here try the following patch, with all the > st> > initial kernel options that were leading you to the deadlock? (thus > st> > revert any debugging patch/option you added for the moment): > st> > http://www.freebsd.org/~attilio/callout-fixup.diff > st> > > st> > Please note that this patch is for STABLE_8, if you can confirm the > st> > good result I'll commit to -CURRENT and then backmarge as soon as > st> > possible. > st> > > st> > Thanks, > st> > Attilio > st> > > st> > st> Thanks, Attilio. I've applied the patch and removed the extra debug > st> options I had added (though keeping debug symbols). I'll let you kno= w if > st> I experience any more panics. >=20 > No panic for 20 hours at this moment, FYI. For my NFS server, I > think another 24 hours would be sufficient to confirm the stability. > I will see how it works... >=20 > -- Hiroki Likewise: $ uptime 5:37PM up 21:45, 5 users, load averages: 0.68, 0.45, 0.63 So far, so good (knocks on head). --=20 =2EO. | Sterling (Chip) Camden | http://camdensoftware.com =2E.O | sterling@camdensoftware.com | http://chipsquips.com OOO | 2048R/D6DBAF91 | http://chipstips.com --f+W+jCU1fRNres8c Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iQEcBAEBAgAGBQJOTbBnAAoJEIpckszW26+RT+AIAIRMa07BhoVaRBq3lz1dVcsq zh+G7945FXqbD+0hhv+/4T75mbtzSG4l72dhlwGWNUZg70hZKqEUfNzQs3meSquR wmVCi3NH0cu5jIAZqvDWCvU8BigBn2GRjN/sXl5GCsGrZFi50kZXWKmgzTyDVrIM iwva8366ceK36QfodupVgxSs7ifDt8Jl3tLSdXHdacf17BceW2mETwOVvmd13LXQ BVOxFE7Qmk7xYXqrt3dj+E/gtO21R31EL3XJYx7prev534eNF99pn1GZCaj2By1Q B1iG4SfXMgYtzHpqSGniENX8RAhaCJmpFZDrIebnawel2rPMPFHuzJLc5hKp6eE= =lxLO -----END PGP SIGNATURE----- --f+W+jCU1fRNres8c-- From owner-freebsd-stable@FreeBSD.ORG Fri Aug 19 01:28:05 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 788BB106564A for ; Fri, 19 Aug 2011 01:28:05 +0000 (UTC) (envelope-from yuri@rawbw.com) Received: from shell0.rawbw.com (shell0.rawbw.com [198.144.192.45]) by mx1.freebsd.org (Postfix) with ESMTP id 676968FC1C for ; Fri, 19 Aug 2011 01:28:05 +0000 (UTC) Received: from eagle.yuri.org (stunnel@localhost [127.0.0.1]) (authenticated bits=0) by shell0.rawbw.com (8.14.4/8.14.4) with ESMTP id p7J1S46E028644 for ; Thu, 18 Aug 2011 18:28:04 -0700 (PDT) (envelope-from yuri@rawbw.com) Message-ID: <4E4DBC24.1070007@rawbw.com> Date: Thu, 18 Aug 2011 18:28:04 -0700 From: Yuri User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110716 Thunderbird/5.0 MIME-Version: 1.0 To: freebsd-stable@freebsd.org References: <4E4CD19E.5070108@rawbw.com> In-Reply-To: <4E4CD19E.5070108@rawbw.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: WD Advanced Format: do I need to do something special? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2011 01:28:05 -0000 Following instructions here (http://ivoras.net/blog/tree/2011-01-01.freebsd-on-4k-sector-drives.html) I destroyed my previous ZFS pool with 512 byte sectors and did this: gnop create -S 4096 /dev/ad4 zpool create mypool /dev/ad4.nop zpol create mypool/mydir zpool export mypool gnop destroy /dev/ad4.nop zpool import mypool Now this command 'zdb -C data | grep ashift' shows ashift=12 (4096 byte sectors). However, when I begin to copy a lot of files files into /mypool/mydir online radio player gets severely affected. Sound get interrupted all the time. Itrettuptions stop after 1-2 secs after I stop copying. This didn't happen with sector size 512 bytes. What is wrong? Yuri From owner-freebsd-stable@FreeBSD.ORG Fri Aug 19 02:37:39 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx2.freebsd.org (mx2.freebsd.org [IPv6:2001:4f8:fff6::35]) by hub.freebsd.org (Postfix) with ESMTP id 384341065670 for ; Fri, 19 Aug 2011 02:37:39 +0000 (UTC) (envelope-from dougb@FreeBSD.org) Received: from 172-17-198-245.globalsuite.net (hub.freebsd.org [IPv6:2001:4f8:fff6::36]) by mx2.freebsd.org (Postfix) with ESMTP id 7FE2815169F for ; Fri, 19 Aug 2011 02:36:51 +0000 (UTC) Date: Thu, 18 Aug 2011 19:36:50 -0700 (PDT) From: Doug Barton To: freebsd-stable@FreeBSD.org Message-ID: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-message-flag: Outlook -- Not just for spreading viruses anymore! OpenPGP: id=1A1ABC84 Organization: http://SupersetSolutions.com/ MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII Cc: Subject: crash on 8.2-RELEASE amd64, high-traffic squid server X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2011 02:37:39 -0000 Howdy, I have some high-traffic squid servers, most of which are running a flavor of RELENG_7 very successfully, but one that I've been evaluating 8.x on has had a lot of problems. Most recently we had the crash below twice in the last 2 weeks. Same exact backtrace. Any suggestions on where to look would be appreciated. Thanks, Doug #0 doadump () at pcpu.h:224 224 pcpu.h: No such file or directory. in pcpu.h (kgdb) #0 doadump () at pcpu.h:224 #1 0xffffffff803ec4be in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:419 #2 0xffffffff803ec8f1 in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:592 #3 0xffffffff8069a4d0 in trap_fatal (frame=0x1c, eva=Variable "eva" is not available. ) at /usr/src/sys/amd64/amd64/trap.c:783 #4 0xffffffff8069aab9 in trap (frame=0xffffff800012f650) at /usr/src/sys/amd64/amd64/trap.c:592 #5 0xffffffff80682e84 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:224 #6 0xffffffff80698896 in bcopy () at /usr/src/sys/amd64/amd64/support.S:124 #7 0xffffffff8044df61 in sbcompress (sb=0xffffff01d98945e0, m=0xffffff010b815300, n=0xffffff006baa3700) at /usr/src/sys/kern/uipc_sockbuf.c:779 #8 0xffffffff8044e1e6 in sbappendstream_locked (sb=0xffffff01d98945e0, m=0xffffff010b815300) at /usr/src/sys/kern/uipc_sockbuf.c:534 #9 0xffffffff80527530 in tcp_do_segment (m=0xffffff010b815300, th=Variable "th" is not available. ) at /usr/src/sys/netinet/tcp_input.c:2588 #10 0xffffffff80528b4b in tcp_input (m=0xffffff010b815300, off0=Variable "off0" is not available. ) at /usr/src/sys/netinet/tcp_input.c:1029 #11 0xffffffff804c3b2c in ip_input (m=0xffffff010b815300) at /usr/src/sys/netinet/ip_input.c:787 #12 0xffffffff804a631e in netisr_dispatch_src (proto=1, source=Variable "source" is not available. ) at /usr/src/sys/net/netisr.c:917 #13 0xffffffff8049d73d in ether_demux (ifp=0xffffff0002d30000, m=0xffffff010b815300) at /usr/src/sys/net/if_ethersubr.c:894 #14 0xffffffff8049db2d in ether_input (ifp=0xffffff0002d30000, m=0xffffff010b815300) at /usr/src/sys/net/if_ethersubr.c:753 #15 0xffffffff8027c18a in em_rxeof (rxr=0xffffff0002d7c600, count=98, done=0x0) at /usr/src/sys/dev/e1000/if_em.c:4293 #16 0xffffffff8027c5a8 in em_handle_que (context=Variable "context" is not available. ) at /usr/src/sys/dev/e1000/if_em.c:1482 #17 0xffffffff80429ab5 in taskqueue_run_locked (queue=0xffffff0002d8d800) at /usr/src/sys/kern/subr_taskqueue.c:250 #18 0xffffffff80429c4e in taskqueue_thread_loop (arg=Variable "arg" is not available. ) at /usr/src/sys/kern/subr_taskqueue.c:387 #19 0xffffffff803c30f8 in fork_exit ( callout=0xffffffff80429c00 , arg=0xffffff80005a8748, frame=0xffffff800012fc40) at /usr/src/sys/kern/kern_fork.c:845 #20 0xffffffff8068334e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:565 #21 0x0000000000000000 in ?? () #22 0x0000000000000000 in ?? () #23 0x0000000000000000 in ?? () #24 0x0000000000000000 in ?? () #25 0x0000000000000000 in ?? () #26 0x0000000000000000 in ?? () #27 0x0000000000000000 in ?? () #28 0x0000000000000000 in ?? () #29 0x0000000000000000 in ?? () #30 0x0000000000000000 in ?? () #31 0x0000000000000000 in ?? () #32 0x0000000000000000 in ?? () #33 0x0000000000000000 in ?? () #34 0x0000000000000000 in ?? () #35 0x0000000000000000 in ?? () #36 0x0000000000000000 in ?? () #37 0x0000000000000000 in ?? () #38 0x0000000000000000 in ?? () #39 0x0000000000000000 in ?? () #40 0x0000000000000000 in ?? () #41 0x0000000000000000 in ?? () #42 0x0000000000000000 in ?? () #43 0x0000000000000000 in ?? () #44 0x0000000000000000 in ?? () #45 0xffffffff8095ac00 in affinity () #46 0x0000000000000000 in ?? () #47 0x0000000000000000 in ?? () #48 0xffffff0002d2d8c0 in ?? () #49 0xffffff800012f320 in ?? () #50 0xffffff800012f2c8 in ?? () #51 0xffffff0002c59000 in ?? () #52 0xffffffff80411db9 in sched_switch (td=0xffffffff80429c00, newtd=0xffffff80005a8748, flags=Variable "flags" is not available. ) at /usr/src/sys/kern/sched_ule.c:1852 Previous frame inner to this frame (corrupt stack?) (kgdb) -- Nothin' ever doesn't change, but nothin' changes much. -- OK Go Breadth of IT experience, and depth of knowledge in the DNS. Yours for the right price. :) http://SupersetSolutions.com/ From owner-freebsd-stable@FreeBSD.ORG Fri Aug 19 03:04:08 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4241C106564A for ; Fri, 19 Aug 2011 03:04:08 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta13.emeryville.ca.mail.comcast.net (qmta13.emeryville.ca.mail.comcast.net [76.96.27.243]) by mx1.freebsd.org (Postfix) with ESMTP id 289338FC0C for ; Fri, 19 Aug 2011 03:04:07 +0000 (UTC) Received: from omta18.emeryville.ca.mail.comcast.net ([76.96.30.74]) by qmta13.emeryville.ca.mail.comcast.net with comcast id N2zm1h0021bwxycAD343qu; Fri, 19 Aug 2011 03:04:03 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta18.emeryville.ca.mail.comcast.net with comcast id N33d1h00m1t3BNj8e33d5S; Fri, 19 Aug 2011 03:03:38 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 2E624102C1A; Thu, 18 Aug 2011 20:04:05 -0700 (PDT) Date: Thu, 18 Aug 2011 20:04:05 -0700 From: Jeremy Chadwick To: Doug Barton Message-ID: <20110819030405.GA83032@icarus.home.lan> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-stable@FreeBSD.org, "Vogel, Jack" Subject: Re: crash on 8.2-RELEASE amd64, high-traffic squid server X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2011 03:04:08 -0000 On Thu, Aug 18, 2011 at 07:36:50PM -0700, Doug Barton wrote: > Howdy, > > I have some high-traffic squid servers, most of which are running a > flavor of RELENG_7 very successfully, but one that I've been > evaluating 8.x on has had a lot of problems. Most recently we had > the crash below twice in the last 2 weeks. Same exact backtrace. Any > suggestions on where to look would be appreciated. > > > Thanks, > > Doug > > #0 doadump () at pcpu.h:224 > 224 pcpu.h: No such file or directory. > in pcpu.h > (kgdb) #0 doadump () at pcpu.h:224 > #1 0xffffffff803ec4be in boot (howto=260) > at /usr/src/sys/kern/kern_shutdown.c:419 > #2 0xffffffff803ec8f1 in panic (fmt=Variable "fmt" is not available. > ) > at /usr/src/sys/kern/kern_shutdown.c:592 > #3 0xffffffff8069a4d0 in trap_fatal (frame=0x1c, eva=Variable "eva" is not available. > ) > at /usr/src/sys/amd64/amd64/trap.c:783 > #4 0xffffffff8069aab9 in trap (frame=0xffffff800012f650) > at /usr/src/sys/amd64/amd64/trap.c:592 > #5 0xffffffff80682e84 in calltrap () > at /usr/src/sys/amd64/amd64/exception.S:224 > #6 0xffffffff80698896 in bcopy () > at /usr/src/sys/amd64/amd64/support.S:124 > #7 0xffffffff8044df61 in sbcompress (sb=0xffffff01d98945e0, > m=0xffffff010b815300, n=0xffffff006baa3700) > at /usr/src/sys/kern/uipc_sockbuf.c:779 > #8 0xffffffff8044e1e6 in sbappendstream_locked (sb=0xffffff01d98945e0, > m=0xffffff010b815300) at /usr/src/sys/kern/uipc_sockbuf.c:534 > #9 0xffffffff80527530 in tcp_do_segment (m=0xffffff010b815300, th=Variable "th" is not available. > ) > at /usr/src/sys/netinet/tcp_input.c:2588 > #10 0xffffffff80528b4b in tcp_input (m=0xffffff010b815300, off0=Variable "off0" is not available. > ) > at /usr/src/sys/netinet/tcp_input.c:1029 > #11 0xffffffff804c3b2c in ip_input (m=0xffffff010b815300) > at /usr/src/sys/netinet/ip_input.c:787 > #12 0xffffffff804a631e in netisr_dispatch_src (proto=1, source=Variable "source" is not available. > ) > at /usr/src/sys/net/netisr.c:917 > #13 0xffffffff8049d73d in ether_demux (ifp=0xffffff0002d30000, > m=0xffffff010b815300) at /usr/src/sys/net/if_ethersubr.c:894 > #14 0xffffffff8049db2d in ether_input (ifp=0xffffff0002d30000, > m=0xffffff010b815300) at /usr/src/sys/net/if_ethersubr.c:753 > #15 0xffffffff8027c18a in em_rxeof (rxr=0xffffff0002d7c600, count=98, > done=0x0) at /usr/src/sys/dev/e1000/if_em.c:4293 > #16 0xffffffff8027c5a8 in em_handle_que (context=Variable "context" is not available. > ) > at /usr/src/sys/dev/e1000/if_em.c:1482 > #17 0xffffffff80429ab5 in taskqueue_run_locked (queue=0xffffff0002d8d800) > at /usr/src/sys/kern/subr_taskqueue.c:250 > #18 0xffffffff80429c4e in taskqueue_thread_loop (arg=Variable "arg" is not available. > ) > at /usr/src/sys/kern/subr_taskqueue.c:387 > #19 0xffffffff803c30f8 in fork_exit ( > callout=0xffffffff80429c00 , > arg=0xffffff80005a8748, frame=0xffffff800012fc40) > at /usr/src/sys/kern/kern_fork.c:845 > #20 0xffffffff8068334e in fork_trampoline () > at /usr/src/sys/amd64/amd64/exception.S:565 > #21 0x0000000000000000 in ?? () > #22 0x0000000000000000 in ?? () > #23 0x0000000000000000 in ?? () > #24 0x0000000000000000 in ?? () > #25 0x0000000000000000 in ?? () > #26 0x0000000000000000 in ?? () > #27 0x0000000000000000 in ?? () > #28 0x0000000000000000 in ?? () > #29 0x0000000000000000 in ?? () > #30 0x0000000000000000 in ?? () > #31 0x0000000000000000 in ?? () > #32 0x0000000000000000 in ?? () > #33 0x0000000000000000 in ?? () > #34 0x0000000000000000 in ?? () > #35 0x0000000000000000 in ?? () > #36 0x0000000000000000 in ?? () > #37 0x0000000000000000 in ?? () > #38 0x0000000000000000 in ?? () > #39 0x0000000000000000 in ?? () > #40 0x0000000000000000 in ?? () > #41 0x0000000000000000 in ?? () > #42 0x0000000000000000 in ?? () > #43 0x0000000000000000 in ?? () > #44 0x0000000000000000 in ?? () > #45 0xffffffff8095ac00 in affinity () > #46 0x0000000000000000 in ?? () > #47 0x0000000000000000 in ?? () > #48 0xffffff0002d2d8c0 in ?? () > #49 0xffffff800012f320 in ?? () > #50 0xffffff800012f2c8 in ?? () > #51 0xffffff0002c59000 in ?? () > #52 0xffffffff80411db9 in sched_switch (td=0xffffffff80429c00, > newtd=0xffffff80005a8748, flags=Variable "flags" is not available. > ) > at /usr/src/sys/kern/sched_ule.c:1852 > Previous frame inner to this frame (corrupt stack?) > (kgdb) CC'ing Jack Vogel here, since I see em(4) is involved. Jack will probably want this data from the system: # uname -a (hostname can be XXX'd out) # dmesg (particularly the emX entries and driver version) # pciconf -lvbc (specifically the emX entries and related data) # ifconfig -a (IPs and MACs can be X'd out; mainly interested in options and other pieces) # netstat -m (if possible from a system which has been up a while and is a likely crash candidate) # vmstat -i (same condition as netstat -m) There isn't enough data above for me to determine what's going on, but from the stack trace it looks like sbcompress() may be given some data which is null or inaccessible. The source for that hasn't been touched directly in a while. The TCP stack/code, however, has been (since 8.2-RELEASE for sure). I think em(4) has as well. This may end up being a case where running RELENG_8 is the fix, but I'd love to be able to say that for certain. "bt full" would be helpful but the above indicates the kernel might not have debugging symbols included in it? I've seen this kind of output even on a system with "makeoptions DEBUG=-g" in its kernel config before though. Never was sure how to deal with that problem. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-stable@FreeBSD.ORG Fri Aug 19 05:16:12 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 081BA106566B; Fri, 19 Aug 2011 05:16:12 +0000 (UTC) (envelope-from ae@FreeBSD.org) Received: from mail.kirov.so-ups.ru (ns.kirov.so-ups.ru [178.74.170.1]) by mx1.freebsd.org (Postfix) with ESMTP id A50768FC12; Fri, 19 Aug 2011 05:16:10 +0000 (UTC) Received: from kas30pipe.localhost (localhost.kirov.so-cdu.ru [127.0.0.1]) by mail.kirov.so-ups.ru (Postfix) with SMTP id 0D7B0B801B; Fri, 19 Aug 2011 09:00:22 +0400 (MSD) Received: from kirov.so-cdu.ru (unknown [172.21.81.1]) by mail.kirov.so-ups.ru (Postfix) with ESMTP id 03517B8008; Fri, 19 Aug 2011 09:00:22 +0400 (MSD) Received: by ns.kirov.so-cdu.ru (Postfix, from userid 1010) id D3800B8F0A; Fri, 19 Aug 2011 09:00:15 +0400 (MSD) Received: from [10.118.3.52] (elsukov.kirov.oduur.so [10.118.3.52]) by ns.kirov.so-cdu.ru (Postfix) with ESMTP id 9F605B8F04; Fri, 19 Aug 2011 09:00:15 +0400 (MSD) Message-ID: <4E4DEDDB.6060201@FreeBSD.org> Date: Fri, 19 Aug 2011 09:00:11 +0400 From: "Andrey V. Elsukov" User-Agent: Mozilla Thunderbird 1.5 (FreeBSD/20051231) MIME-Version: 1.0 To: Kevin Oberman References: <4E4CD19E.5070108@rawbw.com> <20110818101034.GA1958@emphyrio.blackend.org> In-Reply-To: X-Enigmail-Version: 1.3 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig74A1FA41A34B9FA8A9C4CB9D" X-SpamTest-Version: SMTP-Filter Version 3.0.0 [0284], KAS30/Release X-SpamTest-Info: Not protected Cc: Yuri , freebsd-stable@freebsd.org Subject: Re: WD Advanced Format: do I need to do something special? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2011 05:16:12 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig74A1FA41A34B9FA8A9C4CB9D Content-Type: text/plain; charset=KOI8-R Content-Transfer-Encoding: quoted-printable On 19.08.2011 3:11, Kevin Oberman wrote: > I'd strongly suggest avoiding fdisk(8) and using gpart(8) on 8 and > above. It has an > alignment option that makes this all just work and also allows the use = of GPT > formatting. (Watch out for GPT on any system that needs to run 32-bit W= indows.) >=20 > gpart create -s gpt ada1 > gpart bootcode -b /boot/pmbr ada1 > gpart add -t freebsd-boot -a 4 -s 128 -b 40 ad0 > gpart bootcode -p /boot/gptboot -i 1 ad0 > gpart add -t freebsd-ufs -a 4 -s 2097152 ada1 > gpart add -t freebsd-swap -a 4 -s 8388608 ada1 > gpart add -t freebsd-ufs -a 4 -s 10485760 ada1 > gpart add -t freebsd-ufs -a 4 -s 1048576 ada1 > gpart add -t freebsd-ufs -a 4 ada1 If you are using gpart with -a option you don't need to specify exactly n= umbers. And if you want to align your partition to 4096 bytes you should use "-a = 4k" or "-a 8". E.g. # gpart add -t freebsd-boot -a 4k -s 64k ad0 --=20 WBR, Andrey V. Elsukov --------------enig74A1FA41A34B9FA8A9C4CB9D Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (MingW32) iQEcBAEBAgAGBQJOTe3fAAoJEAHF6gQQyKF6QPsIALO6JwNVmk8GnWYIsCgshZNB KEAg/DwXlBpapfGONuIXv+F6Db4ydeKeWvZouINc6W9xx4qgrmUwOFs6oVi6tO0d bQUg2wB6QFHufCdC5Ndfb9RMYZZLKjAfhCnOYEj/8G1SHPoaPOFUBJ+qd4JBwSvq 3M9nMEsNlOyyLsBvti1sPresvypwv3JQrzvGW7XEPUsbU0+VgUEeoIXLXGfMWDgR z0V45ErMLzN2oc5Le3l9617m4SM5INUpWEZuOU5iHBAYXoTlglsaGscmQPKd9aTt GK0cs3nlu5xeH2BvmJtbUcmCL8z4vPy700aAu0EUnMdFvHvqTreh0s/bmrWMSiY= =cvvN -----END PGP SIGNATURE----- --------------enig74A1FA41A34B9FA8A9C4CB9D-- From owner-freebsd-stable@FreeBSD.ORG Fri Aug 19 12:14:02 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EE108106566B; Fri, 19 Aug 2011 12:14:02 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id C011E8FC12; Fri, 19 Aug 2011 12:14:02 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 69FD446B35; Fri, 19 Aug 2011 08:14:02 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id E7B738A02F; Fri, 19 Aug 2011 08:14:01 -0400 (EDT) From: John Baldwin To: freebsd-hackers@freebsd.org Date: Fri, 19 Aug 2011 08:14:00 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110617; KDE/4.5.5; amd64; ; ) References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk> <4E4C22D6.6070407@FreeBSD.org> <4E4D717F.3090802@FreeBSD.org> In-Reply-To: <4E4D717F.3090802@FreeBSD.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201108190814.00885.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Fri, 19 Aug 2011 08:14:02 -0400 (EDT) Cc: freebsd-stable@freebsd.org, Andriy Gapon Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2011 12:14:03 -0000 On Thursday, August 18, 2011 4:09:35 pm Andriy Gapon wrote: > on 17/08/2011 23:21 Andriy Gapon said the following: > > It seems like everything starts with some kind of a race between terminating > > processes in a jail and termination of the jail itself. This is where the > > details are very thin so far. What we see is that a process (http) is in > > exit(2) syscall, in exit1() function actually, and past the place where P_WEXIT > > flag is set and even past the place where p_limit is freed and reset to NULL. > > At that place the thread calls prison_proc_free(), which calls prison_deref(). > > Then, we see that in prison_deref() the thread gets a page fault because of what > > seems like a NULL pointer dereference. That's just the start of the problem and > > its root cause. > > > > Then, trap_pfault() gets invoked and, because addresses close to NULL look like > > userspace addresses, vm_fault/vm_fault_hold gets called, which in its turn goes > > on to call vm_map_growstack. First thing that vm_map_growstack does is a call > > to lim_cur(), but because p_limit is already NULL, that call results in a NULL > > pointer dereference and a page fault. Goto the beginning of this paragraph. > > > > So we get this recursion of sorts, which only ends when a stack is exhausted and > > a CPU generates a double-fault. > > BTW, does anyone has an idea why the thread in question would "disappear" from > the kgdb's point of view? > > (kgdb) p cpuid_to_pcpu[2]->pc_curthread->td_tid > $3 = 102057 > (kgdb) tid 102057 > invalid tid > > info threads also doesn't list the thread. > > Is it because the panic happened while the thread was somewhere in exit1()? Yes, it is a bug in kgdb that it only walks allproc and not zombproc. Try this: Index: kthr.c =================================================================== --- kthr.c (revision 224879) +++ kthr.c (working copy) @@ -73,11 +73,52 @@ kgdb_thr_first(void) return (first); } +static void +kgdb_thr_add_procs(uintptr_t paddr) +{ + struct proc p; + struct thread td; + struct kthr *kt; + CORE_ADDR addr; + + while (paddr != 0) { + if (kvm_read(kvm, paddr, &p, sizeof(p)) != sizeof(p)) { + warnx("kvm_read: %s", kvm_geterr(kvm)); + break; + } + addr = (uintptr_t)TAILQ_FIRST(&p.p_threads); + while (addr != 0) { + if (kvm_read(kvm, addr, &td, sizeof(td)) != + sizeof(td)) { + warnx("kvm_read: %s", kvm_geterr(kvm)); + break; + } + kt = malloc(sizeof(*kt)); + kt->next = first; + kt->kaddr = addr; + if (td.td_tid == dumptid) + kt->pcb = dumppcb; + else if (td.td_state == TDS_RUNNING && stoppcbs != 0 && + CPU_ISSET(td.td_oncpu, &stopped_cpus)) + kt->pcb = (uintptr_t)stoppcbs + + sizeof(struct pcb) * td.td_oncpu; + else + kt->pcb = (uintptr_t)td.td_pcb; + kt->kstack = td.td_kstack; + kt->tid = td.td_tid; + kt->pid = p.p_pid; + kt->paddr = paddr; + kt->cpu = td.td_oncpu; + first = kt; + addr = (uintptr_t)TAILQ_NEXT(&td, td_plist); + } + paddr = (uintptr_t)LIST_NEXT(&p, p_list); + } +} + struct kthr * kgdb_thr_init(void) { - struct proc p; - struct thread td; long cpusetsize; struct kthr *kt; CORE_ADDR addr; @@ -113,37 +154,11 @@ kgdb_thr_init(void) stoppcbs = kgdb_lookup("stoppcbs"); - while (paddr != 0) { - if (kvm_read(kvm, paddr, &p, sizeof(p)) != sizeof(p)) { - warnx("kvm_read: %s", kvm_geterr(kvm)); - break; - } - addr = (uintptr_t)TAILQ_FIRST(&p.p_threads); - while (addr != 0) { - if (kvm_read(kvm, addr, &td, sizeof(td)) != - sizeof(td)) { - warnx("kvm_read: %s", kvm_geterr(kvm)); - break; - } - kt = malloc(sizeof(*kt)); - kt->next = first; - kt->kaddr = addr; - if (td.td_tid == dumptid) - kt->pcb = dumppcb; - else if (td.td_state == TDS_RUNNING && stoppcbs != 0 && - CPU_ISSET(td.td_oncpu, &stopped_cpus)) - kt->pcb = (uintptr_t) stoppcbs + sizeof(struct pcb) * td.td_oncpu; - else - kt->pcb = (uintptr_t)td.td_pcb; - kt->kstack = td.td_kstack; - kt->tid = td.td_tid; - kt->pid = p.p_pid; - kt->paddr = paddr; - kt->cpu = td.td_oncpu; - first = kt; - addr = (uintptr_t)TAILQ_NEXT(&td, td_plist); - } - paddr = (uintptr_t)LIST_NEXT(&p, p_list); + kgdb_thr_add_procs(paddr); + addr = kgdb_lookup("zombproc"); + if (addr != 0) { + kvm_read(kvm, addr, &paddr, sizeof(paddr)); + kgdb_thr_add_procs(paddr); } curkthr = kgdb_thr_lookup_tid(dumptid); if (curkthr == NULL) > is there an easy way to examine its stack in this case? Hmm, you can use something like this from my kgdb macros. For amd64: # Do a backtrace given %rip and %rbp as args define bt set $_rip = $arg0 set $_rbp = $arg1 set $i = 0 while ($_rbp != 0 || $_rip != 0) printf "%2d: pc ", $i if ($_rip != 0) x/1i $_rip else printf "\n" end if ($_rbp == 0) set $_rip = 0 else set $fr = (struct amd64_frame *)$_rbp set $_rbp = $fr->f_frame set $_rip = $fr->f_retaddr set $i = $i + 1 end end end document bt Given values for %rip and %rbp, perform a manual backtrace. end define btf bt $arg0.tf_rip $arg0.tf_rbp end document btf Do a manual backtrace from a specified trapframe. end For i386: # Do a backtrace given %eip and %ebp as args define bt set $_eip = $arg0 set $_ebp = $arg1 set $i = 0 while ($_ebp != 0 || $_eip != 0) printf "%2d: pc ", $i if ($_eip != 0) x/1i $_eip else printf "\n" end if ($_ebp == 0) set $_eip = 0 else set $fr = (struct i386_frame *)$_ebp set $_ebp = $fr->f_frame set $_eip = $fr->f_retaddr set $i = $i + 1 end end end document bt Given values for %eip and %ebp, perform a manual backtrace. end define btf bt $arg0.tf_eip $arg0.tf_ebp end document btf Do a manual backtrace from a specified trapframe. end -- John Baldwin From owner-freebsd-stable@FreeBSD.ORG Fri Aug 19 12:55:30 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BD1C5106566B; Fri, 19 Aug 2011 12:55:30 +0000 (UTC) (envelope-from mike@sentex.net) Received: from smarthost1.sentex.ca (smarthost1-6.sentex.ca [IPv6:2607:f3e0:0:1::12]) by mx1.freebsd.org (Postfix) with ESMTP id 7BFBF8FC13; Fri, 19 Aug 2011 12:55:30 +0000 (UTC) Received: from [IPv6:2607:f3e0:0:4:f025:8813:7603:7e4a] (saphire3.sentex.ca [IPv6:2607:f3e0:0:4:f025:8813:7603:7e4a]) by smarthost1.sentex.ca (8.14.4/8.14.4) with ESMTP id p7JCtSbd054974; Fri, 19 Aug 2011 08:55:28 -0400 (EDT) (envelope-from mike@sentex.net) Message-ID: <4E4E5D49.4040502@sentex.net> Date: Fri, 19 Aug 2011 08:55:37 -0400 From: Mike Tancsa Organization: Sentex Communications User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7 MIME-Version: 1.0 To: Hiroki Sato , attilio@FreeBSD.org, kostikbel@gmail.com, freebsd-stable@FreeBSD.org, avg@FreeBSD.org, Nick Esborn References: <20110818.091600.831954331552558249.hrs@allbsd.org> <20110818025550.GA1971@libertas.local.camdensoftware.com> <20110819.092811.1087267565626420460.hrs@allbsd.org> <20110819003759.GC54831@libertas.local.camdensoftware.com> In-Reply-To: <20110819003759.GC54831@libertas.local.camdensoftware.com> X-Enigmail-Version: 1.1.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.71 on IPv6:2607:f3e0:0:1::12 Cc: Subject: Re: panic: spin lock held too long (RELENG_8 from today) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2011 12:55:30 -0000 On 8/18/2011 8:37 PM, Chip Camden wrote: >> st> Thanks, Attilio. I've applied the patch and removed the extra debug >> st> options I had added (though keeping debug symbols). I'll let you know if >> st> I experience any more panics. >> >> No panic for 20 hours at this moment, FYI. For my NFS server, I >> think another 24 hours would be sufficient to confirm the stability. >> I will see how it works... >> >> -- Hiroki > > Likewise: > > $ uptime > 5:37PM up 21:45, 5 users, load averages: 0.68, 0.45, 0.63 > > So far, so good (knocks on head). > 0(ns4)% uptime 8:55AM up 22:39, 3 users, load averages: 0.01, 0.00, 0.00 0(ns4)% So far so good for me too ---Mike -- ------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/ From owner-freebsd-stable@FreeBSD.ORG Fri Aug 19 15:06:17 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BEBE6106566B for ; Fri, 19 Aug 2011 15:06:17 +0000 (UTC) (envelope-from sterling@camdensoftware.com) Received: from wh1.interactivevillages.com (ca.2e.7bae.static.theplanet.com [174.123.46.202]) by mx1.freebsd.org (Postfix) with ESMTP id 8341F8FC0A for ; Fri, 19 Aug 2011 15:06:17 +0000 (UTC) Received: from 184-78-197-203.war.clearwire-wmx.net ([184.78.197.203] helo=_HOSTNAME_) by wh1.interactivevillages.com with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from ) id 1QuQeE-0004Hf-S5 for freebsd-stable@FreeBSD.org; Fri, 19 Aug 2011 08:05:52 -0700 Received: by _HOSTNAME_ (sSMTP sendmail emulation); Fri, 19 Aug 2011 08:06:12 -0700 Date: Fri, 19 Aug 2011 08:06:12 -0700 From: Chip Camden To: freebsd-stable@FreeBSD.org Message-ID: <20110819150612.GA34969@libertas.local.camdensoftware.com> Mail-Followup-To: freebsd-stable@FreeBSD.org References: <20110818.091600.831954331552558249.hrs@allbsd.org> <20110818025550.GA1971@libertas.local.camdensoftware.com> <20110819.092811.1087267565626420460.hrs@allbsd.org> <20110819003759.GC54831@libertas.local.camdensoftware.com> <4E4E5D49.4040502@sentex.net> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="vkogqOf2sHV7VnPd" Content-Disposition: inline In-Reply-To: <4E4E5D49.4040502@sentex.net> User-Agent: Mutt/1.4.2.3i Company: Camden Software Consulting URL: http://camdensoftware.com X-PGP-Key: http://pgp.mit.edu:11371/pks/lookup?search=0xD6DBAF91 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - wh1.interactivevillages.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - camdensoftware.com X-Source: X-Source-Args: X-Source-Dir: Cc: Subject: Re: panic: spin lock held too long (RELENG_8 from today) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2011 15:06:17 -0000 --vkogqOf2sHV7VnPd Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Quoth Mike Tancsa on Friday, 19 August 2011: > On 8/18/2011 8:37 PM, Chip Camden wrote: >=20 > >> st> Thanks, Attilio. I've applied the patch and removed the extra deb= ug > >> st> options I had added (though keeping debug symbols). I'll let you = know if > >> st> I experience any more panics. > >> > >> No panic for 20 hours at this moment, FYI. For my NFS server, I > >> think another 24 hours would be sufficient to confirm the stability. > >> I will see how it works... > >> > >> -- Hiroki > >=20 > > Likewise: > >=20 > > $ uptime > > 5:37PM up 21:45, 5 users, load averages: 0.68, 0.45, 0.63 > >=20 > > So far, so good (knocks on head). > >=20 >=20 >=20 > 0(ns4)% uptime > 8:55AM up 22:39, 3 users, load averages: 0.01, 0.00, 0.00 > 0(ns4)% >=20 >=20 > So far so good for me too >=20 > ---Mike >=20 > --=20 > ------------------- > Mike Tancsa, tel +1 519 651 3400 > Sentex Communications, mike@sentex.net > Providing Internet services since 1994 www.sentex.net > Cambridge, Ontario Canada http://www.tancsa.com/ > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" Still up and running here. 8:02AM up 1 day, 12:10, 4 users, load averages: 0.08, 0.26, 0.52 After the panics began, I never went more than 12 hours without one before applying this patch. I think you nailed it, Attilio. Or at least, you moved it. --=20 =2EO. | Sterling (Chip) Camden | http://camdensoftware.com =2E.O | sterling@camdensoftware.com | http://chipsquips.com OOO | 2048R/D6DBAF91 | http://chipstips.com --vkogqOf2sHV7VnPd Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iQEcBAEBAgAGBQJOTnvkAAoJEIpckszW26+RbXAH/RmLvrpkuuZU7wUAaXpN/jC/ t6x6ZMWUDJId2AlH3SIORFFDw2VvSQOxck14hZvGGHhBNYsqtfrdrAHi4cZual6S Lv6hlcCN4asS52wsKCqoBvOasF5xZV+3L+0RARRhwO8kBNh2zKJ0jHYsQijBa/gw xc1LLE4MPPETQGvZe/yxIQuC/oO5Sdo+zW85g6/8XX84ydDrEZqPSwbPbmtGrj3S +vGMmexfnhlslgVlHboPnYIOnwRQKMkLb5oM7xejbx4yl6jn8qHtAFo+ltNftj4D 6vhQ/5AsNWimmHdsj/ZGTcTgM537k7gKSgYQvmpJolqdjqJ7hrFZOCRW7ewmPdI= =S8yN -----END PGP SIGNATURE----- --vkogqOf2sHV7VnPd-- From owner-freebsd-stable@FreeBSD.ORG Fri Aug 19 16:28:08 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 82BE3106566C; Fri, 19 Aug 2011 16:28:08 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 609F08FC17; Fri, 19 Aug 2011 16:28:06 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA02123; Fri, 19 Aug 2011 19:28:05 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4E4E8F15.5030301@FreeBSD.org> Date: Fri, 19 Aug 2011 19:28:05 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110705 Thunderbird/5.0 MIME-Version: 1.0 To: John Baldwin References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk> <4E4C22D6.6070407@FreeBSD.org> <4E4D717F.3090802@FreeBSD.org> <201108190814.00885.jhb@freebsd.org> In-Reply-To: <201108190814.00885.jhb@freebsd.org> X-Enigmail-Version: 1.2pre Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2011 16:28:08 -0000 on 19/08/2011 15:14 John Baldwin said the following: > Yes, it is a bug in kgdb that it only walks allproc and not zombproc. Try this: The patch worked perfectly well for me, thank you! > Index: kthr.c > =================================================================== > --- kthr.c (revision 224879) > +++ kthr.c (working copy) > @@ -73,11 +73,52 @@ kgdb_thr_first(void) > return (first); > } > > +static void > +kgdb_thr_add_procs(uintptr_t paddr) > +{ > + struct proc p; > + struct thread td; > + struct kthr *kt; > + CORE_ADDR addr; > + > + while (paddr != 0) { > + if (kvm_read(kvm, paddr, &p, sizeof(p)) != sizeof(p)) { > + warnx("kvm_read: %s", kvm_geterr(kvm)); > + break; > + } > + addr = (uintptr_t)TAILQ_FIRST(&p.p_threads); > + while (addr != 0) { > + if (kvm_read(kvm, addr, &td, sizeof(td)) != > + sizeof(td)) { > + warnx("kvm_read: %s", kvm_geterr(kvm)); > + break; > + } > + kt = malloc(sizeof(*kt)); > + kt->next = first; > + kt->kaddr = addr; > + if (td.td_tid == dumptid) > + kt->pcb = dumppcb; > + else if (td.td_state == TDS_RUNNING && stoppcbs != 0 && > + CPU_ISSET(td.td_oncpu, &stopped_cpus)) > + kt->pcb = (uintptr_t)stoppcbs + > + sizeof(struct pcb) * td.td_oncpu; > + else > + kt->pcb = (uintptr_t)td.td_pcb; > + kt->kstack = td.td_kstack; > + kt->tid = td.td_tid; > + kt->pid = p.p_pid; > + kt->paddr = paddr; > + kt->cpu = td.td_oncpu; > + first = kt; > + addr = (uintptr_t)TAILQ_NEXT(&td, td_plist); > + } > + paddr = (uintptr_t)LIST_NEXT(&p, p_list); > + } > +} > + > struct kthr * > kgdb_thr_init(void) > { > - struct proc p; > - struct thread td; > long cpusetsize; > struct kthr *kt; > CORE_ADDR addr; > @@ -113,37 +154,11 @@ kgdb_thr_init(void) > > stoppcbs = kgdb_lookup("stoppcbs"); > > - while (paddr != 0) { > - if (kvm_read(kvm, paddr, &p, sizeof(p)) != sizeof(p)) { > - warnx("kvm_read: %s", kvm_geterr(kvm)); > - break; > - } > - addr = (uintptr_t)TAILQ_FIRST(&p.p_threads); > - while (addr != 0) { > - if (kvm_read(kvm, addr, &td, sizeof(td)) != > - sizeof(td)) { > - warnx("kvm_read: %s", kvm_geterr(kvm)); > - break; > - } > - kt = malloc(sizeof(*kt)); > - kt->next = first; > - kt->kaddr = addr; > - if (td.td_tid == dumptid) > - kt->pcb = dumppcb; > - else if (td.td_state == TDS_RUNNING && stoppcbs != 0 && > - CPU_ISSET(td.td_oncpu, &stopped_cpus)) > - kt->pcb = (uintptr_t) stoppcbs + sizeof(struct pcb) * td.td_oncpu; > - else > - kt->pcb = (uintptr_t)td.td_pcb; > - kt->kstack = td.td_kstack; > - kt->tid = td.td_tid; > - kt->pid = p.p_pid; > - kt->paddr = paddr; > - kt->cpu = td.td_oncpu; > - first = kt; > - addr = (uintptr_t)TAILQ_NEXT(&td, td_plist); > - } > - paddr = (uintptr_t)LIST_NEXT(&p, p_list); > + kgdb_thr_add_procs(paddr); > + addr = kgdb_lookup("zombproc"); > + if (addr != 0) { > + kvm_read(kvm, addr, &paddr, sizeof(paddr)); > + kgdb_thr_add_procs(paddr); > } > curkthr = kgdb_thr_lookup_tid(dumptid); > if (curkthr == NULL) > >> is there an easy way to examine its stack in this case? > > Hmm, you can use something like this from my kgdb macros. Oh, I completely forgot about them. I hope I will remember where to search for the tricks next time I need them :-) Thank you again! > For amd64: > > # Do a backtrace given %rip and %rbp as args > define bt > set $_rip = $arg0 > set $_rbp = $arg1 > set $i = 0 > while ($_rbp != 0 || $_rip != 0) > printf "%2d: pc ", $i > if ($_rip != 0) > x/1i $_rip > else > printf "\n" > end > if ($_rbp == 0) > set $_rip = 0 > else > set $fr = (struct amd64_frame *)$_rbp > set $_rbp = $fr->f_frame > set $_rip = $fr->f_retaddr > set $i = $i + 1 > end > end > end > > document bt > Given values for %rip and %rbp, perform a manual backtrace. > end > > define btf > bt $arg0.tf_rip $arg0.tf_rbp > end > > document btf > Do a manual backtrace from a specified trapframe. > end > > For i386: > > # Do a backtrace given %eip and %ebp as args > define bt > set $_eip = $arg0 > set $_ebp = $arg1 > set $i = 0 > while ($_ebp != 0 || $_eip != 0) > printf "%2d: pc ", $i > if ($_eip != 0) > x/1i $_eip > else > printf "\n" > end > if ($_ebp == 0) > set $_eip = 0 > else > set $fr = (struct i386_frame *)$_ebp > set $_ebp = $fr->f_frame > set $_eip = $fr->f_retaddr > set $i = $i + 1 > end > end > end > > document bt > Given values for %eip and %ebp, perform a manual backtrace. > end > > define btf > bt $arg0.tf_eip $arg0.tf_ebp > end > > document btf > Do a manual backtrace from a specified trapframe. > end > -- Andriy Gapon From owner-freebsd-stable@FreeBSD.ORG Fri Aug 19 16:32:19 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0AB08106566C for ; Fri, 19 Aug 2011 16:32:19 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 5A21C8FC16 for ; Fri, 19 Aug 2011 16:32:17 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA02153; Fri, 19 Aug 2011 19:32:13 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4E4E900D.8010506@FreeBSD.org> Date: Fri, 19 Aug 2011 19:32:13 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110705 Thunderbird/5.0 MIME-Version: 1.0 To: Hans Petter Selasky References: <4E4D460A.2080100@FreeBSD.org> <201108182324.58276.hselasky@c2i.net> In-Reply-To: <201108182324.58276.hselasky@c2i.net> X-Enigmail-Version: 1.2pre Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-stable@FreeBSD.org Subject: Re: USB/coredump hangs in 8 and 9 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2011 16:32:19 -0000 on 19/08/2011 00:24 Hans Petter Selasky said the following: > On Thursday 18 August 2011 19:04:10 Andriy Gapon wrote: >> If you can help Hans to figure out what you is wrong with USB subsystem in >> this respect that would help us all. > > Hi, > > usb_busdma.c: /* we use "mtx_owned()" instead of this function */ > usb_busdma.c: owned = mtx_owned(uptag->mtx); > usb_compat_linux.c: do_unlock = mtx_owned(&Giant) ? 0 : 1; > usb_compat_linux.c: do_unlock = mtx_owned(&Giant) ? 0 : 1; > usb_compat_linux.c: do_unlock = mtx_owned(&Giant) ? 0 : 1; > usb_hub.c: if (mtx_owned(&bus->bus_mtx)) { > usb_transfer.c: if (!mtx_owned(info->xfer_mtx)) { > usb_transfer.c: if (mtx_owned(xfer->xroot->xfer_mtx)) { > usb_transfer.c: while (mtx_owned(&xroot->udev->bus->bus_mtx)) { > usb_transfer.c: while (mtx_owned(xroot->xfer_mtx)) { > > One fix you will need to do, if mtx_owned is not giving correct value is: First, could you please clarify what is the correct, or rather - expected, value in this case. It's not immediately clear to me if we should consider all locks as owned or un-owned in a situation where all locks are actually skipped behind the scenes. Maybe USB code should explicitly check for that condition as to not make any unsafe assumptions. Second, it's not clear to me what the above list actually represents in the context of this discussion. > static void > usbd_callback_wrapper(struct usb_xfer_queue *pq) > { > struct usb_xfer *xfer = pq->curr; > struct usb_xfer_root *info = xfer->xroot; > > USB_BUS_LOCK_ASSERT(info->bus, MA_OWNED); > if (!mtx_owned(info->xfer_mtx)) { > > The above "if" should be anded with && !paniced && !dumping ... or maybe the > new not scheduling variable is good for this purpose? > > /* > * Cases that end up here: > * > > #if USB_HAVE_BUSDMA > if (mtx_owned(xfer->xroot->xfer_mtx)) { > struct usb_xfer_queue *pq; > > > This case is more like a BUS-DMA error case, and is not so important to > execute. > > --HPS -- Andriy Gapon From owner-freebsd-stable@FreeBSD.ORG Fri Aug 19 16:44:39 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2EE0D106564A; Fri, 19 Aug 2011 16:44:38 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com [209.85.160.182]) by mx1.freebsd.org (Postfix) with ESMTP id CA6068FC13; Fri, 19 Aug 2011 16:44:37 +0000 (UTC) Received: by gyd10 with SMTP id 10so2717217gyd.13 for ; Fri, 19 Aug 2011 09:44:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=0Lz5H6GCIvnE1/LQqKZyiedNy77sAEIyOH0cmJVDOFk=; b=aBdz3xeAtXi14nYrzdTfbYnnuhNqjufzZl/IQ9oX2bGTTZNnmH34Yd1WsZIPayJECW KDX3eLfZRxwYbItBWAvsxE58zhqHvdqB7vtaJClFX4Fx0mQOJotTU2R1foquCqQtbepG XTtxiWrQ8g2CvxKbJ/Vb8kVORyJdwvRQtliaQ= MIME-Version: 1.0 Received: by 10.236.80.9 with SMTP id j9mr8972365yhe.94.1313772277318; Fri, 19 Aug 2011 09:44:37 -0700 (PDT) Sender: asmrookie@gmail.com Received: by 10.236.108.33 with HTTP; Fri, 19 Aug 2011 09:44:37 -0700 (PDT) In-Reply-To: References: Date: Fri, 19 Aug 2011 18:44:37 +0200 X-Google-Sender-Auth: -paXys0lGuRJVgp-FLwxqxKuPdc Message-ID: From: Attilio Rao To: Andrew Boyer Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-stable@freebsd.org, Eugene Grosbein , Vishal.Shah@netapp.com, Andriy Gapon , Hans Petter Selasky , Jeremiah Lott , Steven Hartland Subject: Re: USB/coredump hangs in 8 and 9 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2011 16:44:39 -0000 2011/8/12 Andrew Boyer : > Re: panic: bufwrite: buffer is not busy??? (originally on freebsd-net) > Re: debugging frequent kernel panics on 8.2-RELEASE (originally on freebs= d-stable) > Re: System hang in USB umass module while processing panic =C2=A0(origina= lly on freebsd-usb) > > Hello Andriy and Hans, > > Sorry for tying in so many discussions on this topic, but I think I have = an explanation for the problems we have been reporting* with hanging coredu= mps on multicore systems on 8.2-RELEASE, and it has implications for Andriy= 's proposed scheduler patch** and for USB. > > In today's 8.X and 9.X branches, nothing that I can find stops the other = CPUs when the kernel panics, but many parts of the locking code get disable= d (grep on 'panicstr'). =C2=A0The 'bufwrite: buffer is not busy???' panic i= s caused by the syncer encountering an error. =C2=A0If that happens when it= 's on the dumping CPU everything hangs. =C2=A0If it's running on a differen= t CPU, it will be blocked and hidden by the panic_cpu spinlock in panic(), = and the dump continues, polling every attached keyboard for a Ctl-C. > > But, the new 8.X USB stack relies on multithreading. =C2=A0(The new stack= is the variable that broke coredumps for us in the 7.1->8.2 transition, I = think.) =C2=A0SVN 224223 fixes a hang that would happen when dumpsys() poll= s the USB keyboard (IPMI KVM, in our case). =C2=A0That helps, but it only g= ets as far as usb_process(), where it hangs in a loop around a cv_wait() ca= ll. =C2=A0This is easy to reproduce by adding code to the watchdog to break= into the debugger if panicstr is set. > > I am experimenting with Andriy's patch** to stop the scheduler and it see= ms to be most of the way there, stopping the CPUs and disabling the rest of= locking. =C2=A0There are a few places that still reference panicstr, but t= hat's minor. =C2=A0These are the changes I made to the patch: > =C2=A0* Changed ukbd_do_poll() to return immediately if SCHEDULER_STOPPED= () is true, so that we don't hang up in USB. =C2=A0ukbd_yield() =C2=A0locks= up in DROP_GIANT(), and if you skip ukbd_yield(), usbd_transfer_poll() loc= ks up trying to drop mutexes. > =C2=A0* Changed the call to spinlock_enter() back to critical_enter(), so= that interrupts stay enabled and the hardclock still functions. Which spinlock_enter() are you referring here? I think that having interrupts fast handlers running during panic/shutdown is something we should avoid like hell. Attilio --=20 Peace can only be achieved by understanding - A. Einstein From owner-freebsd-stable@FreeBSD.ORG Fri Aug 19 21:09:10 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2C40E1065679 for ; Fri, 19 Aug 2011 21:09:10 +0000 (UTC) (envelope-from dan@langille.org) Received: from nyi.unixathome.org (nyi.unixathome.org [64.147.113.42]) by mx1.freebsd.org (Postfix) with ESMTP id 028A58FC08 for ; Fri, 19 Aug 2011 21:09:09 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by nyi.unixathome.org (Postfix) with ESMTP id C69B450A09 for ; Fri, 19 Aug 2011 20:50:02 +0000 (UTC) X-Virus-Scanned: amavisd-new at unixathome.org Received: from nyi.unixathome.org ([127.0.0.1]) by localhost (nyi.unixathome.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 29vy5iYDDqPK for ; Fri, 19 Aug 2011 21:50:02 +0100 (BST) Received: from smtp-auth.unixathome.org (smtp-auth.unixathome.org [10.4.7.7]) (Authenticated sender: hidden) by nyi.unixathome.org (Postfix) with ESMTPSA id 63A34509F3 for ; Fri, 19 Aug 2011 20:50:02 +0000 (UTC) From: Dan Langille Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Date: Fri, 19 Aug 2011 16:50:01 -0400 Message-Id: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org> To: freebsd-stable@freebsd.org Mime-Version: 1.0 (Apple Message framework v1084) X-Mailer: Apple Mail (2.1084) Subject: bad sector in gmirror HDD X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2011 21:09:10 -0000 System in question: FreeBSD 8.2-STABLE #3: Thu Mar 3 04:52:04 GMT 2011 After a recent power failure, I'm seeing this in my logs: Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently = unreadable (pending) sectors And gmirror reports: # gmirror status Name Status Components mirror/gm0 DEGRADED ad0 (100%) ad2 I think the solution is: gmirror rebuild Comments? Searching on that error message, I was led to believe that identifying = the bad sector and running dd to read it would cause the HDD to reallocate that bad block. http://smartmontools.sourceforge.net/badblockhowto.html However, since ad2 is one half of a gmirror, I don't think this is the = best approach. Comments? More information: smartd, gpart, dh, diskinfo, and fdisk output at = http://beta.freebsddiary.org/smart-fixing-bad-sector.php also: # gmirror list Geom name: gm0 State: DEGRADED Components: 2 Balance: round-robin Slice: 4096 Flags: NONE GenID: 0 SyncID: 1 ID: 3362720654 Providers: 1. Name: mirror/gm0 Mediasize: 40027028992 (37G) Sectorsize: 512 Mode: r6w5e14 Consumers: 1. Name: ad0 Mediasize: 40027029504 (37G) Sectorsize: 512 Mode: r1w1e1 State: SYNCHRONIZING Priority: 0 Flags: DIRTY, SYNCHRONIZING GenID: 0 SyncID: 1 Synchronized: 100% ID: 949692477 2. Name: ad2 Mediasize: 40027029504 (37G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: DIRTY, BROKEN GenID: 0 SyncID: 1 ID: 3585934016 --=20 Dan Langille - http://langille.org From owner-freebsd-stable@FreeBSD.ORG Fri Aug 19 21:52:02 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C6781106564A for ; Fri, 19 Aug 2011 21:52:02 +0000 (UTC) (envelope-from cswiger@mac.com) Received: from asmtpout025.mac.com (asmtpout025.mac.com [17.148.16.100]) by mx1.freebsd.org (Postfix) with ESMTP id AFFAC8FC08 for ; Fri, 19 Aug 2011 21:52:02 +0000 (UTC) MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; CHARSET=US-ASCII Received: from [17.153.44.144] by asmtp025.mac.com (Oracle Communications Messaging Exchange Server 7u4-20.01 64bit (built Nov 21 2010)) with ESMTPSA id <0LQ7000ZO3DZCG70@asmtp025.mac.com> for freebsd-stable@freebsd.org; Fri, 19 Aug 2011 14:51:37 -0700 (PDT) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.4.6813,1.0.211,0.0.0000 definitions=2011-08-19_08:2011-08-19, 2011-08-19, 1970-01-01 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 ipscore=0 suspectscore=0 phishscore=0 bulkscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx engine=6.0.2-1012030000 definitions=main-1108190264 From: Chuck Swiger In-reply-to: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org> Date: Fri, 19 Aug 2011 14:51:34 -0700 Message-id: <65474D95-F56F-4DC7-8029-BA7166C4E46F@mac.com> References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org> To: Dan Langille X-Mailer: Apple Mail (2.1084) Cc: freebsd-stable@freebsd.org Subject: Re: bad sector in gmirror HDD X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2011 21:52:02 -0000 On Aug 19, 2011, at 1:50 PM, Dan Langille wrote: > Searching on that error message, I was led to believe that identifying the bad sector and > running dd to read it would cause the HDD to reallocate that bad block. > > http://smartmontools.sourceforge.net/badblockhowto.html > > However, since ad2 is one half of a gmirror, I don't think this is the best approach. > > Comments? Reading the underlying failing drive with dd will help identify any other questionable sectors. However, your drive temps are too high-- many vendors call out either 50C or 55C as the point where drive reliability becomes significantly degraded. Regards, -- -Chuck From owner-freebsd-stable@FreeBSD.ORG Fri Aug 19 23:21:31 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F335F106564A for ; Fri, 19 Aug 2011 23:21:30 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta04.westchester.pa.mail.comcast.net (qmta04.westchester.pa.mail.comcast.net [76.96.62.40]) by mx1.freebsd.org (Postfix) with ESMTP id A28F88FC08 for ; Fri, 19 Aug 2011 23:21:30 +0000 (UTC) Received: from omta20.westchester.pa.mail.comcast.net ([76.96.62.71]) by qmta04.westchester.pa.mail.comcast.net with comcast id NPJP1h0021YDfWL54PMWHa; Fri, 19 Aug 2011 23:21:30 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta20.westchester.pa.mail.comcast.net with comcast id NPMS1h0191t3BNj3gPMTnv; Fri, 19 Aug 2011 23:21:29 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 5054F102C1A; Fri, 19 Aug 2011 16:21:25 -0700 (PDT) Date: Fri, 19 Aug 2011 16:21:25 -0700 From: Jeremy Chadwick To: Dan Langille Message-ID: <20110819232125.GA4965@icarus.home.lan> References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-stable@freebsd.org Subject: Re: bad sector in gmirror HDD X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2011 23:21:31 -0000 On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote: > System in question: FreeBSD 8.2-STABLE #3: Thu Mar 3 04:52:04 GMT 2011 > > After a recent power failure, I'm seeing this in my logs: > > Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently unreadable (pending) sectors I doubt this is related to a power failure. > Searching on that error message, I was led to believe that identifying the bad sector and > running dd to read it would cause the HDD to reallocate that bad block. > > http://smartmontools.sourceforge.net/badblockhowto.html This is incorrect (meaning you've misunderstood what's written there). Unreadable LBAs can be a result of the LBA being actually bad (as in uncorrectable), or the LBA being marked "suspect". In either case the LBA will return an I/O error when read. If the LBAs are marked "suspect", the drive will perform re-analysis of the LBA (to determine if the LBA can be read and the data re-mapped, or if it cannot then the LBA is marked uncorrectable) when you **write** to the LBA. The above smartd output doesn't tell me much. Providing actual SMART attribute data (smartctl -a) for the drive would help. The brand of the drive, the firmware version, and the model all matter -- every drive behaves a little differently. Furthermore, if the LBA is re-analysed and determined to be uncorrectable -- regardless of remapping -- this doesn't actually fix I/O errors on a filesystem level. The filesystem itself (and more often than not in the data section of the file/inode, so things like fsck can't work around this) can still contain references to the LBA which is uncorrectable, and will still continue to return I/O errors when read. There has to be a way to tell the filesystem, when formatted, "avoid use of this LBA". How UFS/FFS handles this is unknown to me. I know of badsect(8) but I don't know if this works. "Transparent" remapping I have never seen work except on SSDs. If you want me to step you through the procedure of re-testing the LBAs (assuming they're suspect and not uncorrectable) I can do so, just ask. Finding the suspect LBAs can be done using a dd loop (I wrote a shell script for this), or using "smartctl -t select,0-max /dev/XXX" and let the drive's internal selective test see if it can find them. From there it's an issue of submitting a write request to the LBA and seeing what happens (I do this via dd as well, but the parameters you pass it are very specific, e.g. don't mix up/misunderstand seek vs. skip). I've assisted with this time and time again for folks on forums with varying success. I've also found some models of drives which claim there's suspect LBAs yet an internal surface scan passes with no issues (and these are drives which I myself have, the only difference between my drives and the individuals' drive is firmware, which leads me to believe a bug on some drives in the field). -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-stable@FreeBSD.ORG Fri Aug 19 23:53:50 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C22BB1065670; Fri, 19 Aug 2011 23:53:50 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-gw0-f54.google.com (mail-gw0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id 560328FC08; Fri, 19 Aug 2011 23:53:50 +0000 (UTC) Received: by gwb15 with SMTP id 15so2335350gwb.13 for ; Fri, 19 Aug 2011 16:53:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=Xuj6PAUpGO/iake7/sudbaeNchPxrBxs8fUdliE4DYQ=; b=HC/Hu5uIrhD9LJE3SFIMZyz15FZ7mDSnf1mvDABpnP1CwxZXh2/2B/mf5xww+C5M8a cC2VrebwGsJcIep4JwD2wA5I3a+hmtg2OtUFOXcEA61SKgebD3Foe4DONj3LoHQ96rqM oT0M8wfcZ35IvJGIVGfo8hM7A58W6LOL0raeA= MIME-Version: 1.0 Received: by 10.236.116.40 with SMTP id f28mr2295591yhh.60.1313798029787; Fri, 19 Aug 2011 16:53:49 -0700 (PDT) Sender: asmrookie@gmail.com Received: by 10.236.108.33 with HTTP; Fri, 19 Aug 2011 16:53:49 -0700 (PDT) In-Reply-To: <4E4E5D49.4040502@sentex.net> References: <20110818.091600.831954331552558249.hrs@allbsd.org> <20110818025550.GA1971@libertas.local.camdensoftware.com> <20110819.092811.1087267565626420460.hrs@allbsd.org> <20110819003759.GC54831@libertas.local.camdensoftware.com> <4E4E5D49.4040502@sentex.net> Date: Sat, 20 Aug 2011 01:53:49 +0200 X-Google-Sender-Auth: qEZlhSvUegqFgRJ99Kef9GaLXuU Message-ID: From: Attilio Rao To: Mike Tancsa Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: kostikbel@gmail.com, Nick Esborn , freebsd-stable@freebsd.org, avg@freebsd.org Subject: Re: panic: spin lock held too long (RELENG_8 from today) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2011 23:53:50 -0000 If nobody complains about it earlier, I'll propose the patch to re@ in 8 ho= urs. Attilio 2011/8/19 Mike Tancsa : > On 8/18/2011 8:37 PM, Chip Camden wrote: > >>> st> Thanks, Attilio. =C2=A0I've applied the patch and removed the extra= debug >>> st> options I had added (though keeping debug symbols). =C2=A0I'll let = you know if >>> st> I experience any more panics. >>> >>> =C2=A0No panic for 20 hours at this moment, FYI. =C2=A0For my NFS serve= r, I >>> =C2=A0think another 24 hours would be sufficient to confirm the stabili= ty. >>> =C2=A0I will see how it works... >>> >>> -- Hiroki >> >> Likewise: >> >> $ uptime >> =C2=A05:37PM =C2=A0up 21:45, 5 users, load averages: 0.68, 0.45, 0.63 >> >> So far, so good (knocks on head). >> > > > 0(ns4)% uptime > =C2=A08:55AM =C2=A0up 22:39, 3 users, load averages: 0.01, 0.00, 0.00 > 0(ns4)% > > > So far so good for me too > > =C2=A0 =C2=A0 =C2=A0 =C2=A0---Mike > > -- > ------------------- > Mike Tancsa, tel +1 519 651 3400 > Sentex Communications, mike@sentex.net > Providing Internet services since 1994 www.sentex.net > Cambridge, Ontario Canada =C2=A0 http://www.tancsa.com/ > --=20 Peace can only be achieved by understanding - A. Einstein From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 00:14:57 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2F203106566B for ; Sat, 20 Aug 2011 00:14:57 +0000 (UTC) (envelope-from db@db.net) Received: from diana.db.net (diana.db.net [66.113.102.10]) by mx1.freebsd.org (Postfix) with ESMTP id 1B9488FC0A for ; Sat, 20 Aug 2011 00:14:56 +0000 (UTC) Received: from night.db.net (localhost [127.0.0.1]) by diana.db.net (Postfix) with ESMTP id 158F62282A; Fri, 19 Aug 2011 17:48:20 -0600 (MDT) Received: by night.db.net (Postfix, from userid 1000) id 0FA996533; Fri, 19 Aug 2011 19:57:19 -0400 (EDT) Date: Fri, 19 Aug 2011 19:57:19 -0400 From: Diane Bruce To: Dan Langille Message-ID: <20110819235719.GA64220@night.db.net> References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org> User-Agent: Mutt/1.4.2.3i Cc: freebsd-stable@freebsd.org Subject: Re: bad sector in gmirror HDD X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 00:14:57 -0000 On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote: > System in question: FreeBSD 8.2-STABLE #3: Thu Mar 3 04:52:04 GMT 2011 > > After a recent power failure, I'm seeing this in my logs: > > Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently unreadable (pending) sectors > Personally, I'd replace that drive now. > Searching on that error message, I was led to believe that identifying the bad sector and > running dd to read it would cause the HDD to reallocate that bad block. No, as otherwise mentioned (Hi Jeremy!) you need to read and write the block. This could buy you a few more days or a few more weeks. Personally, I would not wait. Your call. > Comments? ... > Dan Langille - http://langille.org - Diane -- - db@FreeBSD.org db@db.net http://www.db.net/~db Why leave money to our children if we don't leave them the Earth? From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 00:51:03 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 56142106566B for ; Sat, 20 Aug 2011 00:51:03 +0000 (UTC) (envelope-from kob6558@gmail.com) Received: from mail-gw0-f54.google.com (mail-gw0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id 1912F8FC18 for ; Sat, 20 Aug 2011 00:51:02 +0000 (UTC) Received: by gwb15 with SMTP id 15so2352159gwb.13 for ; Fri, 19 Aug 2011 17:51:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=mthNhNDly5UeuehxGgrRZJoGGvFlzBzKGLh3/jP2/RU=; b=P2lh2vxgDkC8yPuWt2HSEIrPrLYgHu6CYUMTIi9OS1p+mhxx736x9V//SlygrMtUc7 O6ni74t7P7dnbjWwRVc8hAwzDmvk8bBcXBCiGo/rwm8fKEr7cS1JCpEzjLnkhWWaUGHK GdIjR3M9A6c51UVO6SdlarDDTNqlK1iPZzGu8= MIME-Version: 1.0 Received: by 10.150.236.9 with SMTP id j9mr36820ybh.167.1313801462341; Fri, 19 Aug 2011 17:51:02 -0700 (PDT) Received: by 10.151.98.3 with HTTP; Fri, 19 Aug 2011 17:51:02 -0700 (PDT) In-Reply-To: <20110819235719.GA64220@night.db.net> References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org> <20110819235719.GA64220@night.db.net> Date: Fri, 19 Aug 2011 17:51:02 -0700 Message-ID: From: Kevin Oberman To: Dan Langille Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-stable@freebsd.org Subject: Re: bad sector in gmirror HDD X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 00:51:03 -0000 On Fri, Aug 19, 2011 at 4:57 PM, Diane Bruce wrote: > On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote: >> System in question: FreeBSD 8.2-STABLE #3: Thu Mar =A03 04:52:04 GMT 201= 1 >> >> After a recent power failure, I'm seeing this in my logs: >> >> Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently unreada= ble (pending) sectors >> > > Personally, I'd replace that drive now. > >> Searching on that error message, I was led to believe that identifying t= he bad sector and >> running dd to read it would cause the HDD to reallocate that bad block. > > No, as otherwise mentioned (Hi Jeremy!) you need to read and write the > block. This could buy you a few more days or a few more weeks. Personally= , > I would not wait. Your call. > While I largely agree, it depends on several factors as to whether I'd replace the drive. First, what does SMART show other then these errors? If the reported statistics look generally good, and considering that you a mirror with one "good" copy of the blocks in question, the impact is zero unless the other drive fails. That is why the blocks need to be re-written so that they will be re-located on the drive. Second, how critical is the data? The mirror gives good integrity, but you also need good backups. If the data MUST be on-line with high reliability, buy a replacement drive. You need to look at cost-benefit (or really the cost of replacement vs. cost of failure). It's worth mentioning that all drives have bad blocks. Most are hard bad blocks and are re-mapped before the drive is shipped, but marginal bad blocks can and do slip through to customers and it is entirely possible that the drive is just fine for the most part and replacing it is really a waste of money. Only you can make the call, but if further bad blocks show up in the near term, I'll go along with recommending replacement. --=20 R. Kevin Oberman, Network Engineer - Retired E-mail: kob6558@gmail.com From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 01:14:08 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2EAD31065670 for ; Sat, 20 Aug 2011 01:14:08 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta12.westchester.pa.mail.comcast.net (qmta12.westchester.pa.mail.comcast.net [76.96.59.227]) by mx1.freebsd.org (Postfix) with ESMTP id CFDC28FC13 for ; Sat, 20 Aug 2011 01:14:07 +0000 (UTC) Received: from omta01.westchester.pa.mail.comcast.net ([76.96.62.11]) by qmta12.westchester.pa.mail.comcast.net with comcast id NRCw1h0020EZKEL5CRE8ze; Sat, 20 Aug 2011 01:14:08 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta01.westchester.pa.mail.comcast.net with comcast id NRE61h0191t3BNj3MRE70T; Sat, 20 Aug 2011 01:14:08 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 4D774102C1A; Fri, 19 Aug 2011 18:14:05 -0700 (PDT) Date: Fri, 19 Aug 2011 18:14:05 -0700 From: Jeremy Chadwick To: Kevin Oberman Message-ID: <20110820011405.GA20330@icarus.home.lan> References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org> <20110819235719.GA64220@night.db.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-stable@freebsd.org, Dan Langille Subject: Re: bad sector in gmirror HDD X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 01:14:08 -0000 On Fri, Aug 19, 2011 at 05:51:02PM -0700, Kevin Oberman wrote: > On Fri, Aug 19, 2011 at 4:57 PM, Diane Bruce wrote: > > On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote: > >> System in question: FreeBSD 8.2-STABLE #3: Thu Mar ?3 04:52:04 GMT 2011 > >> > >> After a recent power failure, I'm seeing this in my logs: > >> > >> Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently unreadable (pending) sectors > >> > > > > Personally, I'd replace that drive now. > > > >> Searching on that error message, I was led to believe that identifying the bad sector and > >> running dd to read it would cause the HDD to reallocate that bad block. > > > > No, as otherwise mentioned (Hi Jeremy!) you need to read and write the > > block. This could buy you a few more days or a few more weeks. Personally, > > I would not wait. Your call. > > > > While I largely agree, it depends on several factors as to whether I'd > replace the drive. > > First, what does SMART show other then these errors? If the reported > statistics look generally good, and considering that you a mirror with > one "good" copy of the blocks in question, the impact is zero unless > the other drive fails. That is why the blocks need to be re-written so > that they will be re-located on the drive. > > Second, how critical is the data? The mirror gives good integrity, but > you also need good backups. If the data MUST be on-line with high > reliability, buy a replacement drive. You need to look at cost-benefit > (or really the cost of replacement vs. cost of failure). > > It's worth mentioning that all drives have bad blocks. Most are hard > bad blocks and are re-mapped before the drive is shipped, but marginal > bad blocks can and do slip through to customers and it is entirely > possible that the drive is just fine for the most part and replacing > it is really a waste of money. > > Only you can make the call, but if further bad blocks show up in the > near term, I'll go along with recommending replacement. I can expand a bit on this. With ATA/SATA and SCSI disks, there's a factory default list of LBAs which are bad (referred to as the "physical defect list"). Everyone by now is familiar with this. With SCSI disks there's "grown defects", which is a drive-managed AND user-managed list of LBAs which are considered bad. Whether these LBAs were correctable (remapped) or not is tracked by SMART on SCSI. I can provide many examples of this if people want to see what it looks like (we have quite a collection of Fujitsu disks at my workplace. They're one of a few vendors I more or less boycott). With SCSI, you can clear the grown defect list with ease. Some drives support clearing the physical defect list too, but doing that requires a *true* low-level format to be done afterward. In the case you issue a SCSI FORMAT command, any grown defects (as the drive encounters them) will be "merged" with the physical defect list. When the FORMAT is done, the drive will report 0 grown defects. Again, I can confirm this exact behaviour with our Fujitsu disks at my workplace; it's easy to get a list of the physical and grown defects with SCSI. With ATA/SATA disks it's a different story: It seems vary from vendor to vendor and model to model. The established theory is that the drive has a list of spare LBAs for remappings, which is managed entirely by the drive itself -- and not reported back to the user via SMART or any other means. This happens entirely without user intervention, and (on repetitive errors) might show up as the drive stalling on some I/O or other oddities. These situations are not reported back to the OS either -- it's entirely 100% transparent to the user. When an ATA/SATA disk begins reporting errors back via SMART, or to the OS (e.g. I/O error), on certain LBA accesses, then the theory is that the spare LBA list used by the drive internally has been exhausted, and it will begin using a different spare list (or an extension of the existing spares; I'm not sure). What Diane's getting at (Hi Diane!) is that since the drive is already to the stage/point of reporting errors back to the OS and SMART, it means the drive has experienced problems (which it worked around) prior to this point in time. Hence her recommendation to replace the drive. What I still have a bit of trouble stomaching these days is whether or not the above theories are still used *today* in practise on SATA disks. Part of me is inclined to believe that **any** errors are reported to SMART and the OS, and the remapping is reported via SMART, etc.; e.g. there's no more "transparent" anything. The problem is that I don't have a good way to confirm/deny this. Oh what I'd give for good engineering contacts within Western Digital and Seagate... These days, I replace drives depending upon their age (Power_On_Hours) combined with how many errors are seen and what kind of errors. For example, if I have a drive that's been in operation for 20,000 hours and it now has 2 bad LBAs, I can accept that. If I have a drive that's been in operation for 48 hours and it has 30 errors, that drive is getting RMA'd. When I get new or RMA'd/refurbished drives, I test them before putting them to use. I do a read-only surface scan using SMART ("smartctl -t select,0-max /dev/XXX") and let that finish. Assuming no errors are shown in the selective scan log, I then proceed with a full disk zero ("dd if=/dev/zero of=/dev/XXX bs=64k"). When finished I check SMART for any errors. If there are any, I RMA the drive -- or if it's been RMA'd already, I get angry at the vendor. :-) -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 01:39:22 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 06479106566B for ; Sat, 20 Aug 2011 01:39:22 +0000 (UTC) (envelope-from dan@langille.org) Received: from nyi.unixathome.org (nyi.unixathome.org [64.147.113.42]) by mx1.freebsd.org (Postfix) with ESMTP id CD7218FC0A for ; Sat, 20 Aug 2011 01:39:21 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by nyi.unixathome.org (Postfix) with ESMTP id 17AB250A09; Sat, 20 Aug 2011 01:39:21 +0000 (UTC) X-Virus-Scanned: amavisd-new at unixathome.org Received: from nyi.unixathome.org ([127.0.0.1]) by localhost (nyi.unixathome.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dptRzAcxz9Zj; Sat, 20 Aug 2011 02:39:20 +0100 (BST) Received: from smtp-auth.unixathome.org (smtp-auth.unixathome.org [10.4.7.7]) (Authenticated sender: hidden) by nyi.unixathome.org (Postfix) with ESMTPSA id 9E15A50A06 ; Sat, 20 Aug 2011 01:39:20 +0000 (UTC) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Dan Langille In-Reply-To: <20110819232125.GA4965@icarus.home.lan> Date: Fri, 19 Aug 2011 21:39:17 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org> <20110819232125.GA4965@icarus.home.lan> To: Jeremy Chadwick X-Mailer: Apple Mail (2.1084) Cc: freebsd-stable@freebsd.org Subject: Re: bad sector in gmirror HDD X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 01:39:22 -0000 On Aug 19, 2011, at 7:21 PM, Jeremy Chadwick wrote: > On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote: >> System in question: FreeBSD 8.2-STABLE #3: Thu Mar 3 04:52:04 GMT = 2011 >>=20 >> After a recent power failure, I'm seeing this in my logs: >>=20 >> Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently = unreadable (pending) sectors >=20 > I doubt this is related to a power failure. >=20 >> Searching on that error message, I was led to believe that = identifying the bad sector and >> running dd to read it would cause the HDD to reallocate that bad = block. >>=20 >> http://smartmontools.sourceforge.net/badblockhowto.html >=20 > This is incorrect (meaning you've misunderstood what's written there). >=20 > Unreadable LBAs can be a result of the LBA being actually bad (as in > uncorrectable), or the LBA being marked "suspect". In either case the > LBA will return an I/O error when read. >=20 > If the LBAs are marked "suspect", the drive will perform re-analysis = of > the LBA (to determine if the LBA can be read and the data re-mapped, = or > if it cannot then the LBA is marked uncorrectable) when you **write** = to > the LBA. >=20 > The above smartd output doesn't tell me much. Providing actual SMART > attribute data (smartctl -a) for the drive would help. The brand of = the > drive, the firmware version, and the model all matter -- every drive > behaves a little differently. Information such as this? = http://beta.freebsddiary.org/smart-fixing-bad-sector.php --=20 Dan Langille - http://langille.org From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 01:53:10 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0467D1065670; Sat, 20 Aug 2011 01:53:10 +0000 (UTC) (envelope-from hrs@FreeBSD.org) Received: from mail.allbsd.org (gatekeeper-int.allbsd.org [IPv6:2001:2f0:104:e002::2]) by mx1.freebsd.org (Postfix) with ESMTP id 819D48FC0A; Sat, 20 Aug 2011 01:53:07 +0000 (UTC) Received: from alph.allbsd.org ([IPv6:2001:2f0:104:e010:862b:2bff:febc:8956]) (authenticated bits=128) by mail.allbsd.org (8.14.4/8.14.4) with ESMTP id p7K1qlhV028971; Sat, 20 Aug 2011 10:52:57 +0900 (JST) (envelope-from hrs@FreeBSD.org) Received: from localhost (localhost [IPv6:::1]) (authenticated bits=0) by alph.allbsd.org (8.14.4/8.14.4) with ESMTP id p7K1qivo090479; Sat, 20 Aug 2011 10:52:45 +0900 (JST) (envelope-from hrs@FreeBSD.org) Date: Sat, 20 Aug 2011 10:52:29 +0900 (JST) Message-Id: <20110820.105229.834911491934932780.hrs@allbsd.org> To: attilio@FreeBSD.org From: Hiroki Sato In-Reply-To: References: <20110819003759.GC54831@libertas.local.camdensoftware.com> <4E4E5D49.4040502@sentex.net> X-PGPkey-fingerprint: BDB3 443F A5DD B3D0 A530 FFD7 4F2C D3D8 2793 CF2D X-Mailer: Mew version 6.3 on Emacs 23.1 / Mule 6.0 (HANACHIRUSATO) Mime-Version: 1.0 Content-Type: Multipart/Signed; protocol="application/pgp-signature"; micalg=pgp-sha1; boundary="--Security_Multipart(Sat_Aug_20_10_52_29_2011_674)--" Content-Transfer-Encoding: 7bit X-Virus-Scanned: clamav-milter 0.97 at gatekeeper.allbsd.org X-Virus-Status: Clean X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.3 (mail.allbsd.org [IPv6:2001:2f0:104:e001::32]); Sat, 20 Aug 2011 10:52:57 +0900 (JST) X-Spam-Status: No, score=-104.6 required=13.0 tests=BAYES_00, CONTENT_TYPE_PRESENT, RDNS_NONE, SPF_SOFTFAIL, USER_IN_WHITELIST autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on gatekeeper.allbsd.org Cc: kostikbel@gmail.com, nick@desert.net, freebsd-stable@FreeBSD.org, avg@FreeBSD.org Subject: Re: panic: spin lock held too long (RELENG_8 from today) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 01:53:10 -0000 ----Security_Multipart(Sat_Aug_20_10_52_29_2011_674)-- Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Attilio Rao wrote in : at> If nobody complains about it earlier, I'll propose the patch to re@ in 8 hours. Running fine for 45 hours so far. Please go ahead! -- Hiroki ----Security_Multipart(Sat_Aug_20_10_52_29_2011_674)-- Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEABECAAYFAk5PE10ACgkQTyzT2CeTzy3lWwCfUKrro8MGV4zpxKks9mpTEPZS OfsAoNeFETyjH+4n+IJZdwwF5ITdjNHB =JoJG -----END PGP SIGNATURE----- ----Security_Multipart(Sat_Aug_20_10_52_29_2011_674)---- From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 03:24:41 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4F8D0106566B for ; Sat, 20 Aug 2011 03:24:41 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta05.emeryville.ca.mail.comcast.net (qmta05.emeryville.ca.mail.comcast.net [76.96.30.48]) by mx1.freebsd.org (Postfix) with ESMTP id 350BC8FC08 for ; Sat, 20 Aug 2011 03:24:40 +0000 (UTC) Received: from omta12.emeryville.ca.mail.comcast.net ([76.96.30.44]) by qmta05.emeryville.ca.mail.comcast.net with comcast id NTLZ1h0010x6nqcA5TQc9p; Sat, 20 Aug 2011 03:24:36 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta12.emeryville.ca.mail.comcast.net with comcast id NTQZ1h01A1t3BNj8YTQaet; Sat, 20 Aug 2011 03:24:34 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 8ACEF102C1A; Fri, 19 Aug 2011 20:24:38 -0700 (PDT) Date: Fri, 19 Aug 2011 20:24:38 -0700 From: Jeremy Chadwick To: Dan Langille Message-ID: <20110820032438.GA21925@icarus.home.lan> References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org> <20110819232125.GA4965@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-stable@freebsd.org Subject: Re: bad sector in gmirror HDD X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 03:24:41 -0000 On Fri, Aug 19, 2011 at 09:39:17PM -0400, Dan Langille wrote: > > On Aug 19, 2011, at 7:21 PM, Jeremy Chadwick wrote: > > > On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote: > >> System in question: FreeBSD 8.2-STABLE #3: Thu Mar 3 04:52:04 GMT 2011 > >> > >> After a recent power failure, I'm seeing this in my logs: > >> > >> Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently unreadable (pending) sectors > > > > I doubt this is related to a power failure. > > > >> Searching on that error message, I was led to believe that identifying the bad sector and > >> running dd to read it would cause the HDD to reallocate that bad block. > >> > >> http://smartmontools.sourceforge.net/badblockhowto.html > > > > This is incorrect (meaning you've misunderstood what's written there). > > > > Unreadable LBAs can be a result of the LBA being actually bad (as in > > uncorrectable), or the LBA being marked "suspect". In either case the > > LBA will return an I/O error when read. > > > > If the LBAs are marked "suspect", the drive will perform re-analysis of > > the LBA (to determine if the LBA can be read and the data re-mapped, or > > if it cannot then the LBA is marked uncorrectable) when you **write** to > > the LBA. > > > > The above smartd output doesn't tell me much. Providing actual SMART > > attribute data (smartctl -a) for the drive would help. The brand of the > > drive, the firmware version, and the model all matter -- every drive > > behaves a little differently. > > Information such as this? http://beta.freebsddiary.org/smart-fixing-bad-sector.php Yes, perfect. Thank you. First thing first: upgrade smartmontools to 5.41. Your attributes will be the same after you do this (the drive is already in smartmontools' internal drive DB), but I often have to remind people that they really need to keep smartmontools updated as often as possible. The changes between versions are vast; this is especially important for people with SSDs (I'm responsible for submitting some recent improvements for Intel 320 and 510 SSDs). Anyway, the drive (albeit an old PATA Maxtor) appears to have three anomalies: 1) One confirmed reallocated LBA (SMART attribute 5) 2) One "suspect" LBA (SMART attribute 197) 3) A very high temperature of 51C (SMART attribute 194). If this drive is in an enclosure or in a system with no fans this would be understandable, otherwise this is a bit high. My home workstation which has only one case fan has a drive with more platters than your Maxtor, and it idles at ~38C. Possibly this drive has been undergoing constant I/O recently (which does greatly increase drive temperature)? Not sure. I'm not going to focus too much on this one. The SMART error log also indicates an LBA failure at the 26000 hour mark (which is 16 hours prior to when you did smartctl -a /dev/ad2). Whether that LBA is the remapped one or the suspect one is unknown. The LBA was 5566440. The SMART tests you did didn't really amount to anything; no surprise. short and long tests usually do not test the surface of the disk. There are some drives which do it on a long test, but as I said before, everything varies from drive to drive. Furthermore, on this model of drive, you cannot do a surface scans via SMART. Bummer. That's indicated in the "Offline data collection capabilities" section at the top, where it reads: No Selective Self-test supported. So you'll have to use the dd method. This takes longer than if surface scanning was supported by the drive, but is acceptable. I'll get to how to go about that in a moment. The reallocated LBA cannot be dealt with aside from re-creating the filesystem and telling it not to use the LBA. I see no flags in newfs(8) that indicate a way to specify LBAs to avoid. And we don't know what LBA it is so we can't refer to it right now anyway. As I said previously, I have no idea how UFS/FFS deals with this. Using fsck(8) is not sufficient; fsck does not attempt reading every LBA on the disk or every LBA that makes up the data portions of an inode. It only examines the "structure" of the filesystem. Is it possible the remapped LBA lived within a structure region and not data? Yes. Is it likely? Given the size of the disk, probably not. As mentioned previously too, there's badsect(8) but I don't know if it works correctly on present-day FreeBSD, if it works with larger drives, on 64-bit, etc... You get the idea. Plus as I said I don't know what LBA to tell it to avoid. You also need to keep something in mind: the terms "sector" and "LBA" are in some ways interchangeable and in other ways aren't. I use the term LBA because nobody in their right mind uses CHS addressing any more. badsect(8) claims it wants sectors, which I want to assume are LBAs. I hope someone familiar with UFS/FFS can explain how to go about this process for UFS/FFS. As for ZFS (because I know someone will ask) -- AFAIK there is no mechanism to deal with excluding certain LBAs from use. The attitude is that disks are cheap, if you see errors replace the disk. I agree with this attitude. You can "deal with" the error with ZFS if the pool consists of a mirror or raidzN, but you'll never be able to rid yourself of seeing R/W/CKSUM errors or possible I/O timeouts when accessing those LBAs. That's just how it goes. Anyway -- as for the "suspect" LBA -- we can absolutely determine what this one is and submit a write request to it to see if it turns out to be bad (uncorrectable) or if it's remappable. If remapped, see above explanation. Below is a script I wrote for scanning disks with dd. See script comments for how to use it. Quite simple. Things to note about the script because I'm 100% certain people will get all spun up about it: 1) It assumes 512-byte LBAs. Using this on an SSD or a 4KB-sector drive is probably not wise. 2) It's slow ("unintelligent"). This is by choice -- I wanted to keep it simple. It reads 512 bytes at a time, rather than larger chunks (e.g. 64k) and then "work down" to a smaller size when it encounters a read error to determine what LBA is responsible. I wanted something that "just worked" and wasn't fancy. There may be alternate utilities out there which do this (dd_rescue?). 3) I needed something that worked on Solaris and FreeBSD regardless of disk type. We use PATA, SATA, and SCSI disks at my workplace, and smartmontools really needs a rehaul for SATA on Solaris; so, shell scripting for the win. 4) I needed something that didn't depend on third-party tools I had to compile or deal with (see #5 though). 5) The hashbang refers to bash, though there aren't "bash-isms" in the script. The reason for this is Solaris; /bin/sh there is a non-evolved travesty that I loathe, so I write everything using /usr/local/bin/bash. You could, on FreeBSD, change this to /bin/sh and it should just work. That said: http://jdc.parodius.com/freebsd/bad_block_scan If you run this on your ad2 drive, I'm hoping what you'll find are two LBAs which can't be read -- one will be the remapped LBA and one will be the "suspect" LBA. If you only get one LBA error then that's fine too, and will be the "suspect" LBA. Once you have the LBA(s), you can submit writes to them to get the drive to re-analyse them (assuming they're "suspect"): dd if=/dev/zero of=/dev/XXX bs=512 count=1 seek=NNNNN Where XXX is the device and NNNNN is the LBA number. If this works properly, the dd command should sit there for a little bit (as the drive does its re-analysis magic) and then should complete. You'll want to check SMART stats after that; you should see Current_Pending_Sector drop to 0. If Offline_Uncorrectable incremented then the LBA could not be re-read/remapped. If Reallocated_Sector_Ct incremented then you now have a total of 2 LBAs which are remapped. In the case of remapping, you get to deal with the UFS/FFS thing above. To get the stats to update in this situation you *might* (but probably not) have to run "smartctl -t offline /dev/XXX". You might also be wondering "that dd command writes 512 bytes of zero to that LBA; what about the old data that was there, in the case that the drive remaps the LBA?" This is a great question, and one I've never actually taken the time to answer because at this present time I have absolutely *no* bad disks in my possession. I'm under the impression that the write does in fact write zeros if the LBA is remapped, but that might not be true at all. I've been waiting to test this for quite some time and document it/write about it. I still suggest you replace the drive, although given its age I doubt you'll be able to find a suitable replacement. I tend to keep disks like this around for testing/experimental purposes and not for actual use. Good luck! -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 03:47:38 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A1C7C106564A for ; Sat, 20 Aug 2011 03:47:38 +0000 (UTC) (envelope-from wblock@wonkity.com) Received: from wonkity.com (wonkity.com [67.158.26.137]) by mx1.freebsd.org (Postfix) with ESMTP id 5E0ED8FC13 for ; Sat, 20 Aug 2011 03:47:38 +0000 (UTC) Received: from wonkity.com (localhost [127.0.0.1]) by wonkity.com (8.14.5/8.14.5) with ESMTP id p7K3lbQ3082870; Fri, 19 Aug 2011 21:47:37 -0600 (MDT) (envelope-from wblock@wonkity.com) Received: from localhost (wblock@localhost) by wonkity.com (8.14.5/8.14.5/Submit) with ESMTP id p7K3lbws082867; Fri, 19 Aug 2011 21:47:37 -0600 (MDT) (envelope-from wblock@wonkity.com) Date: Fri, 19 Aug 2011 21:47:37 -0600 (MDT) From: Warren Block To: Chuck Swiger In-Reply-To: <65474D95-F56F-4DC7-8029-BA7166C4E46F@mac.com> Message-ID: References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org> <65474D95-F56F-4DC7-8029-BA7166C4E46F@mac.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (wonkity.com [127.0.0.1]); Fri, 19 Aug 2011 21:47:37 -0600 (MDT) Cc: freebsd-stable@freebsd.org, Dan Langille Subject: Re: bad sector in gmirror HDD X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 03:47:38 -0000 On Fri, 19 Aug 2011, Chuck Swiger wrote: > Reading the underlying failing drive with dd will help identify any > other questionable sectors. However, your drive temps are too high-- > many vendors call out either 50C or 55C as the point where drive > reliability becomes significantly degraded. The high temperature could be due to impending drive failure. I've seen that exact situation with a failing WD notebook drive. Lots of read failures, and it got very hot. The same model replacement drive ran normally, just warm. From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 06:43:38 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7A530106566C for ; Sat, 20 Aug 2011 06:43:38 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230]) by mx1.freebsd.org (Postfix) with ESMTP id 0501A8FC13 for ; Sat, 20 Aug 2011 06:43:37 +0000 (UTC) Received: from digsys236-136.pip.digsys.bg (digsys236-136.pip.digsys.bg [193.68.136.236]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.4/8.14.4) with ESMTP id p7K6hO2n009616 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Sat, 20 Aug 2011 09:43:30 +0300 (EEST) (envelope-from daniel@digsys.bg) Mime-Version: 1.0 (Apple Message framework v1244.3) Content-Type: text/plain; charset=us-ascii From: Daniel Kalchev In-Reply-To: <20110820032438.GA21925@icarus.home.lan> Date: Sat, 20 Aug 2011 09:43:23 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: <65623662-0232-4599-B633-6D207A4CF15A@digsys.bg> References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org> <20110819232125.GA4965@icarus.home.lan> <20110820032438.GA21925@icarus.home.lan> To: Jeremy Chadwick X-Mailer: Apple Mail (2.1244.3) Cc: freebsd-stable@freebsd.org, Dan Langille Subject: Re: bad sector in gmirror HDD X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 06:43:38 -0000 On Aug 20, 2011, at 06:24 , Jeremy Chadwick wrote: > You might also be wondering "that dd command writes 512 bytes of zero = to > that LBA; what about the old data that was there, in the case that the > drive remaps the LBA?" If you write zeros at OS level to an LBA, you will end up with zeros at = that LBA. What else did you expect??? The already remapped LBAs in ATA are not visible anymore to the user/OS. = You get a perfectly readable sector. Of course not at the original = location, but as you confirmed we are done with CHS addressing. The pending bad sectors are almost always 'corrected', that is, remapped = when you write to that LBA. So your script will find only one readable sector and that will be the = sector that is pending reallocation. It may be that writing zeros to all free space, like dd if=3D/dev/zero of=3D/filesystem/zero bs=3D1m; rm /filesystem/zero is enough to remap the pending bad block and not have any unreadable = sectors. But if the unreadable sector is in a file or directory -- bad = luck -- these will need to be rewritten. Once upon a time, BSD/OS had wonderful disk 'repair' utility. It could = detect failing disks by reading every sector (had nice visual), or could = re-write the drive by reading and writing back every sector. On bad = blocks it would retry lots of times and eventually average what was read = (with error). Having said that, I doubt modern ATA drives will let anything be read by = the pending bad block, but.. who knows. Daniel From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 10:02:32 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 03FB9106564A; Sat, 20 Aug 2011 10:02:32 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 157338FC08; Sat, 20 Aug 2011 10:02:30 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id NAA10374; Sat, 20 Aug 2011 13:02:27 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1QuiNn-000NrY-IE; Sat, 20 Aug 2011 13:02:27 +0300 Message-ID: <4E4F8631.1070300@FreeBSD.org> Date: Sat, 20 Aug 2011 13:02:25 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:6.0) Gecko/20110819 Thunderbird/6.0 MIME-Version: 1.0 To: Steven Hartland References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><4E43E272.1060204@FreeBSD.org><62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk><4E440865.1040500@FreeBSD.org><6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk><4E441314.6060606@FreeBSD.org><2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk><4E48D967.9060804@FreeBSD.org><9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk><4E490DAF.1080009@FreeBSD.org><796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk><4E491D01.1090902@FreeBSD.org><570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk><4E4AD35C.7020504@FreeBSD.org><6A7238AED44542A880B082A40304D940@multiplay.co.uk><4E4BA21F.6010805@FreeBSD.org><581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk><4E4BBA7F.30907@FreeBSD.org><88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk> <4E4C22D6.6070407@FreeBSD.org> <4019027648B5493AAC4B654BD821DE88@multiplay.co.! uk> In-Reply-To: <4019027648B5493AAC4B654BD821DE88@multiplay.co.uk> X-Enigmail-Version: undefined Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 10:02:32 -0000 on 18/08/2011 02:15 Steven Hartland said the following: > In a nutshell the jail manager we're using will attempt to resurrect the jail > from a dieing state in a few specific scenarios. > > Here's an exmaple:- > 1. jail restart requested > 2. jail is stopped, so the java processes is killed off, but active tcp sessions > may prevent the timely full shutdown of the jail. > 3. if an existing jail is detected, i.e. a dieing jail from #2, instead of > starting a new jail we attach to the old one and exec the new java process. > 4. if an existing jail isnt detected, i.e. where there where not hanging tcp > sessions and #2 cleanly shutdown the jail, a new jail is created, attached to > and the java exec'ed. > > The system uses static jailid's so its possible to determine if an existing > jail for this "service" exists or not. This prevents duplicate services as > well as making services easy to identify by their jailid. > > So what we could be seeing is a race between the jail shutdown and the attach > of the new process? Not a jail expert at all, but a few suggestions... First, wouldn't the 'persist' jail option simplify your life a little bit? Second, you may want to try to monitor value of prison0.pr_uref variable (e.g. via kgdb) while executing various scenarios of what you do now. If after finishing a certain scenario you end up with a value lower than at the start of scenario, then this is the troublesome one. Please note that prison0.pr_uref is composed from a number of non-jailed processes plus a number of top-level jails. So take this into account when comparing prison0.pr_uref values - it's better to record the initial value when no jails are started and it's important to keep the number of non-jailed processes the same (or to account for its changes). -- Andriy Gapon From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 10:10:55 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9C3F0106564A; Sat, 20 Aug 2011 10:10:55 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 91EF98FC17; Sat, 20 Aug 2011 10:10:44 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id NAA10423; Sat, 20 Aug 2011 13:10:42 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1QuiVl-000Nrj-VF; Sat, 20 Aug 2011 13:10:41 +0300 Message-ID: <4E4F8821.80108@FreeBSD.org> Date: Sat, 20 Aug 2011 13:10:41 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:6.0) Gecko/20110819 Thunderbird/6.0 MIME-Version: 1.0 To: Steven Hartland References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><4E43E272.1060204@FreeBSD.org><62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk><4E440865.1040500@FreeBSD.org><6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk><4E441314.6060606@FreeBSD.org><2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk><4E48D967.9060804@FreeBSD.org><9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk><4E490DAF.1080009@FreeBSD.org><796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk><4E491D01.1090902@FreeBSD.org><570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk><4E4AD35C.7020504@FreeBSD.org><6A7238AED44542A880B082A40304D940@multiplay.co.uk><4E4BA21F.6010805@FreeBSD.org><581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk><4E4BBA7F.30907@FreeBSD.org><88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk> <4E4C22D6.6070407@FreeBSD.org> <4019027648B5493AAC4B654BD821DE88@multiplay.co.uk> <4E4F8631.1070300@FreeBSD.org> In-Reply-To: <4E4F8631.1070300@FreeBSD.org> X-Enigmail-Version: undefined Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 10:10:55 -0000 on 20/08/2011 13:02 Andriy Gapon said the following: > on 18/08/2011 02:15 Steven Hartland said the following: >> In a nutshell the jail manager we're using will attempt to resurrect the jail >> from a dieing state in a few specific scenarios. >> >> Here's an exmaple:- >> 1. jail restart requested >> 2. jail is stopped, so the java processes is killed off, but active tcp sessions >> may prevent the timely full shutdown of the jail. >> 3. if an existing jail is detected, i.e. a dieing jail from #2, instead of >> starting a new jail we attach to the old one and exec the new java process. >> 4. if an existing jail isnt detected, i.e. where there where not hanging tcp >> sessions and #2 cleanly shutdown the jail, a new jail is created, attached to >> and the java exec'ed. >> >> The system uses static jailid's so its possible to determine if an existing >> jail for this "service" exists or not. This prevents duplicate services as >> well as making services easy to identify by their jailid. >> >> So what we could be seeing is a race between the jail shutdown and the attach >> of the new process? > > Not a jail expert at all, but a few suggestions... > > First, wouldn't the 'persist' jail option simplify your life a little bit? > > Second, you may want to try to monitor value of prison0.pr_uref variable (e.g. > via kgdb) while executing various scenarios of what you do now. If after > finishing a certain scenario you end up with a value lower than at the start of > scenario, then this is the troublesome one. > Please note that prison0.pr_uref is composed from a number of non-jailed > processes plus a number of top-level jails. So take this into account when > comparing prison0.pr_uref values - it's better to record the initial value when > no jails are started and it's important to keep the number of non-jailed > processes the same (or to account for its changes). BTW, I suspect the following scenario, but I am not able to verify it either via testing or in the code: - last process in a dying jail exits - pr_uref of the jail reaches zero - pr_uref of prison0 gets decremented - you attach to the jail and resurrect it - but pr_uref of prison0 stays decremented Repeat this enough times and prison0.pr_uref reaches zero. To reach zero even sooner just kill enough of non-jailed processes. -- Andriy Gapon From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 11:15:15 2011 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4E9A2106566B for ; Sat, 20 Aug 2011 11:15:15 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from mail.digiware.nl (mail.ip6.digiware.nl [IPv6:2001:4cb8:1:106::2]) by mx1.freebsd.org (Postfix) with ESMTP id E189B8FC12 for ; Sat, 20 Aug 2011 11:15:14 +0000 (UTC) Received: from rack1.digiware.nl (localhost.digiware.nl [127.0.0.1]) by mail.digiware.nl (Postfix) with ESMTP id EBC7615346B for ; Sat, 20 Aug 2011 13:15:13 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.nl Received: from mail.digiware.nl ([127.0.0.1]) by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zrnB3hqJ4huR for ; Sat, 20 Aug 2011 13:15:12 +0200 (CEST) Received: from [IPv6:2001:4cb8:3:1:1d6a:c449:c682:7195] (unknown [IPv6:2001:4cb8:3:1:1d6a:c449:c682:7195]) by mail.digiware.nl (Postfix) with ESMTP id F2A33153433 for ; Sat, 20 Aug 2011 13:15:11 +0200 (CEST) Message-ID: <4E4F973D.9070706@digiware.nl> Date: Sat, 20 Aug 2011 13:15:09 +0200 From: Willem Jan Withagen User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0) Gecko/20110812 Thunderbird/6.0 MIME-Version: 1.0 To: "stable@freebsd.org" Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: Remote installing X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 11:15:15 -0000 Hi, Today I liked to live dangerously, and want to upgrade a backups server from i386 to amd64. Just to see if we could. And otherwise I'd scap it and install from usb-stick. So I have my server running amd64 build GENERIC. export /, /var, /usr on the server to be upgraded. But upgrading world dus have a snag already early on: ---- empty changed flags expected "schg" found "none" not modified: Operation not supported ---- This is probably where some program wants to set immutable flag on /var/tmp/empy... But looks like NFS does not grok that. Now I seen plenty of sugestions to do it this way, but never saw anybody come back with this complaint.... So I must be ommiting something ?? --WjW From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 11:26:34 2011 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E6941106566C for ; Sat, 20 Aug 2011 11:26:34 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from mail.digiware.nl (mail.ip6.digiware.nl [IPv6:2001:4cb8:1:106::2]) by mx1.freebsd.org (Postfix) with ESMTP id A8DC28FC0A for ; Sat, 20 Aug 2011 11:26:34 +0000 (UTC) Received: from rack1.digiware.nl (localhost.digiware.nl [127.0.0.1]) by mail.digiware.nl (Postfix) with ESMTP id 9A04A153434 for ; Sat, 20 Aug 2011 13:26:33 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.nl Received: from mail.digiware.nl ([127.0.0.1]) by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id A4Y-3r82JPf1 for ; Sat, 20 Aug 2011 13:26:31 +0200 (CEST) Received: from [IPv6:2001:4cb8:3:1:1d6a:c449:c682:7195] (unknown [IPv6:2001:4cb8:3:1:1d6a:c449:c682:7195]) by mail.digiware.nl (Postfix) with ESMTP id AEA2E153433 for ; Sat, 20 Aug 2011 13:26:31 +0200 (CEST) Message-ID: <4E4F99E4.8060009@digiware.nl> Date: Sat, 20 Aug 2011 13:26:28 +0200 From: Willem Jan Withagen User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0) Gecko/20110812 Thunderbird/6.0 MIME-Version: 1.0 To: "stable@freebsd.org" References: <4E4F973D.9070706@digiware.nl> In-Reply-To: <4E4F973D.9070706@digiware.nl> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: Re: Remote installing X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 11:26:35 -0000 On 2011-08-20 13:15, Willem Jan Withagen wrote: > Hi, > > Today I liked to live dangerously, and want to upgrade a backups server > from i386 to amd64. Just to see if we could. > And otherwise I'd scap it and install from usb-stick. > > So I have my server running amd64 build GENERIC. > export /, /var, /usr on the server to be upgraded. > > But upgrading world dus have a snag already early on: > > ---- > empty changed > flags expected "schg" found "none" not modified: Operation not supported > ---- > > This is probably where some program wants to set immutable flag on > /var/tmp/empy... > > But looks like NFS does not grok that. > > Now I seen plenty of sugestions to do it this way, but never saw anybody > come back with this complaint.... > > So I must be ommiting something ?? I looked at the work errors. ----------- cd /mnt/; rm -f /mnt/sys; ln -s usr/src/sys sys cd /mnt/usr/share/man/en.ISO8859-1; ln -sf ../man* . ln: ./man1: Permission denied ln: ./man1aout: Permission denied ln: ./man2: Permission denied ln: ./man3: Permission denied ln: ./man4: Permission denied ln: ./man5: Permission denied ln: ./man6: Permission denied ln: ./man7: Permission denied ln: ./man8: Permission denied ln: ./man9: Permission denied --------- Which comes from the target distrib-dirs in etc Why would an ln -sf like that fail.... the filesystems are exported with -maproot=0 --WjW From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 13:24:51 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 05984106566B; Sat, 20 Aug 2011 13:24:51 +0000 (UTC) (envelope-from prvs=12137168ef=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 1125A8FC0C; Sat, 20 Aug 2011 13:24:49 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Sat, 20 Aug 2011 14:13:34 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Sat, 20 Aug 2011 14:13:34 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014672793.msg; Sat, 20 Aug 2011 14:13:34 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=12137168ef=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <4E55CB4A4F694A7997FEBDF9EADF87F5@multiplay.co.uk> From: "Steven Hartland" To: "Andriy Gapon" References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><4E43E272.1060204@FreeBSD.org><62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk><4E440865.1040500@FreeBSD.org><6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk><4E441314.6060606@FreeBSD.org><2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk><4E48D967.9060804@FreeBSD.org><9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk><4E490DAF.1080009@FreeBSD.org><796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk><4E491D01.1090902@FreeBSD.org><570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk><4E4AD35C.7020504@FreeBSD.org><6A7238AED44542A880B082A40304D940@multiplay.co.uk><4E4BA21F.6010805@FreeBSD.org><581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk><4E4BBA7F.30907@FreeBSD.org><88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk> <4E4C22D6.6070407@FreeBSD.org> <4019027648B5493AAC4B654BD821DE88@multiplay.co.uk> <4E4F8631.1070300@FreeBSD.org> <4E4F8821.80108@ FreeBSD.org> Date: Sat, 20 Aug 2011 14:14:15 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 13:24:51 -0000 ----- Original Message ----- From: "Andriy Gapon" > BTW, I suspect the following scenario, but I am not able to > verify it either via testing or in the code: > - last process in a dying jail exits > - pr_uref of the jail reaches zero > - pr_uref of prison0 gets decremented > - you attach to the jail and resurrect it > - but pr_uref of prison0 stays decremented > > Repeat this enough times and prison0.pr_uref reaches zero. > To reach zero even sooner just kill enough of non-jailed processes. Ahh now that explains all of our experienced panic scenarios:- 1. jail stop / start causing the panic but only after at least a few days worth of uptime. Here what we're seeing is enough "leak" of pr_uref from the restarted jails to decrement prison0.pr_uref to 0 even with all the standard unjailed processes still running. 2. A machine reboot, after all jails have been stopped but after less time than #2. In this case we haven't seen enough leakage to decrement prison0.pr_uref to 0 given the number or prison0 process but it has been incorrectly decremented, so as soon as the reboot kicks in and prison0 processes start exiting, prison0.pr_uref gets further decremented and again hits 0 when it shouldn't Now if this is the case, we should be able to confirm it with a little more info. 1. What exactly does pr_uref represent? 2. Can what its value should be, be calculated from examining other details of the system i.e. number of running processes, number of running jails? If we can calculate the value that prison0.pr_uref should be, then by examining the machines we have which have been up for a while, we should be able to confirm if an incorrect value is present on them and hence prove this is the case. Ideally a little script to run in kgdb to test this would be the best way to go. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 13:37:57 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DC2571065672; Sat, 20 Aug 2011 13:37:57 +0000 (UTC) (envelope-from hselasky@c2i.net) Received: from swip.net (mailfe01.c2i.net [212.247.154.2]) by mx1.freebsd.org (Postfix) with ESMTP id 32A6F8FC19; Sat, 20 Aug 2011 13:37:56 +0000 (UTC) X-Cloudmark-Score: 0.000000 [] X-Cloudmark-Analysis: v=1.1 cv=ELIg/Y9mGCPhWMyRcSlygtjWSLZJE4Mi+f/g6oC4Nzw= c=1 sm=1 a=SvYTsOw2Z4kA:10 a=EPV5yV1zpIAA:10 a=WQU8e4WWZSUA:10 a=8nJEP1OIZ-IA:10 a=CL8lFSKtTFcA:10 a=i9M/sDlu2rpZ9XS819oYzg==:17 a=_baaTq9UrmHnDPmMK9MA:9 a=wPNLvfGTeEIA:10 a=i9M/sDlu2rpZ9XS819oYzg==:117 Received: from [188.126.198.129] (account mc467741@c2i.net HELO laptop002.hselasky.homeunix.org) by mailfe01.swip.net (CommuniGate Pro SMTP 5.2.19) with ESMTPA id 169096590; Sat, 20 Aug 2011 15:37:52 +0200 From: Hans Petter Selasky To: Andriy Gapon Date: Sat, 20 Aug 2011 15:35:24 +0200 User-Agent: KMail/1.13.5 (FreeBSD/8.2-STABLE; KDE/4.4.5; amd64; ; ) References: <201108182324.58276.hselasky@c2i.net> <4E4E900D.8010506@FreeBSD.org> In-Reply-To: <4E4E900D.8010506@FreeBSD.org> X-Face: *nPdTl_}RuAI6^PVpA02T?$%Xa^>@hE0uyUIoiha$pC:9TVgl.Oq, NwSZ4V"|LR.+tj}g5 %V,x^qOs~mnU3]Gn; cQLv&.N>TrxmSFf+p6(30a/{)KUU!s}w\IhQBj}[g}bj0I3^glmC( :AuzV9:.hESm-x4h240C`9=w MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201108201535.24061.hselasky@c2i.net> Cc: freebsd-stable@freebsd.org Subject: Re: USB/coredump hangs in 8 and 9 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 13:37:57 -0000 On Friday 19 August 2011 18:32:13 Andriy Gapon wrote: > on 19/08/2011 00:24 Hans Petter Selasky said the following: > > On Thursday 18 August 2011 19:04:10 Andriy Gapon wrote: > >> If you can help Hans to figure out what you is wrong with USB subsystem > >> in this respect that would help us all. > > > > Hi, > > > > usb_busdma.c: /* we use "mtx_owned()" instead of this function */ > > usb_busdma.c: owned = mtx_owned(uptag->mtx); > > usb_compat_linux.c: do_unlock = mtx_owned(&Giant) ? 0 : 1; > > usb_compat_linux.c: do_unlock = mtx_owned(&Giant) ? 0 : 1; > > usb_compat_linux.c: do_unlock = mtx_owned(&Giant) ? 0 : 1; > > usb_hub.c: if (mtx_owned(&bus->bus_mtx)) { > > usb_transfer.c: if (!mtx_owned(info->xfer_mtx)) { > > usb_transfer.c: if (mtx_owned(xfer->xroot->xfer_mtx)) { > > usb_transfer.c: while (mtx_owned(&xroot->udev->bus->bus_mtx)) { > > usb_transfer.c: while (mtx_owned(xroot->xfer_mtx)) { > > > One fix you will need to do, if mtx_owned is not giving correct value is: > First, could you please clarify what is the correct, or rather - expected, > value in this case. It's not immediately clear to me if we should > consider all locks as owned or un-owned in a situation where all locks are > actually skipped behind the scenes. > Maybe USB code should explicitly check for that condition as to not make > any unsafe assumptions. > > Second, it's not clear to me what the above list actually represents in the > context of this discussion. Hi, The mtx_owned() is not only used to assert mutex ownership, but also to figure out which context the function is being called from. If the correct mutex is not locked already we postpone the work until later. In the panic case, there is no way to postpone work, so this check should be skipped in case of panic, because there is no other thread to put work to. --HPS From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 15:51:03 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 10E9E106566C; Sat, 20 Aug 2011 15:51:03 +0000 (UTC) (envelope-from prvs=12137168ef=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 2364E8FC15; Sat, 20 Aug 2011 15:51:01 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Sat, 20 Aug 2011 16:50:27 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Sat, 20 Aug 2011 16:50:27 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014673864.msg; Sat, 20 Aug 2011 16:50:26 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=12137168ef=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <82E865FBA30747078AF6EE3C1701F973@multiplay.co.uk> From: "Steven Hartland" To: "Andriy Gapon" References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><4E43E272.1060204@FreeBSD.org><62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk><4E440865.1040500@FreeBSD.org><6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk><4E441314.6060606@FreeBSD.org><2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk><4E48D967.9060804@FreeBSD.org><9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk><4E490DAF.1080009@FreeBSD.org><796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk><4E491D01.1090902@FreeBSD.org><570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk><4E4AD35C.7020504@FreeBSD.org><6A7238AED44542A880B082A40304D940@multiplay.co.uk><4E4BA21F.6010805@FreeBSD.org><581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk><4E4BBA7F.30907@FreeBSD.org><88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk><4E4C22D6.6070407@FreeBSD.org><4019027648B5493AAC4B654BD821DE88@multiplay.co.uk><4E4F8631.1070300@FreeBSD.org> <4E4F8821.80108@Fre eBSD.org> Date: Sat, 20 Aug 2011 16:51:50 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 15:51:03 -0000 ----- Original Message ----- From: "Andriy Gapon" > BTW, I suspect the following scenario, but I am not able to verify it either via > testing or in the code: > - last process in a dying jail exits > - pr_uref of the jail reaches zero > - pr_uref of prison0 gets decremented > - you attach to the jail and resurrect it > - but pr_uref of prison0 stays decremented > > Repeat this enough times and prison0.pr_uref reaches zero. > To reach zero even sooner just kill enough of non-jailed processes. I've just checked across a number of the panic dumps from the past few days and they all have prison0.pr_uref = 0 which confirms the cause of the panic. I've tried scripting continuous jail start stops, but even after 1000's of iterations have been unable to trigger this on my test machine, so I'm going to dig into the jail code to see if I can find out how its incorrectly decrementing prison0 via inspection. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 16:46:06 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C76F2106566C for ; Sat, 20 Aug 2011 16:46:06 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 1C5138FC12 for ; Sat, 20 Aug 2011 16:46:05 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA12920; Sat, 20 Aug 2011 19:46:00 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1QuogK-000O3S-6U; Sat, 20 Aug 2011 19:46:00 +0300 Message-ID: <4E4FE4C5.9030305@FreeBSD.org> Date: Sat, 20 Aug 2011 19:45:57 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:6.0) Gecko/20110819 Thunderbird/6.0 MIME-Version: 1.0 To: Hans Petter Selasky References: <201108182324.58276.hselasky@c2i.net> <4E4E900D.8010506@FreeBSD.org> <201108201535.24061.hselasky@c2i.net> In-Reply-To: <201108201535.24061.hselasky@c2i.net> X-Enigmail-Version: undefined Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-stable@FreeBSD.org Subject: Re: USB/coredump hangs in 8 and 9 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 16:46:06 -0000 on 20/08/2011 16:35 Hans Petter Selasky said the following: > On Friday 19 August 2011 18:32:13 Andriy Gapon wrote: >> on 19/08/2011 00:24 Hans Petter Selasky said the following: >>> On Thursday 18 August 2011 19:04:10 Andriy Gapon wrote: >>>> If you can help Hans to figure out what you is wrong with USB subsystem >>>> in this respect that would help us all. >>> >>> Hi, >>> >>> usb_busdma.c: /* we use "mtx_owned()" instead of this function */ >>> usb_busdma.c: owned = mtx_owned(uptag->mtx); >>> usb_compat_linux.c: do_unlock = mtx_owned(&Giant) ? 0 : 1; >>> usb_compat_linux.c: do_unlock = mtx_owned(&Giant) ? 0 : 1; >>> usb_compat_linux.c: do_unlock = mtx_owned(&Giant) ? 0 : 1; >>> usb_hub.c: if (mtx_owned(&bus->bus_mtx)) { >>> usb_transfer.c: if (!mtx_owned(info->xfer_mtx)) { >>> usb_transfer.c: if (mtx_owned(xfer->xroot->xfer_mtx)) { >>> usb_transfer.c: while (mtx_owned(&xroot->udev->bus->bus_mtx)) { >>> usb_transfer.c: while (mtx_owned(xroot->xfer_mtx)) { >> >>> One fix you will need to do, if mtx_owned is not giving correct value is: >> First, could you please clarify what is the correct, or rather - expected, >> value in this case. It's not immediately clear to me if we should >> consider all locks as owned or un-owned in a situation where all locks are >> actually skipped behind the scenes. >> Maybe USB code should explicitly check for that condition as to not make >> any unsafe assumptions. >> >> Second, it's not clear to me what the above list actually represents in the >> context of this discussion. > > Hi, > > The mtx_owned() is not only used to assert mutex ownership, but also to figure > out which context the function is being called from. If the correct mutex is > not locked already we postpone the work until later. In the panic case, there > is no way to postpone work, so this check should be skipped in case of panic, > because there is no other thread to put work to. Now I see, but still I can not make the conclusions... So what would you suggest - should USB code explicitly check for panicstr (or SCHEDULER_STOPPED in the future)? Or what mutex_owned should return - true or false? -- Andriy Gapon From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 16:48:30 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E35821065677; Sat, 20 Aug 2011 16:48:30 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id EE0C48FC1C; Sat, 20 Aug 2011 16:48:29 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA12940; Sat, 20 Aug 2011 19:48:27 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1Quoig-000O3V-LU; Sat, 20 Aug 2011 19:48:26 +0300 Message-ID: <4E4FE55A.9000101@FreeBSD.org> Date: Sat, 20 Aug 2011 19:48:26 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:6.0) Gecko/20110819 Thunderbird/6.0 MIME-Version: 1.0 To: Steven Hartland References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E43E272.1060204@FreeBSD.org><62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk><4E440865.1040500@FreeBSD.org><6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk><4E441314.6060606@FreeBSD.org><2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk><4E48D967.9060804@FreeBSD.org><9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk><4E490DAF.1080009@FreeBSD.org><796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk><4E491D01.1090902@FreeBSD.org><570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk><4E4AD35C.7020504@FreeBSD.org><6A7238AED44542A880B082A40304D940@multiplay.co.uk><4E4BA21F.6010805@FreeBSD.org><581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk><4E4BBA7F.30907@FreeBSD.org><88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk><4E4C22D6.6070407@FreeBSD.org><4019027648B5493AAC4B654BD821DE88@multiplay.co.uk><4E4F8631.1070300@FreeBSD.org> <4E4F8821.80108@Fre eBSD.org> <82E865FBA30747078AF6EE3C1701F973@multiplay.co.uk> In-Reply-To: <82E865FBA30747078AF6EE3C1701F973@multiplay.co.uk> X-Enigmail-Version: undefined Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 16:48:31 -0000 on 20/08/2011 18:51 Steven Hartland said the following: > ----- Original Message ----- From: "Andriy Gapon" > >> BTW, I suspect the following scenario, but I am not able to verify it either via >> testing or in the code: >> - last process in a dying jail exits >> - pr_uref of the jail reaches zero >> - pr_uref of prison0 gets decremented >> - you attach to the jail and resurrect it >> - but pr_uref of prison0 stays decremented >> >> Repeat this enough times and prison0.pr_uref reaches zero. >> To reach zero even sooner just kill enough of non-jailed processes. > > I've just checked across a number of the panic dumps from the > past few days and they all have prison0.pr_uref = 0 which confirms > the cause of the panic. > > I've tried scripting continuous jail start stops, but even after 1000's > of iterations have been unable to trigger this on my test machine, so > I'm going to dig into the jail code to see if I can find out how its > incorrectly decrementing prison0 via inspection. Steve, thanks for doing this! I'll reiterate my suspicion just in case - I think that you should look for the cases where you stop a jail, but then re-attach and resurrect the jail before it's completely dead. -- Andriy Gapon From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 16:56:53 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 77D68106564A; Sat, 20 Aug 2011 16:56:53 +0000 (UTC) (envelope-from hselasky@c2i.net) Received: from swip.net (mailfe03.c2i.net [212.247.154.66]) by mx1.freebsd.org (Postfix) with ESMTP id CD8EE8FC1C; Sat, 20 Aug 2011 16:56:52 +0000 (UTC) X-Cloudmark-Score: 0.000000 [] X-Cloudmark-Analysis: v=1.1 cv=Ic1eHMOXbQHcCvhs/sz3xt2crOpE4ZQ8e7+3c6x+FwY= c=1 sm=1 a=SvYTsOw2Z4kA:10 a=EPV5yV1zpIAA:10 a=WQU8e4WWZSUA:10 a=8nJEP1OIZ-IA:10 a=CL8lFSKtTFcA:10 a=i9M/sDlu2rpZ9XS819oYzg==:17 a=SkclzYauFElIErlVZ5wA:9 a=wPNLvfGTeEIA:10 a=i9M/sDlu2rpZ9XS819oYzg==:117 Received: from [188.126.198.129] (account mc467741@c2i.net HELO laptop002.hselasky.homeunix.org) by mailfe03.swip.net (CommuniGate Pro SMTP 5.2.19) with ESMTPA id 804488; Sat, 20 Aug 2011 18:56:49 +0200 From: Hans Petter Selasky To: Andriy Gapon Date: Sat, 20 Aug 2011 18:54:21 +0200 User-Agent: KMail/1.13.5 (FreeBSD/8.2-STABLE; KDE/4.4.5; amd64; ; ) References: <201108201535.24061.hselasky@c2i.net> <4E4FE4C5.9030305@FreeBSD.org> In-Reply-To: <4E4FE4C5.9030305@FreeBSD.org> X-Face: *nPdTl_}RuAI6^PVpA02T?$%Xa^>@hE0uyUIoiha$pC:9TVgl.Oq, NwSZ4V"|LR.+tj}g5 %V,x^qOs~mnU3]Gn; cQLv&.N>TrxmSFf+p6(30a/{)KUU!s}w\IhQBj}[g}bj0I3^glmC( :AuzV9:.hESm-x4h240C`9=w MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201108201854.21180.hselasky@c2i.net> Cc: freebsd-stable@freebsd.org Subject: Re: USB/coredump hangs in 8 and 9 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 16:56:53 -0000 On Saturday 20 August 2011 18:45:57 Andriy Gapon wrote: > SCHEDULER_STOPPED The USB code needs to check for the SCHEDULER_STOPPED and cold at the present moment. If this state can be set during bootup, and cleared at the same time like "cold", it would be very good. --HPS From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 17:09:09 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 487091065670 for ; Sat, 20 Aug 2011 17:09:09 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 8FD658FC08 for ; Sat, 20 Aug 2011 17:09:08 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id UAA13081; Sat, 20 Aug 2011 20:09:04 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1Qup2d-000O4E-Ph; Sat, 20 Aug 2011 20:09:03 +0300 Message-ID: <4E4FEA2E.7050209@FreeBSD.org> Date: Sat, 20 Aug 2011 20:09:02 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:6.0) Gecko/20110819 Thunderbird/6.0 MIME-Version: 1.0 To: Hans Petter Selasky References: <201108201535.24061.hselasky@c2i.net> <4E4FE4C5.9030305@FreeBSD.org> <201108201854.21180.hselasky@c2i.net> In-Reply-To: <201108201854.21180.hselasky@c2i.net> X-Enigmail-Version: undefined Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-stable@FreeBSD.org Subject: Re: USB/coredump hangs in 8 and 9 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 17:09:09 -0000 on 20/08/2011 19:54 Hans Petter Selasky said the following: > On Saturday 20 August 2011 18:45:57 Andriy Gapon wrote: >> SCHEDULER_STOPPED > > The USB code needs to check for the SCHEDULER_STOPPED and cold at the present > moment. If this state can be set during bootup, and cleared at the same time > like "cold", it would be very good. Sorry again - not sure if I follow. SCHEDULER_STOPPED is supposed to be set on panic and never be reset. It's like a mirror of 'cold' in a sense. -- Andriy Gapon From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 17:21:15 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 484E71065670; Sat, 20 Aug 2011 17:21:15 +0000 (UTC) (envelope-from hselasky@c2i.net) Received: from swip.net (mailfe07.c2i.net [212.247.154.194]) by mx1.freebsd.org (Postfix) with ESMTP id 9EF1E8FC0A; Sat, 20 Aug 2011 17:21:14 +0000 (UTC) X-Cloudmark-Score: 0.000000 [] X-Cloudmark-Analysis: v=1.1 cv=ND3JYWI3bJ4ZiXLCJEAs5I5grFUWsY+sOY5HCnTiTok= c=1 sm=1 a=SvYTsOw2Z4kA:10 a=EPV5yV1zpIAA:10 a=WQU8e4WWZSUA:10 a=8nJEP1OIZ-IA:10 a=CL8lFSKtTFcA:10 a=i9M/sDlu2rpZ9XS819oYzg==:17 a=_c88UKqYe0xIkabUrEwA:9 a=wPNLvfGTeEIA:10 a=i9M/sDlu2rpZ9XS819oYzg==:117 Received: from [188.126.198.129] (account mc467741@c2i.net HELO laptop002.hselasky.homeunix.org) by mailfe07.swip.net (CommuniGate Pro SMTP 5.2.19) with ESMTPA id 168343860; Sat, 20 Aug 2011 19:21:12 +0200 From: Hans Petter Selasky To: Andriy Gapon Date: Sat, 20 Aug 2011 19:18:43 +0200 User-Agent: KMail/1.13.5 (FreeBSD/8.2-STABLE; KDE/4.4.5; amd64; ; ) References: <201108201854.21180.hselasky@c2i.net> <4E4FEA2E.7050209@FreeBSD.org> In-Reply-To: <4E4FEA2E.7050209@FreeBSD.org> X-Face: *nPdTl_}RuAI6^PVpA02T?$%Xa^>@hE0uyUIoiha$pC:9TVgl.Oq, NwSZ4V"|LR.+tj}g5 %V,x^qOs~mnU3]Gn; cQLv&.N>TrxmSFf+p6(30a/{)KUU!s}w\IhQBj}[g}bj0I3^glmC( :AuzV9:.hESm-x4h240C`9=w MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201108201918.43978.hselasky@c2i.net> Cc: freebsd-stable@freebsd.org Subject: Re: USB/coredump hangs in 8 and 9 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 17:21:15 -0000 On Saturday 20 August 2011 19:09:02 Andriy Gapon wrote: > on 20/08/2011 19:54 Hans Petter Selasky said the following: > > On Saturday 20 August 2011 18:45:57 Andriy Gapon wrote: > >> SCHEDULER_STOPPED > > > > The USB code needs to check for the SCHEDULER_STOPPED and cold at the > > present moment. If this state can be set during bootup, and cleared at > > the same time like "cold", it would be very good. > > Sorry again - not sure if I follow. > SCHEDULER_STOPPED is supposed to be set on panic and never be reset. It's > like a mirror of 'cold' in a sense. OK. Then you should add a test "&& !SCHEDULER_STOPPED" where I pointed out: static void usbd_callback_wrapper(struct usb_xfer_queue *pq) { struct usb_xfer *xfer = pq->curr; struct usb_xfer_root *info = xfer->xroot; USB_BUS_LOCK_ASSERT(info->bus, MA_OWNED); if (!mtx_owned(info->xfer_mtx) && !SCHEDULER_STOPPED) { /* * Cases that end up here: * And also ensure that no mutex asserts can trigger further panics. --HPS From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 17:34:45 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 405BC106566C for ; Sat, 20 Aug 2011 17:34:45 +0000 (UTC) (envelope-from dan@langille.org) Received: from nyi.unixathome.org (nyi.unixathome.org [64.147.113.42]) by mx1.freebsd.org (Postfix) with ESMTP id D43198FC0A for ; Sat, 20 Aug 2011 17:34:44 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by nyi.unixathome.org (Postfix) with ESMTP id EBC5850A09; Sat, 20 Aug 2011 17:34:43 +0000 (UTC) X-Virus-Scanned: amavisd-new at unixathome.org Received: from nyi.unixathome.org ([127.0.0.1]) by localhost (nyi.unixathome.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id glPSpkeHR3wf; Sat, 20 Aug 2011 18:34:43 +0100 (BST) Received: from smtp-auth.unixathome.org (smtp-auth.unixathome.org [10.4.7.7]) (Authenticated sender: hidden) by nyi.unixathome.org (Postfix) with ESMTPSA id 6B74B50A06 ; Sat, 20 Aug 2011 17:34:43 +0000 (UTC) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Dan Langille In-Reply-To: <20110820032438.GA21925@icarus.home.lan> Date: Sat, 20 Aug 2011 13:34:41 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: <4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org> References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org> <20110819232125.GA4965@icarus.home.lan> <20110820032438.GA21925@icarus.home.lan> To: Jeremy Chadwick X-Mailer: Apple Mail (2.1084) Cc: freebsd-stable@freebsd.org Subject: Re: bad sector in gmirror HDD X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 17:34:45 -0000 On Aug 19, 2011, at 11:24 PM, Jeremy Chadwick wrote: > On Fri, Aug 19, 2011 at 09:39:17PM -0400, Dan Langille wrote: >>=20 >> On Aug 19, 2011, at 7:21 PM, Jeremy Chadwick wrote: >>=20 >>> On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote: >>>> System in question: FreeBSD 8.2-STABLE #3: Thu Mar 3 04:52:04 GMT = 2011 >>>>=20 >>>> After a recent power failure, I'm seeing this in my logs: >>>>=20 >>>> Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently = unreadable (pending) sectors >>>=20 >>> I doubt this is related to a power failure. >>>=20 >>>> Searching on that error message, I was led to believe that = identifying the bad sector and >>>> running dd to read it would cause the HDD to reallocate that bad = block. >>>>=20 >>>> http://smartmontools.sourceforge.net/badblockhowto.html >>>=20 >>> This is incorrect (meaning you've misunderstood what's written = there). >>>=20 >>> Unreadable LBAs can be a result of the LBA being actually bad (as in >>> uncorrectable), or the LBA being marked "suspect". In either case = the >>> LBA will return an I/O error when read. >>>=20 >>> If the LBAs are marked "suspect", the drive will perform re-analysis = of >>> the LBA (to determine if the LBA can be read and the data re-mapped, = or >>> if it cannot then the LBA is marked uncorrectable) when you = **write** to >>> the LBA. >>>=20 >>> The above smartd output doesn't tell me much. Providing actual = SMART >>> attribute data (smartctl -a) for the drive would help. The brand of = the >>> drive, the firmware version, and the model all matter -- every drive >>> behaves a little differently. >>=20 >> Information such as this? = http://beta.freebsddiary.org/smart-fixing-bad-sector.php >=20 > Yes, perfect. Thank you. First thing first: upgrade smartmontools to > 5.41. Your attributes will be the same after you do this (the drive = is > already in smartmontools' internal drive DB), but I often have to = remind > people that they really need to keep smartmontools updated as often as > possible. The changes between versions are vast; this is especially > important for people with SSDs (I'm responsible for submitting some > recent improvements for Intel 320 and 510 SSDs). Done. > Anyway, the drive (albeit an old PATA Maxtor) appears to have three > anomalies: >=20 > 1) One confirmed reallocated LBA (SMART attribute 5) >=20 > 2) One "suspect" LBA (SMART attribute 197) >=20 > 3) A very high temperature of 51C (SMART attribute 194). If this = drive > is in an enclosure or in a system with no fans this would be > understandable, otherwise this is a bit high. My home workstation = which > has only one case fan has a drive with more platters than your Maxtor, > and it idles at ~38C. Possibly this drive has been undergoing = constant > I/O recently (which does greatly increase drive temperature)? Not = sure. > I'm not going to focus too much on this one. This is an older system. I suspect insufficient ventilation. I'll look = at getting a new case fan, if not some HDD fans. > The SMART error log also indicates an LBA failure at the 26000 hour = mark > (which is 16 hours prior to when you did smartctl -a /dev/ad2). = Whether > that LBA is the remapped one or the suspect one is unknown. The LBA = was > 5566440. >=20 > The SMART tests you did didn't really amount to anything; no surprise. > short and long tests usually do not test the surface of the disk. = There > are some drives which do it on a long test, but as I said before, > everything varies from drive to drive. >=20 > Furthermore, on this model of drive, you cannot do a surface scans via > SMART. Bummer. That's indicated in the "Offline data collection > capabilities" section at the top, where it reads: >=20 > No Selective Self-test supported. >=20 > So you'll have to use the dd method. This takes longer than if = surface > scanning was supported by the drive, but is acceptable. I'll get to = how > to go about that in a moment. FWIW, I've done a dd read of the entire suspect disk already. Just two = errors. =46rom the URL mentioned above: [root@bast:~] # dd of=3D/dev/null if=3D/dev/ad2 bs=3D1m conv=3Dnoerror dd: /dev/ad2: Input/output error 2717+0 records in 2717+0 records out 2848980992 bytes transferred in 127.128503 secs (22410246 bytes/sec) dd: /dev/ad2: Input/output error 38170+1 records in 38170+1 records out 40025063424 bytes transferred in 1544.671423 secs (25911701 bytes/sec) [root@bast:~] #=20 That seems to indicate two problems. Are those the values I should be = using=20 with dd? I did some more precise testing: # time dd of=3D/dev/null if=3D/dev/ad2 bs=3D512 iseek=3D5566440 dd: /dev/ad2: Input/output error 9+0 records in 9+0 records out 4608 bytes transferred in 5.368668 secs (858 bytes/sec) real 0m5.429s user 0m0.000s sys 0m0.010s NOTE: that's 9 blocks later than mentioned in smarctl The above generated this in /var/log/messages: Aug 20 17:29:25 bast kernel: ad2: FAILURE - READ_DMA = status=3D51 error=3D40 LBA=3D5566449 > [stuff snipped] > That said: >=20 > http://jdc.parodius.com/freebsd/bad_block_scan >=20 > If you run this on your ad2 drive, I'm hoping what you'll find are two > LBAs which can't be read -- one will be the remapped LBA and one will = be > the "suspect" LBA. If you only get one LBA error then that's fine = too, > and will be the "suspect" LBA. > Once you have the LBA(s), you can submit writes to them to get the = drive > to re-analyse them (assuming they're "suspect"): >=20 > dd if=3D/dev/zero of=3D/dev/XXX bs=3D512 count=3D1 seek=3DNNNNN >=20 > Where XXX is the device and NNNNN is the LBA number. >=20 > If this works properly, the dd command should sit there for a little = bit > (as the drive does its re-analysis magic) and then should complete. ad2 is part of a gmirror with ad0. Does this change things? I haven't tried the dd yet. >=20 > You'll want to check SMART stats after that; you should see > Current_Pending_Sector drop to 0. If Offline_Uncorrectable = incremented > then the LBA could not be re-read/remapped. It did increment: 197 Current_Pending_Sector 0x0032 100 100 020 Old_age Always = - 2 [was 1] > If Reallocated_Sector_Ct > incremented then you now have a total of 2 LBAs which are remapped. It did increment: $ diff smarctl.1 smarctl.3 | grep Reallocated_Sector_Ct < 5 Reallocated_Sector_Ct 0x0033 100 100 020 Pre-fail = Always - 1 > 5 Reallocated_Sector_Ct 0x0033 100 100 020 Pre-fail = Always - 2 Full output of smartctl has been appended to = http://beta.freebsddiary.org/smart-fixing-bad-sector.php > In > the case of remapping, you get to deal with the UFS/FFS thing above. > To get the stats to update in this situation you *might* (but probably > not) have to run "smartctl -t offline /dev/XXX". I didn't try that... >=20 > You might also be wondering "that dd command writes 512 bytes of zero = to > that LBA; what about the old data that was there, in the case that the > drive remaps the LBA?" This is a great question, and one I've never > actually taken the time to answer because at this present time I have > absolutely *no* bad disks in my possession. I'm under the impression > that the write does in fact write zeros if the LBA is remapped, but = that > might not be true at all. I've been waiting to test this for quite = some > time and document it/write about it. >=20 > I still suggest you replace the drive, although given its age I doubt > you'll be able to find a suitable replacement. I tend to keep disks > like this around for testing/experimental purposes and not for actual > use. I have several unused 80GB HDD I can place into this system. I think = that's what I'll wind up doing. But I'd like to follow this process through = and get it documented for future reference. --=20 Dan Langille - http://langille.org From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 17:41:56 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2BEA8106564A; Sat, 20 Aug 2011 17:41:56 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id BD7428FC14; Sat, 20 Aug 2011 17:41:55 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p7KHflxW068168 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 20 Aug 2011 20:41:47 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id p7KHflhw008180; Sat, 20 Aug 2011 20:41:47 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p7KHflc6008179; Sat, 20 Aug 2011 20:41:47 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 20 Aug 2011 20:41:47 +0300 From: Kostik Belousov To: alc@freebsd.org Message-ID: <20110820174147.GW17489@deviant.kiev.zoral.com.ua> References: <4E4143A6.6030307@digsys.bg> <935F8EC2-88E0-45A3-BE8B-7210BE223BC5@mac.com> <4e42a0c0.e2t/9MF98O3HFjb1%perryh@pluto.rain.com> <4E4CCA6C.8020408@ipfw.ru> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="gfR41eDGUhhc/UyZ" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-stable@freebsd.org, perryh@pluto.rain.com, "Alexander V. Chernikov" , daniel@digsys.bg Subject: Re: 32GB limit per swap device? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 17:41:56 -0000 --gfR41eDGUhhc/UyZ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Aug 20, 2011 at 12:33:29PM -0500, Alan Cox wrote: > On Thu, Aug 18, 2011 at 3:16 AM, Alexander V. Chernikov wrote: >=20 > > On 10.08.2011 19:16, perryh@pluto.rain.com wrote: > > > >> Chuck Swiger wrote: > >> > >> On Aug 9, 2011, at 7:26 AM, Daniel Kalchev wrote: > >>> > >>>> I am trying to set up 64GB partitions for swap for a system that > >>>> has 64GB of RAM (with the idea to dump kernel core etc). But, on > >>>> 8-stable as of today I get: > >>>> > >>>> WARNING: reducing size to maximum of 67108864 blocks per swap unit > >>>> > >>>> Is there workaround for this limitation? > >>>> > >>> > > Another interesting question: > > > > swap pager operates in page blocks (PAGE_SIZE=3D4k on common arch). > > > > Block device size in passed to swaponsomething() in number of _disk_ bl= ocks > > (e.g. in DEV_BSIZE=3D512). After that, kernel b-lists (on top of which= swap > > pager is build) maximum objects check is enforced. > > > > The (possible) problem is that real object count we will operate on is = not > > the value passed to swaponsomething() since it is calculated in wrong u= nits. > > > > we should check b-list limit on (X * DEV_BSIZE512 / PAGE_SIZE) value wh= ich > > is rough (X / 8) so we should be able to address 32*8=3D256G. > > > > The code should look like this: > > > > Index: vm/swap_pager.c > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D**=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D**=3D=3D=3D=3D=3D=3D=3D > > --- vm/swap_pager.c (revision 223877) > > +++ vm/swap_pager.c (working copy) > > @@ -2129,6 +2129,15 @@ swaponsomething(struct vnode *vp, void *id, u_lo= ng > > u_long mblocks; > > > > /* > > + * nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunk= s. > > + * First chop nblks off to page-align it, then convert. > > + * > > + * sw->sw_nblks is in page-sized chunks now too. > > + */ > > + nblks &=3D ~(ctodb(1) - 1); > > + nblks =3D dbtoc(nblks); > > + > > + /* > > > > * If we go beyond this, we get overflows in the radix > > * tree bitmap code. > > */ > > @@ -2138,14 +2147,6 @@ swaponsomething(struct vnode *vp, void *id, u_lo= ng > > mblocks); > > nblks =3D mblocks; > > } > > - /* > > - * nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunk= s. > > - * First chop nblks off to page-align it, then convert. > > - * > > - * sw->sw_nblks is in page-sized chunks now too. > > - */ > > - nblks &=3D ~(ctodb(1) - 1); > > - nblks =3D dbtoc(nblks); > > > > sp =3D malloc(sizeof *sp, M_VMPGDATA, M_WAITOK | M_ZERO); > > sp->sw_vp =3D vp; > > > > > > (move pages recalculation before b-list check) > > > > > > Can someone comment on this? > > > > > I believe that you are correct. Have you tried testing this change on a > large swap device? I probably agree too, but I am in the process of re-reading the swap code, and I do not quite believe in the limit. When the initial code was committed, our daddr_t was 32bit, I checked the RELENG_4 sources. Current code uses int64_t for daddr_t. My impression right now is that we only utilize the low 32bits of daddr_t. Esp. interesting looks the following typedef: typedef uint32_t u_daddr_t; /* unsigned disk address */ which (correctly) means that typical mask (u_daddr_t)-1 is 0xffffffff. I wonder whether we could just use full 64bit and de-facto remove the limitation on the swap partition size. --gfR41eDGUhhc/UyZ Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk5P8dsACgkQC3+MBN1Mb4gKdwCeK7fVc2QYLxELDvVNP+xeDEdQ bk8An2aneYCGFD/rDi0TA2tSjFHD5Srd =Eikm -----END PGP SIGNATURE----- --gfR41eDGUhhc/UyZ-- From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 17:54:49 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 598F0106566C for ; Sat, 20 Aug 2011 17:54:49 +0000 (UTC) (envelope-from ml@os2.kiev.ua) Received: from s1.sdv.com.ua (s1.sdv.com.ua [77.120.97.61]) by mx1.freebsd.org (Postfix) with ESMTP id 14C9F8FC1A for ; Sat, 20 Aug 2011 17:54:48 +0000 (UTC) Received: from 90-105-243-80.cust.centrio.cz ([80.243.105.90] helo=[192.168.100.107]) by s1.sdv.com.ua with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.76 (FreeBSD)) (envelope-from ) id 1Qupkn-0007Nu-BH; Sat, 20 Aug 2011 20:54:43 +0300 Message-ID: <4E4FF4D6.1090305@os2.kiev.ua> Date: Sat, 20 Aug 2011 19:54:30 +0200 From: Alex Samorukov User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18) Gecko/20110617 Lightning/1.0b2 Thunderbird/3.1.11 MIME-Version: 1.0 To: Dan Langille References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org> <20110819232125.GA4965@icarus.home.lan> <20110820032438.GA21925@icarus.home.lan> <4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org> In-Reply-To: <4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-SA-Score: -1.0 Cc: freebsd-stable@freebsd.org, Jeremy Chadwick Subject: Re: bad sector in gmirror HDD X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 17:54:49 -0000 You can run long self-test in smartmontools (-t long). Then you can get failed sector number from the smartmontools (-l selftest) and then you can use DD to write zero to the specific sector. Also i am highly recommending to setup smartd as daemon and to monitor number of relocated sectors. If they will grow again - then it is a good time to utilize this disk. > [root@bast:~] # dd of=/dev/null if=/dev/ad2 bs=1m conv=noerror > dd: /dev/ad2: Input/output error > 2717+0 records in > 2717+0 records out > 2848980992 bytes transferred in 127.128503 secs (22410246 bytes/sec) > dd: /dev/ad2: Input/output error > 38170+1 records in > 38170+1 records out > 40025063424 bytes transferred in 1544.671423 secs (25911701 bytes/sec) > [root@bast:~] # > > That seems to indicate two problems. Are those the values I should be using > with dd? > From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 18:01:33 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 616D1106566B for ; Sat, 20 Aug 2011 18:01:33 +0000 (UTC) (envelope-from alan.l.cox@gmail.com) Received: from mail-yi0-f54.google.com (mail-yi0-f54.google.com [209.85.218.54]) by mx1.freebsd.org (Postfix) with ESMTP id 206B28FC13 for ; Sat, 20 Aug 2011 18:01:32 +0000 (UTC) Received: by yib19 with SMTP id 19so3272232yib.13 for ; Sat, 20 Aug 2011 11:01:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=BxwVeO12cVrdIaF5/1P2XlGzxmKj97ajQnDOuvXV/BU=; b=Ot6GmqJnnbdVRt98QP92Tp5pCQlRTg2D/pNyx1PSPE89rXcRhLo6T+5orWs7zLSm6U 4ZXrM8JCN/4pEX5NQE6fufyqWZ11w/ynoidPIz8y5baVvdoubIB4SJbOFY5Mt1LCdiQ+ PwhsVAPv3XE2xfN00X6Jd01Tol7xSRXhrJLIw= MIME-Version: 1.0 Received: by 10.42.137.2 with SMTP id w2mr668882ict.116.1313861609094; Sat, 20 Aug 2011 10:33:29 -0700 (PDT) Received: by 10.231.192.20 with HTTP; Sat, 20 Aug 2011 10:33:29 -0700 (PDT) In-Reply-To: <4E4CCA6C.8020408@ipfw.ru> References: <4E4143A6.6030307@digsys.bg> <935F8EC2-88E0-45A3-BE8B-7210BE223BC5@mac.com> <4e42a0c0.e2t/9MF98O3HFjb1%perryh@pluto.rain.com> <4E4CCA6C.8020408@ipfw.ru> Date: Sat, 20 Aug 2011 12:33:29 -0500 Message-ID: From: Alan Cox To: "Alexander V. Chernikov" Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Kostik Belousov , freebsd-stable@freebsd.org, perryh@pluto.rain.com, daniel@digsys.bg Subject: Re: 32GB limit per swap device? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: alc@freebsd.org List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 18:01:33 -0000 On Thu, Aug 18, 2011 at 3:16 AM, Alexander V. Chernikov wrote: > On 10.08.2011 19:16, perryh@pluto.rain.com wrote: > >> Chuck Swiger wrote: >> >> On Aug 9, 2011, at 7:26 AM, Daniel Kalchev wrote: >>> >>>> I am trying to set up 64GB partitions for swap for a system that >>>> has 64GB of RAM (with the idea to dump kernel core etc). But, on >>>> 8-stable as of today I get: >>>> >>>> WARNING: reducing size to maximum of 67108864 blocks per swap unit >>>> >>>> Is there workaround for this limitation? >>>> >>> > Another interesting question: > > swap pager operates in page blocks (PAGE_SIZE=4k on common arch). > > Block device size in passed to swaponsomething() in number of _disk_ blocks > (e.g. in DEV_BSIZE=512). After that, kernel b-lists (on top of which swap > pager is build) maximum objects check is enforced. > > The (possible) problem is that real object count we will operate on is not > the value passed to swaponsomething() since it is calculated in wrong units. > > we should check b-list limit on (X * DEV_BSIZE512 / PAGE_SIZE) value which > is rough (X / 8) so we should be able to address 32*8=256G. > > The code should look like this: > > Index: vm/swap_pager.c > ==============================**==============================**======= > --- vm/swap_pager.c (revision 223877) > +++ vm/swap_pager.c (working copy) > @@ -2129,6 +2129,15 @@ swaponsomething(struct vnode *vp, void *id, u_long > u_long mblocks; > > /* > + * nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks. > + * First chop nblks off to page-align it, then convert. > + * > + * sw->sw_nblks is in page-sized chunks now too. > + */ > + nblks &= ~(ctodb(1) - 1); > + nblks = dbtoc(nblks); > + > + /* > > * If we go beyond this, we get overflows in the radix > * tree bitmap code. > */ > @@ -2138,14 +2147,6 @@ swaponsomething(struct vnode *vp, void *id, u_long > mblocks); > nblks = mblocks; > } > - /* > - * nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks. > - * First chop nblks off to page-align it, then convert. > - * > - * sw->sw_nblks is in page-sized chunks now too. > - */ > - nblks &= ~(ctodb(1) - 1); > - nblks = dbtoc(nblks); > > sp = malloc(sizeof *sp, M_VMPGDATA, M_WAITOK | M_ZERO); > sp->sw_vp = vp; > > > (move pages recalculation before b-list check) > > > Can someone comment on this? > > I believe that you are correct. Have you tried testing this change on a large swap device? Alan From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 18:04:17 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 093C61065675 for ; Sat, 20 Aug 2011 18:04:17 +0000 (UTC) (envelope-from db@db.net) Received: from diana.db.net (diana.db.net [66.113.102.10]) by mx1.freebsd.org (Postfix) with ESMTP id E53638FC15 for ; Sat, 20 Aug 2011 18:04:16 +0000 (UTC) Received: from night.db.net (localhost [127.0.0.1]) by diana.db.net (Postfix) with ESMTP id 850E62283B; Sat, 20 Aug 2011 11:55:14 -0600 (MDT) Received: by night.db.net (Postfix, from userid 1000) id 589C96914; Sat, 20 Aug 2011 14:04:15 -0400 (EDT) Date: Sat, 20 Aug 2011 14:04:15 -0400 From: Diane Bruce To: Dan Langille Message-ID: <20110820180415.GA74553@night.db.net> References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org> <20110819232125.GA4965@icarus.home.lan> <20110820032438.GA21925@icarus.home.lan> <4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org> User-Agent: Mutt/1.4.2.3i Cc: freebsd-stable@freebsd.org, Jeremy Chadwick Subject: Re: bad sector in gmirror HDD X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 18:04:17 -0000 On Sat, Aug 20, 2011 at 01:34:41PM -0400, Dan Langille wrote: > On Aug 19, 2011, at 11:24 PM, Jeremy Chadwick wrote: > > > On Fri, Aug 19, 2011 at 09:39:17PM -0400, Dan Langille wrote: ... > >> Information such as this? http://beta.freebsddiary.org/smart-fixing-bad-sector.php ... > > 3) A very high temperature of 51C (SMART attribute 194). If this drive > > is in an enclosure or in a system with no fans this would be ... eh? What's the temperature of the second drive? ... > This is an older system. I suspect insufficient ventilation. I'll look at getting > a new case fan, if not some HDD fans. ... > > I still suggest you replace the drive, although given its age I doubt Older drive and errors starting to happen, replace ASAP. > > you'll be able to find a suitable replacement. I tend to keep disks > > like this around for testing/experimental purposes and not for actual > > use. > > I have several unused 80GB HDD I can place into this system. I think that's > what I'll wind up doing. But I'd like to follow this process through and get it documented > for future reference. If the data is valuable, the sooner the better. It's actually somewhat saner if the two drives are not from the same lot. > -- > Dan Langille - http://langille.org > - Diane -- - db@FreeBSD.org db@db.net http://www.db.net/~db Why leave money to our children if we don't leave them the Earth? From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 18:16:00 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BA4FA1065670 for ; Sat, 20 Aug 2011 18:16:00 +0000 (UTC) (envelope-from dan@langille.org) Received: from nyi.unixathome.org (nyi.unixathome.org [64.147.113.42]) by mx1.freebsd.org (Postfix) with ESMTP id 8ACBE8FC1B for ; Sat, 20 Aug 2011 18:16:00 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by nyi.unixathome.org (Postfix) with ESMTP id B740750A09; Sat, 20 Aug 2011 18:15:59 +0000 (UTC) X-Virus-Scanned: amavisd-new at unixathome.org Received: from nyi.unixathome.org ([127.0.0.1]) by localhost (nyi.unixathome.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NNGCgOb+uOFQ; Sat, 20 Aug 2011 19:15:59 +0100 (BST) Received: from smtp-auth.unixathome.org (smtp-auth.unixathome.org [10.4.7.7]) (Authenticated sender: hidden) by nyi.unixathome.org (Postfix) with ESMTPSA id 3D67E50A06 ; Sat, 20 Aug 2011 18:15:59 +0000 (UTC) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Dan Langille In-Reply-To: <4E4FF4D6.1090305@os2.kiev.ua> Date: Sat, 20 Aug 2011 14:15:57 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: <2AB04C16-FF20-467E-9508-AF35CB6323BC@langille.org> References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org> <20110819232125.GA4965@icarus.home.lan> <20110820032438.GA21925@icarus.home.lan> <4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org> <4E4FF4D6.1090305@os2.kiev.ua> To: Alex Samorukov X-Mailer: Apple Mail (2.1084) Cc: freebsd-stable@freebsd.org, Jeremy Chadwick Subject: Re: bad sector in gmirror HDD X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 18:16:00 -0000 On Aug 20, 2011, at 1:54 PM, Alex Samorukov wrote: >> [root@bast:~] # dd of=3D/dev/null if=3D/dev/ad2 bs=3D1m conv=3Dnoerror >> dd: /dev/ad2: Input/output error >> 2717+0 records in >> 2717+0 records out >> 2848980992 bytes transferred in 127.128503 secs (22410246 bytes/sec) >> dd: /dev/ad2: Input/output error >> 38170+1 records in >> 38170+1 records out >> 40025063424 bytes transferred in 1544.671423 secs (25911701 = bytes/sec) >> [root@bast:~] # >>=20 >> That seems to indicate two problems. Are those the values I should = be using >> with dd? >>=20 >=20 > You can run long self-test in smartmontools (-t long). Then you can = get failed sector number from the smartmontools (-l selftest) and then = you can use DD to write zero to the specific sector. Already done: http://beta.freebsddiary.org/smart-fixing-bad-sector.php Search for 786767 Or did you mean something else? That doesn't seem to map to a particular sector though... I ran it for a = while... # time dd of=3D/dev/null if=3D/dev/ad2 bs=3D512 iseek=3D786767=20 ^C4301949+0 records in 4301949+0 records out 2202597888 bytes transferred in 780.245828 secs (2822954 bytes/sec) real 13m0.256s user 0m22.087s sys 3m24.215s > Also i am highly recommending to setup smartd as daemon and to monitor = number of relocated sectors. If they will grow again - then it is a good = time to utilize this disk. It is running, but with nothing custom in the .conf file. --=20 Dan Langille - http://langille.org From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 18:17:39 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 76BCA106567A for ; Sat, 20 Aug 2011 18:17:39 +0000 (UTC) (envelope-from dan@langille.org) Received: from nyi.unixathome.org (nyi.unixathome.org [64.147.113.42]) by mx1.freebsd.org (Postfix) with ESMTP id 47F8B8FC1D for ; Sat, 20 Aug 2011 18:17:39 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by nyi.unixathome.org (Postfix) with ESMTP id C589750A09; Sat, 20 Aug 2011 18:17:38 +0000 (UTC) X-Virus-Scanned: amavisd-new at unixathome.org Received: from nyi.unixathome.org ([127.0.0.1]) by localhost (nyi.unixathome.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id w91gjHj2BT9y; Sat, 20 Aug 2011 19:17:38 +0100 (BST) Received: from smtp-auth.unixathome.org (smtp-auth.unixathome.org [10.4.7.7]) (Authenticated sender: hidden) by nyi.unixathome.org (Postfix) with ESMTPSA id 607D150A06 ; Sat, 20 Aug 2011 18:17:38 +0000 (UTC) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Dan Langille In-Reply-To: <20110820180415.GA74553@night.db.net> Date: Sat, 20 Aug 2011 14:17:37 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org> <20110819232125.GA4965@icarus.home.lan> <20110820032438.GA21925@icarus.home.lan> <4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org> <20110820180415.GA74553@night.db.net> To: Diane Bruce X-Mailer: Apple Mail (2.1084) Cc: freebsd-stable@freebsd.org, Jeremy Chadwick Subject: Re: bad sector in gmirror HDD X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 18:17:39 -0000 On Aug 20, 2011, at 2:04 PM, Diane Bruce wrote: > On Sat, Aug 20, 2011 at 01:34:41PM -0400, Dan Langille wrote: >> On Aug 19, 2011, at 11:24 PM, Jeremy Chadwick wrote: >>=20 >>> On Fri, Aug 19, 2011 at 09:39:17PM -0400, Dan Langille wrote: > ... >>>> Information such as this? = http://beta.freebsddiary.org/smart-fixing-bad-sector.php > ... >>> 3) A very high temperature of 51C (SMART attribute 194). If this = drive >>> is in an enclosure or in a system with no fans this would be >=20 > ... >=20 > eh? What's the temperature of the second drive? Roughly the same: [root@bast:/home/dan/tmp] # smartctl -a /dev/ad2 | grep -i temp 194 Temperature_Celsius 0x0022 080 076 042 Old_age Always = - 51 [root@bast:/home/dan/tmp] # smartctl -a /dev/ad0 | grep -i temp 194 Temperature_Celsius 0x0022 081 074 042 Old_age Always = - 49 [root@bast:/home/dan/tmp] #=20 FYI, when I first set up smartd, I questioned those values. The HDD in = question, at the time, did not feel hot to the touch. >=20 > ... >=20 >> This is an older system. I suspect insufficient ventilation. I'll = look at getting >> a new case fan, if not some HDD fans. >=20 > ... >=20 >>> I still suggest you replace the drive, although given its age I = doubt >=20 > Older drive and errors starting to happen, replace ASAP. >=20 >>> you'll be able to find a suitable replacement. I tend to keep disks >>> like this around for testing/experimental purposes and not for = actual >>> use. >>=20 >> I have several unused 80GB HDD I can place into this system. I think = that's >> what I'll wind up doing. But I'd like to follow this process through = and get it documented >> for future reference. >=20 > If the data is valuable, the sooner the better.=20 > It's actually somewhat saner if the two drives are not from the same = lot. Noted. --=20 Dan Langille - http://langille.org From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 18:23:30 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C6852106566B; Sat, 20 Aug 2011 18:23:30 +0000 (UTC) (envelope-from marquis@roble.com) Received: from mx5.roble.com (mx5.roble.com [206.40.34.5]) by mx1.freebsd.org (Postfix) with ESMTP id B7EB08FC08; Sat, 20 Aug 2011 18:23:30 +0000 (UTC) Received: from mx5.roble.com (mx5.roble.com [206.40.34.5]) by mx5.roble.com (Postfix) with ESMTP id 2FCA867899; Sat, 20 Aug 2011 11:10:31 -0700 (PDT) Date: Sat, 20 Aug 2011 11:10:31 -0700 (PDT) From: Roger Marquis To: freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org In-Reply-To: <82E865FBA30747078AF6EE3C1701F973@multiplay.co.uk> References: <82E865FBA30747078AF6EE3C1701F973@multiplay.co.uk> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Message-Id: <20110820182330.C6852106566B@hub.freebsd.org> Cc: Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 18:23:30 -0000 >> Repeat this enough times and prison0.pr_uref reaches zero. >> To reach zero even sooner just kill enough of non-jailed processes. Interesting. We've been getting kernel panics in -stable but with only one jail started at boot without being restarted. Are you using SAS drives by any chance? Setting ethernet polling and HZ? How about softupdates, gmirror, and/or anything in sysctl.conf? Roger Marquis From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 18:30:46 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ECA10106567A; Sat, 20 Aug 2011 18:30:46 +0000 (UTC) (envelope-from prvs=12137168ef=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 431938FC22; Sat, 20 Aug 2011 18:30:45 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Sat, 20 Aug 2011 19:30:11 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Sat, 20 Aug 2011 19:30:11 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014675318.msg; Sat, 20 Aug 2011 19:30:09 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=12137168ef=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: From: "Steven Hartland" To: "Roger Marquis" , , References: <82E865FBA30747078AF6EE3C1701F973@multiplay.co.uk> <20110820182330.C6852106566B@hub.freebsd.org> Date: Sat, 20 Aug 2011 19:31:11 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 18:30:47 -0000 ----- Original Message ----- From: "Roger Marquis" To: ; Sent: Saturday, August 20, 2011 7:10 PM Subject: Re: debugging frequent kernel panics on 8.2-RELEASE >>> Repeat this enough times and prison0.pr_uref reaches zero. >>> To reach zero even sooner just kill enough of non-jailed processes. > > Interesting. We've been getting kernel panics in -stable but with only > one jail started at boot without being restarted. > > Are you using SAS drives by any chance? Setting ethernet polling and HZ? > How about softupdates, gmirror, and/or anything in sysctl.conf? If your not restarting things it may be unrelated. No SAS, polling is compiled in but no devices have it active and using ZFS only. Are you seeing a double fault panic? Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 18:35:01 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 329DB1065677 for ; Sat, 20 Aug 2011 18:35:01 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta04.emeryville.ca.mail.comcast.net (qmta04.emeryville.ca.mail.comcast.net [76.96.30.40]) by mx1.freebsd.org (Postfix) with ESMTP id 14B7A8FC14 for ; Sat, 20 Aug 2011 18:35:00 +0000 (UTC) Received: from omta09.emeryville.ca.mail.comcast.net ([76.96.30.20]) by qmta04.emeryville.ca.mail.comcast.net with comcast id NiZg1h0020S2fkCA4iawSK; Sat, 20 Aug 2011 18:34:56 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta09.emeryville.ca.mail.comcast.net with comcast id Niar1h00Z1t3BNj8ViasCe; Sat, 20 Aug 2011 18:34:54 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id C8CCC102C1A; Sat, 20 Aug 2011 11:34:56 -0700 (PDT) Date: Sat, 20 Aug 2011 11:34:56 -0700 From: Jeremy Chadwick To: Alex Samorukov Message-ID: <20110820183456.GA38317@icarus.home.lan> References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org> <20110819232125.GA4965@icarus.home.lan> <20110820032438.GA21925@icarus.home.lan> <4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org> <4E4FF4D6.1090305@os2.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4E4FF4D6.1090305@os2.kiev.ua> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-stable@freebsd.org, Dan Langille Subject: Re: bad sector in gmirror HDD X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 18:35:01 -0000 On Sat, Aug 20, 2011 at 07:54:30PM +0200, Alex Samorukov wrote: > You can run long self-test in smartmontools (-t long). Then you can > get failed sector number from the smartmontools (-l selftest) and > then you can use DD to write zero to the specific sector. This is inaccurate advice. I covered this in my reply already as well: http://lists.freebsd.org/pipermail/freebsd-stable/2011-August/063665.html Quote: "The SMART tests you did didn't really amount to anything; no surprise. short and long tests usually do not test the surface of the disk. There are some drives which do it on a long test, but as I said before, everything varies from drive to drive." TL;DR version: smartctl -t long != smartctl -t select. The OP's drive does not support selective scans (-t select), and long turned up nothing (no surprise there either). So, using dd to find the bad LBAs is the only choice he has. > Also i am highly recommending to setup smartd as daemon and to monitor > number of relocated sectors. If they will grow again - then it is a > good time to utilize this disk. You have to know what you're looking at and how to interpret the data smartd gives you for it to be useful. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 18:36:33 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5C910106564A for ; Sat, 20 Aug 2011 18:36:33 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta01.westchester.pa.mail.comcast.net (qmta01.westchester.pa.mail.comcast.net [76.96.62.16]) by mx1.freebsd.org (Postfix) with ESMTP id 1D8898FC17 for ; Sat, 20 Aug 2011 18:36:32 +0000 (UTC) Received: from omta21.westchester.pa.mail.comcast.net ([76.96.62.72]) by qmta01.westchester.pa.mail.comcast.net with comcast id NibJ1h0021ZXKqc51icZx5; Sat, 20 Aug 2011 18:36:33 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta21.westchester.pa.mail.comcast.net with comcast id NicP1h01q1t3BNj3hicV8Q; Sat, 20 Aug 2011 18:36:31 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id AEA5C102C1A; Sat, 20 Aug 2011 11:36:22 -0700 (PDT) Date: Sat, 20 Aug 2011 11:36:22 -0700 From: Jeremy Chadwick To: Dan Langille Message-ID: <20110820183622.GA38427@icarus.home.lan> References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org> <20110819232125.GA4965@icarus.home.lan> <20110820032438.GA21925@icarus.home.lan> <4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-stable@freebsd.org Subject: Re: bad sector in gmirror HDD X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 18:36:33 -0000 Dan, I will respond to your reply sometime tomorrow. I do not have time to review the Email today (~7.7KBytes), but will have time tomorrow. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 18:36:58 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2ABC610656B6 for ; Sat, 20 Aug 2011 18:36:58 +0000 (UTC) (envelope-from alc@rice.edu) Received: from mh5.mail.rice.edu (mh5.mail.rice.edu [128.42.199.32]) by mx1.freebsd.org (Postfix) with ESMTP id E88268FC12 for ; Sat, 20 Aug 2011 18:36:57 +0000 (UTC) Received: from mh5.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh5.mail.rice.edu (Postfix) with ESMTP id 212CB29021B; Sat, 20 Aug 2011 13:20:05 -0500 (CDT) X-Virus-Scanned: by amavis-2.6.4 at mh5.mail.rice.edu, auth channel Received: from mh5.mail.rice.edu ([127.0.0.1]) by mh5.mail.rice.edu (mh5.mail.rice.edu [127.0.0.1]) (amavis, port 10026) with ESMTP id VY-Q6Bmihokg; Sat, 20 Aug 2011 13:20:05 -0500 (CDT) Received: from adsl-216-63-78-18.dsl.hstntx.swbell.net (adsl-216-63-78-18.dsl.hstntx.swbell.net [216.63.78.18]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) (Authenticated sender: alc) by mh5.mail.rice.edu (Postfix) with ESMTPSA id 5869E2901AB; Sat, 20 Aug 2011 13:20:04 -0500 (CDT) Message-ID: <4E4FFAD3.4090706@rice.edu> Date: Sat, 20 Aug 2011 13:20:03 -0500 From: Alan Cox User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.9.2.17) Gecko/20110620 Thunderbird/3.1.10 MIME-Version: 1.0 To: Kostik Belousov References: <4E4143A6.6030307@digsys.bg> <935F8EC2-88E0-45A3-BE8B-7210BE223BC5@mac.com> <4e42a0c0.e2t/9MF98O3HFjb1%perryh@pluto.rain.com> <4E4CCA6C.8020408@ipfw.ru> <20110820174147.GW17489@deviant.kiev.zoral.com.ua> In-Reply-To: <20110820174147.GW17489@deviant.kiev.zoral.com.ua> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: alc@freebsd.org, freebsd-stable@freebsd.org, perryh@pluto.rain.com, "Alexander V. Chernikov" , daniel@digsys.bg Subject: Re: 32GB limit per swap device? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 18:36:58 -0000 On 08/20/2011 12:41, Kostik Belousov wrote: > On Sat, Aug 20, 2011 at 12:33:29PM -0500, Alan Cox wrote: >> On Thu, Aug 18, 2011 at 3:16 AM, Alexander V. Chernikovwrote: >> >>> On 10.08.2011 19:16, perryh@pluto.rain.com wrote: >>> >>>> Chuck Swiger wrote: >>>> >>>> On Aug 9, 2011, at 7:26 AM, Daniel Kalchev wrote: >>>>>> I am trying to set up 64GB partitions for swap for a system that >>>>>> has 64GB of RAM (with the idea to dump kernel core etc). But, on >>>>>> 8-stable as of today I get: >>>>>> >>>>>> WARNING: reducing size to maximum of 67108864 blocks per swap unit >>>>>> >>>>>> Is there workaround for this limitation? >>>>>> >>> Another interesting question: >>> >>> swap pager operates in page blocks (PAGE_SIZE=4k on common arch). >>> >>> Block device size in passed to swaponsomething() in number of _disk_ blocks >>> (e.g. in DEV_BSIZE=512). After that, kernel b-lists (on top of which swap >>> pager is build) maximum objects check is enforced. >>> >>> The (possible) problem is that real object count we will operate on is not >>> the value passed to swaponsomething() since it is calculated in wrong units. >>> >>> we should check b-list limit on (X * DEV_BSIZE512 / PAGE_SIZE) value which >>> is rough (X / 8) so we should be able to address 32*8=256G. >>> >>> The code should look like this: >>> >>> Index: vm/swap_pager.c >>> ==============================**==============================**======= >>> --- vm/swap_pager.c (revision 223877) >>> +++ vm/swap_pager.c (working copy) >>> @@ -2129,6 +2129,15 @@ swaponsomething(struct vnode *vp, void *id, u_long >>> u_long mblocks; >>> >>> /* >>> + * nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks. >>> + * First chop nblks off to page-align it, then convert. >>> + * >>> + * sw->sw_nblks is in page-sized chunks now too. >>> + */ >>> + nblks&= ~(ctodb(1) - 1); >>> + nblks = dbtoc(nblks); >>> + >>> + /* >>> >>> * If we go beyond this, we get overflows in the radix >>> * tree bitmap code. >>> */ >>> @@ -2138,14 +2147,6 @@ swaponsomething(struct vnode *vp, void *id, u_long >>> mblocks); >>> nblks = mblocks; >>> } >>> - /* >>> - * nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks. >>> - * First chop nblks off to page-align it, then convert. >>> - * >>> - * sw->sw_nblks is in page-sized chunks now too. >>> - */ >>> - nblks&= ~(ctodb(1) - 1); >>> - nblks = dbtoc(nblks); >>> >>> sp = malloc(sizeof *sp, M_VMPGDATA, M_WAITOK | M_ZERO); >>> sp->sw_vp = vp; >>> >>> >>> (move pages recalculation before b-list check) >>> >>> >>> Can someone comment on this? >>> >>> >> I believe that you are correct. Have you tried testing this change on a >> large swap device? > I probably agree too, but I am in the process of re-reading the swap code, > and I do not quite believe in the limit. > I'm uncertain whether the current limit, "0x40000000 / BLIST_META_RADIX", is exact or not, but I doubt that it is too large. > When the initial code was committed, our daddr_t was 32bit, I checked > the RELENG_4 sources. Current code uses int64_t for daddr_t. My impression > right now is that we only utilize the low 32bits of daddr_t. > > Esp. interesting looks the following typedef: > typedef uint32_t u_daddr_t; /* unsigned disk address */ > which (correctly) means that typical mask (u_daddr_t)-1 is 0xffffffff. > > I wonder whether we could just use full 64bit and de-facto remove the > limitation on the swap partition size. I would rather argue first that the subr_list code should not be using daddr_t all. The code is abusing daddr_t and defining u_daddr_t to represent things that are not disk addresses. Instead, it should either define its own type or directly use (u)int*_t. Then, as for choosing between 32 and 64 bits, I'm skeptical of using this structure for managing more than 32 bits worth of blocks, given the amount of RAM it will use. From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 18:40:24 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4C6101065672 for ; Sat, 20 Aug 2011 18:40:24 +0000 (UTC) (envelope-from dan@langille.org) Received: from nyi.unixathome.org (nyi.unixathome.org [64.147.113.42]) by mx1.freebsd.org (Postfix) with ESMTP id 1D7F68FC0A for ; Sat, 20 Aug 2011 18:40:23 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by nyi.unixathome.org (Postfix) with ESMTP id 73A2750A09; Sat, 20 Aug 2011 18:40:23 +0000 (UTC) X-Virus-Scanned: amavisd-new at unixathome.org Received: from nyi.unixathome.org ([127.0.0.1]) by localhost (nyi.unixathome.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JG+6dJqrxBhn; Sat, 20 Aug 2011 19:40:23 +0100 (BST) Received: from smtp-auth.unixathome.org (smtp-auth.unixathome.org [10.4.7.7]) (Authenticated sender: hidden) by nyi.unixathome.org (Postfix) with ESMTPSA id 3225F50A06 ; Sat, 20 Aug 2011 18:40:23 +0000 (UTC) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Dan Langille In-Reply-To: <20110820183622.GA38427@icarus.home.lan> Date: Sat, 20 Aug 2011 14:40:21 -0400 Content-Transfer-Encoding: 7bit Message-Id: <04B6AC2F-A1F5-42B9-B0D2-D2840DFE7917@langille.org> References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org> <20110819232125.GA4965@icarus.home.lan> <20110820032438.GA21925@icarus.home.lan> <4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org> <20110820183622.GA38427@icarus.home.lan> To: Jeremy Chadwick X-Mailer: Apple Mail (2.1084) Cc: freebsd-stable@freebsd.org Subject: Re: bad sector in gmirror HDD X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 18:40:24 -0000 On Aug 20, 2011, at 2:36 PM, Jeremy Chadwick wrote: > Dan, I will respond to your reply sometime tomorrow. I do not have time > to review the Email today (~7.7KBytes), but will have time tomorrow. No worries. Thank you. -- Dan Langille - http://langille.org From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 18:43:24 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5F07C106566C for ; Sat, 20 Aug 2011 18:43:24 +0000 (UTC) (envelope-from ml@os2.kiev.ua) Received: from s1.sdv.com.ua (s1.sdv.com.ua [77.120.97.61]) by mx1.freebsd.org (Postfix) with ESMTP id 19DEF8FC18 for ; Sat, 20 Aug 2011 18:43:23 +0000 (UTC) Received: from 90-105-243-80.cust.centrio.cz ([80.243.105.90] helo=[192.168.100.107]) by s1.sdv.com.ua with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.76 (FreeBSD)) (envelope-from ) id 1QuqVr-0009Nq-PW; Sat, 20 Aug 2011 21:43:21 +0300 Message-ID: <4E50003D.30803@os2.kiev.ua> Date: Sat, 20 Aug 2011 20:43:09 +0200 From: Alex Samorukov User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18) Gecko/20110617 Lightning/1.0b2 Thunderbird/3.1.11 MIME-Version: 1.0 To: Jeremy Chadwick References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org> <20110819232125.GA4965@icarus.home.lan> <20110820032438.GA21925@icarus.home.lan> <4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org> <4E4FF4D6.1090305@os2.kiev.ua> <20110820183456.GA38317@icarus.home.lan> In-Reply-To: <20110820183456.GA38317@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-SA-Score: -1.0 Cc: freebsd-stable@freebsd.org, Dan Langille Subject: Re: bad sector in gmirror HDD X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 18:43:24 -0000 > "The SMART tests you did didn't really amount to anything; no surprise. > short and long tests usually do not test the surface of the disk. There > are some drives which do it on a long test, but as I said before, > everything varies from drive to drive." > It is not correct statement, sorry. Long test trying to read all the data from surface (and doing some other things). // one of the smartmontools developers and sysutils/smartmontools maintainer. From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 18:46:13 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8D01D10656E2; Sat, 20 Aug 2011 18:46:13 +0000 (UTC) (envelope-from melifaro@ipfw.ru) Received: from mail.ipfw.ru (unknown [IPv6:2a01:4f8:120:6141::2]) by mx1.freebsd.org (Postfix) with ESMTP id 159938FC0A; Sat, 20 Aug 2011 18:46:13 +0000 (UTC) Received: from v6.mpls.in ([2a02:978:2::5] helo=ws.su29.net) by mail.ipfw.ru with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.76 (FreeBSD)) (envelope-from ) id 1QuqYa-0005fe-Sb; Sat, 20 Aug 2011 22:46:09 +0400 Message-ID: <4E500014.6030800@ipfw.ru> Date: Sat, 20 Aug 2011 22:42:28 +0400 From: "Alexander V. Chernikov" User-Agent: Thunderbird 2.0.0.24 (X11/20100515) MIME-Version: 1.0 To: Alan Cox References: <4E4143A6.6030307@digsys.bg> <935F8EC2-88E0-45A3-BE8B-7210BE223BC5@mac.com> <4e42a0c0.e2t/9MF98O3HFjb1%perryh@pluto.rain.com> <4E4CCA6C.8020408@ipfw.ru> <20110820174147.GW17489@deviant.kiev.zoral.com.ua> <4E4FFAD3.4090706@rice.edu> In-Reply-To: <4E4FFAD3.4090706@rice.edu> X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Kostik Belousov , alc@freebsd.org, perryh@pluto.rain.com, freebsd-stable@freebsd.org, daniel@digsys.bg Subject: Re: 32GB limit per swap device? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 18:46:13 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Alan Cox wrote: > On 08/20/2011 12:41, Kostik Belousov wrote: >> On Sat, Aug 20, 2011 at 12:33:29PM -0500, Alan Cox wrote: >>> On Thu, Aug 18, 2011 at 3:16 AM, Alexander V. >>> Chernikovwrote: >>> >>>> On 10.08.2011 19:16, perryh@pluto.rain.com wrote: >>>> >>>>> Chuck Swiger wrote: >>>>> >>>>> On Aug 9, 2011, at 7:26 AM, Daniel Kalchev wrote: >>>>>>> I am trying to set up 64GB partitions for swap for a system that >>>>>>> has 64GB of RAM (with the idea to dump kernel core etc). But, on >>>>>>> 8-stable as of today I get: >>>>>>> >>>>>>> WARNING: reducing size to maximum of 67108864 blocks per swap unit >>>>>>> >>>>>>> Is there workaround for this limitation? >>>>>>> >>>> Another interesting question: >>>> >>>> swap pager operates in page blocks (PAGE_SIZE=4k on common arch). >>>> >>>> Block device size in passed to swaponsomething() in number of _disk_ >>>> blocks >>>> (e.g. in DEV_BSIZE=512). After that, kernel b-lists (on top of >>>> which swap >>>> pager is build) maximum objects check is enforced. >>>> >>>> The (possible) problem is that real object count we will operate on >>>> is not >>>> the value passed to swaponsomething() since it is calculated in >>>> wrong units. >>>> >>>> we should check b-list limit on (X * DEV_BSIZE512 / PAGE_SIZE) value >>>> which >>>> is rough (X / 8) so we should be able to address 32*8=256G. >>>> >>>> The code should look like this: >>>> >>>> Index: vm/swap_pager.c >>>> ==============================**==============================**======= >>>> --- vm/swap_pager.c (revision 223877) >>>> +++ vm/swap_pager.c (working copy) >>>> @@ -2129,6 +2129,15 @@ swaponsomething(struct vnode *vp, void *id, >>>> u_long >>>> u_long mblocks; >>>> >>>> /* >>>> + * nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd >>>> chunks. >>>> + * First chop nblks off to page-align it, then convert. >>>> + * >>>> + * sw->sw_nblks is in page-sized chunks now too. >>>> + */ >>>> + nblks&= ~(ctodb(1) - 1); >>>> + nblks = dbtoc(nblks); >>>> + >>>> + /* >>>> >>>> * If we go beyond this, we get overflows in the radix >>>> * tree bitmap code. >>>> */ >>>> @@ -2138,14 +2147,6 @@ swaponsomething(struct vnode *vp, void *id, >>>> u_long >>>> mblocks); >>>> nblks = mblocks; >>>> } >>>> - /* >>>> - * nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd >>>> chunks. >>>> - * First chop nblks off to page-align it, then convert. >>>> - * >>>> - * sw->sw_nblks is in page-sized chunks now too. >>>> - */ >>>> - nblks&= ~(ctodb(1) - 1); >>>> - nblks = dbtoc(nblks); >>>> >>>> sp = malloc(sizeof *sp, M_VMPGDATA, M_WAITOK | M_ZERO); >>>> sp->sw_vp = vp; >>>> >>>> >>>> (move pages recalculation before b-list check) >>>> >>>> >>>> Can someone comment on this? >>>> >>>> >>> I believe that you are correct. Have you tried testing this change on a >>> large swap device? I will try tomorrow. >> I probably agree too, but I am in the process of re-reading the swap >> code, >> and I do not quite believe in the limit. >> > > I'm uncertain whether the current limit, "0x40000000 / > BLIST_META_RADIX", is exact or not, but I doubt that it is too large. It is not exact. It is rough estimation of sizeof(blmeta_t) * X < 4G (blist_create() assumes malloc() not being able to allocate more that 4G. I'm not sure if it is true this days) X is number of blocks we need to store. Actual number, however, it is X / (1 + 1/BLIST_META_RADIX + 1/BLIST_META_RADIX^2 + ...) but it dffers from X not very much. blist can be seen as tree of radix trees, with metainformation for all those radix trees allocated by single allocation which imposes this limit. Metatinformation is used to find free blocks more quickly Single linear allocation is required to advance to next radix tree on the same level very fast: * * * * * ** ** ** ** ** ******************** ^^^ Some kind of schema with 3 level in tree and BLIST_META_RADIX=2 (instead of 16). > >> When the initial code was committed, our daddr_t was 32bit, I checked >> the RELENG_4 sources. Current code uses int64_t for daddr_t. My >> impression >> right now is that we only utilize the low 32bits of daddr_t. >> >> Esp. interesting looks the following typedef: >> typedef uint32_t u_daddr_t; /* unsigned disk address */ >> which (correctly) means that typical mask (u_daddr_t)-1 is 0xffffffff. >> >> I wonder whether we could just use full 64bit and de-facto remove the >> limitation on the swap partition size. This will increase struct blmeta_t twice and cause 2*X memory usage for every swap configuration. > > I would rather argue first that the subr_list code should not be using > daddr_t all. The code is abusing daddr_t and defining u_daddr_t to > represent things that are not disk addresses. Instead, it should either > define its own type or directly use (u)int*_t. Then, as for choosing > between 32 and 64 bits, I'm skeptical of using this structure for > managing more than 32 bits worth of blocks, given the amount of RAM it > will use. > > > -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk5QABQACgkQwcJ4iSZ1q2kdXwCfWPN48wauijoGOQCUaalYnFCR BIgAnRLCuDmPwySp1gd0xf+UPG5nC7KJ =sP6M -----END PGP SIGNATURE----- From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 19:17:31 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D7EF61065672 for ; Sat, 20 Aug 2011 19:17:31 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta13.emeryville.ca.mail.comcast.net (qmta13.emeryville.ca.mail.comcast.net [76.96.27.243]) by mx1.freebsd.org (Postfix) with ESMTP id BC2748FC15 for ; Sat, 20 Aug 2011 19:17:31 +0000 (UTC) Received: from omta18.emeryville.ca.mail.comcast.net ([76.96.30.74]) by qmta13.emeryville.ca.mail.comcast.net with comcast id NjED1h0011bwxycADjHTjs; Sat, 20 Aug 2011 19:17:27 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta18.emeryville.ca.mail.comcast.net with comcast id NjGv1h0061t3BNj8ejGxAZ; Sat, 20 Aug 2011 19:16:57 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id B2DDA102C1A; Sat, 20 Aug 2011 12:17:26 -0700 (PDT) Date: Sat, 20 Aug 2011 12:17:26 -0700 From: Jeremy Chadwick To: Alex Samorukov Message-ID: <20110820191726.GA39027@icarus.home.lan> References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org> <20110819232125.GA4965@icarus.home.lan> <20110820032438.GA21925@icarus.home.lan> <4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org> <4E4FF4D6.1090305@os2.kiev.ua> <20110820183456.GA38317@icarus.home.lan> <4E50003D.30803@os2.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4E50003D.30803@os2.kiev.ua> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-stable@freebsd.org, Dan Langille Subject: Re: bad sector in gmirror HDD X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 19:17:31 -0000 On Sat, Aug 20, 2011 at 08:43:09PM +0200, Alex Samorukov wrote: > > >"The SMART tests you did didn't really amount to anything; no surprise. > >short and long tests usually do not test the surface of the disk. There > >are some drives which do it on a long test, but as I said before, > >everything varies from drive to drive." > > > It is not correct statement, sorry. Long test trying to read all the > data from surface (and doing some other things). > > // one of the smartmontools developers and sysutils/smartmontools > maintainer. That's great, but too bad it's generally not true in practise. Dan's long scan on his site proves it, and I've dealt with this situation myself many times over. SMART long tests *may* do a surface scan, but in most cases they just seem to do something that's similar to "short" but over a longer period of time. Furthermore, some which *do* do a surface scan on a "long" test don't always report LBA failures in the self-test log. I've personally seen this happen on Western Digital disks (model strings are unknown, I'm certain I've rid myself of those drives). Firmware bug/quirk? Possibly, but at the end of the day it doesn't matter -- it means the end-user has wasted 2-3 hours for something that tests OK yet we know for a fact isn't OK. I *have* seen a drive do a surface scan on a "long" test and report LBAs it couldn't read, but as I said, it's rare and varies from vendor to vendor, drive to drive, and firmware to firmware. When it happened I was very, very surprised (and delighted). The only thing I can trust 100% of the time when it comes to surface scans is SMART selective scans (if available, which again the OP's drive does not offer this), or using dd or a read-per-LBA on the OS level (which works everywhere). -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 19:17:39 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8383A1065700; Sat, 20 Aug 2011 19:17:39 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id E54DF8FC1B; Sat, 20 Aug 2011 19:17:38 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p7KJHQON072993 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 20 Aug 2011 22:17:26 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id p7KJHQxE011642; Sat, 20 Aug 2011 22:17:26 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p7KJHQoC011641; Sat, 20 Aug 2011 22:17:26 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 20 Aug 2011 22:17:26 +0300 From: Kostik Belousov To: "Alexander V. Chernikov" Message-ID: <20110820191726.GY17489@deviant.kiev.zoral.com.ua> References: <4E4143A6.6030307@digsys.bg> <935F8EC2-88E0-45A3-BE8B-7210BE223BC5@mac.com> <4e42a0c0.e2t/9MF98O3HFjb1%perryh@pluto.rain.com> <4E4CCA6C.8020408@ipfw.ru> <20110820174147.GW17489@deviant.kiev.zoral.com.ua> <4E4FFAD3.4090706@rice.edu> <4E500014.6030800@ipfw.ru> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="fXc9gqH37d6mfFz8" Content-Disposition: inline In-Reply-To: <4E500014.6030800@ipfw.ru> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: alc@freebsd.org, freebsd-stable@freebsd.org, daniel@digsys.bg, perryh@pluto.rain.com, Alan Cox Subject: Re: 32GB limit per swap device? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 19:17:39 -0000 --fXc9gqH37d6mfFz8 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Aug 20, 2011 at 10:42:28PM +0400, Alexander V. Chernikov wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 >=20 > Alan Cox wrote: > > On 08/20/2011 12:41, Kostik Belousov wrote: > >> On Sat, Aug 20, 2011 at 12:33:29PM -0500, Alan Cox wrote: > >>> On Thu, Aug 18, 2011 at 3:16 AM, Alexander V. > >>> Chernikovwrote: > >>> > >>>> On 10.08.2011 19:16, perryh@pluto.rain.com wrote: > >>>> > >>>>> Chuck Swiger wrote: > >>>>> > >>>>> On Aug 9, 2011, at 7:26 AM, Daniel Kalchev wrote: > >>>>>>> I am trying to set up 64GB partitions for swap for a system that > >>>>>>> has 64GB of RAM (with the idea to dump kernel core etc). But, on > >>>>>>> 8-stable as of today I get: > >>>>>>> > >>>>>>> WARNING: reducing size to maximum of 67108864 blocks per swap unit > >>>>>>> > >>>>>>> Is there workaround for this limitation? > >>>>>>> > >>>> Another interesting question: > >>>> > >>>> swap pager operates in page blocks (PAGE_SIZE=3D4k on common arch). > >>>> > >>>> Block device size in passed to swaponsomething() in number of _disk_ > >>>> blocks > >>>> (e.g. in DEV_BSIZE=3D512). After that, kernel b-lists (on top of > >>>> which swap > >>>> pager is build) maximum objects check is enforced. > >>>> > >>>> The (possible) problem is that real object count we will operate on > >>>> is not > >>>> the value passed to swaponsomething() since it is calculated in > >>>> wrong units. > >>>> > >>>> we should check b-list limit on (X * DEV_BSIZE512 / PAGE_SIZE) value > >>>> which > >>>> is rough (X / 8) so we should be able to address 32*8=3D256G. > >>>> > >>>> The code should look like this: > >>>> > >>>> Index: vm/swap_pager.c > >>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D**=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D**=3D=3D=3D=3D=3D=3D=3D > >>>> --- vm/swap_pager.c (revision 223877) > >>>> +++ vm/swap_pager.c (working copy) > >>>> @@ -2129,6 +2129,15 @@ swaponsomething(struct vnode *vp, void *id, > >>>> u_long > >>>> u_long mblocks; > >>>> > >>>> /* > >>>> + * nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd > >>>> chunks. > >>>> + * First chop nblks off to page-align it, then convert. > >>>> + * > >>>> + * sw->sw_nblks is in page-sized chunks now too. > >>>> + */ > >>>> + nblks&=3D ~(ctodb(1) - 1); > >>>> + nblks =3D dbtoc(nblks); > >>>> + > >>>> + /* > >>>> > >>>> * If we go beyond this, we get overflows in the radix > >>>> * tree bitmap code. > >>>> */ > >>>> @@ -2138,14 +2147,6 @@ swaponsomething(struct vnode *vp, void *id, > >>>> u_long > >>>> mblocks); > >>>> nblks =3D mblocks; > >>>> } > >>>> - /* > >>>> - * nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd > >>>> chunks. > >>>> - * First chop nblks off to page-align it, then convert. > >>>> - * > >>>> - * sw->sw_nblks is in page-sized chunks now too. > >>>> - */ > >>>> - nblks&=3D ~(ctodb(1) - 1); > >>>> - nblks =3D dbtoc(nblks); > >>>> > >>>> sp =3D malloc(sizeof *sp, M_VMPGDATA, M_WAITOK | M_ZERO); > >>>> sp->sw_vp =3D vp; > >>>> > >>>> > >>>> (move pages recalculation before b-list check) > >>>> > >>>> > >>>> Can someone comment on this? > >>>> > >>>> > >>> I believe that you are correct. Have you tried testing this change o= n a > >>> large swap device? > I will try tomorrow. >=20 > >> I probably agree too, but I am in the process of re-reading the swap > >> code, > >> and I do not quite believe in the limit. > >> > >=20 > > I'm uncertain whether the current limit, "0x40000000 / > > BLIST_META_RADIX", is exact or not, but I doubt that it is too large. >=20 > It is not exact. It is rough estimation of > sizeof(blmeta_t) * X < 4G (blist_create() assumes malloc() not being > able to allocate more that 4G. I'm not sure if it is true this days) > X is number of blocks we need to store. Actual number, however, it is X > / (1 + 1/BLIST_META_RADIX + 1/BLIST_META_RADIX^2 + ...) but it dffers > from X not very much. >=20 > blist can be seen as tree of radix trees, with metainformation for all > those radix trees allocated by single allocation which imposes this > limit. Metatinformation is used to find free blocks more quickly >=20 > Single linear allocation is required to advance to next radix tree on > the same level very fast: >=20 >=20 > * * * * * > ** ** ** ** ** > ******************** > ^^^ > Some kind of schema with 3 level in tree and BLIST_META_RADIX=3D2 (instead > of 16). >=20 >=20 >=20 > >=20 > >> When the initial code was committed, our daddr_t was 32bit, I checked > >> the RELENG_4 sources. Current code uses int64_t for daddr_t. My > >> impression > >> right now is that we only utilize the low 32bits of daddr_t. > >> > >> Esp. interesting looks the following typedef: > >> typedef uint32_t u_daddr_t; /* unsigned disk address */ > >> which (correctly) means that typical mask (u_daddr_t)-1 is 0xffffffff. > >> > >> I wonder whether we could just use full 64bit and de-facto remove the > >> limitation on the swap partition size. >=20 > This will increase struct blmeta_t twice and cause 2*X memory usage for > every swap configuration. No, daddr_t is already 64bit. Nothing will increase. My point is the current limitation is artificial. I think Alan note referred to the amount of the radix tree nodes required to cover the large swap partition. But it could be a good temporary measure. I expect to be able to provide some numeric evidence later. >=20 > >=20 > > I would rather argue first that the subr_list code should not be using > > daddr_t all. The code is abusing daddr_t and defining u_daddr_t to > > represent things that are not disk addresses. Instead, it should either > > define its own type or directly use (u)int*_t. Then, as for choosing > > between 32 and 64 bits, I'm skeptical of using this structure for > > managing more than 32 bits worth of blocks, given the amount of RAM it > > will use. > >=20 > >=20 > >=20 >=20 > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2.0.14 (FreeBSD) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ >=20 > iEYEARECAAYFAk5QABQACgkQwcJ4iSZ1q2kdXwCfWPN48wauijoGOQCUaalYnFCR > BIgAnRLCuDmPwySp1gd0xf+UPG5nC7KJ > =3DsP6M > -----END PGP SIGNATURE----- --fXc9gqH37d6mfFz8 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk5QCEYACgkQC3+MBN1Mb4g3VQCfYlGrzdJOUw3Z2pL0mAWpb9fK 6hsAoLHoHVBteVjYBCRBEfRGCbACp6HU =BGLI -----END PGP SIGNATURE----- --fXc9gqH37d6mfFz8-- From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 19:44:46 2011 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E922D106566B for ; Sat, 20 Aug 2011 19:44:46 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from mail.digiware.nl (mail.ip6.digiware.nl [IPv6:2001:4cb8:1:106::2]) by mx1.freebsd.org (Postfix) with ESMTP id 839148FC0C for ; Sat, 20 Aug 2011 19:44:46 +0000 (UTC) Received: from rack1.digiware.nl (localhost.digiware.nl [127.0.0.1]) by mail.digiware.nl (Postfix) with ESMTP id F29FC153434 for ; Sat, 20 Aug 2011 21:44:44 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.nl Received: from mail.digiware.nl ([127.0.0.1]) by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id R7qEMSDvelXP for ; Sat, 20 Aug 2011 21:44:43 +0200 (CEST) Received: from [IPv6:2001:4cb8:3:1:c02b:ce62:71ff:9cbc] (unknown [IPv6:2001:4cb8:3:1:c02b:ce62:71ff:9cbc]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.digiware.nl (Postfix) with ESMTPSA id 1BD16153433 for ; Sat, 20 Aug 2011 21:44:43 +0200 (CEST) Message-ID: <4E500EAE.10005@digiware.nl> Date: Sat, 20 Aug 2011 21:44:46 +0200 From: Willem Jan Withagen User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0) Gecko/20110812 Thunderbird/6.0 MIME-Version: 1.0 To: "stable@freebsd.org" References: <4E4F973D.9070706@digiware.nl> <4E4F99E4.8060009@digiware.nl> In-Reply-To: <4E4F99E4.8060009@digiware.nl> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: Re: Remote installing X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 19:44:47 -0000 On 20-8-2011 13:26, Willem Jan Withagen wrote: > On 2011-08-20 13:15, Willem Jan Withagen wrote: >> Hi, >> >> Today I liked to live dangerously, and want to upgrade a backups server >> from i386 to amd64. Just to see if we could. >> And otherwise I'd scap it and install from usb-stick. >> >> So I have my server running amd64 build GENERIC. >> export /, /var, /usr on the server to be upgraded. >> >> But upgrading world dus have a snag already early on: >> >> ---- >> empty changed >> flags expected "schg" found "none" not modified: Operation not supported >> ---- >> >> This is probably where some program wants to set immutable flag on >> /var/tmp/empy... >> >> But looks like NFS does not grok that. >> >> Now I seen plenty of sugestions to do it this way, but never saw anybody >> come back with this complaint.... >> >> So I must be ommiting something ?? > > I looked at the work errors. > ----------- > cd /mnt/; rm -f /mnt/sys; ln -s usr/src/sys sys > cd /mnt/usr/share/man/en.ISO8859-1; ln -sf ../man* . > ln: ./man1: Permission denied > ln: ./man1aout: Permission denied > ln: ./man2: Permission denied > ln: ./man3: Permission denied > ln: ./man4: Permission denied > ln: ./man5: Permission denied > ln: ./man6: Permission denied > ln: ./man7: Permission denied > ln: ./man8: Permission denied > ln: ./man9: Permission denied > --------- > > Which comes from the target distrib-dirs in etc > > Why would an ln -sf like that fail.... > the filesystems are exported with -maproot=0 Well turned out that the easiest fix was to run chflags -R noschg / at the client, because certain files are immutable and once you run into those, it is hard to fix it after the fact. Next would be to move /lib and /usr/lib out of the way. So that doesn't cause conflict in near future. Which will cause new programs to start to fail. So better make shure that every thing is set before you start upgrading over NFS. But I did manage to get it "upgraded" from i386 to amd64. --WjW From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 19:57:05 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 01657106564A for ; Sat, 20 Aug 2011 19:57:05 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta07.emeryville.ca.mail.comcast.net (qmta07.emeryville.ca.mail.comcast.net [76.96.30.64]) by mx1.freebsd.org (Postfix) with ESMTP id DA1778FC16 for ; Sat, 20 Aug 2011 19:57:04 +0000 (UTC) Received: from omta18.emeryville.ca.mail.comcast.net ([76.96.30.74]) by qmta07.emeryville.ca.mail.comcast.net with comcast id Njss1h0011bwxycA7jx06k; Sat, 20 Aug 2011 19:57:00 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta18.emeryville.ca.mail.comcast.net with comcast id NjwX1h0051t3BNj8ejwXUX; Sat, 20 Aug 2011 19:56:31 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id A4ACA102C1A; Sat, 20 Aug 2011 12:57:02 -0700 (PDT) Date: Sat, 20 Aug 2011 12:57:02 -0700 From: Jeremy Chadwick To: Dan Langille Message-ID: <20110820195702.GA39109@icarus.home.lan> References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org> <20110819232125.GA4965@icarus.home.lan> <20110820032438.GA21925@icarus.home.lan> <4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-stable@freebsd.org Subject: Re: bad sector in gmirror HDD X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 19:57:05 -0000 Dan, sorry for the previous mail. Seems my schedule today has just unexpected changed; I had social events to deal with but as I found out a few minutes ago those events are cancelled, which means I have time today to look at your mail. On Sat, Aug 20, 2011 at 01:34:41PM -0400, Dan Langille wrote: > On Aug 19, 2011, at 11:24 PM, Jeremy Chadwick wrote: > > The SMART error log also indicates an LBA failure at the 26000 hour mark > > (which is 16 hours prior to when you did smartctl -a /dev/ad2). Whether > > that LBA is the remapped one or the suspect one is unknown. The LBA was > > 5566440. > > > > The SMART tests you did didn't really amount to anything; no surprise. > > short and long tests usually do not test the surface of the disk. There > > are some drives which do it on a long test, but as I said before, > > everything varies from drive to drive. > > > > Furthermore, on this model of drive, you cannot do a surface scans via > > SMART. Bummer. That's indicated in the "Offline data collection > > capabilities" section at the top, where it reads: > > > > No Selective Self-test supported. > > > > So you'll have to use the dd method. This takes longer than if surface > > scanning was supported by the drive, but is acceptable. I'll get to how > > to go about that in a moment. > > FWIW, I've done a dd read of the entire suspect disk already. Just two errors. Actually one error -- keep reading. > From the URL mentioned above: > > [root@bast:~] # dd of=/dev/null if=/dev/ad2 bs=1m conv=noerror > dd: /dev/ad2: Input/output error > 2717+0 records in > 2717+0 records out > 2848980992 bytes transferred in 127.128503 secs (22410246 bytes/sec) > dd: /dev/ad2: Input/output error > 38170+1 records in > 38170+1 records out > 40025063424 bytes transferred in 1544.671423 secs (25911701 bytes/sec) > [root@bast:~] # > > That seems to indicate two problems. Are those the values I should be using > with dd? The "values" you refer to are byte offsets, not LBAs. Furthermore, you used a block size of 1 megabyte (not sure why people keep doing this). LBA size on your drive is 512 bytes; asking for 1 megabyte in dd is going to make the drive try to read() 1MByte, and an I/O error could happen anywhere within that 1MByte range. (1024*1024) / 512 == 2048 LBAs make up 1MByte. Next, remember that the "noerror" attribute has some quirks associated with it that need to be kept in mind. The man page discusses these. Finally, I believe the last I/O error you see (at byte 40025063424) is normal given what you told dd to do. It was trying to use bs=1m, and your drive has a capacity limit of 40027029504 bytes. I'm left to believe you had a "short read" (less than 1MByte), so this is normal. 40027029504 / (1024*1024) == 38172.75, which is not a round number, hence the error. > I did some more precise testing: > > # time dd of=/dev/null if=/dev/ad2 bs=512 iseek=5566440 > dd: /dev/ad2: Input/output error > 9+0 records in > 9+0 records out > 4608 bytes transferred in 5.368668 secs (858 bytes/sec) > > real 0m5.429s > user 0m0.000s > sys 0m0.010s > > NOTE: that's 9 blocks later than mentioned in smarctl > > The above generated this in /var/log/messages: > > Aug 20 17:29:25 bast kernel: ad2: FAILURE - READ_DMA status=51 error=40 LBA=5566449 Your dd command above is saying "use a block size of 512 bytes, and read indefinitely from /dev/ad2, starting with an lseek() on /dev/ad2 of 5566440". You then get an I/O error "somewhere" from where you start to when the device ends. You're assuming that the "number of bytes transferred" indicates where the actual error happened, which in my experience is not always true. What really needs to happen here is use of count=1, and you adjusting iseek manually per each LBA. Or you could use the script I wrote and let the computer do it for you. :-) I understand what you're getting at, re: "that's 9 blocks later". But the OS does some caching of I/O and so on sometimes, or aggregates block reads larger than physical LBA size, so that may be what's going on here. However, if you keep reading, you might find your answer is that you may (still unsure) have other LBAs which are now marked suspect. > > That said: > > > > http://jdc.parodius.com/freebsd/bad_block_scan > > > > If you run this on your ad2 drive, I'm hoping what you'll find are two > > LBAs which can't be read -- one will be the remapped LBA and one will be > > the "suspect" LBA. If you only get one LBA error then that's fine too, > > and will be the "suspect" LBA. > > > Once you have the LBA(s), you can submit writes to them to get the drive > > to re-analyse them (assuming they're "suspect"): > > > > dd if=/dev/zero of=/dev/XXX bs=512 count=1 seek=NNNNN > > > > Where XXX is the device and NNNNN is the LBA number. > > > > If this works properly, the dd command should sit there for a little bit > > (as the drive does its re-analysis magic) and then should complete. > > ad2 is part of a gmirror with ad0. Does this change things? > > I haven't tried the dd yet. It does not change things, but I don't know what's going to happen if you do write commands to the device directly while the drive is still attached in gmirror. When I encounter a disk that's behaving like this, I immediately remove it from the pool/mirror so I can work on it. I do not trust the OS to do things like not panic/crash/behave weirdly when doing these things. > > You'll want to check SMART stats after that; you should see > > Current_Pending_Sector drop to 0. If Offline_Uncorrectable incremented > > then the LBA could not be re-read/remapped. > > It did increment: > > 197 Current_Pending_Sector 0x0032 100 100 020 Old_age Always - 2 > > [was 1] What this means is that you have *another* LBA the drive found and marked suspect. This could have happened any time; possibly during the above dd you did, possibly during normal read operation (assuming the drive is still handling I/O as part of your mirror). > > If Reallocated_Sector_Ct > > incremented then you now have a total of 2 LBAs which are remapped. > > It did increment: > > $ diff smarctl.1 smarctl.3 | grep Reallocated_Sector_Ct > < 5 Reallocated_Sector_Ct 0x0033 100 100 020 Pre-fail Always - 1 > > 5 Reallocated_Sector_Ct 0x0033 100 100 020 Pre-fail Always - 2 > > Full output of smartctl has been appended to http://beta.freebsddiary.org/smart-fixing-bad-sector.php But you didn't issue any writes to the drive (quote: "I haven't tried the dd yet"), so I cannot explain why this attribute would increment. Unless you *did* try the dd? I don't know; there's not enough information here for me to ascertain what may have happened between this paragraph and a couple paragraphs up. To me, this looks like a write to the drive was issued either manually (with the dd or if the drive is still in use for I/O by gmirror) and happened to hit an LBA which was previously marked suspect -- and induced a remap. Alternately -- and this is just as plausible as what I just described -- the drive may have a firmware quirk/bug/behavioural different from what I'm used to, where Current_Pending_Sector acts as a counter (e.g. it will never reset to zero). Maxtor "should" be using Reallocated_Event_Count for this (since that's what it's for; it indicates failures OR successes), but as I've said time and time again, the behaviour varies from drive to drive, model to model, and firmware to firmware. Also alternatively, there's the whole "smartctl -t offline" ordeal which might update the attribute data, but it's labelled Old_age not Offline, so I don't think this would be the case (unless there's a bug in the firmware or mislabeling of the attribute in the firmware for this drive). The thing about bad LBAs is that they often come in groups/bunches; dust on the drive, some region loses its magnetic integrity, etc... Your drive is ""old"" (27416 hours = 1142 days = 3.1 years) so it's understandable IMO. The only way to know for sure would be to do a surface scan on the drive and see if any more I/O errors show up. If they do, I would recommend just writing zeros from LBA 0 all the way to the end of the drive, then afterward see what the SMART attributes look like. "dd if=/dev/zero of=/dev/ad2 bs=64k" would do the trick (in this case 'bs' doesn't matter since all you're trying to do is zero the drive; doesn't matter if writes get aggregated or not). > > In > > the case of remapping, you get to deal with the UFS/FFS thing above. > > To get the stats to update in this situation you *might* (but probably > > not) have to run "smartctl -t offline /dev/XXX". > > I didn't try that... > > > You might also be wondering "that dd command writes 512 bytes of zero to > > that LBA; what about the old data that was there, in the case that the > > drive remaps the LBA?" This is a great question, and one I've never > > actually taken the time to answer because at this present time I have > > absolutely *no* bad disks in my possession. I'm under the impression > > that the write does in fact write zeros if the LBA is remapped, but that > > might not be true at all. I've been waiting to test this for quite some > > time and document it/write about it. > > > > I still suggest you replace the drive, although given its age I doubt > > you'll be able to find a suitable replacement. I tend to keep disks > > like this around for testing/experimental purposes and not for actual > > use. > > I have several unused 80GB HDD I can place into this system. I think that's > what I'll wind up doing. But I'd like to follow this process through and get it documented > for future reference. Yes, given the behaviour of the drive I would recommend you simply replace it at this point in time. What concerns me the most is Current_Pending_Sector incrementing, but it's impossible for me to determine if that incrementing means there are other LBAs which are bad, or if the drive is behaving how its firmware is designed. Keep the drive around for further experiments/tinkering if you're interested. Stuff like this is always interesting/fun as long as your data isn't at risk, so doing the replacement first would be best (especially if both drives in your mirror were bought at the same time from the same place and have similar manufacturing plants/dates on them). -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 20:00:19 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B79E2106566B; Sat, 20 Aug 2011 20:00:17 +0000 (UTC) (envelope-from prvs=12137168ef=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id C2F038FC0C; Sat, 20 Aug 2011 20:00:16 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Sat, 20 Aug 2011 20:59:42 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Sat, 20 Aug 2011 20:59:41 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014676027.msg; Sat, 20 Aug 2011 20:59:41 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=12137168ef=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: From: "Steven Hartland" To: "Andriy Gapon" References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E43E272.1060204@FreeBSD.org><62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk><4E440865.1040500@FreeBSD.org><6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk><4E441314.6060606@FreeBSD.org><2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk><4E48D967.9060804@FreeBSD.org><9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk><4E490DAF.1080009@FreeBSD.org><796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk><4E491D01.1090902@FreeBSD.org><570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk><4E4AD35C.7020504@FreeBSD.org><6A7238AED44542A880B082A40304D940@multiplay.co.uk><4E4BA21F.6010805@FreeBSD.org><581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk><4E4BBA7F.30907@FreeBSD.org><88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk><4E4C22D6.6070407@FreeBSD.org><4019027648B5493AAC4B654BD821DE88@multiplay.co.uk><4E4F8631.1070300@FreeBSD.org> <4E4F8821.80108@Fre eBSD.org> <82E865FBA30747078AF6EE3C1701F973@multiplay.co.uk> <4E4FE55A.9000101@ FreeBSD.org> Date: Sat, 20 Aug 2011 21:01:00 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 20:00:20 -0000 ----- Original Message ----- From: "Andriy Gapon" > thanks for doing this! I'll reiterate my suspicion just in case - I think that > you should look for the cases where you stop a jail, but then re-attach and > resurrect the jail before it's completely dead. Yer that's where I think its happening too, but I also suspect its not just dieing jail that's needed, I think its a dieing jail in the final stages of cleanup. Looking through the code I believe I may have noticed a scenario which could trigger the problem. Given the following code:- static void prison_deref(struct prison *pr, int flags) { struct prison *ppr, *tpr; int vfslocked; if (!(flags & PD_LOCKED)) mtx_lock(&pr->pr_mtx); /* Decrement the user references in a separate loop. */ if (flags & PD_DEUREF) { for (tpr = pr;; tpr = tpr->pr_parent) { if (tpr != pr) mtx_lock(&tpr->pr_mtx); if (--tpr->pr_uref > 0) break; KASSERT(tpr != &prison0, ("prison0 pr_uref=0")); mtx_unlock(&tpr->pr_mtx); } /* Done if there were only user references to remove. */ if (!(flags & PD_DEREF)) { mtx_unlock(&tpr->pr_mtx); if (flags & PD_LIST_SLOCKED) sx_sunlock(&allprison_lock); else if (flags & PD_LIST_XLOCKED) sx_xunlock(&allprison_lock); return; } if (tpr != pr) { mtx_unlock(&tpr->pr_mtx); mtx_lock(&pr->pr_mtx); } } If you take a scenario of a simple one level prison setup running a single process where a prison has just been stopped. In the above code pr_uref of the processes prison is decremented. As this is the last process then pr_uref will hit 0 and the loop continues instead of breaking early. Now at the end of the loop iteration the mtx is unlocked so other process can now manipulate the jail, this is where I think the problem may be. If we now have another process come in and attach to the jail but then instantly exit, this process may allow another kernel thread to hit this same bit of code and so two process for the same prison get into the section which decrements prison0's pr_uref, instead of only one. In essence I think we can get the following flow where 1# = process1 and 2# = process2 1#1. prison1.pr_uref = 1 (single process jail) 1#2. prison_deref( prison1,... 1#3. prison1.pr_uref-- (prison1.pr_uref = 0) 1#3. prison1.mtx_unlock <-- this now allows others to change prison1.pr_uref 1#3. prison0.pr_uref-- 2#1. process1.attach( prison1 ) (prison1.pr_uref = 1) 2#2. process1.exit 2#3. prison_deref( prison1,... 2#4. prison1.pr_uref-- (prison1.pr_uref = 0) 2#5. prison1.mtx_unlock <-- this now allows others to change prison1.pr_uref 2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented twice by prison1) It seems like the action on the parent prison to decrement the pr_uref is happening too early, while the jail can still be used and without the lock on the child jails mtx, so causing a race condition. I think the fix is to the move the decrement of parent prison pr_uref's down so it only takes place if the jail is "really" being removed. Either that or to change the locking semantics so that once the lock is aquired in this prison_deref its not unlocked until the function completes. What do people think? Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 20:07:47 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DDABF1065674 for ; Sat, 20 Aug 2011 20:07:47 +0000 (UTC) (envelope-from dan@langille.org) Received: from nyi.unixathome.org (nyi.unixathome.org [64.147.113.42]) by mx1.freebsd.org (Postfix) with ESMTP id AC1428FC15 for ; Sat, 20 Aug 2011 20:07:47 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by nyi.unixathome.org (Postfix) with ESMTP id E34C950A09; Sat, 20 Aug 2011 20:07:46 +0000 (UTC) X-Virus-Scanned: amavisd-new at unixathome.org Received: from nyi.unixathome.org ([127.0.0.1]) by localhost (nyi.unixathome.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1Xf1KD8QjThm; Sat, 20 Aug 2011 21:07:46 +0100 (BST) Received: from smtp-auth.unixathome.org (smtp-auth.unixathome.org [10.4.7.7]) (Authenticated sender: hidden) by nyi.unixathome.org (Postfix) with ESMTPSA id 781F1509F3 ; Sat, 20 Aug 2011 20:07:46 +0000 (UTC) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Dan Langille In-Reply-To: <20110820195702.GA39109@icarus.home.lan> Date: Sat, 20 Aug 2011 16:07:44 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: <09FB1664-127D-4835-88C4-BF5CD3A320C1@langille.org> References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org> <20110819232125.GA4965@icarus.home.lan> <20110820032438.GA21925@icarus.home.lan> <4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org> <20110820195702.GA39109@icarus.home.lan> To: Jeremy Chadwick X-Mailer: Apple Mail (2.1084) Cc: freebsd-stable@freebsd.org Subject: Re: bad sector in gmirror HDD X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 20:07:47 -0000 On Aug 20, 2011, at 3:57 PM, Jeremy Chadwick wrote: >>> I still suggest you replace the drive, although given its age I = doubt >>> you'll be able to find a suitable replacement. I tend to keep disks >>> like this around for testing/experimental purposes and not for = actual >>> use. >>=20 >> I have several unused 80GB HDD I can place into this system. I think = that's >> what I'll wind up doing. But I'd like to follow this process through = and get it documented >> for future reference. >=20 > Yes, given the behaviour of the drive I would recommend you simply > replace it at this point in time. What concerns me the most is > Current_Pending_Sector incrementing, but it's impossible for me to > determine if that incrementing means there are other LBAs which are = bad, > or if the drive is behaving how its firmware is designed. >=20 > Keep the drive around for further experiments/tinkering if you're > interested. Stuff like this is always interesting/fun as long as your > data isn't at risk, so doing the replacement first would be best > (especially if both drives in your mirror were bought at the same time > from the same place and have similar manufacturing plants/dates on > them). I'm happy to send you this drive for your experimentation pleasure. If so, please email me an address offline. You don't have a disk with=20= errors, and it seems you should have one. After I wipe it. I'm sure I have a destroyer CD here somewhere.... --=20 Dan Langille - http://langille.org From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 20:19:19 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E1D2E106566B for ; Sat, 20 Aug 2011 20:19:19 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta03.westchester.pa.mail.comcast.net (qmta03.westchester.pa.mail.comcast.net [76.96.62.32]) by mx1.freebsd.org (Postfix) with ESMTP id 8D1578FC13 for ; Sat, 20 Aug 2011 20:19:19 +0000 (UTC) Received: from omta23.westchester.pa.mail.comcast.net ([76.96.62.74]) by qmta03.westchester.pa.mail.comcast.net with comcast id NkCF1h0061c6gX853kKKcB; Sat, 20 Aug 2011 20:19:19 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta23.westchester.pa.mail.comcast.net with comcast id NkKE1h0101t3BNj3jkKGCM; Sat, 20 Aug 2011 20:19:17 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 829EA102C1A; Sat, 20 Aug 2011 13:19:13 -0700 (PDT) Date: Sat, 20 Aug 2011 13:19:13 -0700 From: Jeremy Chadwick To: Dan Langille Message-ID: <20110820201913.GA39827@icarus.home.lan> References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org> <20110819232125.GA4965@icarus.home.lan> <20110820032438.GA21925@icarus.home.lan> <4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org> <20110820195702.GA39109@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110820195702.GA39109@icarus.home.lan> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-stable@freebsd.org Subject: Re: bad sector in gmirror HDD X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 20:19:20 -0000 A follow-up given that I just viewed the SMART attribute data at the very bottom of this page as of this writing (Sat Aug 20 13:00:09 PDT 2011): http://beta.freebsddiary.org/smart-fixing-bad-sector.php And I see this: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0033 100 100 020 Pre-fail Always - 2 9 Power_On_Hours 0x0012 059 059 001 Old_age Always - 27440 196 Reallocated_Event_Count 0x0010 099 099 020 Old_age Offline - 1 197 Current_Pending_Sector 0x0032 100 100 020 Old_age Always - 2 198 Offline_Uncorrectable 0x0010 100 253 000 Old_age Offline - 0 These attributes USUALLY mean: 1) Reallocated_Sector_Ct == There are 2 remapped LBAs. 2) Reallocated_Event_Count == There is 1 remapping event which has been noticed (either failure or success). 3) Current_Pending_Sector == There are 2 LBAs which are suspect. Now, given my previous statement about this particular model of drive, Maxtor may have a firmware quirk or other oddities that don't cause Current_Pending_Sector to drop to 0 or Reallocated_Event_Count to match reality. I simply don't know. But keep reading. And remember, this is what we started with: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0033 100 100 020 Pre-fail Always - 1 9 Power_On_Hours 0x0012 059 059 001 Old_age Always - 27416 196 Reallocated_Event_Count 0x0010 100 100 020 Old_age Offline - 0 197 Current_Pending_Sector 0x0032 100 100 020 Old_age Always - 1 198 Offline_Uncorrectable 0x0010 100 253 000 Old_age Offline - 0 Anyway, in the SMART error log, I see 3 entries (2 new ones since the last time I saw the web page): * Error 3 occurred at disk power-on lifetime: 27422 hours (1142 days + 14 hours) 40 59 18 e8 ef 54 e0 Error: UNC 24 sectors at LBA = 0x0054efe8 = 5566440 * Error 2 occurred at disk power-on lifetime: 27421 hours (1142 days + 13 hours) 40 59 18 e8 ef 54 e0 Error: UNC 24 sectors at LBA = 0x0054efe8 = 5566440 * Error 1 occurred at disk power-on lifetime: 27400 hours (1141 days + 16 hours) 40 59 18 e8 ef 54 e0 Error: UNC 24 sectors at LBA = 0x0054efe8 = 5566440 These are all for the same LBA -- 5566440. "Error 1" was something we already saw on the page the first time. So where did the other two come from? Earlier on the web page I saw these commands being executed: sh ./bad_block_scan /dev/ad2 5566400 5566500 <-- will hit bad LBA sh ./bad_block_scan /dev/ad2 5566000 5566500 <-- will hit bad LBA sh ./bad_block_scan /dev/ad2 5560000 5566000 <-- will not hit bad LBA sh ./bad_block_scan /dev/ad2 5560000 5566000 <-- will not hit bad LBA So there's the explanation for the two newly-added entries in the SMART error log. I'm very surprised if bad_block_scan did not echo that it had encountered read errors on LBA 5566440. It should have, unless I left the script in some weird state. The commands to use to verify would be: dd if=/dev/ad2 of=/dev/null bs=512 count=1 skip=5566439 dd if=/dev/ad2 of=/dev/null bs=512 count=1 skip=5566440 dd if=/dev/ad2 of=/dev/null bs=512 count=1 skip=5566441 (I tend to check "around" that LBA area as well, just to make sure, that's why there's 3 commands with -1 and +1 LBAs). One of these should return an I/O error, unless the LBA has been remapped already, in which case it shouldn't. Finally, there's this very interesting piece of information in the SMART self-test log (not selective scan log, but the self-test log; meaning this was the result of "smartctl -t long /dev/ad2" at some point): Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 27416 786767 So it seems this is one of those drives which does do a surface scan on a long test. But that's interesting -- LBA 786767. If that's true, then issuing the same dd commands as above (but with "skip" changed appropriately) should return an I/O error as well. Naturally check the SMART error log for verification. So, it's possible that there are actually two bad LBAs on this drive -- LBA 5566440 and LBA 786767. I simply don't know about the latter, but the former is confirmed in the SMART error log. If either of these LBAs are the ones which Current_Pending_Sector is referring to, then writes to them should be sufficient to induce re-analysis. E.g.: dd if=/dev/zero of=/dev/ad2 bs=512 count=1 seek=5566440 dd if=/dev/zero of=/dev/ad2 bs=512 count=1 seek=786767 The offsets for seek (not skip!!!) should probably be based on what the dd reads done earlier would show. Unless of course what we're seeing is just a batch of LBAs in a small region that are getting worse the more they're read from (possible). No idea if LBA 5566440 and LBA 786767 are anywhere near one another on the physical media. I don't have a way to determine that (way too complex). That's about all the light I can shed on this for now. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 20:23:41 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DC564106567A; Sat, 20 Aug 2011 20:23:41 +0000 (UTC) (envelope-from prvs=12137168ef=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 076718FC2A; Sat, 20 Aug 2011 20:23:39 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Sat, 20 Aug 2011 21:23:06 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Sat, 20 Aug 2011 21:23:06 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014676211.msg; Sat, 20 Aug 2011 21:23:05 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=12137168ef=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: From: "Steven Hartland" To: "Andriy Gapon" References: eBSD.org><82E865FBA30747078AF6EE3C1701F973@multiplay.co.uk><4E4FE55A.9000101@ FreeBSD.org> Date: Sat, 20 Aug 2011 21:24:30 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 20:23:42 -0000 ----- Original Message ----- From: "Steven Hartland" > Looking through the code I believe I may have noticed a scenario which could > trigger the problem. > > Given the following code:- > > static void > prison_deref(struct prison *pr, int flags) > { > struct prison *ppr, *tpr; > int vfslocked; > > if (!(flags & PD_LOCKED)) > mtx_lock(&pr->pr_mtx); > /* Decrement the user references in a separate loop. */ > if (flags & PD_DEUREF) { > for (tpr = pr;; tpr = tpr->pr_parent) { > if (tpr != pr) > mtx_lock(&tpr->pr_mtx); > if (--tpr->pr_uref > 0) > break; > KASSERT(tpr != &prison0, ("prison0 pr_uref=0")); > mtx_unlock(&tpr->pr_mtx); > } > /* Done if there were only user references to remove. */ > if (!(flags & PD_DEREF)) { > mtx_unlock(&tpr->pr_mtx); > if (flags & PD_LIST_SLOCKED) > sx_sunlock(&allprison_lock); > else if (flags & PD_LIST_XLOCKED) > sx_xunlock(&allprison_lock); > return; > } > if (tpr != pr) { > mtx_unlock(&tpr->pr_mtx); > mtx_lock(&pr->pr_mtx); > } > } > > If you take a scenario of a simple one level prison setup running a single process > where a prison has just been stopped. > > In the above code pr_uref of the processes prison is decremented. As this is the > last process then pr_uref will hit 0 and the loop continues instead of breaking > early. > > Now at the end of the loop iteration the mtx is unlocked so other process can > now manipulate the jail, this is where I think the problem may be. > > If we now have another process come in and attach to the jail but then instantly > exit, this process may allow another kernel thread to hit this same bit of code > and so two process for the same prison get into the section which decrements > prison0's pr_uref, instead of only one. > > In essence I think we can get the following flow where 1# = process1 > and 2# = process2 > 1#1. prison1.pr_uref = 1 (single process jail) > 1#2. prison_deref( prison1,... > 1#3. prison1.pr_uref-- (prison1.pr_uref = 0) > 1#3. prison1.mtx_unlock <-- this now allows others to change prison1.pr_uref > 1#3. prison0.pr_uref-- > 2#1. process1.attach( prison1 ) (prison1.pr_uref = 1) > 2#2. process1.exit > 2#3. prison_deref( prison1,... > 2#4. prison1.pr_uref-- (prison1.pr_uref = 0) > 2#5. prison1.mtx_unlock <-- this now allows others to change prison1.pr_uref > 2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented twice by prison1) > > It seems like the action on the parent prison to decrement the pr_uref is > happening too early, while the jail can still be used and without the lock on > the child jails mtx, so causing a race condition. > > I think the fix is to the move the decrement of parent prison pr_uref's down > so it only takes place if the jail is "really" being removed. Either that or > to change the locking semantics so that once the lock is aquired in this > prison_deref its not unlocked until the function completes. > > What do people think? After reviewing the changes to prison_deref in commit which added hierarchical jails, the removal of the lock by the inital loop on the passed in prison may be unintentional. http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/kern_jail.c.diff?r1=1.101;r2=1.102;f=h If so the following may be all that's needed to fix this issue:- diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c --- sys/kern/kern_jail.c.orig 2011-08-20 21:17:14.856618854 +0100 +++ sys/kern/kern_jail.c 2011-08-20 21:18:35.307201425 +0100 @@ -2455,7 +2455,8 @@ if (--tpr->pr_uref > 0) break; KASSERT(tpr != &prison0, ("prison0 pr_uref=0")); - mtx_unlock(&tpr->pr_mtx); + if (tpr != pr) + mtx_unlock(&tpr->pr_mtx); } /* Done if there were only user references to remove. */ if (!(flags & PD_DEREF)) { Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 20:34:56 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 42F871065672; Sat, 20 Aug 2011 20:34:56 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 6FC888FC0C; Sat, 20 Aug 2011 20:34:55 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA14332; Sat, 20 Aug 2011 23:34:52 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1QusFo-000O9N-7B; Sat, 20 Aug 2011 23:34:52 +0300 Message-ID: <4E501A6A.3030801@FreeBSD.org> Date: Sat, 20 Aug 2011 23:34:50 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:6.0) Gecko/20110819 Thunderbird/6.0 MIME-Version: 1.0 To: Steven Hartland References: eBSD.org><82E865FBA30747078AF6EE3C1701F973@multiplay.co.uk><4E4FE55A.9000101@ FreeBSD.org> In-Reply-To: X-Enigmail-Version: undefined Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 20:34:56 -0000 on 20/08/2011 23:24 Steven Hartland said the following: > ----- Original Message ----- From: "Steven Hartland" >> Looking through the code I believe I may have noticed a scenario which could >> trigger the problem. >> >> Given the following code:- >> >> static void >> prison_deref(struct prison *pr, int flags) >> { >> struct prison *ppr, *tpr; >> int vfslocked; >> >> if (!(flags & PD_LOCKED)) >> mtx_lock(&pr->pr_mtx); >> /* Decrement the user references in a separate loop. */ >> if (flags & PD_DEUREF) { >> for (tpr = pr;; tpr = tpr->pr_parent) { >> if (tpr != pr) >> mtx_lock(&tpr->pr_mtx); >> if (--tpr->pr_uref > 0) >> break; >> KASSERT(tpr != &prison0, ("prison0 pr_uref=0")); >> mtx_unlock(&tpr->pr_mtx); >> } >> /* Done if there were only user references to remove. */ >> if (!(flags & PD_DEREF)) { >> mtx_unlock(&tpr->pr_mtx); >> if (flags & PD_LIST_SLOCKED) >> sx_sunlock(&allprison_lock); >> else if (flags & PD_LIST_XLOCKED) >> sx_xunlock(&allprison_lock); >> return; >> } >> if (tpr != pr) { >> mtx_unlock(&tpr->pr_mtx); >> mtx_lock(&pr->pr_mtx); >> } >> } >> >> If you take a scenario of a simple one level prison setup running a single >> process >> where a prison has just been stopped. >> >> In the above code pr_uref of the processes prison is decremented. As this is the >> last process then pr_uref will hit 0 and the loop continues instead of breaking >> early. >> >> Now at the end of the loop iteration the mtx is unlocked so other process can >> now manipulate the jail, this is where I think the problem may be. >> >> If we now have another process come in and attach to the jail but then instantly >> exit, this process may allow another kernel thread to hit this same bit of code >> and so two process for the same prison get into the section which decrements >> prison0's pr_uref, instead of only one. >> >> In essence I think we can get the following flow where 1# = process1 >> and 2# = process2 >> 1#1. prison1.pr_uref = 1 (single process jail) >> 1#2. prison_deref( prison1,... >> 1#3. prison1.pr_uref-- (prison1.pr_uref = 0) >> 1#3. prison1.mtx_unlock <-- this now allows others to change prison1.pr_uref >> 1#3. prison0.pr_uref-- >> 2#1. process1.attach( prison1 ) (prison1.pr_uref = 1) >> 2#2. process1.exit >> 2#3. prison_deref( prison1,... >> 2#4. prison1.pr_uref-- (prison1.pr_uref = 0) >> 2#5. prison1.mtx_unlock <-- this now allows others to change prison1.pr_uref >> 2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented twice by prison1) >> >> It seems like the action on the parent prison to decrement the pr_uref is >> happening too early, while the jail can still be used and without the lock on >> the child jails mtx, so causing a race condition. >> >> I think the fix is to the move the decrement of parent prison pr_uref's down >> so it only takes place if the jail is "really" being removed. Either that or >> to change the locking semantics so that once the lock is aquired in this >> prison_deref its not unlocked until the function completes. >> >> What do people think? > > After reviewing the changes to prison_deref in commit which added hierarchical > jails, the removal of the lock by the inital loop on the passed in prison may > be unintentional. > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/kern_jail.c.diff?r1=1.101;r2=1.102;f=h > > > If so the following may be all that's needed to fix this issue:- > > diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c > --- sys/kern/kern_jail.c.orig 2011-08-20 21:17:14.856618854 +0100 > +++ sys/kern/kern_jail.c 2011-08-20 21:18:35.307201425 +0100 > @@ -2455,7 +2455,8 @@ > if (--tpr->pr_uref > 0) > break; > KASSERT(tpr != &prison0, ("prison0 pr_uref=0")); > - mtx_unlock(&tpr->pr_mtx); > + if (tpr != pr) > + mtx_unlock(&tpr->pr_mtx); > } > /* Done if there were only user references to remove. */ > if (!(flags & PD_DEREF)) { Not sure if this would fly as is - please double check the later block where pr->pr_mtx is re-locked. -- Andriy Gapon From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 20:49:58 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7B68F106566C; Sat, 20 Aug 2011 20:49:58 +0000 (UTC) (envelope-from prvs=12137168ef=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 9306A8FC15; Sat, 20 Aug 2011 20:49:57 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Sat, 20 Aug 2011 21:49:23 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Sat, 20 Aug 2011 21:49:23 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014676446.msg; Sat, 20 Aug 2011 21:49:23 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=12137168ef=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <7585E1DAE11E47488CD5A7F038957F4D@multiplay.co.uk> From: "Steven Hartland" To: "Andriy Gapon" References: eBSD.org><82E865FBA30747078AF6EE3C1701F973@multiplay.co.uk><4E4FE55A.9000101@FreeBSD.org> <4E501A6A.3030801@FreeBSD.org> Date: Sat, 20 Aug 2011 21:50:51 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 20:49:58 -0000 ----- Original Message ----- From: "Andriy Gapon" >> diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c >> --- sys/kern/kern_jail.c.orig 2011-08-20 21:17:14.856618854 +0100 >> +++ sys/kern/kern_jail.c 2011-08-20 21:18:35.307201425 +0100 >> @@ -2455,7 +2455,8 @@ >> if (--tpr->pr_uref > 0) >> break; >> KASSERT(tpr != &prison0, ("prison0 pr_uref=0")); >> - mtx_unlock(&tpr->pr_mtx); >> + if (tpr != pr) >> + mtx_unlock(&tpr->pr_mtx); >> } >> /* Done if there were only user references to remove. */ >> if (!(flags & PD_DEREF)) { > > Not sure if this would fly as is - please double check the later block where > pr->pr_mtx is re-locked. Will do, I'm now 99.9% sure this is the problem and even better I now have a reproducible scenario :) Something else you many be more interested in Andriy:- I added in debugging options DDB & INVARIANTS to see if I can get a more useful info and the panic results in a looping panic constantly scrolling up the console. Not sure if this is a side effect of the patches we've been trying. Going to see if I can confirm that, lmk if there's something you want me to try? Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 21:38:56 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F0C5A106564A; Sat, 20 Aug 2011 21:38:56 +0000 (UTC) (envelope-from prvs=12137168ef=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 1A5868FC0A; Sat, 20 Aug 2011 21:38:55 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Sat, 20 Aug 2011 22:38:21 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Sat, 20 Aug 2011 22:38:21 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014676893.msg; Sat, 20 Aug 2011 22:38:20 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=12137168ef=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <75D250E28B9A424EAF387E07CA223213@multiplay.co.uk> From: "Steven Hartland" To: "Steven Hartland" , "Andriy Gapon" References: eBSD.org><82E865FBA30747078AF6EE3C1701F973@multiplay.co.uk><4E4FE55A.9000101@FreeBSD.org><4E501A6A.3030801@FreeBSD.org> <7585E1DAE11E47488CD5A7F038957F4D@multiplay.co.uk> Date: Sat, 20 Aug 2011 22:38:49 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 21:38:57 -0000 ----- Original Message ----- From: "Steven Hartland" > Something else you many be more interested in Andriy:- > I added in debugging options DDB & INVARIANTS to see if I can get a more > useful info and the panic results in a looping panic constantly scrolling up > the console. Not sure if this is a side effect of the patches we've been > trying. > > Going to see if I can confirm that, lmk if there's something you want me > to try? Seems the stop_scheduler_on_panic.8.x.patch is the cause of this. Removing it allows me to drop to ddb when the panic due to the KASSERT happens. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk.