From owner-freebsd-stable@FreeBSD.ORG  Sun Aug 14 14:53:54 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 458D7106566C;
	Sun, 14 Aug 2011 14:53:54 +0000 (UTC)
	(envelope-from prvs=120731b379=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	by mx1.freebsd.org (Postfix) with ESMTP id 962ED8FC13;
	Sun, 14 Aug 2011 14:53:53 +0000 (UTC)
X-MDAV-Processed: mail1.multiplay.co.uk, Sun, 14 Aug 2011 15:42:54 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Sun, 14 Aug 2011 15:42:54 +0100
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	mail1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST
	shortcircuit=ham autolearn=disabled version=3.2.5
Received: from r2d2 ([188.220.16.49])
	by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	(MDaemon PRO v10.0.4) with ESMTP id md50014595600.msg;
	Sun, 14 Aug 2011 15:42:53 +0100
X-MDRemoteIP: 188.220.16.49
X-Return-Path: prvs=120731b379=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
Message-ID: <2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Andriy Gapon" <avg@FreeBSD.org>
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><A71C3ACF01EC4D36871E49805C1A5321@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk>
	<4E43E272.1060204@FreeBSD.org>
	<62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk>
	<4E440865.1040500@FreeBSD.org>
	<6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk>
	<4E441314.6060606@FreeBSD.org>
Date: Sun, 14 Aug 2011 15:43:26 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
Cc: freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Aug 2011 14:53:54 -0000

----- Original Message ----- 
From: "Andriy Gapon" <avg@FreeBSD.org>
> 
> Maybe test it on couple of machines first just in case I overlooked something
> essential, although I have a report from another use that the patch didn't break
> anything for him (it was tested for an unrelated issue).

We've got this running on a ~40 machines and just had the first panic
since the update. Unfortunately it doesn't seem to have changed anything :(

We have 352 thread entries starting with:-
#0  sched_switch (td=0xffffffff8083e4e0, newtd=0xffffff0012d838c0, flags=Variable "flags" is not available.
23 with:-
cpustop_handler () at atomic.h:285
and 16 with:-
#0  fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:562

The main message being:-
panic: double fault

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
<118>Aug 14 15:13:33 amsbld15 syslogd: exiting on signal 15

Fatal double fault
rip = 0xffffffff8053b691
rsp = 0xffffff8d8f356fb0
rbp = 0xffffff8d8f357210
cpuid = 2; apic id = 02
panic: double fault
cpuid = 2
KDB: stack backtrace:
#0 0xffffffff803bb75e at kdb_backtrace+0x5e
#1 0xffffffff8038956e at panic+0x2ae
#2 0xffffffff805802b6 at dblfault_handler+0x96
#3 0xffffffff8056900d at Xdblfault+0xad
stack: 0xffffff8d8f357000, 4
rsp = 0xffffff800009ae10
Uptime: 2d21h6m18s
Physical memory: 49132 MB
Dumping 17080 MB: 17065...
Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /boot/kernel/zfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/opensolaris.ko
Reading symbols from /boot/kernel/linprocfs.ko...Reading symbols from /boot/kernel/linprocfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/linprocfs.ko
Reading symbols from /boot/kernel/nullfs.ko...Reading symbols from /boot/kernel/nullfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/nullfs.ko
#0  sched_switch (td=0xffffffff8083e4e0, newtd=0xffffff0012d838c0, flags=Variable "flags" is not available.)
    at /usr/src/sys/kern/sched_ule.c:1858
1858            cpuid = PCPU_GET(cpuid);
(kgdb) #0  sched_switch (td=0xffffffff8083e4e0, newtd=0xffffff0012d838c0, flags=Variable "flags" is not available.)
    at /usr/src/sys/kern/sched_ule.c:1858
#1  0xffffffff80391a99 in mi_switch (flags=260, newtd=0x0)
    at /usr/src/sys/kern/kern_synch.c:451
#2  0xffffffff803c5112 in sleepq_timedwait (wchan=0xffffffff8083e080, pri=68)
    at /usr/src/sys/kern/subr_sleepqueue.c:644
#3  0xffffffff80391efb in _sleep (ident=0xffffffff8083e080, lock=0x0,
    priority=Variable "priority" is not available.) at /usr/src/sys/kern/kern_synch.c:230
#4  0xffffffff8053ebc9 in scheduler (dummy=Variable "dummy" is not available.)
    at /usr/src/sys/vm/vm_glue.c:807
#5  0xffffffff80341767 in mi_startup () at /usr/src/sys/kern/init_main.c:254
#6  0xffffffff8016efdc in btext () at /usr/src/sys/amd64/amd64/locore.S:81
#7  0xffffffff80863dc8 in sleepq_chains ()
#8  0xffffffff80848ae0 in cpu_top ()
#9  0x0000000000000000 in ?? ()
#10 0xffffffff8083e4e0 in proc0 ()
#11 0xffffffff80bb3b90 in ?? ()
#12 0xffffffff80bb3b38 in ?? ()
#13 0xffffff0012d838c0 in ?? ()
#14 0xffffffff803aeb19 in sched_switch (td=0x0, newtd=0x0, flags=Variable "flags" is not available.)
    at /usr/src/sys/kern/sched_ule.c:1852
Previous frame inner to this frame (corrupt stack?)

There are some indications that stopping jails could be the
cause of the panics so on one test box I've added in invariants
to see if we get anything shows up from that.

    Regards
    Steve


================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-stable@FreeBSD.ORG  Sun Aug 14 23:45:02 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 16BB4106564A;
	Sun, 14 Aug 2011 23:45:02 +0000 (UTC)
	(envelope-from prvs=120731b379=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	by mx1.freebsd.org (Postfix) with ESMTP id 666118FC1C;
	Sun, 14 Aug 2011 23:45:00 +0000 (UTC)
X-MDAV-Processed: mail1.multiplay.co.uk, Mon, 15 Aug 2011 00:44:00 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Mon, 15 Aug 2011 00:44:00 +0100
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	mail1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST
	shortcircuit=ham autolearn=disabled version=3.2.5
Received: from r2d2 ([188.220.16.49])
	by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	(MDaemon PRO v10.0.4) with ESMTP id md50014600121.msg;
	Mon, 15 Aug 2011 00:43:59 +0100
X-MDRemoteIP: 188.220.16.49
X-Return-Path: prvs=120731b379=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
Message-ID: <A14EDFBC1A374752957176383D0BE6A0@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Attilio Rao" <attilio@freebsd.org>,
	"Jeremy Chadwick" <freebsd@jdc.parodius.com>
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><A71C3ACF01EC4D36871E49805C1A5321@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><CAJ-FndAq2ASHzg_+9S__x=vTAgzHowMrv1DFSbXwroX27PF36A@mail.gmail.com><44DD20E1CFA949E8A1B15B3847769DCB@multiplay.co.uk><20110811092858.GA94514@icarus.home.lan>
	<CAJ-FndBfiHMemNfmXtWkzzZTkZ-Cw9oYd8D+CQtjSAOMf=0a8w@mail.gmail.com>
Date: Mon, 15 Aug 2011 00:44:34 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="UTF-8"; reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
Cc: freebsd-stable@freebsd.org, Andriy Gapon <avg@freebsd.org>
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Aug 2011 23:45:02 -0000


----- Original Message ----- 
From: "Attilio Rao" <attilio@freebsd.org>
> Anyway, we really would need much more information in order to take a
> proactive action.

> Would it be possible to access to one of the panic'ing machine? Is it
> always the same panic which is happening or it is variadic (like: once
> page fault, once fatal double fault, once fatal trap, etc.).

They are always double fault, 99% of the time with no additional info
we've seen 1 mention of java on one of the machines but the vmcore
didn't seem to mention anything to do with that after dump.

My colleague informs me when he did the upgrade to add in schedule
stop patch, pretty much every machine paniced when shutting the
java servers down, which is essentially a jail stop.

I've also had two panics when rebooting my test machine to change
kernel settings, although this could be a side effect of the scheduler
patch?

This single test machine is now running with the following none standard
settings:-
options     INVARIANTS
options     INVARIANT_SUPPORT
options     DDB
options     KSTACK_PAGES=12

I've got several vmcores from a number or different machines but none
seem to be any use, as they don't seem to list any thread that caused
the panic i.e. no mention of dump, or fault.

Is there something else in particular I should be looking for?

Circumstantial evidence seems to indicate uptime may to be a factor,
machines under 2 days seem much less likely to panic.

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-stable@FreeBSD.ORG  Mon Aug 15 08:31:44 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C9A281065674
	for <freebsd-stable@FreeBSD.org>; Mon, 15 Aug 2011 08:31:44 +0000 (UTC)
	(envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 01AEF8FC17
	for <freebsd-stable@FreeBSD.org>; Mon, 15 Aug 2011 08:31:43 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id LAA09764;
	Mon, 15 Aug 2011 11:31:40 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
	by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1QssaB-000750-OY; Mon, 15 Aug 2011 11:31:39 +0300
Message-ID: <4E48D967.9060804@FreeBSD.org>
Date: Mon, 15 Aug 2011 11:31:35 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110706 Thunderbird/5.0
MIME-Version: 1.0
To: Steven Hartland <killing@multiplay.co.uk>
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><A71C3ACF01EC4D36871E49805C1A5321@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk>
	<4E43E272.1060204@FreeBSD.org>
	<62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk>
	<4E440865.1040500@FreeBSD.org>
	<6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk>
	<4E441314.6060606@FreeBSD.org>
	<2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk>
In-Reply-To: <2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk>
X-Enigmail-Version: 1.2pre
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Aug 2011 08:31:45 -0000

on 14/08/2011 17:43 Steven Hartland said the following:
> ----- Original Message ----- From: "Andriy Gapon" <avg@FreeBSD.org>
>>
>> Maybe test it on couple of machines first just in case I overlooked something
>> essential, although I have a report from another use that the patch didn't break
>> anything for him (it was tested for an unrelated issue).
> 
> We've got this running on a ~40 machines and just had the first panic
> since the update. Unfortunately it doesn't seem to have changed anything :(
> 
> We have 352 thread entries starting with:-
> #0  sched_switch (td=0xffffffff8083e4e0, newtd=0xffffff0012d838c0,
> flags=Variable "flags" is not available.
> 23 with:-
> cpustop_handler () at atomic.h:285
> and 16 with:-
> #0  fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:562

I would like to get a full output of thread apply all bt.

> The main message being:-
> panic: double fault
> 
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
> 
> Unread portion of the kernel message buffer:
> <118>Aug 14 15:13:33 amsbld15 syslogd: exiting on signal 15

So this line, does it indicate a shutdown of a jail or of the whole system?

> Fatal double fault
> rip = 0xffffffff8053b691

Can you please provide output of 'list *0xffffffff8053b691' in kgdb?

> rsp = 0xffffff8d8f356fb0
> rbp = 0xffffff8d8f357210
> cpuid = 2; apic id = 02
> panic: double fault
> cpuid = 2
> KDB: stack backtrace:
> #0 0xffffffff803bb75e at kdb_backtrace+0x5e
> #1 0xffffffff8038956e at panic+0x2ae
> #2 0xffffffff805802b6 at dblfault_handler+0x96
> #3 0xffffffff8056900d at Xdblfault+0xad

I think (not 100% sure) that with DDB in kernel we could get a better backtrace
here, possibly with pre-dblfault stack frames, because DDB backend is a bit more
smarter than the trivial stack(9) printer.

> stack: 0xffffff8d8f357000, 4

One thing I can say is that this looks like like a double-fault because of stack
exhaustion (the most typical cause): rsp value is below td_kstack.

Can you please also provide the following information:
p *((struct pcb *)((char *)0xffffff8d8f357000 + KSTACK_PAGES * PAGE_SIZE) - 1)
where KSTACK_PAGES is a value of KSTACK_PAGES option (amd64 default is 4) and
PAGE_SIZE is 4096.

> rsp = 0xffffff800009ae10

[snip]

> There are some indications that stopping jails could be the
> cause of the panics so on one test box I've added in invariants
> to see if we get anything shows up from that.

OK.

-- 
Andriy Gapon

From owner-freebsd-stable@FreeBSD.ORG  Mon Aug 15 10:45:16 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9C4151065673;
	Mon, 15 Aug 2011 10:45:16 +0000 (UTC)
	(envelope-from prvs=1208040d95=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	by mx1.freebsd.org (Postfix) with ESMTP id EB69E8FC1C;
	Mon, 15 Aug 2011 10:45:15 +0000 (UTC)
X-MDAV-Processed: mail1.multiplay.co.uk, Mon, 15 Aug 2011 11:33:24 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Mon, 15 Aug 2011 11:33:24 +0100
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	mail1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST
	shortcircuit=ham autolearn=disabled version=3.2.5
Received: from r2d2 ([188.220.16.49])
	by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	(MDaemon PRO v10.0.4) with ESMTP id md50014604141.msg;
	Mon, 15 Aug 2011 11:33:23 +0100
X-MDRemoteIP: 188.220.16.49
X-Return-Path: prvs=1208040d95=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
Message-ID: <9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Andriy Gapon" <avg@FreeBSD.org>
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><A71C3ACF01EC4D36871E49805C1A5321@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk>
	<4E43E272.1060204@FreeBSD.org>
	<62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk>
	<4E440865.1040500@FreeBSD.org>
	<6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk>
	<4E441314.6060606@FreeBSD.org>
	<2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk>
	<4E48D967.9060804@FreeBSD.org>
Date: Mon, 15 Aug 2011 11:34:02 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
Cc: freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Aug 2011 10:45:16 -0000

----- Original Message ----- 
From: "Andriy Gapon" <avg@FreeBSD.org>
>> We have 352 thread entries starting with:-
>> #0  sched_switch (td=0xffffffff8083e4e0, newtd=0xffffff0012d838c0,
>> flags=Variable "flags" is not available.
>> 23 with:-
>> cpustop_handler () at atomic.h:285
>> and 16 with:-
>> #0  fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:562
>
> I would like to get a full output of thread apply all bt.

http://blog.multplay.co.uk/dropzone/freebsd/panic-2011-08-14-1524.txt

>> The main message being:-
>> panic: double fault
>>
>> GNU gdb 6.1.1 [FreeBSD]
>> Copyright 2004 Free Software Foundation, Inc.
>> GDB is free software, covered by the GNU General Public License, and you are
>> welcome to change it and/or distribute copies of it under certain conditions.
>> Type "show copying" to see the conditions.
>> There is absolutely no warranty for GDB.  Type "show warranty" for details.
>> This GDB was configured as "amd64-marcel-freebsd"...
>>
>> Unread portion of the kernel message buffer:
>> <118>Aug 14 15:13:33 amsbld15 syslogd: exiting on signal 15
>
> So this line, does it indicate a shutdown of a jail or of the whole system?

This specific panic was caused by me running "reboot" after all jails (~40)
where shutdown, which is slightly different from what my collegue was seeing
last friday, where the machines where panicing when the jails themselves
where stopped.

I may have a crash from one of these if needed.

>> Fatal double fault
>> rip = 0xffffffff8053b691
>
> Can you please provide output of 'list *0xffffffff8053b691' in kgdb?

(kgdb) list *0xffffffff8053b691
0xffffffff8053b691 is in vm_fault (/usr/src/sys/vm/vm_fault.c:239).
234             /*
235              * Find the backing store object and offset into it to begin the
236              * search.
237              */
238             fs.map = map;
239             result = vm_map_lookup(&fs.map, vaddr, fault_type, &fs.entry,
240                 &fs.first_object, &fs.first_pindex, &prot, &wired);
241             if (result != KERN_SUCCESS) {
242                     if (result != KERN_PROTECTION_FAILURE ||
243                         (fault_flags & VM_FAULT_WIRE_MASK) != VM_FAULT_USER_WIRE) {

>
>> rsp = 0xffffff8d8f356fb0
>> rbp = 0xffffff8d8f357210
>> cpuid = 2; apic id = 02
>> panic: double fault
>> cpuid = 2
>> KDB: stack backtrace:
>> #0 0xffffffff803bb75e at kdb_backtrace+0x5e
>> #1 0xffffffff8038956e at panic+0x2ae
>> #2 0xffffffff805802b6 at dblfault_handler+0x96
>> #3 0xffffffff8056900d at Xdblfault+0xad
>
> I think (not 100% sure) that with DDB in kernel we could get a better backtrace
> here, possibly with pre-dblfault stack frames, because DDB backend is a bit more
> smarter than the trivial stack(9) printer.

I've added this into the the kernel on my test machine and will try
to get it panic over the next few days. Seems to need a few days on
uptime before the panics start happening. In addition to increasing
KSTACK_PAGES to 12, if you believe this may be stack exhaustion, do
you want me to remove this increase?


>> stack: 0xffffff8d8f357000, 4
>
> One thing I can say is that this looks like like a double-fault because of stack
> exhaustion (the most typical cause): rsp value is below td_kstack.
>
> Can you please also provide the following information:
> p *((struct pcb *)((char *)0xffffff8d8f357000 + KSTACK_PAGES * PAGE_SIZE) - 1)
> where KSTACK_PAGES is a value of KSTACK_PAGES option (amd64 default is 4) and
> PAGE_SIZE is 4096.

(kgdb) p *((struct pcb *)((char *)0xffffff8d8f357000 + 4 * 4096) - 1)
$1 = {pcb_r15 = -2138686968, pcb_r14 = -1070655224792, pcb_r13 = 0, pcb_r12 = -1070655225856, pcb_rbp = -491518580864, pcb_rsp 
= -491518580952, pcb_rbx = -1099195460512, pcb_rip = -2143622375, pcb_fsbase = 34365428376,
  pcb_gsbase = 0, pcb_kgsbase = 0, pcb_cr0 = 0, pcb_cr2 = 0, pcb_cr3 = 12406784, pcb_cr4 = 0, pcb_dr0 = 0, pcb_dr1 = 0, pcb_dr2 = 
0, pcb_dr3 = 0, pcb_dr6 = 0, pcb_dr7 = 0, pcb_flags = 0, pcb_initial_fpucw = 895,
  pcb_onfault = 0x0, pcb_gs32sd = {sd_lolimit = 0, sd_lobase = 0, sd_type = 0, sd_dpl = 0, sd_p = 0, sd_hilimit = 0, sd_xx = 0, 
sd_long = 0, sd_def32 = 0, sd_gran = 0, sd_hibase = 0}, pcb_tssp = 0x0,
  pcb_save = 0xffffff8d8f35ae00, pcb_full_iret = 0 '\0', pcb_gdt = {rd_limit = 0, rd_base = 0}, pcb_idt = {rd_limit = 0, rd_base = 
0}, pcb_ldt = {rd_limit = 0, rd_base = 0}, pcb_tr = 0, pcb_user_save = {sv_env = {en_cw = 895,
      en_sw = 0, en_tw = 0 '\0', en_zero = 0 '\0', en_opcode = 0, en_rip = 0, en_rdp = 0, en_mxcsr = 8096, en_mxcsr_mask = 65535}, 
sv_fp = {{fp_acc = {fp_bytes = "\000\000\000\000\000\000\000\000\000"},
        fp_pad = "\000\000\000\000\000"}, {fp_acc = {fp_bytes = "\000\000\000\000\000\000\000\000\000"}, fp_pad = 
"\000\000\000\000\000"}, {fp_acc = {fp_bytes = "\000\000\000\000\000\000\000\000\000"},
        fp_pad = "\000\000\000\000\000"}, {fp_acc = {fp_bytes = "\000\000\000\000\000\000\000\000\000"}, fp_pad = 
"\000\000\000\000\000"}, {fp_acc = {fp_bytes = "\000\000\000\000\000\000\000\000\000"},
        fp_pad = "\000\000\000\000\000"}, {fp_acc = {fp_bytes = "\000\000\000\000\000\000\000\000\000"}, fp_pad = 
"\000\000\000\000\000"}, {fp_acc = {fp_bytes = "\000\000\000\000\000\000\000\000\000"},
        fp_pad = "\000\000\000\000\000"}, {fp_acc = {fp_bytes = "\000\000\000\000\000\000\000\000\000"}, fp_pad = 
"\000\000\000\000\000"}}, sv_xmm = {{xmm_bytes = "\000\000\000\b\030\212rA\000\000\000\000\000\000\000"}, {
        xmm_bytes = '\0' <repeats 15 times>} <repeats 15 times>}, sv_pad = '\0' <repeats 95 times>}}

Thanks for your help on this, as its way over my head ;-)

    Regards
    Steve 


================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-stable@FreeBSD.ORG  Mon Aug 15 12:00:06 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4EAF1106566C
	for <freebsd-stable@FreeBSD.org>; Mon, 15 Aug 2011 12:00:05 +0000 (UTC)
	(envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id EFE758FC0C
	for <freebsd-stable@FreeBSD.org>; Mon, 15 Aug 2011 12:00:04 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id PAA12470;
	Mon, 15 Aug 2011 15:00:00 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Message-ID: <4E490A3F.1000205@FreeBSD.org>
Date: Mon, 15 Aug 2011 14:59:59 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110705 Thunderbird/5.0
MIME-Version: 1.0
To: Steven Hartland <killing@multiplay.co.uk>
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><A71C3ACF01EC4D36871E49805C1A5321@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk>
	<4E43E272.1060204@FreeBSD.org>
	<62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk>
	<4E440865.1040500@FreeBSD.org>
	<6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk>
	<4E441314.6060606@FreeBSD.org>
	<2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk>
	<4E48D967.9060804@FreeBSD.org>
	<9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk>
In-Reply-To: <9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk>
X-Enigmail-Version: 1.2pre
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Aug 2011 12:00:06 -0000

on 15/08/2011 13:34 Steven Hartland said the following:
> ----- Original Message ----- From: "Andriy Gapon" <avg@FreeBSD.org>
>> I think (not 100% sure) that with DDB in kernel we could get a better backtrace
>> here, possibly with pre-dblfault stack frames, because DDB backend is a bit more
>> smarter than the trivial stack(9) printer.
> 
> I've added this into the the kernel on my test machine and will try
> to get it panic over the next few days. Seems to need a few days on
> uptime before the panics start happening. In addition to increasing
> KSTACK_PAGES to 12, if you believe this may be stack exhaustion, do
> you want me to remove this increase?

Yes, I think it would make sense to change KSTACK_PAGES to the default value.
But, OTOH, if you can afford to have DDB in a few more machines, then it would be
interesting to compare behavior with different stack sizes.

BTW, if you don't want your machines to sit at ddb prompt after panic, then you'd
also need either KDB_UNATTENDED option or set debug.debugger_on_panic=0.

-- 
Andriy Gapon

From owner-freebsd-stable@FreeBSD.ORG  Mon Aug 15 12:14:43 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8FF861065675
	for <freebsd-stable@FreeBSD.org>; Mon, 15 Aug 2011 12:14:43 +0000 (UTC)
	(envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id DA9C38FC16
	for <freebsd-stable@FreeBSD.org>; Mon, 15 Aug 2011 12:14:42 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id PAA12721;
	Mon, 15 Aug 2011 15:14:40 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Message-ID: <4E490DAF.1080009@FreeBSD.org>
Date: Mon, 15 Aug 2011 15:14:39 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110705 Thunderbird/5.0
MIME-Version: 1.0
To: Steven Hartland <killing@multiplay.co.uk>
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><A71C3ACF01EC4D36871E49805C1A5321@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk>
	<4E43E272.1060204@FreeBSD.org>
	<62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk>
	<4E440865.1040500@FreeBSD.org>
	<6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk>
	<4E441314.6060606@FreeBSD.org>
	<2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk>
	<4E48D967.9060804@FreeBSD.org>
	<9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk>
In-Reply-To: <9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk>
X-Enigmail-Version: 1.2pre
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Aug 2011 12:14:43 -0000

on 15/08/2011 13:34 Steven Hartland said the following:
> (kgdb) list *0xffffffff8053b691
> 0xffffffff8053b691 is in vm_fault (/usr/src/sys/vm/vm_fault.c:239).
> 234             /*
> 235              * Find the backing store object and offset into it to begin the
> 236              * search.
> 237              */
> 238             fs.map = map;
> 239             result = vm_map_lookup(&fs.map, vaddr, fault_type, &fs.entry,
> 240                 &fs.first_object, &fs.first_pindex, &prot, &wired);
> 241             if (result != KERN_SUCCESS) {
> 242                     if (result != KERN_PROTECTION_FAILURE ||
> 243                         (fault_flags & VM_FAULT_WIRE_MASK) !=
> VM_FAULT_USER_WIRE) {
> 

Interesting... thanks!
Can you please also additionally provide (lengthy) output of x/512a
0xffffff8d8f356fb0 ?

-- 
Andriy Gapon

From owner-freebsd-stable@FreeBSD.ORG  Mon Aug 15 12:52:21 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DCE181065672;
	Mon, 15 Aug 2011 12:52:21 +0000 (UTC)
	(envelope-from prvs=1208040d95=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	by mx1.freebsd.org (Postfix) with ESMTP id 378718FC18;
	Mon, 15 Aug 2011 12:52:20 +0000 (UTC)
X-MDAV-Processed: mail1.multiplay.co.uk, Mon, 15 Aug 2011 13:51:07 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Mon, 15 Aug 2011 13:51:07 +0100
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	mail1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST
	shortcircuit=ham autolearn=disabled version=3.2.5
Received: from r2d2 ([188.220.16.49])
	by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	(MDaemon PRO v10.0.4) with ESMTP id md50014605358.msg;
	Mon, 15 Aug 2011 13:51:06 +0100
X-MDRemoteIP: 188.220.16.49
X-Return-Path: prvs=1208040d95=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
Message-ID: <796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Andriy Gapon" <avg@FreeBSD.org>
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><A71C3ACF01EC4D36871E49805C1A5321@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk>
	<4E43E272.1060204@FreeBSD.org>
	<62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk>
	<4E440865.1040500@FreeBSD.org>
	<6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk>
	<4E441314.6060606@FreeBSD.org>
	<2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk>
	<4E48D967.9060804@FreeBSD.org>
	<9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk>
	<4E490DAF.1080009@FreeBSD.org>
Date: Mon, 15 Aug 2011 13:51:44 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
Cc: freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Aug 2011 12:52:21 -0000

----- Original Message ----- 
From: "Andriy Gapon" <avg@FreeBSD.org>


> on 15/08/2011 13:34 Steven Hartland said the following:
>> (kgdb) list *0xffffffff8053b691
>> 0xffffffff8053b691 is in vm_fault (/usr/src/sys/vm/vm_fault.c:239).
>> 234             /*
>> 235              * Find the backing store object and offset into it to begin the
>> 236              * search.
>> 237              */
>> 238             fs.map = map;
>> 239             result = vm_map_lookup(&fs.map, vaddr, fault_type, &fs.entry,
>> 240                 &fs.first_object, &fs.first_pindex, &prot, &wired);
>> 241             if (result != KERN_SUCCESS) {
>> 242                     if (result != KERN_PROTECTION_FAILURE ||
>> 243                         (fault_flags & VM_FAULT_WIRE_MASK) !=
>> VM_FAULT_USER_WIRE) {
>> 
> 
> Interesting... thanks!
> Can you please also additionally provide (lengthy) output of x/512a
> 0xffffff8d8f356fb0 ?

Sorry I'm not sure I follow your their?

Do you mean any of the following:-
(kgdb) x/512a
0xffffff8d8f35b000:     Cannot access memory at address 0xffffff8d8f35b000

(kgdb) list *0xffffff8d8f356fb0
No source file for address 0xffffff8d8f356fb0.

or:
(kgdb) x/512a 0xffffff8d8f356fb0
0xffffff8d8f356fb0:     Cannot access memory at address 0xffffff8d8f356fb0

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-stable@FreeBSD.ORG  Mon Aug 15 13:20:05 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EDF4B1065673
	for <freebsd-stable@FreeBSD.org>; Mon, 15 Aug 2011 13:20:05 +0000 (UTC)
	(envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 3C8298FC17
	for <freebsd-stable@FreeBSD.org>; Mon, 15 Aug 2011 13:20:04 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id QAA13741;
	Mon, 15 Aug 2011 16:20:02 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Message-ID: <4E491D01.1090902@FreeBSD.org>
Date: Mon, 15 Aug 2011 16:20:01 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110705 Thunderbird/5.0
MIME-Version: 1.0
To: Steven Hartland <killing@multiplay.co.uk>
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><A71C3ACF01EC4D36871E49805C1A5321@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk>
	<4E43E272.1060204@FreeBSD.org>
	<62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk>
	<4E440865.1040500@FreeBSD.org>
	<6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk>
	<4E441314.6060606@FreeBSD.org>
	<2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk>
	<4E48D967.9060804@FreeBSD.org>
	<9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk>
	<4E490DAF.1080009@FreeBSD.org>
	<796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk>
In-Reply-To: <796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk>
X-Enigmail-Version: 1.2pre
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Aug 2011 13:20:06 -0000

on 15/08/2011 15:51 Steven Hartland said the following:
> ----- Original Message ----- From: "Andriy Gapon" <avg@FreeBSD.org>
> 
> 
>> on 15/08/2011 13:34 Steven Hartland said the following:
>>> (kgdb) list *0xffffffff8053b691
>>> 0xffffffff8053b691 is in vm_fault (/usr/src/sys/vm/vm_fault.c:239).
>>> 234             /*
>>> 235              * Find the backing store object and offset into it to begin the
>>> 236              * search.
>>> 237              */
>>> 238             fs.map = map;
>>> 239             result = vm_map_lookup(&fs.map, vaddr, fault_type, &fs.entry,
>>> 240                 &fs.first_object, &fs.first_pindex, &prot, &wired);
>>> 241             if (result != KERN_SUCCESS) {
>>> 242                     if (result != KERN_PROTECTION_FAILURE ||
>>> 243                         (fault_flags & VM_FAULT_WIRE_MASK) !=
>>> VM_FAULT_USER_WIRE) {
>>>
>>
>> Interesting... thanks!
>> Can you please also additionally provide (lengthy) output of x/512a
>> 0xffffff8d8f356fb0 ?
> 
> Sorry I'm not sure I follow your their?

It seems that you got me correctly :)

> Do you mean any of the following:-
> (kgdb) x/512a
> 0xffffff8d8f35b000:     Cannot access memory at address 0xffffff8d8f35b000
>
> (kgdb) list *0xffffff8d8f356fb0
> No source file for address 0xffffff8d8f356fb0.
> 
> or:
> (kgdb) x/512a 0xffffff8d8f356fb0
> 0xffffff8d8f356fb0:     Cannot access memory at address 0xffffff8d8f356fb0

Can you please try this (the last command) with 0xffffff8d8f357210 instead?

-- 
Andriy Gapon

From owner-freebsd-stable@FreeBSD.ORG  Mon Aug 15 14:57:26 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C0CFD1065675;
	Mon, 15 Aug 2011 14:57:26 +0000 (UTC)
	(envelope-from prvs=1208040d95=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	by mx1.freebsd.org (Postfix) with ESMTP id C99618FC1E;
	Mon, 15 Aug 2011 14:57:25 +0000 (UTC)
X-MDAV-Processed: mail1.multiplay.co.uk, Mon, 15 Aug 2011 15:55:51 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Mon, 15 Aug 2011 15:55:51 +0100
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	mail1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST
	shortcircuit=ham autolearn=disabled version=3.2.5
Received: from r2d2 ([188.220.16.49])
	by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	(MDaemon PRO v10.0.4) with ESMTP id md50014606518.msg;
	Mon, 15 Aug 2011 15:55:50 +0100
X-MDRemoteIP: 188.220.16.49
X-Return-Path: prvs=1208040d95=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
Message-ID: <570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Andriy Gapon" <avg@FreeBSD.org>
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><A71C3ACF01EC4D36871E49805C1A5321@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk>
	<4E43E272.1060204@FreeBSD.org>
	<62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk>
	<4E440865.1040500@FreeBSD.org>
	<6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk>
	<4E441314.6060606@FreeBSD.org>
	<2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk>
	<4E48D967.9060804@FreeBSD.org>
	<9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk>
	<4E490DAF.1080009@FreeBSD.org>
	<796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk>
	<4E491D01.1090902@FreeBSD.org>
Date: Mon, 15 Aug 2011 15:56:27 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
Cc: freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Aug 2011 14:57:26 -0000


----- Original Message ----- 
From: "Andriy Gapon" <avg@FreeBSD.org>
To: "Steven Hartland" <killing@multiplay.co.uk>
Cc: <freebsd-stable@FreeBSD.org>
Sent: Monday, August 15, 2011 2:20 PM
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE


> on 15/08/2011 15:51 Steven Hartland said the following:
>> ----- Original Message ----- From: "Andriy Gapon" <avg@FreeBSD.org>
>> 
>> 
>>> on 15/08/2011 13:34 Steven Hartland said the following:
>>>> (kgdb) list *0xffffffff8053b691
>>>> 0xffffffff8053b691 is in vm_fault (/usr/src/sys/vm/vm_fault.c:239).
>>>> 234             /*
>>>> 235              * Find the backing store object and offset into it to begin the
>>>> 236              * search.
>>>> 237              */
>>>> 238             fs.map = map;
>>>> 239             result = vm_map_lookup(&fs.map, vaddr, fault_type, &fs.entry,
>>>> 240                 &fs.first_object, &fs.first_pindex, &prot, &wired);
>>>> 241             if (result != KERN_SUCCESS) {
>>>> 242                     if (result != KERN_PROTECTION_FAILURE ||
>>>> 243                         (fault_flags & VM_FAULT_WIRE_MASK) !=
>>>> VM_FAULT_USER_WIRE) {
>>>>
>>>
>>> Interesting... thanks!
>>> Can you please also additionally provide (lengthy) output of x/512a
>>> 0xffffff8d8f356fb0 ?
>> 
>> Sorry I'm not sure I follow your their?
> 
> It seems that you got me correctly :)
> 
>> Do you mean any of the following:-
>> (kgdb) x/512a
>> 0xffffff8d8f35b000:     Cannot access memory at address 0xffffff8d8f35b000
>>
>> (kgdb) list *0xffffff8d8f356fb0
>> No source file for address 0xffffff8d8f356fb0.
>> 
>> or:
>> (kgdb) x/512a 0xffffff8d8f356fb0
>> 0xffffff8d8f356fb0:     Cannot access memory at address 0xffffff8d8f356fb0
> 
> Can you please try this (the last command) with 0xffffff8d8f357210 instead?

(kgdb) x/512a 0xffffff8d8f357210
0xffffff8d8f357210:     0xffffff8d8f357280      0xffffffff805807d3 <trap_pfault+307>
0xffffff8d8f357220:     0x0     0xffffff8d8f357370
0xffffff8d8f357230:     0xffffff06b7f9c000      0x30
0xffffff8d8f357240:     0x100000000     0x0
0xffffff8d8f357250:     0x0     0x9
0xffffff8d8f357260:     0xc     0xffffff8d8f357370
0xffffff8d8f357270:     0xffffff06b7f9c000      0x0
0xffffff8d8f357280:     0xffffff8d8f357360      0xffffffff80580e0f <trap+991>
0xffffff8d8f357290:     0x0     0x0
0xffffff8d8f3572a0:     0x80074e49e     0x2
0xffffff8d8f3572b0:     0x80071cba0     0x80071cdc0
0xffffff8d8f3572c0:     0x80071c9a0     0x0
0xffffff8d8f3572d0:     0x0     0x0
0xffffff8d8f3572e0:     0x0     0x0
0xffffff8d8f3572f0:     0x0     0x0
0xffffff8d8f357300:     0x80074e49e     0x1
0xffffff8d8f357310:     0x80071cba0     0x80071cdc0
0xffffff8d8f357320:     0x80071c9a0     0x0
0xffffff8d8f357330:     0x0     0x4
0xffffff8d8f357340:     0xffffff070b5a48c0      0xffffff06b7f9c000
0xffffff8d8f357350:     0x0     0xffffffff8083e920 <vmspace0>
0xffffff8d8f357360:     0xffffff8d8f357430      0xffffffff80568f04 <calltrap+8>
0xffffff8d8f357370:     0xffffff070b5a48c0      0x3
0xffffff8d8f357380:     0xffffff8d8f357440      0x0
0xffffff8d8f357390:     0xffffff8d8f357440      0x30
0xffffff8d8f3573a0:     0xffffff06b7f9c000      0x4
0xffffff8d8f3573b0:     0xffffff8d8f357430      0xffffffff8083e920 <vmspace0>
0xffffff8d8f3573c0:     0xffffff06b7f9c000      0xffffff070b5a48c0
0xffffff8d8f3573d0:     0xffffff06b7f9c000      0x0
0xffffff8d8f3573e0:     0xffffffff8083e920 <vmspace0>   0x1b00130000000c
0xffffff8d8f3573f0:     0x30    0x3b003b00000001
0xffffff8d8f357400:     0x0     0xffffffff80384632 <lim_rlimit+18>
0xffffff8d8f357410:     0x20    0x10206
0xffffff8d8f357420:     0xffffff8d8f357430      0x28
0xffffff8d8f357430:     0xffffff8d8f357450      0xffffffff80384681 <lim_cur+17>
0xffffff8d8f357440:     0x4     0xffffff070b5a48c0
0xffffff8d8f357450:     0xffffff8d8f357500      0xffffffff80543ffd <vm_map_growstack+93>
0xffffff8d8f357460:     0xffffff8d8f357470      0xffffff8d8f3576d8
0xffffff8d8f357470:     0xffffff8d8f357500      0xffffffff80544ef8 <vm_map_lookup+808>
0xffffff8d8f357480:     0xffffff070b5a49b8      0x0
0xffffff8d8f357490:     0x8     0xffffff06b7f9c000
0xffffff8d8f3574a0:     0xffffff06b7f9c000      0xffffff8d8f3576d8
0xffffff8d8f3574b0:     0xffffff8d8f3576d0      0xffffff8d8f3576e8
0xffffff8d8f3574c0:     0x0     0xffffff8d8f3576e0
0xffffff8d8f3574d0:     0x100000001     0x1
0xffffff8d8f3574e0:     0xffffff06b7f9c000      0x1
0xffffff8d8f3574f0:     0x0     0xffffffff8083e920 <vmspace0>
0xffffff8d8f357500:     0xffffff8d8f357770      0xffffffff8053c723 <vm_fault+4355>
0xffffff8d8f357510:     0xffffff8d8f35773f      0xffffff8d8f357738
0xffffff8d8f357520:     0x80085e4f9     0x80085e4f8
0xffffff8d8f357530:     0xffffff06b7f9c000      0xffffff8d8f3576e0
0xffffff8d8f357540:     0xffffff8d8f3576e8      0xffffff8d8f3576d0
0xffffff8d8f357550:     0xffffff8d8f3576d8      0x80085e4f9
0xffffff8d8f357560:     0x80085e4f9     0x80085e4f9
0xffffff8d8f357570:     0x80085e4f9     0x80085e4f9
0xffffff8d8f357580:     0x80085e4f9     0x80085e4f9
0xffffff8d8f357590:     0x80085e4f9     0x80085e4f9
0xffffff8d8f3575a0:     0x80085e4f9     0x734210
0xffffff8d8f3575b0:     0x10000000001   0x80073ada0
0xffffff8d8f3575c0:     0x0     0xffffffff8083e920 <vmspace0>
0xffffff8d8f3575d0:     0x80073aec0     0x1
0xffffff8d8f3575e0:     0x6967614d00000000      0x454e4f4e
0xffffff8d8f3575f0:     0x0     0x0
0xffffff8d8f357600:     0x0     0x0
0xffffff8d8f357610:     0x0     0xfffd
0xffffff8d8f357620:     0x200   0x200
0xffffff8d8f357630:     0x200   0x200
0xffffff8d8f357640:     0x200   0x200
0xffffff8d8f357650:     0x200   0x200
0xffffff8d8f357660:     0x200   0x24200
0xffffff8d8f357670:     0x4200  0x4200
0xffffff8d8f357680:     0x4200  0x4200
0xffffff8d8f357690:     0x200   0x200
0xffffff8d8f3576a0:     0x200   0x200
0xffffff8d8f3576b0:     0x200   0x200
0xffffff8d8f3576c0:     0x200   0x200
0xffffff8d8f3576d0:     0x200   0x200
0xffffff8d8f3576e0:     0xffffffff8083e920 <vmspace0>   0xffffffff8083e920 <vmspace0>
0xffffff8d8f3576f0:     0x200   0x0
0xffffff8d8f357700:     0x0     0x200
0xffffff8d8f357710:     0x200   0x200
0xffffff8d8f357720:     0x64000 0x42800
0xffffff8d8f357730:     0x42800 0x42800
0xffffff8d8f357740:     0x42800 0xffffff070b5a48c0
0xffffff8d8f357750:     0xffffff06b7f9c000      0x4
0xffffff8d8f357760:     0x0     0xffffffff8083e920 <vmspace0>
0xffffff8d8f357770:     0xffffff8d8f3577e0      0xffffffff805807d3 <trap_pfault+307>
0xffffff8d8f357780:     0x42800 0xffffff8d8f3578d0
0xffffff8d8f357790:     0xffffff06b7f9c000      0x30
0xffffff8d8f3577a0:     0x100050c00     0x50c01
0xffffff8d8f3577b0:     0x50c02 0x9
0xffffff8d8f3577c0:     0xc     0xffffff8d8f3578d0
0xffffff8d8f3577d0:     0xffffff06b7f9c000      0x0
0xffffff8d8f3577e0:     0xffffff8d8f3578c0      0xffffffff80580e0f <trap+991>
0xffffff8d8f3577f0:     0x42800 0x42800
0xffffff8d8f357800:     0x42800 0x42800
0xffffff8d8f357810:     0x42800 0x42800
0xffffff8d8f357820:     0x42800 0x5890a
0xffffff8d8f357830:     0x5890b 0x5890c
0xffffff8d8f357840:     0x5890d 0x5890e
0xffffff8d8f357850:     0x5890f 0x48900
0xffffff8d8f357860:     0x48900 0x48900
0xffffff8d8f357870:     0x48900 0x48900
0xffffff8d8f357880:     0x48900 0x48900
0xffffff8d8f357890:     0x48900 0x4
0xffffff8d8f3578a0:     0xffffff070b5a48c0      0xffffff06b7f9c000
0xffffff8d8f3578b0:     0x0     0xffffffff8083e920 <vmspace0>
0xffffff8d8f3578c0:     0xffffff8d8f357990      0xffffffff80568f04 <calltrap+8>
0xffffff8d8f3578d0:     0xffffff070b5a48c0      0x3
0xffffff8d8f3578e0:     0xffffff8d8f3579a0      0x0
0xffffff8d8f3578f0:     0xffffff8d8f3579a0      0x30
0xffffff8d8f357900:     0xffffff06b7f9c000      0x4
0xffffff8d8f357910:     0xffffff8d8f357990      0xffffffff8083e920 <vmspace0>
0xffffff8d8f357920:     0xffffff06b7f9c000      0xffffff070b5a48c0
0xffffff8d8f357930:     0xffffff06b7f9c000      0x0
0xffffff8d8f357940:     0xffffffff8083e920 <vmspace0>   0x1b00130000000c
0xffffff8d8f357950:     0x30    0x3b003b00000001
0xffffff8d8f357960:     0x0     0xffffffff80384632 <lim_rlimit+18>
0xffffff8d8f357970:     0x20    0x10206
0xffffff8d8f357980:     0xffffff8d8f357990      0x28
0xffffff8d8f357990:     0xffffff8d8f3579b0      0xffffffff80384681 <lim_cur+17>
0xffffff8d8f3579a0:     0x4     0xffffff070b5a48c0
0xffffff8d8f3579b0:     0xffffff8d8f357a60      0xffffffff80543ffd <vm_map_growstack+93>
0xffffff8d8f3579c0:     0xffffff8d8f3579d0      0xffffff8d8f357c38
0xffffff8d8f3579d0:     0xffffff8d8f357a60      0xffffffff80544ef8 <vm_map_lookup+808>
0xffffff8d8f3579e0:     0xffffff070b5a49b8      0x0
0xffffff8d8f3579f0:     0x41900 0xffffff06b7f9c000
0xffffff8d8f357a00:     0xffffff06b7f9c000      0xffffff8d8f357c38
0xffffff8d8f357a10:     0xffffff8d8f357c30      0xffffff8d8f357c48
0xffffff8d8f357a20:     0x0     0xffffff8d8f357c40
0xffffff8d8f357a30:     0x0     0x1
0xffffff8d8f357a40:     0xffffff06b7f9c000      0x1
0xffffff8d8f357a50:     0x0     0xffffffff8083e920 <vmspace0>
0xffffff8d8f357a60:     0xffffff8d8f357cd0      0xffffffff8053c723 <vm_fault+4355>
0xffffff8d8f357a70:     0xffffff8d8f357c9f      0xffffff8d8f357c98
0xffffff8d8f357a80:     0x0     0x0
0xffffff8d8f357a90:     0xffffff06b7f9c000      0xffffff8d8f357c40
0xffffff8d8f357aa0:     0xffffff8d8f357c48      0xffffff8d8f357c30
0xffffff8d8f357ab0:     0xffffff8d8f357c38      0x0
0xffffff8d8f357ac0:     0x0     0x0
0xffffff8d8f357ad0:     0x0     0x0
0xffffff8d8f357ae0:     0x0     0x0
0xffffff8d8f357af0:     0x0     0x0
0xffffff8d8f357b00:     0x0     0x0
0xffffff8d8f357b10:     0x1     0x0
0xffffff8d8f357b20:     0x0     0xffffffff8083e920 <vmspace0>
0xffffff8d8f357b30:     0x0     0x1
0xffffff8d8f357b40:     0x0     0x0
0xffffff8d8f357b50:     0x0     0x0
0xffffff8d8f357b60:     0x0     0x0
0xffffff8d8f357b70:     0x0     0x0
0xffffff8d8f357b80:     0x0     0x0
0xffffff8d8f357b90:     0x0     0x0
0xffffff8d8f357ba0:     0x0     0x0
0xffffff8d8f357bb0:     0x0     0x0
0xffffff8d8f357bc0:     0x0     0x0
0xffffff8d8f357bd0:     0x0     0x0
0xffffff8d8f357be0:     0x0     0x0
0xffffff8d8f357bf0:     0x0     0x0
0xffffff8d8f357c00:     0x0     0x0
0xffffff8d8f357c10:     0x0     0x0
0xffffff8d8f357c20:     0x0     0x0
0xffffff8d8f357c30:     0x0     0x0
0xffffff8d8f357c40:     0xffffffff8083e920 <vmspace0>   0xffffffff8083e920 <vmspace0>
0xffffff8d8f357c50:     0x0     0x0
0xffffff8d8f357c60:     0x0     0x0
0xffffff8d8f357c70:     0x0     0x0
0xffffff8d8f357c80:     0x0     0x0
0xffffff8d8f357c90:     0x0     0x0
0xffffff8d8f357ca0:     0x0     0xffffff070b5a48c0
0xffffff8d8f357cb0:     0xffffff06b7f9c000      0x4
0xffffff8d8f357cc0:     0x0     0xffffffff8083e920 <vmspace0>
0xffffff8d8f357cd0:     0xffffff8d8f357d40      0xffffffff805807d3 <trap_pfault+307>
0xffffff8d8f357ce0:     0x0     0xffffff8d8f357e30
0xffffff8d8f357cf0:     0xffffff06b7f9c000      0x30
0xffffff8d8f357d00:     0x100000000     0x0
0xffffff8d8f357d10:     0x0     0x9
0xffffff8d8f357d20:     0xc     0xffffff8d8f357e30
0xffffff8d8f357d30:     0xffffff06b7f9c000      0x0
0xffffff8d8f357d40:     0xffffff8d8f357e20      0xffffffff80580e0f <trap+991>
0xffffff8d8f357d50:     0x0     0x0
0xffffff8d8f357d60:     0x0     0x0
0xffffff8d8f357d70:     0x0     0x0
0xffffff8d8f357d80:     0x0     0x0
0xffffff8d8f357d90:     0x0     0x0
0xffffff8d8f357da0:     0x0     0x0
0xffffff8d8f357db0:     0x0     0x0
0xffffff8d8f357dc0:     0x0     0x0
0xffffff8d8f357dd0:     0x0     0x0
0xffffff8d8f357de0:     0x0     0x0
0xffffff8d8f357df0:     0x0     0x4
0xffffff8d8f357e00:     0xffffff070b5a48c0      0xffffff06b7f9c000
0xffffff8d8f357e10:     0x0     0xffffffff8083e920 <vmspace0>
0xffffff8d8f357e20:     0xffffff8d8f357ef0      0xffffffff80568f04 <calltrap+8>
0xffffff8d8f357e30:     0xffffff070b5a48c0      0x3
0xffffff8d8f357e40:     0xffffff8d8f357f00      0x0
0xffffff8d8f357e50:     0xffffff8d8f357f00      0x30
0xffffff8d8f357e60:     0xffffff06b7f9c000      0x4
0xffffff8d8f357e70:     0xffffff8d8f357ef0      0xffffffff8083e920 <vmspace0>
0xffffff8d8f357e80:     0xffffff06b7f9c000      0xffffff070b5a48c0
0xffffff8d8f357e90:     0xffffff06b7f9c000      0x0
0xffffff8d8f357ea0:     0xffffffff8083e920 <vmspace0>   0x1b00130000000c
0xffffff8d8f357eb0:     0x30    0x3b003b00000001
0xffffff8d8f357ec0:     0x0     0xffffffff80384632 <lim_rlimit+18>
0xffffff8d8f357ed0:     0x20    0x10206
0xffffff8d8f357ee0:     0xffffff8d8f357ef0      0x28
0xffffff8d8f357ef0:     0xffffff8d8f357f10      0xffffffff80384681 <lim_cur+17>
0xffffff8d8f357f00:     0x4     0xffffff070b5a48c0
0xffffff8d8f357f10:     0xffffff8d8f357fc0      0xffffffff80543ffd <vm_map_growstack+93>
0xffffff8d8f357f20:     0xffffff8d8f357f30      0xffffff8d8f358198
0xffffff8d8f357f30:     0xffffff8d8f357fc0      0xffffffff80544ef8 <vm_map_lookup+808>
0xffffff8d8f357f40:     0xffffff070b5a49b8      0x0
0xffffff8d8f357f50:     0x6c    0xffffff06b7f9c000
0xffffff8d8f357f60:     0xffffff06b7f9c000      0xffffff8d8f358198
0xffffff8d8f357f70:     0xffffff8d8f358190      0xffffff8d8f3581a8
0xffffff8d8f357f80:     0x0     0xffffff8d8f3581a0
0xffffff8d8f357f90:     0x5d0000005c    0x1
0xffffff8d8f357fa0:     0xffffff06b7f9c000      0x1
0xffffff8d8f357fb0:     0x0     0xffffffff8083e920 <vmspace0>
0xffffff8d8f357fc0:     0xffffff8d8f358230      0xffffffff8053c723 <vm_fault+4355>
0xffffff8d8f357fd0:     0xffffff8d8f3581ff      0xffffff8d8f3581f8
0xffffff8d8f357fe0:     0x7100000070    0x7300000072
0xffffff8d8f357ff0:     0xffffff06b7f9c000      0xffffff8d8f3581a0
0xffffff8d8f358000:     0xffffff8d8f3581a8      0xffffff8d8f358190
0xffffff8d8f358010:     0xffffff8d8f358198      0x7f0000007e
0xffffff8d8f358020:     0x8100000080    0x8300000082
0xffffff8d8f358030:     0x8500000084    0x8700000086
0xffffff8d8f358040:     0x8900000088    0x8b0000008a
0xffffff8d8f358050:     0x8d0000008c    0x8f0000008e
0xffffff8d8f358060:     0x9100000090    0x92
0xffffff8d8f358070:     0x9500000001    0x9700000096
0xffffff8d8f358080:     0x0     0xffffffff8083e920 <vmspace0>
0xffffff8d8f358090:     0x9d0000009c    0x1
0xffffff8d8f3580a0:     0xa100000000    0xa3000000a2
0xffffff8d8f3580b0:     0xa5000000a4    0xa7000000a6
0xffffff8d8f3580c0:     0xa9000000a8    0xab000000aa
0xffffff8d8f3580d0:     0xad000000ac    0xaf000000ae
0xffffff8d8f3580e0:     0xb1000000b0    0xb2
0xffffff8d8f3580f0:     0xb5000000b4    0xb7000000b6
0xffffff8d8f358100:     0xb9000000b8    0xbb000000ba
0xffffff8d8f358110:     0xbd000000bc    0xbf000000be
0xffffff8d8f358120:     0xc1000000c0    0xc3000000c2
0xffffff8d8f358130:     0xc5000000c4    0xc7000000c6
0xffffff8d8f358140:     0xc9000000c8    0xcb000000ca
0xffffff8d8f358150:     0xcd000000cc    0xcf000000ce
0xffffff8d8f358160:     0xd1000000d0    0xd3000000d2
0xffffff8d8f358170:     0xd5000000d4    0xd7000000d6
0xffffff8d8f358180:     0xd9000000d8    0xdb000000da
0xffffff8d8f358190:     0xdd000000dc    0xdf000000de
0xffffff8d8f3581a0:     0xffffffff8083e920 <vmspace0>   0xffffffff8083e920 <vmspace0>
0xffffff8d8f3581b0:     0xe5000000e4    0x0
0xffffff8d8f3581c0:     0xe900000000    0xeb000000ea
0xffffff8d8f3581d0:     0xed000000ec    0xef000000ee
0xffffff8d8f3581e0:     0xf1000000f0    0xf3000000f2
0xffffff8d8f3581f0:     0xf5000000f4    0xf7000000f6
0xffffff8d8f358200:     0xf9000000f8    0xffffff070b5a48c0
(kgdb) 

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-stable@FreeBSD.ORG  Mon Aug 15 15:36:35 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E0624106566B;
	Mon, 15 Aug 2011 15:36:35 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 019128FC0A;
	Mon, 15 Aug 2011 15:36:34 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id SAA15475;
	Mon, 15 Aug 2011 18:36:31 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Message-ID: <4E493CFE.6010207@FreeBSD.org>
Date: Mon, 15 Aug 2011 18:36:30 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110705 Thunderbird/5.0
MIME-Version: 1.0
To: Steven Hartland <killing@multiplay.co.uk>
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><A71C3ACF01EC4D36871E49805C1A5321@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk>
	<4E43E272.1060204@FreeBSD.org>
	<62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk>
	<4E440865.1040500@FreeBSD.org>
	<6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk>
	<4E441314.6060606@FreeBSD.org>
	<2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk>
	<4E48D967.9060804@FreeBSD.org>
	<9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk>
	<4E490DAF.1080009@FreeBSD.org>
	<796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk>
	<4E491D01.1090902@FreeBSD.org>
	<570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk>
In-Reply-To: <570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk>
X-Enigmail-Version: 1.2pre
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Aug 2011 15:36:36 -0000

on 15/08/2011 17:56 Steven Hartland said the following:
> 
> ----- Original Message ----- From: "Andriy Gapon" <avg@FreeBSD.org>
> To: "Steven Hartland" <killing@multiplay.co.uk>
> Cc: <freebsd-stable@FreeBSD.org>
> Sent: Monday, August 15, 2011 2:20 PM
> Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
> 
> 
>> on 15/08/2011 15:51 Steven Hartland said the following:
>>> ----- Original Message ----- From: "Andriy Gapon" <avg@FreeBSD.org>
>>>
>>>
>>>> on 15/08/2011 13:34 Steven Hartland said the following:
>>>>> (kgdb) list *0xffffffff8053b691
>>>>> 0xffffffff8053b691 is in vm_fault (/usr/src/sys/vm/vm_fault.c:239).
>>>>> 234             /*
>>>>> 235              * Find the backing store object and offset into it to begin the
>>>>> 236              * search.
>>>>> 237              */
>>>>> 238             fs.map = map;
>>>>> 239             result = vm_map_lookup(&fs.map, vaddr, fault_type, &fs.entry,
>>>>> 240                 &fs.first_object, &fs.first_pindex, &prot, &wired);
>>>>> 241             if (result != KERN_SUCCESS) {
>>>>> 242                     if (result != KERN_PROTECTION_FAILURE ||
>>>>> 243                         (fault_flags & VM_FAULT_WIRE_MASK) !=
>>>>> VM_FAULT_USER_WIRE) {
>>>>>
>>>>
>>>> Interesting... thanks!
[snip]
> (kgdb) x/512a 0xffffff8d8f357210

This is not conclusive, but that stack looks like the following recursive chain:
vm_fault -> {vm_map_lookup, vm_map_growstack} -> trap -> trap_pfault -> vm_fault
So I suspect that increasing kernel stack size won't help here much.
Where does this chain come from?  I have no answer at the moment, maybe other
developers could help here.  I suspect that we shouldn't be getting that trap in
vm_map_growstack or should handle it in a different way.

> 0xffffff8d8f357210:     0xffffff8d8f357280      0xffffffff805807d3 <trap_pfault+307>
> 0xffffff8d8f357220:     0x0     0xffffff8d8f357370
> 0xffffff8d8f357230:     0xffffff06b7f9c000      0x30
> 0xffffff8d8f357240:     0x100000000     0x0
> 0xffffff8d8f357250:     0x0     0x9
> 0xffffff8d8f357260:     0xc     0xffffff8d8f357370
> 0xffffff8d8f357270:     0xffffff06b7f9c000      0x0
> 0xffffff8d8f357280:     0xffffff8d8f357360      0xffffffff80580e0f <trap+991>
> 0xffffff8d8f357290:     0x0     0x0
> 0xffffff8d8f3572a0:     0x80074e49e     0x2
> 0xffffff8d8f3572b0:     0x80071cba0     0x80071cdc0
> 0xffffff8d8f3572c0:     0x80071c9a0     0x0
> 0xffffff8d8f3572d0:     0x0     0x0
> 0xffffff8d8f3572e0:     0x0     0x0
> 0xffffff8d8f3572f0:     0x0     0x0
> 0xffffff8d8f357300:     0x80074e49e     0x1
> 0xffffff8d8f357310:     0x80071cba0     0x80071cdc0
> 0xffffff8d8f357320:     0x80071c9a0     0x0
> 0xffffff8d8f357330:     0x0     0x4
> 0xffffff8d8f357340:     0xffffff070b5a48c0      0xffffff06b7f9c000
> 0xffffff8d8f357350:     0x0     0xffffffff8083e920 <vmspace0>
> 0xffffff8d8f357360:     0xffffff8d8f357430      0xffffffff80568f04 <calltrap+8>
> 0xffffff8d8f357370:     0xffffff070b5a48c0      0x3
> 0xffffff8d8f357380:     0xffffff8d8f357440      0x0
> 0xffffff8d8f357390:     0xffffff8d8f357440      0x30
> 0xffffff8d8f3573a0:     0xffffff06b7f9c000      0x4
> 0xffffff8d8f3573b0:     0xffffff8d8f357430      0xffffffff8083e920 <vmspace0>
> 0xffffff8d8f3573c0:     0xffffff06b7f9c000      0xffffff070b5a48c0
> 0xffffff8d8f3573d0:     0xffffff06b7f9c000      0x0
> 0xffffff8d8f3573e0:     0xffffffff8083e920 <vmspace0>   0x1b00130000000c
> 0xffffff8d8f3573f0:     0x30    0x3b003b00000001
> 0xffffff8d8f357400:     0x0     0xffffffff80384632 <lim_rlimit+18>
> 0xffffff8d8f357410:     0x20    0x10206
> 0xffffff8d8f357420:     0xffffff8d8f357430      0x28
> 0xffffff8d8f357430:     0xffffff8d8f357450      0xffffffff80384681 <lim_cur+17>
> 0xffffff8d8f357440:     0x4     0xffffff070b5a48c0
> 0xffffff8d8f357450:     0xffffff8d8f357500      0xffffffff80543ffd
> <vm_map_growstack+93>
> 0xffffff8d8f357460:     0xffffff8d8f357470      0xffffff8d8f3576d8
> 0xffffff8d8f357470:     0xffffff8d8f357500      0xffffffff80544ef8
> <vm_map_lookup+808>
[trim]

-- 
Andriy Gapon

From owner-freebsd-stable@FreeBSD.ORG  Mon Aug 15 16:03:08 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx2.freebsd.org (mx2.freebsd.org [IPv6:2001:4f8:fff6::35])
	by hub.freebsd.org (Postfix) with ESMTP id 931DA1065670
	for <freebsd-stable@freebsd.org>; Mon, 15 Aug 2011 16:03:08 +0000 (UTC)
	(envelope-from ae@FreeBSD.org)
Received: from [127.0.0.1] (hub.freebsd.org [IPv6:2001:4f8:fff6::36])
	by mx2.freebsd.org (Postfix) with ESMTP id 9E1CE1508EA;
	Mon, 15 Aug 2011 16:03:05 +0000 (UTC)
Message-ID: <4E49430D.10609@FreeBSD.org>
Date: Mon, 15 Aug 2011 20:02:21 +0400
From: "Andrey V. Elsukov" <ae@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.2.17) Gecko/20110429 Thunderbird/3.1.10
MIME-Version: 1.0
To: Kevin Oberman <kob6558@gmail.com>
References: <CAN6yY1vukPLPV+bM+BDV_dDaFmiCBGG8BhhiiRrWHmXCa_NFGQ@mail.gmail.com>
In-Reply-To: <CAN6yY1vukPLPV+bM+BDV_dDaFmiCBGG8BhhiiRrWHmXCa_NFGQ@mail.gmail.com>
X-Enigmail-Version: 1.1.2
OpenPGP: id=10C8A17A
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature";
	boundary="------------enig05A5EE2C5BDC0AE357693AFC"
Cc: "freebsd-stable@freebsd.org Stable" <freebsd-stable@freebsd.org>
Subject: Re: GPT boot blocks, booting and booteasy
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Aug 2011 16:03:08 -0000

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig05A5EE2C5BDC0AE357693AFC
Content-Type: text/plain; charset=KOI8-R
Content-Transfer-Encoding: quoted-printable

On 10.08.2011 07:12, Kevin Oberman wrote:
> I have /boot/pmbr loaded into the PMBR and gptboot into the
> freebsd-boot partition. I'll
> admit that I did this by rote and don't understand how these two files
> interact with the
> UEFI BIOS to get the loader started. I'm not really certain that I
> even need both.
>=20
> Is it possible to build a "custom" booteasy boot system with boot0cfg
> or some other tool
> so I can select d ifferent bootable partition or my other disk which
> is sliced in the traditional
> fashion? Can anyone point me to any information on how the boot
> process works with GPT?

PMBR is a simple variant of MBR which does know enough to parse GPT
partition table and how to load bootcode from the "freebsd-boot"
partition. Then gptboot does search bootable UFS partition.
At this time we do not have any  bootcodes like booteasy for GPT.
But you can try to use bootme and bootonce GPT attributes (see
gpart(8)). Also you can use grub boot loader.

--=20
WBR, Andrey V. Elsukov


--------------enig05A5EE2C5BDC0AE357693AFC
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (FreeBSD)

iQEcBAEBAgAGBQJOSUMTAAoJEAHF6gQQyKF6B20IAK4akiXcjdsauGiFw9zCulFx
fIQBUqN6T7Zjq3HX5FBx2695S9ScsSI/nzKi1I+sXCZcMXf75bIF07WPXJRqdD8W
Aw/CAIvBqglvHEA0Edt5Ov1J3z2qoIWERG4bCPgryKK1GxSQ58yLWv4I734HHyiI
oZUOORwr3tLnkDQf0ZxZCMXtJhNDy5fi9/Vy7ZI0cOf5BjPzHXDYzHBBSN9VodfT
jLhVCI0dP5tCjZYo2SdxzSBg/GTh3LO9xlDxZDVhVG1JipELJuPUw1EbUPM3I3me
rmaZ9CC4VG/8y0ea6U/1TP4XKNYyZDoQ37x26poxOLOtpMv4J11+Isv1zQR6aYU=
=raVf
-----END PGP SIGNATURE-----

--------------enig05A5EE2C5BDC0AE357693AFC--

From owner-freebsd-stable@FreeBSD.ORG  Mon Aug 15 16:14:29 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1490C106564A;
	Mon, 15 Aug 2011 16:14:29 +0000 (UTC)
	(envelope-from prvs=1208040d95=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	by mx1.freebsd.org (Postfix) with ESMTP id 564A48FC19;
	Mon, 15 Aug 2011 16:14:27 +0000 (UTC)
X-MDAV-Processed: mail1.multiplay.co.uk, Mon, 15 Aug 2011 17:13:11 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Mon, 15 Aug 2011 17:13:10 +0100
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	mail1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST
	shortcircuit=ham autolearn=disabled version=3.2.5
Received: from r2d2 ([188.220.16.49])
	by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	(MDaemon PRO v10.0.4) with ESMTP id md50014607352.msg;
	Mon, 15 Aug 2011 17:13:08 +0100
X-MDRemoteIP: 188.220.16.49
X-Return-Path: prvs=1208040d95=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
Message-ID: <94438CD02F1447EAB4889D16BFC610B5@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Andriy Gapon" <avg@FreeBSD.org>
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><A71C3ACF01EC4D36871E49805C1A5321@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk><4E43E272.1060204@FreeBSD.org><62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk><4E440865.1040500@FreeBSD.org><6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk><4E441314.6060606@FreeBSD.org><2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk><4E48D967.9060804@FreeBSD.org><9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk><4E490DAF.1080009@FreeBSD.org><796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk><4E491D01.1090902@FreeBSD.org><570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk>
	<4E493CFE.6010207@FreeBSD.org>
Date: Mon, 15 Aug 2011 17:13:43 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
Cc: freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Aug 2011 16:14:29 -0000


----- Original Message ----- 
From: "Andriy Gapon" <avg@FreeBSD.org>
To: "Steven Hartland" <killing@multiplay.co.uk>
Cc: <freebsd-stable@FreeBSD.org>
Sent: Monday, August 15, 2011 4:36 PM
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE


> on 15/08/2011 17:56 Steven Hartland said the following:
>> 
>> ----- Original Message ----- From: "Andriy Gapon" <avg@FreeBSD.org>
>> To: "Steven Hartland" <killing@multiplay.co.uk>
>> Cc: <freebsd-stable@FreeBSD.org>
>> Sent: Monday, August 15, 2011 2:20 PM
>> Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
>> 
>> 
>>> on 15/08/2011 15:51 Steven Hartland said the following:
>>>> ----- Original Message ----- From: "Andriy Gapon" <avg@FreeBSD.org>
>>>>
>>>>
>>>>> on 15/08/2011 13:34 Steven Hartland said the following:
>>>>>> (kgdb) list *0xffffffff8053b691
>>>>>> 0xffffffff8053b691 is in vm_fault (/usr/src/sys/vm/vm_fault.c:239).
>>>>>> 234             /*
>>>>>> 235              * Find the backing store object and offset into it to begin the
>>>>>> 236              * search.
>>>>>> 237              */
>>>>>> 238             fs.map = map;
>>>>>> 239             result = vm_map_lookup(&fs.map, vaddr, fault_type, &fs.entry,
>>>>>> 240                 &fs.first_object, &fs.first_pindex, &prot, &wired);
>>>>>> 241             if (result != KERN_SUCCESS) {
>>>>>> 242                     if (result != KERN_PROTECTION_FAILURE ||
>>>>>> 243                         (fault_flags & VM_FAULT_WIRE_MASK) !=
>>>>>> VM_FAULT_USER_WIRE) {
>>>>>>
>>>>>
>>>>> Interesting... thanks!
> [snip]
>> (kgdb) x/512a 0xffffff8d8f357210
> 
> This is not conclusive, but that stack looks like the following recursive chain:
> vm_fault -> {vm_map_lookup, vm_map_growstack} -> trap -> trap_pfault -> vm_fault
> So I suspect that increasing kernel stack size won't help here much.
> Where does this chain come from?  I have no answer at the moment, maybe other
> developers could help here.  I suspect that we shouldn't be getting that trap in
> vm_map_growstack or should handle it in a different way.
> 

Just in case its relevant I've checked other crashes and all rip entries
point to: vm_fault (/usr/src/sys/vm/vm_fault.c:239).

A more typical layout is from a selection of machines is:-

Unread portion of the kernel message buffer:

Fatal double fault
rip = 0xffffffff8053b061
rsp = 0xffffff86ccf8ffb0
rbp = 0xffffff86ccf90210
cpuid = 8; apic id = 10
panic: double fault
cpuid = 8
KDB: stack backtrace:
#0 0xffffffff803bb28e at kdb_backtrace+0x5e
#1 0xffffffff80389187 at panic+0x187
#2 0xffffffff8057fc86 at dblfault_handler+0x96
#3 0xffffffff805689dd at Xdblfault+0xad
Uptime: 2d21h25m4s
Physical memory: 24555 MB
Dumping 4184 MB:...
----

Unread portion of the kernel message buffer:

Fatal double fault
rip = 0xffffffff8053b061
rsp = 0xffffff86cc742fb0
rbp = 0xffffff86cc743210
cpuid = 8; apic id = 10
panic: double fault
cpuid = 8
KDB: stack backtrace:
#0 0xffffffff803bb28e at kdb_backtrace+0x5e
#1 0xffffffff80389187 at panic+0x187
#2 0xffffffff8057fc86 at dblfault_handler+0x96
#3 0xffffffff805689dd at Xdblfault+0xad
Uptime: 2d4h30m58s
Physical memory: 24555 MB
Dumping 5088 MB:...
----

Fatal double fault
rip = 0xffffffff8053b061
rsp = 0xffffff86caeabfb0
rbp = 0xffffff86caeac210
cpuid = 8; apic id = 10
panic: double fault
cpuid = 8
KDB: stack backtrace:
#0 0xffffffff803bb28e at kdb_backtrace+0x5e
#1 0xffffffff80389187 at panic+0x187
#2 0xffffffff8057fc86 at dblfault_handler+0x96
#3 0xffffffff805689dd at Xdblfault+0xad
Uptime: 3d1h56m45s
Physical memory: 24555 MB
Dumping 4690 MB:...
----

Fatal double fault
rip = 0xffffffff8053b061
rsp = 0xffffff86cb1c7fb0
rbp = 0xffffff86cb1c8210
cpuid = 4; apic id = 04
panic: double fault
cpuid = 4
KDB: stack backtrace:
#0 0xffffffff803bb28e at kdb_backtrace+0x5e
#1 0xffffffff80389187 at panic+0x187
#2 0xffffffff8057fc86 at dblfault_handler+0x96
#3 0xffffffff805689dd at Xdblfault+0xad
Uptime: 1d13h41m19s
Physical memory: 24555 MB
Dumping 3626 MB:...

And in case any of the changes to loader.conf or sysctl.conf are
relevant here they are:-
[loader.conf]
zfs_load="YES"
vfs.root.mountfrom="zfs:tank/root"
# fix swap zone exhausted, increase kern.maxswzone
kern.maxswzone=67108864
# Reduce the minimum arc level we want our apps to have the memory
vfs.zfs.arc_min="512M"
[/loader.conf]

[sysctl.conf]
vfs.read_max=32
net.inet.tcp.inflight.enable=0
net.inet.tcp.sendspace=65536
kern.ipc.maxsockbuf=524288
kern.maxfiles=50000
kern.ipc.nmbclusters=51200
[/sysctl.conf]

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-stable@FreeBSD.ORG  Mon Aug 15 20:34:08 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BE087106566B
	for <freebsd-stable@freebsd.org>; Mon, 15 Aug 2011 20:34:08 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 951C28FC0C
	for <freebsd-stable@freebsd.org>; Mon, 15 Aug 2011 20:34:08 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id 49C1E46B06;
	Mon, 15 Aug 2011 16:34:08 -0400 (EDT)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id D8C718A02E;
	Mon, 15 Aug 2011 16:34:07 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-stable@freebsd.org
Date: Mon, 15 Aug 2011 16:16:44 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110617; KDE/4.5.5; amd64; ; )
References: <4846F699-215D-4408-BD3C-4860305BF6B8@transactionware.com>
In-Reply-To: <4846F699-215D-4408-BD3C-4860305BF6B8@transactionware.com>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="windows-1252"
Content-Transfer-Encoding: quoted-printable
Message-Id: <201108151616.44880.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6
	(bigwig.baldwin.cx); Mon, 15 Aug 2011 16:34:07 -0400 (EDT)
Cc: Jan Mikkelsen <janm@transactionware.com>
Subject: Re: Patch to puc(4) to support Moxa CP-112UL board
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Aug 2011 20:34:08 -0000

On Wednesday, August 10, 2011 7:55:18 pm Jan Mikkelsen wrote:
> Hi,
>=20
> I have added these device IDs to pucdata.c to support the Moxa CP-112UL b=
oard family.
>=20
> Should I submit a problem report, or is there an easier way to get the pa=
tch merged?
>=20
> (I care about 8-STABLE at the moment =85)
>=20
> Thanks,
>=20
> Jan Mikkelsen

Committed to HEAD, will MFC in a week or so, thanks!

=2D-=20
John Baldwin

From owner-freebsd-stable@FreeBSD.ORG  Tue Aug 16 00:36:47 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CDE1E106564A
	for <freebsd-stable@freebsd.org>; Tue, 16 Aug 2011 00:36:47 +0000 (UTC)
	(envelope-from kob6558@gmail.com)
Received: from mail-yi0-f54.google.com (mail-yi0-f54.google.com
	[209.85.218.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 8E5628FC12
	for <freebsd-stable@freebsd.org>; Tue, 16 Aug 2011 00:36:47 +0000 (UTC)
Received: by yib19 with SMTP id 19so4017104yib.13
	for <multiple recipients>; Mon, 15 Aug 2011 17:36:46 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	bh=x2ncZz/CexHzeXtiwuWIGJ7e0cJIcN82nV1g7Ii+7JI=;
	b=xLChC7QFPpCjLZnnjGoqaWX7/pbtZD+AqdUOyUGNUhLZDTFYvyT89hLuhd19MAOY2B
	ycAuBXwd2zPnZJztn+S06sGLBvapq8JhGXF1ax8IfuzvUQS7Zpjn3EPtQSsinh3CGs8b
	sOnNOc06eFJnFmnWSJoSDuBWjoF2aO6K5m3+g=
MIME-Version: 1.0
Received: by 10.150.215.2 with SMTP id n2mr5388754ybg.152.1313455006792; Mon,
	15 Aug 2011 17:36:46 -0700 (PDT)
Received: by 10.151.98.3 with HTTP; Mon, 15 Aug 2011 17:36:46 -0700 (PDT)
In-Reply-To: <4E49430D.10609@FreeBSD.org>
References: <CAN6yY1vukPLPV+bM+BDV_dDaFmiCBGG8BhhiiRrWHmXCa_NFGQ@mail.gmail.com>
	<4E49430D.10609@FreeBSD.org>
Date: Mon, 15 Aug 2011 17:36:46 -0700
Message-ID: <CAN6yY1vYyR+uhUy-8oNvuVBVcPW5T5yMkUTXADfwhyWJ1rX4=A@mail.gmail.com>
From: Kevin Oberman <kob6558@gmail.com>
To: "Andrey V. Elsukov" <ae@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: "freebsd-stable@freebsd.org Stable" <freebsd-stable@freebsd.org>
Subject: Re: GPT boot blocks, booting and booteasy
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Aug 2011 00:36:47 -0000

2011/8/15 Andrey V. Elsukov <ae@freebsd.org>:
> On 10.08.2011 07:12, Kevin Oberman wrote:
>> I have /boot/pmbr loaded into the PMBR and gptboot into the
>> freebsd-boot partition. I'll
>> admit that I did this by rote and don't understand how these two files
>> interact with the
>> UEFI BIOS to get the loader started. I'm not really certain that I
>> even need both.
>>
>> Is it possible to build a "custom" booteasy boot system with boot0cfg
>> or some other tool
>> so I can select d ifferent bootable partition or my other disk which
>> is sliced in the traditional
>> fashion? Can anyone point me to any information on how the boot
>> process works with GPT?
>
> PMBR is a simple variant of MBR which does know enough to parse GPT
> partition table and how to load bootcode from the "freebsd-boot"
> partition. Then gptboot does search bootable UFS partition.
> At this time we do not have any =A0bootcodes like booteasy for GPT.
> But you can try to use bootme and bootonce GPT attributes (see
> gpart(8)). Also you can use grub boot loader.

Andrey,

Thanks for the response. The 'bootme' and 'bootonce' attributes look to sol=
ve
some issues. Looks like I might need to have a bios-boot partition to use g=
rub,
but I may give it a shot. On the whole, the advantages of GPT are such that=
 I
would love to see FreeBSD move to make it the standard partitioning scheme,
though I understand this will not be easy until/unless Windows develops ful=
l
GPT support.

Just having more than 4 partitions as opposed to having to sub-partition a =
real
partition (slice) is very nice.
--=20
R. Kevin Oberman, Network Engineer - Retired
E-mail: kob6558@gmail.com

From owner-freebsd-stable@FreeBSD.ORG  Tue Aug 16 06:44:38 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C3A87106566B
	for <freebsd-stable@freebsd.org>; Tue, 16 Aug 2011 06:44:38 +0000 (UTC)
	(envelope-from yuri@rawbw.com)
Received: from shell0.rawbw.com (shell0.rawbw.com [198.144.192.45])
	by mx1.freebsd.org (Postfix) with ESMTP id 980028FC12
	for <freebsd-stable@freebsd.org>; Tue, 16 Aug 2011 06:44:38 +0000 (UTC)
Received: from eagle.yuri.org (stunnel@localhost [127.0.0.1])
	(authenticated bits=0)
	by shell0.rawbw.com (8.14.4/8.14.4) with ESMTP id p7G6LsCJ033597
	for <freebsd-stable@freebsd.org>; Mon, 15 Aug 2011 23:21:54 -0700 (PDT)
	(envelope-from yuri@rawbw.com)
Message-ID: <4E4A0C81.7020501@rawbw.com>
Date: Mon, 15 Aug 2011 23:21:53 -0700
From: Yuri <yuri@rawbw.com>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110716 Thunderbird/5.0
MIME-Version: 1.0
To: freebsd-stable@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: How to use unrecognized COM port card?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Aug 2011 06:44:38 -0000

I have dual COM port pci card:
none7@pci0:8:1:0:       class=0x070002 card=0x32534348 chip=0x32534348 
rev=0x10 hdr=0x00
     class      = simple comms
     subclass   = UART
     bar   [10] = type I/O Port, range 32, base 0xe880, size  8, enabled
     bar   [14] = type I/O Port, range 32, base 0xe800, size  8, enabled

Manufacturer 0x4348 isn't recognized by http://www.pcidatabase.com. It 
was purchased from China through ebay.

How to make it to work in 8.2-STABLE?

Yuri

From owner-freebsd-stable@FreeBSD.ORG  Tue Aug 16 07:48:15 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4F89E1065670
	for <freebsd-stable@freebsd.org>; Tue, 16 Aug 2011 07:48:15 +0000 (UTC)
	(envelope-from delphij@delphij.net)
Received: from anubis.delphij.net (anubis.delphij.net
	[IPv6:2001:470:1:117::25])
	by mx1.freebsd.org (Postfix) with ESMTP id 3467A8FC15
	for <freebsd-stable@freebsd.org>; Tue, 16 Aug 2011 07:48:15 +0000 (UTC)
Received: from delta.delphij.net (c-76-102-50-245.hsd1.ca.comcast.net
	[76.102.50.245])
	(using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
	(No client certificate requested)
	by anubis.delphij.net (Postfix) with ESMTPSA id E7048139CA;
	Tue, 16 Aug 2011 00:48:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=delphij.net; s=anubis;
	t=1313480895; bh=pOgxblFWQSI2Vjlkv5hOrnrwYqBIdVy+9Yk02rdPmCk=;
	h=Message-ID:Date:From:Reply-To:MIME-Version:To:Subject:References:
	In-Reply-To:Content-Type;
	b=vKoVRUQ4SwuNuERXxzZ+mI9PbgCrPe+mW7oInWzvxqR8olnTdR8BREa6XWsYgBpsW
	X6AdkYbIXaov22RUfPIT6E26LT8Dv41TL1dwUy1S7gN5GHbVzkLjN6WBE2ony6G8Wl
	JGoMsEPsemMpwwDepX6HaNPaK3KF0m+RENqnm/Uw=
Message-ID: <4E4A20BE.3060603@delphij.net>
Date: Tue, 16 Aug 2011 00:48:14 -0700
From: Xin LI <delphij@delphij.net>
Organization: The FreeBSD Project
MIME-Version: 1.0
To: freebsd-stable@freebsd.org
References: <4E4A0C81.7020501@rawbw.com>
In-Reply-To: <4E4A0C81.7020501@rawbw.com>
OpenPGP: id=3FCA37C1;
	url=http://www.delphij.net/delphij.asc
Content-Type: multipart/mixed; boundary="------------070402070505040801070209"
Subject: Re: How to use unrecognized COM port card?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: d@delphij.net
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Aug 2011 07:48:15 -0000

This is a multi-part message in MIME format.
--------------070402070505040801070209
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 08/15/11 23:21, Yuri wrote:
> I have dual COM port pci card: none7@pci0:8:1:0:       class=0x070002
> card=0x32534348 chip=0x32534348 rev=0x10 hdr=0x00 class      = simple
> comms subclass   = UART bar   [10] = type I/O Port, range 32, base
> 0xe880, size  8, enabled bar   [14] = type I/O Port, range 32, base
> 0xe800, size  8, enabled
> 
> Manufacturer 0x4348 isn't recognized by http://www.pcidatabase.com.
> It was purchased from China through ebay.
> 
> How to make it to work in 8.2-STABLE?

A wild guess...  (You gotta to provide more details rather than just PCI
IDs).

My guess is that it's using these chips:

http://www.winchiphead.com/product/ch365detail.htm
http://www.winchiphead.com/product/ch353detail.htm

It didn't talked about possible cards' configuration so I used BAR0,
which could be 0x14, 0x18, etc.

Cheers,
- -- 
Xin LI <delphij@delphij.net>	https://www.delphij.net/
FreeBSD - The Power to Serve!		Live free or die
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (FreeBSD)

iQEcBAEBCAAGBQJOSiC9AAoJEATO+BI/yjfB5oAH/R0yt8Zx3HDVOXA5jUOXzlWl
A+XCmbaau4MNhOtiyVJ8sWERE1CukgQeIE7DWze1rJ6YU7bTXKAgoRbqVJsfiAbH
CEhLx+Y2T7HLow9ZojCGrqk6ydrGxheWIyf2AM7nTORZQdEUceEWGLE4GMXJghTp
Y4udsGfSRqa+1O7tTOpechDi5jtG/cW+dDFeyZqVo0AjfS78D10wEqoiudloIkBd
IAEyy7JGCU/R6AM+DhHHm0dIT68MkHxULOpTLy0GxxzJecWruknqd+h+V36Q3X+h
brg2isOawCGLhWgzCDXVZXwJWIXA28RaRmDPeZRNv5TKUESmZEenR8lEpH7ji+s=
=KUoE
-----END PGP SIGNATURE-----

--------------070402070505040801070209
Content-Type: text/plain;
 name="uart.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="uart.diff"

Index: sys/dev/uart/uart_bus_pci.c
===================================================================
--- sys/dev/uart/uart_bus_pci.c	(revision 224900)
+++ sys/dev/uart/uart_bus_pci.c	(working copy)
@@ -111,6 +111,7 @@
 { 0x1415, 0x950b, 0xffff, 0, "Oxford Semiconductor OXCB950 Cardbus 16950 UART",
 	0x10, 16384000 },
 { 0x151f, 0x0000, 0xffff, 0, "TOPIC Semiconductor TP560 56k modem", 0x10 },
+{ 0x4348, 0x3253, 0xffff, 0, "WinChipHead Dual Port RS-232", 0x10 },
 { 0x9710, 0x9820, 0x1000, 1, "NetMos NM9820 Serial Port", 0x10 },
 { 0x9710, 0x9835, 0x1000, 1, "NetMos NM9835 Serial Port", 0x10 },
 { 0x9710, 0x9865, 0xa000, 0x1000, "NetMos NM9865 Serial Port", 0x10 },

--------------070402070505040801070209--

From owner-freebsd-stable@FreeBSD.ORG  Tue Aug 16 09:01:44 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0242B106566C
	for <freebsd-stable@freebsd.org>; Tue, 16 Aug 2011 09:01:44 +0000 (UTC)
	(envelope-from damian.jagosz@gmail.com)
Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com
	[209.85.161.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 86C0A8FC17
	for <freebsd-stable@freebsd.org>; Tue, 16 Aug 2011 09:01:43 +0000 (UTC)
Received: by fxe4 with SMTP id 4so5287418fxe.13
	for <freebsd-stable@freebsd.org>; Tue, 16 Aug 2011 02:01:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=from:content-type:content-transfer-encoding:subject:date:message-id
	:to:mime-version:x-mailer;
	bh=47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=;
	b=k01WzM81K6rTnaCaf+3+/hjyujXWYvpvAQRCrr54pOo2Wn3OLg+ES4pb72SL/m5aey
	X6/pcUKS//+sSsl3mI120lzjvC64ygTq7nRcweUpBKC/yXkHmQDMlHKyAnonB5PVyb3A
	ayVlExkM+9Jp9XFcQnULOIp75G8cack71irUE=
Received: by 10.223.61.79 with SMTP id s15mr5761025fah.117.1313483980590;
	Tue, 16 Aug 2011 01:39:40 -0700 (PDT)
Received: from [192.168.10.197] ([31.134.59.96])
	by mx.google.com with ESMTPS id f12sm1019690fai.25.2011.08.16.01.39.37
	(version=TLSv1/SSLv3 cipher=OTHER);
	Tue, 16 Aug 2011 01:39:39 -0700 (PDT)
From: Damian Jagosz <damian.jagosz@gmail.com>
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
Date: Tue, 16 Aug 2011 10:39:33 +0200
Message-Id: <101CCD1C-AAFB-4638-91B0-46D085C71B11@gmail.com>
To: freebsd-stable@freebsd.org
Mime-Version: 1.0 (Apple Message framework v1244.3)
X-Mailer: Apple Mail (2.1244.3)
Subject: off
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Aug 2011 09:01:44 -0000


From owner-freebsd-stable@FreeBSD.ORG  Tue Aug 16 09:25:41 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E5C7D106566C
	for <freebsd-stable@freebsd.org>; Tue, 16 Aug 2011 09:25:41 +0000 (UTC)
	(envelope-from yuri@rawbw.com)
Received: from shell0.rawbw.com (shell0.rawbw.com [198.144.192.45])
	by mx1.freebsd.org (Postfix) with ESMTP id CF8A88FC0C
	for <freebsd-stable@freebsd.org>; Tue, 16 Aug 2011 09:25:41 +0000 (UTC)
Received: from eagle.yuri.org (stunnel@localhost [127.0.0.1])
	(authenticated bits=0)
	by shell0.rawbw.com (8.14.4/8.14.4) with ESMTP id p7G9PSsK061099;
	Tue, 16 Aug 2011 02:25:29 -0700 (PDT) (envelope-from yuri@rawbw.com)
Message-ID: <4E4A3788.3030605@rawbw.com>
Date: Tue, 16 Aug 2011 02:25:28 -0700
From: Yuri <yuri@rawbw.com>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110716 Thunderbird/5.0
MIME-Version: 1.0
To: d@delphij.net
References: <4E4A0C81.7020501@rawbw.com> <4E4A20BE.3060603@delphij.net>
In-Reply-To: <4E4A20BE.3060603@delphij.net>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-stable@freebsd.org, Xin LI <delphij@delphij.net>
Subject: Re: How to use unrecognized COM port card?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Aug 2011 09:25:42 -0000

On 08/16/2011 00:48, Xin LI wrote:
> A wild guess...  (You gotta to provide more details rather than just PCI
> IDs).
>
> My guess is that it's using these chips:
>
> http://www.winchiphead.com/product/ch365detail.htm
> http://www.winchiphead.com/product/ch353detail.htm
>
> It didn't talked about possible cards' configuration so I used BAR0,
> which could be 0x14, 0x18, etc.

Actually, the main chip there is CH352L. Plus there are two more chips 
ST75185C, one per COM port.

Your patch made this pci device to connect to uart driver: uart2@pci0:8:1:0.

uart2: <16550 or compatible> port 0xe880-0xe887,0xe800-0xe807 irq 17 at 
device 1.0 on pci8
uart2: [FILTER]

Also new devices showed up:
/dev/cuau2
/dev/cuau2.init
/dev/cuau2.lock
/dev/ttyu2
/dev/ttyu2.init
/dev/ttyu2.lock

cuau2 is probably the same as COM port. I don't have an easy way to 
check now.
I believe adding another entry with 0x14 would add the second COM port.

Thank you!
Yuri

From owner-freebsd-stable@FreeBSD.ORG  Tue Aug 16 15:57:22 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5F046106566C
	for <freebsd-stable@freebsd.org>; Tue, 16 Aug 2011 15:57:22 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 32B7F8FC1F
	for <freebsd-stable@freebsd.org>; Tue, 16 Aug 2011 15:57:22 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id CDB0246B06;
	Tue, 16 Aug 2011 11:57:21 -0400 (EDT)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id 6AC688A02E;
	Tue, 16 Aug 2011 11:57:21 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-stable@freebsd.org
Date: Tue, 16 Aug 2011 11:57:20 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110617; KDE/4.5.5; amd64; ; )
References: <4E4A0C81.7020501@rawbw.com> <4E4A20BE.3060603@delphij.net>
	<4E4A3788.3030605@rawbw.com>
In-Reply-To: <4E4A3788.3030605@rawbw.com>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201108161157.20890.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6
	(bigwig.baldwin.cx); Tue, 16 Aug 2011 11:57:21 -0400 (EDT)
Cc: Yuri <yuri@rawbw.com>, d@delphij.net, Xin LI <delphij@delphij.net>
Subject: Re: How to use unrecognized COM port card?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Aug 2011 15:57:22 -0000

On Tuesday, August 16, 2011 5:25:28 am Yuri wrote:
> On 08/16/2011 00:48, Xin LI wrote:
> > A wild guess...  (You gotta to provide more details rather than just PCI
> > IDs).
> >
> > My guess is that it's using these chips:
> >
> > http://www.winchiphead.com/product/ch365detail.htm
> > http://www.winchiphead.com/product/ch353detail.htm
> >
> > It didn't talked about possible cards' configuration so I used BAR0,
> > which could be 0x14, 0x18, etc.
> 
> Actually, the main chip there is CH352L. Plus there are two more chips 
> ST75185C, one per COM port.
> 
> Your patch made this pci device to connect to uart driver: uart2@pci0:8:1:0.
> 
> uart2: <16550 or compatible> port 0xe880-0xe887,0xe800-0xe807 irq 17 at 
> device 1.0 on pci8
> uart2: [FILTER]
> 
> Also new devices showed up:
> /dev/cuau2
> /dev/cuau2.init
> /dev/cuau2.lock
> /dev/ttyu2
> /dev/ttyu2.init
> /dev/ttyu2.lock
> 
> cuau2 is probably the same as COM port. I don't have an easy way to 
> check now.
> I believe adding another entry with 0x14 would add the second COM port.

For multiport devices you will want to add an entry to sys/dev/puc/pucdata.c 
and use the puc driver instead of patching uart directly.  Perhaps this:

Index: pucdata.c
===================================================================
--- pucdata.c	(revision 224898)
+++ pucdata.c	(working copy)
@@ -862,6 +862,13 @@ const struct puc_cfg puc_pci_devices[] = {
 	    .config_function = puc_config_syba
 	},
 
+	{
+	    0x4348, 0x3253, 0xffff, 0,
+	    "WinChipHead Dual Port RS-232",
+	    DEFAULT_RCLK,
+	    PUC_PORT_2S, 0x10, 4, 0,
+	},
+
 	{   0x6666, 0x0001, 0xffff, 0,
 	    "Decision Computer Inc, PCCOM 4-port serial",
 	    DEFAULT_RCLK,


-- 
John Baldwin

From owner-freebsd-stable@FreeBSD.ORG  Tue Aug 16 19:36:31 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 85AA2106566B
	for <freebsd-stable@freebsd.org>; Tue, 16 Aug 2011 19:36:31 +0000 (UTC)
	(envelope-from petros.fraser@gmail.com)
Received: from mail-qy0-f175.google.com (mail-qy0-f175.google.com
	[209.85.216.175])
	by mx1.freebsd.org (Postfix) with ESMTP id 4539E8FC08
	for <freebsd-stable@freebsd.org>; Tue, 16 Aug 2011 19:36:31 +0000 (UTC)
Received: by mail-qy0-f175.google.com with SMTP id 4so1879692qyk.13
	for <freebsd-stable@freebsd.org>; Tue, 16 Aug 2011 12:36:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:date:message-id:subject:from:to:content-type;
	bh=u5/OWYaX+GSU6cKF2Qw3nWe14s3kfC04axhGR4Y3kqU=;
	b=fH+EsLTmJWE3urrGfycIPprHzSKGeQ0DfN2WdsqfygZB7spbF2sWtH28dCSnI6LsB0
	R0FJUfFtqMlxMJ4lrwvWb9sJs909Pwr0gxV51/1ZhHfibaP1fH2wxwJ/crDKcBYS3Wti
	opyRWP1j0VBlChQRfMY/tD6KlgrzCvMZUfPE8=
MIME-Version: 1.0
Received: by 10.52.93.98 with SMTP id ct2mr94179vdb.314.1313521847351; Tue, 16
	Aug 2011 12:10:47 -0700 (PDT)
Received: by 10.52.184.225 with HTTP; Tue, 16 Aug 2011 12:10:47 -0700 (PDT)
Date: Tue, 16 Aug 2011 14:10:47 -0500
Message-ID: <CAALr=TtU2bg7BjQ-9BdBofxVKhCVuHgur=348cQihC9pAoVtPg@mail.gmail.com>
From: Peter Fraser <petros.fraser@gmail.com>
To: freebsd-stable@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
Subject: Upgrade to 7.4
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Aug 2011 19:36:31 -0000

Hi All
I just ran freebsd-update to upgrade from 7.0 to 7.4 I figured
everything went ok. This is what I did.

1. freebsd-update upgrade -r 7.4-RELEASE

2.  freebsd-update install

3. shutdown -r now

4. freebsd-update install

5. shutdown -r now

The system came back up ok but now if I run another freebsd-update
fetch, I get this error below

config_IDSIgnorePaths: not found
Error processing configuration file, line 26:
==> IDSIgnorePaths /usr/share/man/cat

Is this an error I need to worry about?

How can I correct this if so?

From owner-freebsd-stable@FreeBSD.ORG  Tue Aug 16 19:53:22 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A8C58106566B
	for <freebsd-stable@freebsd.org>; Tue, 16 Aug 2011 19:53:22 +0000 (UTC)
	(envelope-from yuri@rawbw.com)
Received: from shell0.rawbw.com (shell0.rawbw.com [198.144.192.45])
	by mx1.freebsd.org (Postfix) with ESMTP id 70FA48FC14
	for <freebsd-stable@freebsd.org>; Tue, 16 Aug 2011 19:53:22 +0000 (UTC)
Received: from eagle.yuri.org (stunnel@localhost [127.0.0.1])
	(authenticated bits=0)
	by shell0.rawbw.com (8.14.4/8.14.4) with ESMTP id p7GJrHNH031330;
	Tue, 16 Aug 2011 12:53:17 -0700 (PDT) (envelope-from yuri@rawbw.com)
Message-ID: <4E4ACAAD.3030506@rawbw.com>
Date: Tue, 16 Aug 2011 12:53:17 -0700
From: Yuri <yuri@rawbw.com>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110716 Thunderbird/5.0
MIME-Version: 1.0
To: John Baldwin <jhb@freebsd.org>
References: <4E4A0C81.7020501@rawbw.com> <4E4A20BE.3060603@delphij.net>
	<4E4A3788.3030605@rawbw.com> <201108161157.20890.jhb@freebsd.org>
In-Reply-To: <201108161157.20890.jhb@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: d@delphij.net, freebsd-stable@freebsd.org, Xin LI <delphij@delphij.net>
Subject: Re: How to use unrecognized COM port card?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Aug 2011 19:53:22 -0000

On 08/16/2011 08:57, John Baldwin wrote:
> For multiport devices you will want to add an entry to sys/dev/puc/pucdata.c
> and use the puc driver instead of patching uart directly.  Perhaps this:

John,

I did what you suggested:
puc0: <WinChipHead Dual Port RS-232> port 0xe880-0xe887,0xe800-0xe807 
irq 17 at device 1.0 on pci8

But it doesn't show up as a serial device and tty.

Yuri

From owner-freebsd-stable@FreeBSD.ORG  Tue Aug 16 20:30:25 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C0308106567E
	for <freebsd-stable@FreeBSD.org>; Tue, 16 Aug 2011 20:30:25 +0000 (UTC)
	(envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 117778FC1A
	for <freebsd-stable@FreeBSD.org>; Tue, 16 Aug 2011 20:30:24 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA10423;
	Tue, 16 Aug 2011 23:30:22 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
	by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1QtQHF-000Afe-SH; Tue, 16 Aug 2011 23:30:21 +0300
Message-ID: <4E4AD35C.7020504@FreeBSD.org>
Date: Tue, 16 Aug 2011 23:30:20 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110706 Thunderbird/5.0
MIME-Version: 1.0
To: Steven Hartland <killing@multiplay.co.uk>
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><A71C3ACF01EC4D36871E49805C1A5321@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk>
	<4E43E272.1060204@FreeBSD.org>
	<62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk>
	<4E440865.1040500@FreeBSD.org>
	<6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk>
	<4E441314.6060606@FreeBSD.org>
	<2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk>
	<4E48D967.9060804@FreeBSD.org>
	<9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk>
	<4E490DAF.1080009@FreeBSD.org>
	<796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk>
	<4E491D01.1090902@FreeBSD.org>
	<570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk>
In-Reply-To: <570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk>
X-Enigmail-Version: 1.2pre
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Aug 2011 20:30:25 -0000

on 15/08/2011 17:56 Steven Hartland said the following:
> (kgdb) x/512a 0xffffff8d8f357210
[snip]

Can you please also provide the following for this core?
list *vm_map_growstack+93
list *lim_cur+17
list *lim_rlimit+18

Also, it would be interesting to get panic output with DDB option.

-- 
Andriy Gapon

From owner-freebsd-stable@FreeBSD.ORG  Tue Aug 16 20:37:35 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C0C7C1065680;
	Tue, 16 Aug 2011 20:37:35 +0000 (UTC)
	(envelope-from delphij@delphij.net)
Received: from anubis.delphij.net (anubis.delphij.net
	[IPv6:2001:470:1:117::25])
	by mx1.freebsd.org (Postfix) with ESMTP id A25818FC16;
	Tue, 16 Aug 2011 20:37:35 +0000 (UTC)
Received: from delta.delphij.net (drawbridge.ixsystems.com [206.40.55.65])
	(using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
	(No client certificate requested)
	by anubis.delphij.net (Postfix) with ESMTPSA id 69C4813EB2;
	Tue, 16 Aug 2011 13:37:35 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=delphij.net; s=anubis;
	t=1313527055; bh=ZlDdj2pqbMzUDJklO0LSOYV6MB7rQU57e/5RmMdoLM8=;
	h=Message-ID:Date:From:Reply-To:MIME-Version:To:CC:Subject:
	References:In-Reply-To:Content-Type;
	b=gY2aTO7Z65z8qWGrM3R7IAJ/mRb5HiXGdhXZevg8SvacTVMiVG8BCZKCx14m4bJLI
	yQQpwYIV7YjeUroaZ7aiQtzkV5xLeatB2OqS65smeFaqLEAUprhtLJwPh2ntfiLS5y
	dt8E2fVrWqqgIUCkjekV8gJT0fYDx9dbNQdUCN4g=
Message-ID: <4E4AD50E.6050906@delphij.net>
Date: Tue, 16 Aug 2011 13:37:34 -0700
From: Xin LI <delphij@delphij.net>
Organization: The FreeBSD Project
MIME-Version: 1.0
To: Yuri <yuri@rawbw.com>
References: <4E4A0C81.7020501@rawbw.com> <4E4A20BE.3060603@delphij.net>
	<4E4A3788.3030605@rawbw.com> <201108161157.20890.jhb@freebsd.org>
	<4E4ACAAD.3030506@rawbw.com>
In-Reply-To: <4E4ACAAD.3030506@rawbw.com>
OpenPGP: id=3FCA37C1;
	url=http://www.delphij.net/delphij.asc
Content-Type: multipart/mixed; boundary="------------060306060307040906060005"
Cc: d@delphij.net, freebsd-stable@freebsd.org, John Baldwin <jhb@freebsd.org>
Subject: Re: How to use unrecognized COM port card?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: d@delphij.net
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Aug 2011 20:37:35 -0000

This is a multi-part message in MIME format.
--------------060306060307040906060005
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 08/16/11 12:53, Yuri wrote:
> On 08/16/2011 08:57, John Baldwin wrote:
>> For multiport devices you will want to add an entry to 
>> sys/dev/puc/pucdata.c and use the puc driver instead of patching
>> uart directly.  Perhaps this:
> 
> John,
> 
> I did what you suggested: puc0: <WinChipHead Dual Port RS-232> port
> 0xe880-0xe887,0xe800-0xe807 irq 17 at device 1.0 on pci8
> 
> But it doesn't show up as a serial device and tty.

I found a datasheet:

	http://wch-ic.com/download/down.asp?id=116 (English)

	and

	http://winchiphead.com/download/CH352/CH352DS1.PDF (Chinese)

And I think John's patch is right, I've added a new PCI ID for it
though, found from the datasheet.  Did you have uart(4) in your kernel
(remove my old patch)?

Cheers,
- -- 
Xin LI <delphij@delphij.net>	https://www.delphij.net/
FreeBSD - The Power to Serve!		Live free or die
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (FreeBSD)

iQEcBAEBCAAGBQJOStUOAAoJEATO+BI/yjfBSw0IANPaoND+0Xa2QtueAxI8Qa42
V86MiUnaZopRb0coiWf8dQNk+nIlayVuFstC9+77zC9NEEu1O7Mp8T4n2Bx2N7WP
jtsevUnLJq6lIyo0jYRTf4x84eYd1VDBduHqsWbI0B7aMArgfNtHvPV0qUD9Emrn
4yR6I3/tmO3sX3+cWcggYC4s3DIm7XidiyT/6lcWilsmy2QkQlw00HoAkoKl0V4m
DBkKHkmOB2oTUYadpBOKCt6HvdI29xWYF+1zN/sE0B3XwTy+Q1pp4Uq5KiBUyJi3
tNF533Z7COh/mog/Z9cpGpLSRJpWQgI2uCY7gAHZRAMT2+7k1AqkdNPWTJPXoCk=
=CcI6
-----END PGP SIGNATURE-----

--------------060306060307040906060005
Content-Type: text/plain;
 name="puc.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="puc.diff"

Index: sys/dev/puc/pucdata.c
===================================================================
--- sys/dev/puc/pucdata.c	(revision 224912)
+++ sys/dev/puc/pucdata.c	(working copy)
@@ -862,6 +862,20 @@ const struct puc_cfg puc_pci_devices[] = {
 	    .config_function = puc_config_syba
 	},
 
+	{
+	    0x4348, 0x3253, 0x4348, 0x3253,
+	    "WinChipHead Dual Port RS-232",
+	    DEFAULT_RCLK,
+	    PUC_PORT_2S, 0x10, 4, 0,
+	},
+
+	{
+	    0x4348, 0x5053, 0x4348, 0x5053,
+	    "WinChipHead RS-232 and Printer port",
+	    DEFAULT_RCLK,
+	    PUC_PORT_1S1P, 0x10, 4, 0,
+	},
+
 	{   0x6666, 0x0001, 0xffff, 0,
 	    "Decision Computer Inc, PCCOM 4-port serial",
 	    DEFAULT_RCLK,

--------------060306060307040906060005--

From owner-freebsd-stable@FreeBSD.ORG  Tue Aug 16 20:54:33 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BDC5B106566C;
	Tue, 16 Aug 2011 20:54:33 +0000 (UTC)
	(envelope-from prvs=1209a97202=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	by mx1.freebsd.org (Postfix) with ESMTP id 0B4998FC13;
	Tue, 16 Aug 2011 20:54:32 +0000 (UTC)
X-MDAV-Processed: mail1.multiplay.co.uk, Tue, 16 Aug 2011 21:42:41 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Tue, 16 Aug 2011 21:42:40 +0100
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	mail1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST
	shortcircuit=ham autolearn=disabled version=3.2.5
Received: from r2d2 ([188.220.16.49])
	by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	(MDaemon PRO v10.0.4) with ESMTP id md50014625368.msg;
	Tue, 16 Aug 2011 21:42:40 +0100
X-MDRemoteIP: 188.220.16.49
X-Return-Path: prvs=1209a97202=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
Message-ID: <6A7238AED44542A880B082A40304D940@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Andriy Gapon" <avg@FreeBSD.org>
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><A71C3ACF01EC4D36871E49805C1A5321@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk>
	<4E43E272.1060204@FreeBSD.org>
	<62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk>
	<4E440865.1040500@FreeBSD.org>
	<6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk>
	<4E441314.6060606@FreeBSD.org>
	<2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk>
	<4E48D967.9060804@FreeBSD.org>
	<9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk>
	<4E490DAF.1080009@FreeBSD.org>
	<796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk>
	<4E491D01.1090902@FreeBSD.org>
	<570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk>
	<4E4AD35C.7020504@FreeBSD.org>
Date: Tue, 16 Aug 2011 21:43:21 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
Cc: freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Aug 2011 20:54:33 -0000


----- Original Message ----- 
From: "Andriy Gapon" <avg@FreeBSD.org>
To: "Steven Hartland" <killing@multiplay.co.uk>
Cc: <freebsd-stable@FreeBSD.org>
Sent: Tuesday, August 16, 2011 9:30 PM
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE


> on 15/08/2011 17:56 Steven Hartland said the following:
>> (kgdb) x/512a 0xffffff8d8f357210
> [snip]
> 
> Can you please also provide the following for this core?
> list *vm_map_growstack+93
> list *lim_cur+17
> list *lim_rlimit+18
> 
> Also, it would be interesting to get panic output with DDB option.

Here's the info:-

(kgdb) list *vm_map_growstack+93
0xffffffff80543ffd is in vm_map_growstack (/usr/src/sys/vm/vm_map.c:3305).
3300            struct uidinfo *uip;
3301
3302    Retry:
3303            PROC_LOCK(p);
3304            stacklim = lim_cur(p, RLIMIT_STACK);
3305            vmemlim = lim_cur(p, RLIMIT_VMEM);
3306            PROC_UNLOCK(p);
3307
3308            vm_map_lock_read(map);
3309
(kgdb) list *lim_cur+17
0xffffffff80384681 is in lim_cur (/usr/src/sys/kern/kern_resource.c:1150).
1145    rlim_t
1146    lim_cur(struct proc *p, int which)
1147    {
1148            struct rlimit rl;
1149
1150            lim_rlimit(p, which, &rl);
1151            return (rl.rlim_cur);
1152    }
1153
1154    /*
(kgdb) list *lim_rlimit+18
0xffffffff80384632 is in lim_rlimit (/usr/src/sys/kern/kern_resource.c:1165).
1160    {
1161
1162            PROC_LOCK_ASSERT(p, MA_OWNED);
1163            KASSERT(which >= 0 && which < RLIM_NLIMITS,
1164                ("request for invalid resource limit"));
1165            *rlp = p->p_limit->pl_rlimit[which];
1166            if (p->p_sysent->sv_fixlimit != NULL)
1167                    p->p_sysent->sv_fixlimit(rlp, which);
1168    }
1169

I've yet to have the machine with DDB + expanded stack panic.

I plan to leave it a day or so more then try a reboot to see if that
triggers it. If not I'll drop the stack back down to 4 and see if that
enables us to get another panic.

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-stable@FreeBSD.ORG  Tue Aug 16 20:57:05 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 185D1106564A
	for <freebsd-stable@freebsd.org>; Tue, 16 Aug 2011 20:57:05 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id E30978FC08
	for <freebsd-stable@freebsd.org>; Tue, 16 Aug 2011 20:57:04 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id 90ED446B09;
	Tue, 16 Aug 2011 16:57:04 -0400 (EDT)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id 0F98A8A02F;
	Tue, 16 Aug 2011 16:57:04 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-stable@freebsd.org
Date: Tue, 16 Aug 2011 16:57:03 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110617; KDE/4.5.5; amd64; ; )
References: <4E4A0C81.7020501@rawbw.com> <201108161157.20890.jhb@freebsd.org>
	<4E4ACAAD.3030506@rawbw.com>
In-Reply-To: <4E4ACAAD.3030506@rawbw.com>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201108161657.03574.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6
	(bigwig.baldwin.cx); Tue, 16 Aug 2011 16:57:04 -0400 (EDT)
Cc: Yuri <yuri@rawbw.com>, d@delphij.net, Xin LI <delphij@delphij.net>
Subject: Re: How to use unrecognized COM port card?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Aug 2011 20:57:05 -0000

On Tuesday, August 16, 2011 3:53:17 pm Yuri wrote:
> On 08/16/2011 08:57, John Baldwin wrote:
> > For multiport devices you will want to add an entry to 
sys/dev/puc/pucdata.c
> > and use the puc driver instead of patching uart directly.  Perhaps this:
> 
> John,
> 
> I did what you suggested:
> puc0: <WinChipHead Dual Port RS-232> port 0xe880-0xe887,0xe800-0xe807 
> irq 17 at device 1.0 on pci8
> 
> But it doesn't show up as a serial device and tty.

Hmmm, can you get devinfo -v output?  Specifically there should be two 
children of puc0 and they should have extra data specifying what type of port 
each child device is.

-- 
John Baldwin

From owner-freebsd-stable@FreeBSD.ORG  Tue Aug 16 20:57:30 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B5C731065701;
	Tue, 16 Aug 2011 20:57:30 +0000 (UTC) (envelope-from yuri@rawbw.com)
Received: from shell0.rawbw.com (shell0.rawbw.com [198.144.192.45])
	by mx1.freebsd.org (Postfix) with ESMTP id 9D2DA8FC14;
	Tue, 16 Aug 2011 20:57:30 +0000 (UTC)
Received: from eagle.yuri.org (stunnel@localhost [127.0.0.1])
	(authenticated bits=0)
	by shell0.rawbw.com (8.14.4/8.14.4) with ESMTP id p7GKvQAb081091;
	Tue, 16 Aug 2011 13:57:26 -0700 (PDT) (envelope-from yuri@rawbw.com)
Message-ID: <4E4AD9B6.2030001@rawbw.com>
Date: Tue, 16 Aug 2011 13:57:26 -0700
From: Yuri <yuri@rawbw.com>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110716 Thunderbird/5.0
MIME-Version: 1.0
To: d@delphij.net
References: <4E4A0C81.7020501@rawbw.com> <4E4A20BE.3060603@delphij.net>
	<4E4A3788.3030605@rawbw.com> <201108161157.20890.jhb@freebsd.org>
	<4E4ACAAD.3030506@rawbw.com> <4E4AD50E.6050906@delphij.net>
In-Reply-To: <4E4AD50E.6050906@delphij.net>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-stable@freebsd.org, Xin LI <delphij@delphij.net>,
	John Baldwin <jhb@freebsd.org>
Subject: Re: How to use unrecognized COM port card?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Aug 2011 20:57:30 -0000

On 08/16/2011 13:37, Xin LI wrote:
> And I think John's patch is right, I've added a new PCI ID for it
> though, found from the datasheet.  Did you have uart(4) in your kernel
> (remove my old patch)?

Yes, uart(4) is in kernel and puc(4) is the loaded module. I think this 
might be a problem that puc(4) is a module loaded later and that's why 
serial device isn't registered. I found the reference to the similar 
situation with some other card that got cured when puc(4) was compiled 
into kernel. 
(http://www.adras.com/Quadtech-DSC-100-PCI-dual-serial-port-on-8-0R-i386.t6999-79.html)

I have yet to try building puc(4) into kernel, but the way how I have it 
now is the default in GENERIC. Should uart(4) instead be removed from 
kernel and made loadable too to prevent such initialization order issue? 
Or what would be the right fix? Have too much stuff in kernel isn't 
right too. uart probably isn't used by 99% of users.


Yuri

From owner-freebsd-stable@FreeBSD.ORG  Tue Aug 16 20:59:46 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 63D4F1065672;
	Tue, 16 Aug 2011 20:59:46 +0000 (UTC) (envelope-from yuri@rawbw.com)
Received: from shell0.rawbw.com (shell0.rawbw.com [198.144.192.45])
	by mx1.freebsd.org (Postfix) with ESMTP id 4E8578FC08;
	Tue, 16 Aug 2011 20:59:46 +0000 (UTC)
Received: from eagle.yuri.org (stunnel@localhost [127.0.0.1])
	(authenticated bits=0)
	by shell0.rawbw.com (8.14.4/8.14.4) with ESMTP id p7GKxhck081442;
	Tue, 16 Aug 2011 13:59:43 -0700 (PDT) (envelope-from yuri@rawbw.com)
Message-ID: <4E4ADA3E.1070309@rawbw.com>
Date: Tue, 16 Aug 2011 13:59:42 -0700
From: Yuri <yuri@rawbw.com>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110716 Thunderbird/5.0
MIME-Version: 1.0
To: John Baldwin <jhb@freebsd.org>
References: <4E4A0C81.7020501@rawbw.com> <201108161157.20890.jhb@freebsd.org>
	<4E4ACAAD.3030506@rawbw.com> <201108161657.03574.jhb@freebsd.org>
In-Reply-To: <201108161657.03574.jhb@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: d@delphij.net, freebsd-stable@freebsd.org, Xin LI <delphij@delphij.net>
Subject: Re: How to use unrecognized COM port card?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Aug 2011 20:59:46 -0000

On 08/16/2011 13:57, John Baldwin wrote:
> Hmmm, can you get devinfo -v output?  Specifically there should be two
> children of puc0 and they should have extra data specifying what type of port
> each child device is.
>

Here is the only reference to puc0 in devinfo -v output:
<skip>
         pcib8 pnpinfo vendor=0x8086 device=0x244e subvendor=0x1043 
subdevice=0x82d4 class=0x060401 at slot=30 function=0 handle=\_SB_.PCI0.P0P1
           pci8
             pcm0 pnpinfo vendor=0x1274 device=0x5000 subvendor=0x4942 
subdevice=0x4c4c class=0x040100 at slot=0 function=0
             puc0 pnpinfo vendor=0x4348 device=0x3253 subvendor=0x4348 
subdevice=0x3253 class=0x070002 at slot=1 function=0
         isab0 pnpinfo vendor=0x8086 device=0x3a16 subvendor=0x1043 
subdevice=0x82d4 class=0x060100 at slot=31 function=0 handle=\_SB_.PCI0.SBRG
           isa0
             orm0
<skip>


Yuri

From owner-freebsd-stable@FreeBSD.ORG  Tue Aug 16 23:08:17 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 233281065670
	for <freebsd-stable@freebsd.org>; Tue, 16 Aug 2011 23:08:17 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id EB0088FC12
	for <freebsd-stable@freebsd.org>; Tue, 16 Aug 2011 23:08:16 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id 911A046B3B;
	Tue, 16 Aug 2011 19:08:16 -0400 (EDT)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id 2B3688A02E;
	Tue, 16 Aug 2011 19:08:16 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Yuri <yuri@rawbw.com>
Date: Tue, 16 Aug 2011 19:03:43 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110617; KDE/4.5.5; amd64; ; )
References: <4E4A0C81.7020501@rawbw.com> <201108161657.03574.jhb@freebsd.org>
	<4E4ADA3E.1070309@rawbw.com>
In-Reply-To: <4E4ADA3E.1070309@rawbw.com>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201108161903.43881.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6
	(bigwig.baldwin.cx); Tue, 16 Aug 2011 19:08:16 -0400 (EDT)
Cc: d@delphij.net, freebsd-stable@freebsd.org, Xin LI <delphij@delphij.net>
Subject: Re: How to use unrecognized COM port card?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Aug 2011 23:08:17 -0000

On Tuesday, August 16, 2011 4:59:42 pm Yuri wrote:
> On 08/16/2011 13:57, John Baldwin wrote:
> > Hmmm, can you get devinfo -v output?  Specifically there should be two
> > children of puc0 and they should have extra data specifying what type of 
port
> > each child device is.
> >
> 
> Here is the only reference to puc0 in devinfo -v output:
> <skip>
>          pcib8 pnpinfo vendor=0x8086 device=0x244e subvendor=0x1043 
> subdevice=0x82d4 class=0x060401 at slot=30 function=0 handle=\_SB_.PCI0.P0P1
>            pci8
>              pcm0 pnpinfo vendor=0x1274 device=0x5000 subvendor=0x4942 
> subdevice=0x4c4c class=0x040100 at slot=0 function=0
>              puc0 pnpinfo vendor=0x4348 device=0x3253 subvendor=0x4348 
> subdevice=0x3253 class=0x070002 at slot=1 function=0
>          isab0 pnpinfo vendor=0x8086 device=0x3a16 subvendor=0x1043 
> subdevice=0x82d4 class=0x060100 at slot=31 function=0 handle=\_SB_.PCI0.SBRG
>            isa0
>              orm0
> <skip>

Ugh, the dumb driver deletes ports if they don't probe which is rediculous 
thing for it to do.

-- 
John Baldwin

From owner-freebsd-stable@FreeBSD.ORG  Tue Aug 16 23:08:17 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 58A4B1065672;
	Tue, 16 Aug 2011 23:08:17 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 2EA2B8FC13;
	Tue, 16 Aug 2011 23:08:17 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id D087D46B43;
	Tue, 16 Aug 2011 19:08:16 -0400 (EDT)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id 6AFF28A02F;
	Tue, 16 Aug 2011 19:08:16 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Yuri <yuri@rawbw.com>
Date: Tue, 16 Aug 2011 19:08:15 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110617; KDE/4.5.5; amd64; ; )
References: <4E4A0C81.7020501@rawbw.com> <4E4AD50E.6050906@delphij.net>
	<4E4AD9B6.2030001@rawbw.com>
In-Reply-To: <4E4AD9B6.2030001@rawbw.com>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201108161908.15840.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6
	(bigwig.baldwin.cx); Tue, 16 Aug 2011 19:08:16 -0400 (EDT)
Cc: Xin LI <delphij@delphij.net>, Marcel Moolenaar <marcel@freebsd.org>,
	d@delphij.net, freebsd-stable@freebsd.org
Subject: Re: How to use unrecognized COM port card?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Aug 2011 23:08:17 -0000

On Tuesday, August 16, 2011 4:57:26 pm Yuri wrote:
> On 08/16/2011 13:37, Xin LI wrote:
> > And I think John's patch is right, I've added a new PCI ID for it
> > though, found from the datasheet.  Did you have uart(4) in your kernel
> > (remove my old patch)?
> 
> Yes, uart(4) is in kernel and puc(4) is the loaded module. I think this 
> might be a problem that puc(4) is a module loaded later and that's why 
> serial device isn't registered. I found the reference to the similar 
> situation with some other card that got cured when puc(4) was compiled 
> into kernel. 
> (http://www.adras.com/Quadtech-DSC-100-PCI-dual-serial-port-on-8-0R-
i386.t6999-79.html)
> 
> I have yet to try building puc(4) into kernel, but the way how I have it 
> now is the default in GENERIC. Should uart(4) instead be removed from 
> kernel and made loadable too to prevent such initialization order issue? 
> Or what would be the right fix? Have too much stuff in kernel isn't 
> right too. uart probably isn't used by 99% of users.

Err, uart is in _lots_ of machines (just about every rack-mounted x86
server I've ever used).

The real bug here is the uart driver and the way it is compiled into
the kernel.  It should just always include the 'puc' attachment I
believe, or do so if any of the busses supported by 'puc' are compiled
in.  The puc attachment for uart is really tiny, and KOBJ is used in
new-bus specifically so that attachments don't require the full bus
driver to be present.  Something like this:

Index: files
===================================================================
--- files	(revision 224879)
+++ files	(working copy)
@@ -1842,7 +1842,7 @@ dev/uart/uart_bus_fdt.c		optional uart fdt
 dev/uart/uart_bus_isa.c		optional uart isa
 dev/uart/uart_bus_pccard.c	optional uart pccard
 dev/uart/uart_bus_pci.c		optional uart pci
-dev/uart/uart_bus_puc.c		optional uart puc
+dev/uart/uart_bus_puc.c		optional uart puc | uart pccard | uart pci
 dev/uart/uart_bus_scc.c		optional uart scc
 dev/uart/uart_core.c		optional uart
 dev/uart/uart_dbg.c		optional uart gdb


-- 
John Baldwin

From owner-freebsd-stable@FreeBSD.ORG  Wed Aug 17 01:24:11 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C9965106566C
	for <freebsd-stable@freebsd.org>; Wed, 17 Aug 2011 01:24:11 +0000 (UTC)
	(envelope-from janm@transactionware.com)
Received: from midgard.transactionware.com (mail2.transactionware.com
	[203.14.245.36]) by mx1.freebsd.org (Postfix) with SMTP id 3173F8FC13
	for <freebsd-stable@freebsd.org>; Wed, 17 Aug 2011 01:24:10 +0000 (UTC)
Received: (qmail 65962 invoked by uid 907); 17 Aug 2011 01:24:09 -0000
Received: from jmmacpro.transactionware.com (HELO
	jmmacpro.transactionware.com) (192.168.1.33)
	by midgard.transactionware.com (qpsmtpd/0.82) with ESMTP;
	Wed, 17 Aug 2011 11:24:09 +1000
Mime-Version: 1.0 (Apple Message framework v1244.3)
Content-Type: text/plain; charset=iso-8859-1
From: Jan Mikkelsen <janm@transactionware.com>
In-Reply-To: <4E4AD9B6.2030001@rawbw.com>
Date: Wed, 17 Aug 2011 11:24:09 +1000
Content-Transfer-Encoding: quoted-printable
Message-Id: <16D60EA7-85C7-486D-A722-50299407DC69@transactionware.com>
References: <4E4A0C81.7020501@rawbw.com> <4E4A20BE.3060603@delphij.net>
	<4E4A3788.3030605@rawbw.com> <201108161157.20890.jhb@freebsd.org>
	<4E4ACAAD.3030506@rawbw.com> <4E4AD50E.6050906@delphij.net>
	<4E4AD9B6.2030001@rawbw.com>
To: Yuri <yuri@rawbw.com>
X-Mailer: Apple Mail (2.1244.3)
Cc: freebsd-stable@freebsd.org, d@delphij.net, John Baldwin <jhb@freebsd.org>,
	Xin LI <delphij@delphij.net>
Subject: Re: How to use unrecognized COM port card?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2011 01:24:11 -0000

On 17/08/2011, at 6:57 AM, Yuri wrote:
> On 08/16/2011 13:37, Xin LI wrote:
>> And I think John's patch is right, I've added a new PCI ID for it
>> though, found from the datasheet.  Did you have uart(4) in your =
kernel
>> (remove my old patch)?
>=20
> Yes, uart(4) is in kernel and puc(4) is the loaded module. I think =
this might be a problem that puc(4) is a module loaded later and that's =
why serial device isn't registered. I found the reference to the similar =
situation with some other card that got cured when puc(4) was compiled =
into kernel. =
(http://www.adras.com/Quadtech-DSC-100-PCI-dual-serial-port-on-8-0R-i386.t=
6999-79.html)
>=20
> I have yet to try building puc(4) into kernel, but the way how I have =
it now is the default in GENERIC. Should uart(4) instead be removed from =
kernel and made loadable too to prevent such initialization order issue? =
Or what would be the right fix? Have too much stuff in kernel isn't =
right too. uart probably isn't used by 99% of users.

For my recent Moxa 2 port serial card addition, I had to include puc in =
the kernel config; it didn't work as a module.

Jan.


From owner-freebsd-stable@FreeBSD.ORG  Wed Aug 17 11:12:35 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7FCE71065678
	for <freebsd-stable@FreeBSD.org>; Wed, 17 Aug 2011 11:12:35 +0000 (UTC)
	(envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id ABCBF8FC0C
	for <freebsd-stable@FreeBSD.org>; Wed, 17 Aug 2011 11:12:34 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA22120;
	Wed, 17 Aug 2011 14:12:31 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Message-ID: <4E4BA21F.6010805@FreeBSD.org>
Date: Wed, 17 Aug 2011 14:12:31 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110705 Thunderbird/5.0
MIME-Version: 1.0
To: Steven Hartland <killing@multiplay.co.uk>
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><A71C3ACF01EC4D36871E49805C1A5321@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk>
	<4E43E272.1060204@FreeBSD.org>
	<62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk>
	<4E440865.1040500@FreeBSD.org>
	<6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk>
	<4E441314.6060606@FreeBSD.org>
	<2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk>
	<4E48D967.9060804@FreeBSD.org>
	<9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk>
	<4E490DAF.1080009@FreeBSD.org>
	<796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk>
	<4E491D01.1090902@FreeBSD.org>
	<570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk>
	<4E4AD35C.7020504@FreeBSD.org>
	<6A7238AED44542A880B082A40304D940@multiplay.co.uk>
In-Reply-To: <6A7238AED44542A880B082A40304D940@multiplay.co.uk>
X-Enigmail-Version: 1.2pre
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2011 11:12:36 -0000

on 16/08/2011 23:43 Steven Hartland said the following:
> 
> ----- Original Message ----- From: "Andriy Gapon" <avg@FreeBSD.org>
> To: "Steven Hartland" <killing@multiplay.co.uk>
> Cc: <freebsd-stable@FreeBSD.org>
> Sent: Tuesday, August 16, 2011 9:30 PM
> Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
> 
> 
>> on 15/08/2011 17:56 Steven Hartland said the following:
>>> (kgdb) x/512a 0xffffff8d8f357210
>> [snip]
>>
>> Can you please also provide the following for this core?
>> list *vm_map_growstack+93
>> list *lim_cur+17
>> list *lim_rlimit+18
>>
>> Also, it would be interesting to get panic output with DDB option.
> 
> Here's the info:-
> 
> (kgdb) list *vm_map_growstack+93
> 0xffffffff80543ffd is in vm_map_growstack (/usr/src/sys/vm/vm_map.c:3305).
> 3300            struct uidinfo *uip;
> 3301
> 3302    Retry:
> 3303            PROC_LOCK(p);
> 3304            stacklim = lim_cur(p, RLIMIT_STACK);
> 3305            vmemlim = lim_cur(p, RLIMIT_VMEM);
> 3306            PROC_UNLOCK(p);
> 3307
> 3308            vm_map_lock_read(map);
> 3309
> (kgdb) list *lim_cur+17
> 0xffffffff80384681 is in lim_cur (/usr/src/sys/kern/kern_resource.c:1150).
> 1145    rlim_t
> 1146    lim_cur(struct proc *p, int which)
> 1147    {
> 1148            struct rlimit rl;
> 1149
> 1150            lim_rlimit(p, which, &rl);
> 1151            return (rl.rlim_cur);
> 1152    }
> 1153
> 1154    /*
> (kgdb) list *lim_rlimit+18
> 0xffffffff80384632 is in lim_rlimit (/usr/src/sys/kern/kern_resource.c:1165).
> 1160    {
> 1161
> 1162            PROC_LOCK_ASSERT(p, MA_OWNED);
> 1163            KASSERT(which >= 0 && which < RLIM_NLIMITS,
> 1164                ("request for invalid resource limit"));
> 1165            *rlp = p->p_limit->pl_rlimit[which];
> 1166            if (p->p_sysent->sv_fixlimit != NULL)
> 1167                    p->p_sysent->sv_fixlimit(rlp, which);
> 1168    }
> 1169
> 
> I've yet to have the machine with DDB + expanded stack panic.
> 
> I plan to leave it a day or so more then try a reboot to see if that
> triggers it. If not I'll drop the stack back down to 4 and see if that
> enables us to get another panic.

OK, thank you for continuing to debug this!
Another request: could you please execute the following commands in kgdb on the
above core file?

define allpcpu
set $i = 0
while ($i <= mp_maxid)
p *cpuid_to_pcpu[$i]
set $i = $i + 1
end
end
allpcpu


A little bit later I will send you another patch that, I hope, will produce better
diagnostics for this crash (without DDB in kernel).

-- 
Andriy Gapon

From owner-freebsd-stable@FreeBSD.ORG  Wed Aug 17 11:26:55 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 32EB1106564A
	for <freebsd-stable@FreeBSD.org>; Wed, 17 Aug 2011 11:26:55 +0000 (UTC)
	(envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 75DBD8FC0A
	for <freebsd-stable@FreeBSD.org>; Wed, 17 Aug 2011 11:26:54 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA22369;
	Wed, 17 Aug 2011 14:26:51 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Message-ID: <4E4BA57B.6050407@FreeBSD.org>
Date: Wed, 17 Aug 2011 14:26:51 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110705 Thunderbird/5.0
MIME-Version: 1.0
To: Steven Hartland <killing@multiplay.co.uk>
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><A71C3ACF01EC4D36871E49805C1A5321@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk>
	<4E43E272.1060204@FreeBSD.org>
	<62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk>
	<4E440865.1040500@FreeBSD.org>
	<6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk>
	<4E441314.6060606@FreeBSD.org>
	<2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk>
	<4E48D967.9060804@FreeBSD.org>
	<9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk>
	<4E490DAF.1080009@FreeBSD.org>
	<796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk>
	<4E491D01.1090902@FreeBSD.org>
	<570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk>
	<4E4AD35C.7020504@FreeBSD.org>
	<6A7238AED44542A880B082A40304D940@multiplay.co.uk>
	<4E4BA21F.6010805@FreeBSD.org>
In-Reply-To: <4E4BA21F.6010805@FreeBSD.org>
X-Enigmail-Version: 1.2pre
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2011 11:26:55 -0000

on 17/08/2011 14:12 Andriy Gapon said the following:
> A little bit later I will send you another patch that, I hope, will produce better
> diagnostics for this crash (without DDB in kernel).

The patch:
Index: sys/amd64/amd64/trap.c
===================================================================
--- sys/amd64/amd64/trap.c	(revision 224782)
+++ sys/amd64/amd64/trap.c	(working copy)
@@ -198,6 +198,10 @@
 	PCPU_INC(cnt.v_trap);
 	type = frame->tf_trapno;

+	if ((uintptr_t)frame->tf_rip >= (uintptr_t)&lim_rlimit
+	    && (uintptr_t)frame->tf_rip < (uintptr_t)&lim_rlimit + 40)
+		panic("trap in lim_rlimit");
+
 #ifdef SMP
 	/* Handler for NMI IPIs used for stopping CPUs. */
 	if (type == T_NMI) {

-- 
Andriy Gapon

From owner-freebsd-stable@FreeBSD.ORG  Wed Aug 17 12:26:10 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 752571065678;
	Wed, 17 Aug 2011 12:26:10 +0000 (UTC)
	(envelope-from prvs=1210f20b9f=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	by mx1.freebsd.org (Postfix) with ESMTP id 8D7208FC2A;
	Wed, 17 Aug 2011 12:26:08 +0000 (UTC)
X-MDAV-Processed: mail1.multiplay.co.uk, Wed, 17 Aug 2011 13:14:29 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Wed, 17 Aug 2011 13:14:29 +0100
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	mail1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST
	shortcircuit=ham autolearn=disabled version=3.2.5
Received: from r2d2 ([188.220.16.49])
	by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	(MDaemon PRO v10.0.4) with ESMTP id md50014633673.msg;
	Wed, 17 Aug 2011 13:14:27 +0100
X-MDRemoteIP: 188.220.16.49
X-Return-Path: prvs=1210f20b9f=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
Message-ID: <581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Andriy Gapon" <avg@FreeBSD.org>
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><A71C3ACF01EC4D36871E49805C1A5321@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk>
	<4E43E272.1060204@FreeBSD.org>
	<62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk>
	<4E440865.1040500@FreeBSD.org>
	<6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk>
	<4E441314.6060606@FreeBSD.org>
	<2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk>
	<4E48D967.9060804@FreeBSD.org>
	<9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk>
	<4E490DAF.1080009@FreeBSD.org>
	<796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk>
	<4E491D01.1090902@FreeBSD.org>
	<570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk>
	<4E4AD35C.7020504@FreeBSD.org>
	<6A7238AED44542A880B082A40304D940@multiplay.co.uk>
	<4E4BA21F.6010805@FreeBSD.org>
Date: Wed, 17 Aug 2011 13:15:04 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
Cc: freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2011 12:26:10 -0000


----- Original Message ----- 
From: "Andriy Gapon" <avg@FreeBSD.org>
To: "Steven Hartland" <killing@multiplay.co.uk>
Cc: <freebsd-stable@FreeBSD.org>
Sent: Wednesday, August 17, 2011 12:12 PM
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE


> on 16/08/2011 23:43 Steven Hartland said the following:
>>
>> ----- Original Message ----- From: "Andriy Gapon" <avg@FreeBSD.org>
>> To: "Steven Hartland" <killing@multiplay.co.uk>
>> Cc: <freebsd-stable@FreeBSD.org>
>> Sent: Tuesday, August 16, 2011 9:30 PM
>> Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
>>
>>
>>> on 15/08/2011 17:56 Steven Hartland said the following:
>>>> (kgdb) x/512a 0xffffff8d8f357210
>>> [snip]
>>>
>>> Can you please also provide the following for this core?
>>> list *vm_map_growstack+93
>>> list *lim_cur+17
>>> list *lim_rlimit+18
>>>
>>> Also, it would be interesting to get panic output with DDB option.
>>
>> Here's the info:-
>>
>> (kgdb) list *vm_map_growstack+93
>> 0xffffffff80543ffd is in vm_map_growstack (/usr/src/sys/vm/vm_map.c:3305).
>> 3300            struct uidinfo *uip;
>> 3301
>> 3302    Retry:
>> 3303            PROC_LOCK(p);
>> 3304            stacklim = lim_cur(p, RLIMIT_STACK);
>> 3305            vmemlim = lim_cur(p, RLIMIT_VMEM);
>> 3306            PROC_UNLOCK(p);
>> 3307
>> 3308            vm_map_lock_read(map);
>> 3309
>> (kgdb) list *lim_cur+17
>> 0xffffffff80384681 is in lim_cur (/usr/src/sys/kern/kern_resource.c:1150).
>> 1145    rlim_t
>> 1146    lim_cur(struct proc *p, int which)
>> 1147    {
>> 1148            struct rlimit rl;
>> 1149
>> 1150            lim_rlimit(p, which, &rl);
>> 1151            return (rl.rlim_cur);
>> 1152    }
>> 1153
>> 1154    /*
>> (kgdb) list *lim_rlimit+18
>> 0xffffffff80384632 is in lim_rlimit (/usr/src/sys/kern/kern_resource.c:1165).
>> 1160    {
>> 1161
>> 1162            PROC_LOCK_ASSERT(p, MA_OWNED);
>> 1163            KASSERT(which >= 0 && which < RLIM_NLIMITS,
>> 1164                ("request for invalid resource limit"));
>> 1165            *rlp = p->p_limit->pl_rlimit[which];
>> 1166            if (p->p_sysent->sv_fixlimit != NULL)
>> 1167                    p->p_sysent->sv_fixlimit(rlp, which);
>> 1168    }
>> 1169
>>
>> I've yet to have the machine with DDB + expanded stack panic.
>>
>> I plan to leave it a day or so more then try a reboot to see if that
>> triggers it. If not I'll drop the stack back down to 4 and see if that
>> enables us to get another panic.
>
> OK, thank you for continuing to debug this!

No thank you for the help :)

> Another request: could you please execute the following commands in kgdb on the
> above core file?
>
> define allpcpu
> set $i = 0
> while ($i <= mp_maxid)
> p *cpuid_to_pcpu[$i]
> set $i = $i + 1
> end
> end
> allpcpu

Here's the output.

$1 = {pc_curthread = 0xffffff0012d708c0, pc_idlethread = 0xffffff0012d838c0, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb 
= 0xffffff8000149d00, pc_switchtime = 564139965450231, pc_switchticks = 247796551, pc_cpuid = 0,
  pc_cpumask = 1, pc_other_cpus = 16777214, pc_allcpu = {sle_next = 0x0}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 1246344506, 
v_trap = 121031682, v_syscall = 2590785278, v_intr = 866415, v_soft = 174249227,
    v_vm_faults = 24640099, v_cow_faults = 2606934, v_cow_optim = 678, v_zfod = 19177479, v_ozfod = 0, v_swapin = 0, v_swapout = 
0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 24007, v_vnodeout = 41, v_vnodepgsin = 24007,
    v_vnodepgsout = 322, v_intrans = 7300, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree 
= 0, v_tfree = 25056637, v_page_size = 0, v_page_count = 0, v_free_reserved = 0,
    v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, 
v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0,
    v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 35906, v_vforks = 21218, v_rforks = 0, v_kthreads = 20, v_forkpages = 
9357854, v_vforkpages = 4445028, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {9035196,
    1438, 426481, 1091491, 22402335}, pc_device = 0xffffff0012da2700, pc_netisr = 0xffffff0012cfe500, pc_rm_queue = {rmq_next = 
0xffffffff808af550, rmq_prev = 0xffffffff808af550}, pc_dynamic = 3737856,
  pc_monitorbuf = '\0' <repeats 127 times>, pc_prvspace = 0xffffffff808af400, pc_curpmap = 0xffffff0012d74ef8, pc_tssp = 
0xffffffff808ae700, pc_commontssp = 0xffffffff808ae700, pc_rsp0 = -549754462976,
  pc_scratch_rsp = 140737488348968, pc_apic_id = 0, pc_acpi_id = 1, pc_fs32p = 0xffffffff808ad530, pc_gs32p = 0xffffffff808ad538, 
pc_ldt = 0xffffffff808ad578, pc_tss = 0xffffffff808ad568, pc_cmci_mask = 364}
$2 = {pc_curthread = 0xffffff0012d85000, pc_idlethread = 0xffffff0012d85000, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb 
= 0xffffff80001bcd00, pc_switchtime = 564139964769035, pc_switchticks = 247796551, pc_cpuid = 1,
  pc_cpumask = 2, pc_other_cpus = 16777213, pc_allcpu = {sle_next = 0xffffffff808af400}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 
457697994, v_trap = 61700571, v_syscall = 670428238, v_intr = 298981, v_soft = 58852682,
    v_vm_faults = 7228810, v_cow_faults = 442573, v_cow_optim = 116, v_zfod = 6082240, v_ozfod = 0, v_swapin = 0, v_swapout = 0, 
v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 5151, v_vnodeout = 50, v_vnodepgsin = 5151,
    v_vnodepgsout = 397, v_intrans = 5575, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree 
= 0, v_tfree = 8282005, v_page_size = 0, v_page_count = 0, v_free_reserved = 0,
    v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, 
v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0,
    v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 10015, v_vforks = 11459, v_rforks = 0, v_kthreads = 0, v_forkpages = 
2626771, v_vforkpages = 2444076, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {8641747,
    395, 532547, 157762, 23624411}, pc_device = 0xffffff0012da2600, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 0xffffffff808af7d0, 
rmq_prev = 0xffffffff808af7d0}, pc_dynamic = 18446743526093297920,
  pc_monitorbuf = '\0' <repeats 127 times>, pc_prvspace = 0xffffffff808af680, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 
0xffffffff808ae768, pc_commontssp = 0xffffffff808ae768, pc_rsp0 = -549753991936,
  pc_scratch_rsp = 140737488339432, pc_apic_id = 1, pc_acpi_id = 13, pc_fs32p = 0xffffffff808ad598, pc_gs32p = 0xffffffff808ad5a0, 
pc_ldt = 0xffffffff808ad5e0, pc_tss = 0xffffffff808ad5d0, pc_cmci_mask = 0}
$3 = {pc_curthread = 0xffffff06b7f9c000, pc_idlethread = 0xffffff0012d85460, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb 
= 0xffffff8d8f35ad00, pc_switchtime = 564139963042291, pc_switchticks = 247796550, pc_cpuid = 2,
  pc_cpumask = 4, pc_other_cpus = 16777211, pc_allcpu = {sle_next = 0xffffffff808af680}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 
1005391948, v_trap = 95927887, v_syscall = 2033274537, v_intr = 137253, v_soft = 151981308,
    v_vm_faults = 14199910, v_cow_faults = 1468132, v_cow_optim = 533, v_zfod = 11032593, v_ozfod = 0, v_swapin = 0, v_swapout = 
0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 17238, v_vnodeout = 48, v_vnodepgsin = 17238,
    v_vnodepgsout = 378, v_intrans = 6753, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree 
= 0, v_tfree = 15435380, v_page_size = 0, v_page_count = 0, v_free_reserved = 0,
    v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, 
v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0,
    v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 24041, v_vforks = 16857, v_rforks = 0, v_kthreads = 0, v_forkpages = 
6281292, v_vforkpages = 3606842, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {8629094,
    693, 594838, 24425, 23707811}, pc_device = 0xffffff0012da2500, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 0xffffffff808afa50, 
rmq_prev = 0xffffffff808afa50}, pc_dynamic = 18446743526093326592,
  pc_monitorbuf = '\0' <repeats 127 times>, pc_prvspace = 0xffffffff808af900, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 
0xffffffff808ae7d0, pc_commontssp = 0xffffffff808ae7d0, pc_rsp0 = -491518579456,
  pc_scratch_rsp = 140737488347240, pc_apic_id = 2, pc_acpi_id = 2, pc_fs32p = 0xffffffff808ad600, pc_gs32p = 0xffffffff808ad608, 
pc_ldt = 0xffffffff808ad648, pc_tss = 0xffffffff808ad638, pc_cmci_mask = 8}
$4 = {pc_curthread = 0xffffff0012d858c0, pc_idlethread = 0xffffff0012d858c0, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb 
= 0xffffff80001b2d00, pc_switchtime = 564139960864579, pc_switchticks = 247796549, pc_cpuid = 3,
  pc_cpumask = 8, pc_other_cpus = 16777207, pc_allcpu = {sle_next = 0xffffffff808af900}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 
375825838, v_trap = 57311463, v_syscall = 571437816, v_intr = 126334, v_soft = 46300913,
    v_vm_faults = 6398769, v_cow_faults = 365115, v_cow_optim = 101, v_zfod = 5434860, v_ozfod = 0, v_swapin = 0, v_swapout = 0, 
v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 6044, v_vnodeout = 16, v_vnodepgsin = 6044,
    v_vnodepgsout = 128, v_intrans = 5456, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree 
= 0, v_tfree = 7824928, v_page_size = 0, v_page_count = 0, v_free_reserved = 0,
    v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, 
v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0,
    v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 8975, v_vforks = 11796, v_rforks = 0, v_kthreads = 0, v_forkpages = 
2359166, v_vforkpages = 2604538, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {8374580, 378,
    189751, 113208, 24278945}, pc_device = 0xffffff0012eee600, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 0xffffffff808afcd0, 
rmq_prev = 0xffffffff808afcd0}, pc_dynamic = 18446743526093355264,
  pc_monitorbuf = '\0' <repeats 127 times>, pc_prvspace = 0xffffffff808afb80, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 
0xffffffff808ae838, pc_commontssp = 0xffffffff808ae838, pc_rsp0 = -549754032896,
  pc_scratch_rsp = 140737488341768, pc_apic_id = 3, pc_acpi_id = 14, pc_fs32p = 0xffffffff808ad668, pc_gs32p = 0xffffffff808ad670, 
pc_ldt = 0xffffffff808ad6b0, pc_tss = 0xffffffff808ad6a0, pc_cmci_mask = 36}
$5 = {pc_curthread = 0xffffff0016ef7460, pc_idlethread = 0xffffff0012d7e000, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb 
= 0xffffff8d8d249d00, pc_switchtime = 564139958831726, pc_switchticks = 247796548, pc_cpuid = 4,
  pc_cpumask = 16, pc_other_cpus = 16777199, pc_allcpu = {sle_next = 0xffffffff808afb80}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 
806444301, v_trap = 81626382, v_syscall = 1826349511, v_intr = 123653, v_soft = 144961951,
    v_vm_faults = 9705936, v_cow_faults = 966760, v_cow_optim = 329, v_zfod = 7605338, v_ozfod = 0, v_swapin = 0, v_swapout = 0, 
v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 7070, v_vnodeout = 38, v_vnodepgsin = 7070,
    v_vnodepgsout = 298, v_intrans = 6176, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree 
= 0, v_tfree = 10505534, v_page_size = 0, v_page_count = 0, v_free_reserved = 0,
    v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, 
v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0,
    v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 16806, v_vforks = 12551, v_rforks = 0, v_kthreads = 0, v_forkpages = 
4380008, v_vforkpages = 2702450, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {8548739,
    486, 66166, 155291, 24186180}, pc_device = 0xffffff0012eee500, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 0xffffffff808aff50, 
rmq_prev = 0xffffffff808aff50}, pc_dynamic = 18446743526093383936,
  pc_monitorbuf = '\0' <repeats 127 times>, pc_prvspace = 0xffffffff808afe00, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 
0xffffffff808ae8a0, pc_commontssp = 0xffffffff808ae8a0, pc_rsp0 = -491553252096,
  pc_scratch_rsp = 140737488347240, pc_apic_id = 4, pc_acpi_id = 3, pc_fs32p = 0xffffffff808ad6d0, pc_gs32p = 0xffffffff808ad6d8, 
pc_ldt = 0xffffffff808ad718, pc_tss = 0xffffffff808ad708, pc_cmci_mask = 44}
$6 = {pc_curthread = 0xffffff0016d40460, pc_idlethread = 0xffffff0012d7e460, pc_fpcurthread = 0xffffff0016d40460, pc_deadthread = 
0x0, pc_curpcb = 0xffffff8d8d47ed00, pc_switchtime = 564139958865046,
  pc_switchticks = 247796548, pc_cpuid = 5, pc_cpumask = 32, pc_other_cpus = 16777183, pc_allcpu = {sle_next = 
0xffffffff808afe00}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 328647871, v_trap = 51585678, v_syscall = 541729242,
    v_intr = 195389, v_soft = 45565082, v_vm_faults = 5629366, v_cow_faults = 317486, v_cow_optim = 82, v_zfod = 4813949, v_ozfod 
= 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 4750,
    v_vnodeout = 16, v_vnodepgsin = 4750, v_vnodepgsout = 125, v_intrans = 4461, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 
0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 6982766, v_page_size = 0, v_page_count = 0,
    v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, 
v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0,
    v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 7103, v_vforks = 10517, v_rforks = 0, 
v_kthreads = 0, v_forkpages = 1858863, v_vforkpages = 2375468, v_rforkpages = 0, v_kthreadpages = 0},
  pc_cp_time = {8156742, 242, 143793, 754, 24655331}, pc_device = 0xffffff0012eee400, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 
0xffffffff808b01d0, rmq_prev = 0xffffffff808b01d0}, pc_dynamic = 18446743526093412608,
  pc_monitorbuf = '\0' <repeats 127 times>, pc_prvspace = 0xffffffff808b0080, pc_curpmap = 0xffffff00285acd70, pc_tssp = 
0xffffffff808ae908, pc_commontssp = 0xffffffff808ae908, pc_rsp0 = -491550937856,
  pc_scratch_rsp = 140737488338216, pc_apic_id = 5, pc_acpi_id = 15, pc_fs32p = 0xffffffff808ad738, pc_gs32p = 0xffffffff808ad740, 
pc_ldt = 0xffffffff808ad780, pc_tss = 0xffffffff808ad770, pc_cmci_mask = 44}
$7 = {pc_curthread = 0xffffff0012d7e8c0, pc_idlethread = 0xffffff0012d7e8c0, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb 
= 0xffffff80001a3d00, pc_switchtime = 564139963916274, pc_switchticks = 247796550, pc_cpuid = 6,
  pc_cpumask = 64, pc_other_cpus = 16777151, pc_allcpu = {sle_next = 0xffffffff808b0080}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 
571134015, v_trap = 71997786, v_syscall = 1463142320, v_intr = 279742, v_soft = 132911942,
    v_vm_faults = 7791389, v_cow_faults = 708630, v_cow_optim = 253, v_zfod = 6277796, v_ozfod = 0, v_swapin = 0, v_swapout = 0, 
v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 6392, v_vnodeout = 39, v_vnodepgsin = 6392,
    v_vnodepgsout = 312, v_intrans = 5737, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree 
= 0, v_tfree = 9292572, v_page_size = 0, v_page_count = 0, v_free_reserved = 0,
    v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, 
v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0,
    v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 11420, v_vforks = 9776, v_rforks = 0, v_kthreads = 0, v_forkpages = 
2973042, v_vforkpages = 2103188, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {8387371, 350,
    74084, 53153, 24441904}, pc_device = 0xffffff0012eee300, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 0xffffffff808b0450, 
rmq_prev = 0xffffffff808b0450}, pc_dynamic = 18446743526093441280,
  pc_monitorbuf = '\0' <repeats 127 times>, pc_prvspace = 0xffffffff808b0300, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 
0xffffffff808ae970, pc_commontssp = 0xffffffff808ae970, pc_rsp0 = -549754094336,
---Type <return> to continue, or q <return> to quit---
  pc_scratch_rsp = 140737488347240, pc_apic_id = 16, pc_acpi_id = 4, pc_fs32p = 0xffffffff808ad7a0, pc_gs32p = 0xffffffff808ad7a8, 
pc_ldt = 0xffffffff808ad7e8, pc_tss = 0xffffffff808ad7d8, pc_cmci_mask = 44}
$8 = {pc_curthread = 0xffffff0012d7f000, pc_idlethread = 0xffffff0012d7f000, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb 
= 0xffffff800019ed00, pc_switchtime = 564139961406818, pc_switchticks = 247796549, pc_cpuid = 7,
  pc_cpumask = 128, pc_other_cpus = 16777087, pc_allcpu = {sle_next = 0xffffffff808b0300}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 
249485946, v_trap = 42612704, v_syscall = 513323841, v_intr = 158985, v_soft = 49793772,
    v_vm_faults = 4953550, v_cow_faults = 288574, v_cow_optim = 66, v_zfod = 4279446, v_ozfod = 0, v_swapin = 0, v_swapout = 0, 
v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 5127, v_vnodeout = 18, v_vnodepgsin = 5127,
    v_vnodepgsout = 144, v_intrans = 4430, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree 
= 0, v_tfree = 6781637, v_page_size = 0, v_page_count = 0, v_free_reserved = 0,
    v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, 
v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0,
    v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 7191, v_vforks = 8106, v_rforks = 0, v_kthreads = 0, v_forkpages = 
1911571, v_vforkpages = 1791118, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {7834102, 189,
    67132, 7190, 25048249}, pc_device = 0xffffff0012eee200, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 0xffffffff808b06d0, 
rmq_prev = 0xffffffff808b06d0}, pc_dynamic = 18446743526093469952,
  pc_monitorbuf = '\0' <repeats 127 times>, pc_prvspace = 0xffffffff808b0580, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 
0xffffffff808ae9d8, pc_commontssp = 0xffffffff808ae9d8, pc_rsp0 = -549754114816,
  pc_scratch_rsp = 140737488341768, pc_apic_id = 17, pc_acpi_id = 16, pc_fs32p = 0xffffffff808ad808, pc_gs32p = 
0xffffffff808ad810, pc_ldt = 0xffffffff808ad850, pc_tss = 0xffffffff808ad840, pc_cmci_mask = 44}
$9 = {pc_curthread = 0xffffff0012d7f460, pc_idlethread = 0xffffff0012d7f460, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb 
= 0xffffff8000199d00, pc_switchtime = 564139961215887, pc_switchticks = 247796549, pc_cpuid = 8,
  pc_cpumask = 256, pc_other_cpus = 16776959, pc_allcpu = {sle_next = 0xffffffff808b0580}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 
464956409, v_trap = 64334946, v_syscall = 1027059020, v_intr = 0, v_soft = 93052690,
    v_vm_faults = 6917455, v_cow_faults = 567595, v_cow_optim = 160, v_zfod = 5697686, v_ozfod = 0, v_swapin = 0, v_swapout = 0, 
v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 4817, v_vnodeout = 36, v_vnodepgsin = 4817,
    v_vnodepgsout = 285, v_intrans = 5954, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree 
= 0, v_tfree = 7425675, v_page_size = 0, v_page_count = 0, v_free_reserved = 0,
    v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, 
v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0,
    v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 9992, v_vforks = 8433, v_rforks = 0, v_kthreads = 0, v_forkpages = 
2600081, v_vforkpages = 1793522, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {7987187, 281,
    218255, 10560, 24740579}, pc_device = 0xffffff0012eee100, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 0xffffffff808b0950, 
rmq_prev = 0xffffffff808b0950}, pc_dynamic = 18446743526093498624,
  pc_monitorbuf = '\0' <repeats 127 times>, pc_prvspace = 0xffffffff808b0800, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 
0xffffffff808aea40, pc_commontssp = 0xffffffff808aea40, pc_rsp0 = -549754135296,
  pc_scratch_rsp = 140737488349480, pc_apic_id = 18, pc_acpi_id = 5, pc_fs32p = 0xffffffff808ad870, pc_gs32p = 0xffffffff808ad878, 
pc_ldt = 0xffffffff808ad8b8, pc_tss = 0xffffffff808ad8a8, pc_cmci_mask = 44}
$10 = {pc_curthread = 0xffffff0012d7f8c0, pc_idlethread = 0xffffff0012d7f8c0, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb 
= 0xffffff8000194d00, pc_switchtime = 564139962352563, pc_switchticks = 247796549,
  pc_cpuid = 9, pc_cpumask = 512, pc_other_cpus = 16776703, pc_allcpu = {sle_next = 0xffffffff808b0800}, pc_spinlocks = 0x0, 
pc_cnt = {v_swtch = 274982887, v_trap = 31399978, v_syscall = 444601703, v_intr = 3206219,
    v_soft = 40294508, v_vm_faults = 4841563, v_cow_faults = 271228, v_cow_optim = 62, v_zfod = 4257729, v_ozfod = 0, v_swapin = 
0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 3551, v_vnodeout = 18,
    v_vnodepgsin = 3551, v_vnodepgsout = 144, v_intrans = 4426, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, 
v_dfree = 0, v_pfree = 0, v_tfree = 5832072, v_page_size = 0, v_page_count = 0,
    v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, 
v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0,
    v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 6130, v_vforks = 6921, v_rforks = 0, v_kthreads 
= 0, v_forkpages = 1593321, v_vforkpages = 1503680, v_rforkpages = 0, v_kthreadpages = 0},
  pc_cp_time = {7446553, 129, 81200, 80535, 25348445}, pc_device = 0xffffff0012eee000, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 
0xffffffff808b0bd0, rmq_prev = 0xffffffff808b0bd0}, pc_dynamic = 18446743526093527296,
  pc_monitorbuf = '\0' <repeats 127 times>, pc_prvspace = 0xffffffff808b0a80, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 
0xffffffff808aeaa8, pc_commontssp = 0xffffffff808aeaa8, pc_rsp0 = -549754155776,
  pc_scratch_rsp = 140737488350536, pc_apic_id = 19, pc_acpi_id = 17, pc_fs32p = 0xffffffff808ad8d8, pc_gs32p = 
0xffffffff808ad8e0, pc_ldt = 0xffffffff808ad920, pc_tss = 0xffffffff808ad910, pc_cmci_mask = 44}
$11 = {pc_curthread = 0xffffff0012d81000, pc_idlethread = 0xffffff0012d81000, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb 
= 0xffffff800018fd00, pc_switchtime = 564139960703995, pc_switchticks = 247796549,
  pc_cpuid = 10, pc_cpumask = 1024, pc_other_cpus = 16776191, pc_allcpu = {sle_next = 0xffffffff808b0a80}, pc_spinlocks = 0x0, 
pc_cnt = {v_swtch = 409239430, v_trap = 63190665, v_syscall = 916205775, v_intr = 0,
    v_soft = 84751808, v_vm_faults = 5931079, v_cow_faults = 475456, v_cow_optim = 101, v_zfod = 4924752, v_ozfod = 0, v_swapin = 
0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 3365, v_vnodeout = 31,
    v_vnodepgsin = 3365, v_vnodepgsout = 248, v_intrans = 5616, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, 
v_dfree = 0, v_pfree = 0, v_tfree = 6799603, v_page_size = 0, v_page_count = 0,
    v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, 
v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0,
    v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 7789, v_vforks = 7721, v_rforks = 0, v_kthreads 
= 0, v_forkpages = 2032846, v_vforkpages = 1672147, v_rforkpages = 0, v_kthreadpages = 0},
  pc_cp_time = {7664964, 263, 457601, 10909, 24823125}, pc_device = 0xffffff0012e73e00, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 
0xffffffff808b0e50, rmq_prev = 0xffffffff808b0e50}, pc_dynamic = 18446743526093555968,
  pc_monitorbuf = '\0' <repeats 127 times>, pc_prvspace = 0xffffffff808b0d00, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 
0xffffffff808aeb10, pc_commontssp = 0xffffffff808aeb10, pc_rsp0 = -549754176256,
  pc_scratch_rsp = 140737488349208, pc_apic_id = 20, pc_acpi_id = 6, pc_fs32p = 0xffffffff808ad940, pc_gs32p = 0xffffffff808ad948, 
pc_ldt = 0xffffffff808ad988, pc_tss = 0xffffffff808ad978, pc_cmci_mask = 12}
$12 = {pc_curthread = 0xffffff0012d7c000, pc_idlethread = 0xffffff0012d7c000, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb 
= 0xffffff800018ad00, pc_switchtime = 564139964659179, pc_switchticks = 247796550,
  pc_cpuid = 11, pc_cpumask = 2048, pc_other_cpus = 16775167, pc_allcpu = {sle_next = 0xffffffff808b0d00}, pc_spinlocks = 0x0, 
pc_cnt = {v_swtch = 286462305, v_trap = 43327710, v_syscall = 585895149, v_intr = 0,
    v_soft = 58132961, v_vm_faults = 4529044, v_cow_faults = 253158, v_cow_optim = 51, v_zfod = 3997600, v_ozfod = 0, v_swapin = 
0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 2889, v_vnodeout = 14,
    v_vnodepgsin = 2889, v_vnodepgsout = 112, v_intrans = 4397, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, 
v_dfree = 0, v_pfree = 0, v_tfree = 5501089, v_page_size = 0, v_page_count = 0,
    v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, 
v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0,
    v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 5059, v_vforks = 7478, v_rforks = 0, v_kthreads 
= 0, v_forkpages = 1317908, v_vforkpages = 1695236, v_rforkpages = 0, v_kthreadpages = 0},
  pc_cp_time = {6940693, 83, 34106, 26044, 25955936}, pc_device = 0xffffff0012e73d00, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 
0xffffffff808b10d0, rmq_prev = 0xffffffff808b10d0}, pc_dynamic = 18446743526093584640,
  pc_monitorbuf = '\0' <repeats 127 times>, pc_prvspace = 0xffffffff808b0f80, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 
0xffffffff808aeb78, pc_commontssp = 0xffffffff808aeb78, pc_rsp0 = -549754196736,
  pc_scratch_rsp = 140737488341352, pc_apic_id = 21, pc_acpi_id = 18, pc_fs32p = 0xffffffff808ad9a8, pc_gs32p = 
0xffffffff808ad9b0, pc_ldt = 0xffffffff808ad9f0, pc_tss = 0xffffffff808ad9e0, pc_cmci_mask = 32}
$13 = {pc_curthread = 0xffffff0012d7c460, pc_idlethread = 0xffffff0012d7c460, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb 
= 0xffffff8000185d00, pc_switchtime = 564139964135679, pc_switchticks = 247796550,
  pc_cpuid = 12, pc_cpumask = 4096, pc_other_cpus = 16773119, pc_allcpu = {sle_next = 0xffffffff808b0f80}, pc_spinlocks = 0x0, 
pc_cnt = {v_swtch = 1452228911, v_trap = 138992214, v_syscall = 3481775635, v_intr = 0,
    v_soft = 170473572, v_vm_faults = 25297505, v_cow_faults = 2415176, v_cow_optim = 725, v_zfod = 20124416, v_ozfod = 0, 
v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 20013, v_vnodeout = 39,
    v_vnodepgsin = 20013, v_vnodepgsout = 303, v_intrans = 8910, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, 
v_dfree = 0, v_pfree = 0, v_tfree = 26563945, v_page_size = 0, v_page_count = 0,
    v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, 
v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0,
    v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 32999, v_vforks = 20923, v_rforks = 0, 
v_kthreads = 0, v_forkpages = 8640201, v_vforkpages = 4383325, v_rforkpages = 0, v_kthreadpages = 0},
  pc_cp_time = {9099202, 1744, 94283, 697, 23760936}, pc_device = 0xffffff0012e73c00, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 
0xffffffff808b1350, rmq_prev = 0xffffffff808b1350}, pc_dynamic = 18446743526093613312,
  pc_monitorbuf = '\0' <repeats 127 times>, pc_prvspace = 0xffffffff808b1200, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 
0xffffffff808aebe0, pc_commontssp = 0xffffffff808aebe0, pc_rsp0 = -549754217216,
  pc_scratch_rsp = 140737488348728, pc_apic_id = 32, pc_acpi_id = 7, pc_fs32p = 0xffffffff808ada10, pc_gs32p = 0xffffffff808ada18, 
pc_ldt = 0xffffffff808ada58, pc_tss = 0xffffffff808ada48, pc_cmci_mask = 40}
$14 = {pc_curthread = 0xffffff081149b460, pc_idlethread = 0xffffff0012d7c8c0, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb 
= 0xffffff8d8e5a9d00, pc_switchtime = 564139964136603, pc_switchticks = 247796550,
  pc_cpuid = 13, pc_cpumask = 8192, pc_other_cpus = 16769023, pc_allcpu = {sle_next = 0xffffffff808b1200}, pc_spinlocks = 0x0, 
pc_cnt = {v_swtch = 476914812, v_trap = 74744115, v_syscall = 936915699, v_intr = 0,
    v_soft = 80021803, v_vm_faults = 7523961, v_cow_faults = 390299, v_cow_optim = 133, v_zfod = 6338882, v_ozfod = 0, v_swapin = 
0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 3069, v_vnodeout = 31,
    v_vnodepgsin = 3069, v_vnodepgsout = 242, v_intrans = 7411, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, 
v_dfree = 0, v_pfree = 0, v_tfree = 8400243, v_page_size = 0, v_page_count = 0,
    v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, 
v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0,
    v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 9846, v_vforks = 12050, v_rforks = 0, 
v_kthreads = 0, v_forkpages = 2609276, v_vforkpages = 2565142, v_rforkpages = 0, v_kthreadpages = 0},
  pc_cp_time = {8441142, 622, 122914, 718, 24391466}, pc_device = 0xffffff0012e73b00, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 
0xffffffff808b15d0, rmq_prev = 0xffffffff808b15d0}, pc_dynamic = 18446743526093641984,
---Type <return> to continue, or q <return> to quit---
  pc_monitorbuf = '\0' <repeats 127 times>, pc_prvspace = 0xffffffff808b1480, pc_curpmap = 0xffffff017af0a440, pc_tssp = 
0xffffffff808aec48, pc_commontssp = 0xffffffff808aec48, pc_rsp0 = -491532935936,
  pc_scratch_rsp = 140737488347000, pc_apic_id = 33, pc_acpi_id = 19, pc_fs32p = 0xffffffff808ada78, pc_gs32p = 
0xffffffff808ada80, pc_ldt = 0xffffffff808adac0, pc_tss = 0xffffffff808adab0, pc_cmci_mask = 44}
$15 = {pc_curthread = 0xffffff0012d7d000, pc_idlethread = 0xffffff0012d7d000, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb 
= 0xffffff800017bd00, pc_switchtime = 564139961068041, pc_switchticks = 247796549,
  pc_cpuid = 14, pc_cpumask = 16384, pc_other_cpus = 16760831, pc_allcpu = {sle_next = 0xffffffff808b1480}, pc_spinlocks = 0x0, 
pc_cnt = {v_swtch = 1106841263, v_trap = 97803361, v_syscall = 2661991003, v_intr = 0,
    v_soft = 167213229, v_vm_faults = 11378004, v_cow_faults = 1160991, v_cow_optim = 428, v_zfod = 8894351, v_ozfod = 0, v_swapin 
= 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 12399, v_vnodeout = 42,
    v_vnodepgsin = 12399, v_vnodepgsout = 333, v_intrans = 8773, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, 
v_dfree = 0, v_pfree = 0, v_tfree = 11845093, v_page_size = 0, v_page_count = 0,
    v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, 
v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0,
    v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 18853, v_vforks = 12152, v_rforks = 0, 
v_kthreads = 0, v_forkpages = 4923904, v_vforkpages = 2572782, v_rforkpages = 0, v_kthreadpages = 0},
  pc_cp_time = {8264202, 687, 502611, 825769, 23363593}, pc_device = 0xffffff0012e73a00, pc_netisr = 0x0, pc_rm_queue = {rmq_next 
= 0xffffffff808b1850, rmq_prev = 0xffffffff808b1850}, pc_dynamic = 18446743526093670656,
  pc_monitorbuf = '\0' <repeats 127 times>, pc_prvspace = 0xffffffff808b1700, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 
0xffffffff808aecb0, pc_commontssp = 0xffffffff808aecb0, pc_rsp0 = -549754258176,
  pc_scratch_rsp = 140737488332648, pc_apic_id = 34, pc_acpi_id = 8, pc_fs32p = 0xffffffff808adae0, pc_gs32p = 0xffffffff808adae8, 
pc_ldt = 0xffffffff808adb28, pc_tss = 0xffffffff808adb18, pc_cmci_mask = 296}
$16 = {pc_curthread = 0xffffff0012d7d460, pc_idlethread = 0xffffff0012d7d460, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb 
= 0xffffff8000176d00, pc_switchtime = 564139964473317, pc_switchticks = 247796550,
  pc_cpuid = 15, pc_cpumask = 32768, pc_other_cpus = 16744447, pc_allcpu = {sle_next = 0xffffffff808b1700}, pc_spinlocks = 0x0, 
pc_cnt = {v_swtch = 403303982, v_trap = 62431250, v_syscall = 772577344, v_intr = 0,
    v_soft = 66350085, v_vm_faults = 6382252, v_cow_faults = 350483, v_cow_optim = 113, v_zfod = 5469113, v_ozfod = 0, v_swapin = 
0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 4512, v_vnodeout = 25,
    v_vnodepgsin = 4512, v_vnodepgsout = 190, v_intrans = 7276, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, 
v_dfree = 0, v_pfree = 0, v_tfree = 7254252, v_page_size = 0, v_page_count = 0,
    v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, 
v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0,
    v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 9383, v_vforks = 9353, v_rforks = 0, v_kthreads 
= 0, v_forkpages = 2458231, v_vforkpages = 2018955, v_rforkpages = 0, v_kthreadpages = 0},
  pc_cp_time = {8065725, 424, 321250, 606, 24568857}, pc_device = 0xffffff0012e73900, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 
0xffffffff808b1ad0, rmq_prev = 0xffffffff808b1ad0}, pc_dynamic = 18446743526093699328,
  pc_monitorbuf = '\0' <repeats 127 times>, pc_prvspace = 0xffffffff808b1980, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 
0xffffffff808aed18, pc_commontssp = 0xffffffff808aed18, pc_rsp0 = -549754278656,
  pc_scratch_rsp = 140737488341304, pc_apic_id = 35, pc_acpi_id = 20, pc_fs32p = 0xffffffff808adb48, pc_gs32p = 
0xffffffff808adb50, pc_ldt = 0xffffffff808adb90, pc_tss = 0xffffffff808adb80, pc_cmci_mask = 68}
$17 = {pc_curthread = 0xffffff0012d7d8c0, pc_idlethread = 0xffffff0012d7d8c0, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb 
= 0xffffff8000171d00, pc_switchtime = 564139964351635, pc_switchticks = 247796550,
  pc_cpuid = 16, pc_cpumask = 65536, pc_other_cpus = 16711679, pc_allcpu = {sle_next = 0xffffffff808b1980}, pc_spinlocks = 0x0, 
pc_cnt = {v_swtch = 877084980, v_trap = 88142770, v_syscall = 2088145984, v_intr = 0,
    v_soft = 177372820, v_vm_faults = 7362548, v_cow_faults = 753161, v_cow_optim = 299, v_zfod = 5798775, v_ozfod = 0, v_swapin = 
0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 3826, v_vnodeout = 21,
    v_vnodepgsin = 3826, v_vnodepgsout = 168, v_intrans = 8678, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, 
v_dfree = 0, v_pfree = 0, v_tfree = 8584916, v_page_size = 0, v_page_count = 0,
    v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, 
v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0,
    v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 11478, v_vforks = 9188, v_rforks = 0, 
v_kthreads = 0, v_forkpages = 2993442, v_vforkpages = 1934018, v_rforkpages = 0, v_kthreadpages = 0},
  pc_cp_time = {9852924, 500, 2875093, 71501, 20156844}, pc_device = 0xffffff0012e73800, pc_netisr = 0x0, pc_rm_queue = {rmq_next 
= 0xffffffff808b1d50, rmq_prev = 0xffffffff808b1d50}, pc_dynamic = 18446743526093728000,
  pc_monitorbuf = '\0' <repeats 127 times>, pc_prvspace = 0xffffffff808b1c00, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 
0xffffffff808aed80, pc_commontssp = 0xffffffff808aed80, pc_rsp0 = -549754299136,
  pc_scratch_rsp = 140737488347240, pc_apic_id = 36, pc_acpi_id = 9, pc_fs32p = 0xffffffff808adbb0, pc_gs32p = 0xffffffff808adbb8, 
pc_ldt = 0xffffffff808adbf8, pc_tss = 0xffffffff808adbe8, pc_cmci_mask = 8}
$18 = {pc_curthread = 0xffffff0016b94000, pc_idlethread = 0xffffff0012d71460, pc_fpcurthread = 0xffffff0016b94000, pc_deadthread = 
0x0, pc_curpcb = 0xffffff8d8d389d00, pc_switchtime = 564139958856111,
  pc_switchticks = 247796548, pc_cpuid = 17, pc_cpumask = 131072, pc_other_cpus = 16646143, pc_allcpu = {sle_next = 
0xffffffff808b1c00}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 381936054, v_trap = 50211651,
    v_syscall = 515466461, v_intr = 0, v_soft = 45237711, v_vm_faults = 6414094, v_cow_faults = 389134, v_cow_optim = 137, v_zfod 
= 5507950, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0,
    v_vnodein = 5530, v_vnodeout = 53, v_vnodepgsin = 5530, v_vnodepgsout = 424, v_intrans = 6725, v_reactivated = 0, v_pdwakeups 
= 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 7461119, v_page_size = 0,
    v_page_count = 0, v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 
0, v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0,
    v_cache_max = 0, v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 10518, v_vforks = 8957, 
v_rforks = 0, v_kthreads = 0, v_forkpages = 2736040, v_vforkpages = 1935285, v_rforkpages = 0,
    v_kthreadpages = 0}, pc_cp_time = {7863823, 191, 438619, 69173, 24585056}, pc_device = 0xffffff0012e73700, pc_netisr = 0x0, 
pc_rm_queue = {rmq_next = 0xffffffff808b1fd0, rmq_prev = 0xffffffff808b1fd0},
  pc_dynamic = 18446743526093756672, pc_monitorbuf = '\0' <repeats 127 times>, pc_prvspace = 0xffffffff808b1e80, pc_curpmap = 
0xffffffff8083ea50, pc_tssp = 0xffffffff808aede8, pc_commontssp = 0xffffffff808aede8,
  pc_rsp0 = -491551941376, pc_scratch_rsp = 140737488341208, pc_apic_id = 37, pc_acpi_id = 21, pc_fs32p = 0xffffffff808adc18, 
pc_gs32p = 0xffffffff808adc20, pc_ldt = 0xffffffff808adc60, pc_tss = 0xffffffff808adc50,
  pc_cmci_mask = 36}
$19 = {pc_curthread = 0xffffff0012d718c0, pc_idlethread = 0xffffff0012d718c0, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb 
= 0xffffff8000167d00, pc_switchtime = 564139961112754, pc_switchticks = 247796549,
  pc_cpuid = 18, pc_cpumask = 262144, pc_other_cpus = 16515071, pc_allcpu = {sle_next = 0xffffffff808b1e80}, pc_spinlocks = 0x0, 
pc_cnt = {v_swtch = 766728401, v_trap = 80727218, v_syscall = 2226067386, v_intr = 0,
    v_soft = 164836456, v_vm_faults = 7049688, v_cow_faults = 590626, v_cow_optim = 179, v_zfod = 5751647, v_ozfod = 0, v_swapin = 
0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 5189, v_vnodeout = 18,
    v_vnodepgsin = 5189, v_vnodepgsout = 141, v_intrans = 8491, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, 
v_dfree = 0, v_pfree = 0, v_tfree = 8391955, v_page_size = 0, v_page_count = 0,
    v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, 
v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0,
    v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 9985, v_vforks = 8855, v_rforks = 0, v_kthreads 
= 0, v_forkpages = 2607970, v_vforkpages = 1888180, v_rforkpages = 0, v_kthreadpages = 0},
  pc_cp_time = {8073713, 395, 119748, 2812, 24760194}, pc_device = 0xffffff0012e73600, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 
0xffffffff808b2250, rmq_prev = 0xffffffff808b2250}, pc_dynamic = 18446743526093785344,
  pc_monitorbuf = '\0' <repeats 127 times>, pc_prvspace = 0xffffffff808b2100, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 
0xffffffff808aee50, pc_commontssp = 0xffffffff808aee50, pc_rsp0 = -549754340096,
  pc_scratch_rsp = 140737488340200, pc_apic_id = 48, pc_acpi_id = 10, pc_fs32p = 0xffffffff808adc80, pc_gs32p = 
0xffffffff808adc88, pc_ldt = 0xffffffff808adcc8, pc_tss = 0xffffffff808adcb8, pc_cmci_mask = 44}
$20 = {pc_curthread = 0xffffff0016b95000, pc_idlethread = 0xffffff0012d7b000, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb 
= 0xffffff8d8d37ad00, pc_switchtime = 564139964352202, pc_switchticks = 247796550,
  pc_cpuid = 19, pc_cpumask = 524288, pc_other_cpus = 16252927, pc_allcpu = {sle_next = 0xffffffff808b2100}, pc_spinlocks = 0x0, 
pc_cnt = {v_swtch = 273765031, v_trap = 40409221, v_syscall = 606284542, v_intr = 0,
    v_soft = 60032824, v_vm_faults = 4751488, v_cow_faults = 263767, v_cow_optim = 85, v_zfod = 4135690, v_ozfod = 0, v_swapin = 
0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 4928, v_vnodeout = 17,
    v_vnodepgsin = 4928, v_vnodepgsout = 130, v_intrans = 6643, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, 
v_dfree = 0, v_pfree = 0, v_tfree = 6264634, v_page_size = 0, v_page_count = 0,
    v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, 
v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0,
    v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 7126, v_vforks = 6926, v_rforks = 0, v_kthreads 
= 0, v_forkpages = 1865103, v_vforkpages = 1510438, v_rforkpages = 0, v_kthreadpages = 0},
  pc_cp_time = {7556063, 190, 97828, 19877, 25282904}, pc_device = 0xffffff0012eef700, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 
0xffffffff808b24d0, rmq_prev = 0xffffffff808b24d0}, pc_dynamic = 18446743526093814016,
  pc_monitorbuf = '\0' <repeats 127 times>, pc_prvspace = 0xffffffff808b2380, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 
0xffffffff808aeeb8, pc_commontssp = 0xffffffff808aeeb8, pc_rsp0 = -491552002816,
  pc_scratch_rsp = 140737488347448, pc_apic_id = 49, pc_acpi_id = 22, pc_fs32p = 0xffffffff808adce8, pc_gs32p = 
0xffffffff808adcf0, pc_ldt = 0xffffffff808add30, pc_tss = 0xffffffff808add20, pc_cmci_mask = 44}
$21 = {pc_curthread = 0xffffff0012d7b460, pc_idlethread = 0xffffff0012d7b460, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb 
= 0xffffff800015dd00, pc_switchtime = 564139966988296, pc_switchticks = 247796551,
  pc_cpuid = 20, pc_cpumask = 1048576, pc_other_cpus = 15728639, pc_allcpu = {sle_next = 0xffffffff808b2380}, pc_spinlocks = 0x0, 
pc_cnt = {v_swtch = 650673237, v_trap = 75330345, v_syscall = 1869508277, v_intr = 0,
    v_soft = 159703865, v_vm_faults = 6100783, v_cow_faults = 476018, v_cow_optim = 144, v_zfod = 5037626, v_ozfod = 0, v_swapin = 
0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 2676, v_vnodeout = 30,
    v_vnodepgsin = 2676, v_vnodepgsout = 240, v_intrans = 8171, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, 
v_dfree = 0, v_pfree = 0, v_tfree = 7518937, v_page_size = 0, v_page_count = 0,
    v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, 
v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0,
---Type <return> to continue, or q <return> to quit---
    v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 7717, v_vforks = 7894, v_rforks = 0, v_kthreads 
= 0, v_forkpages = 2012738, v_vforkpages = 1684979, v_rforkpages = 0, v_kthreadpages = 0},
  pc_cp_time = {7882266, 330, 49708, 171, 25024387}, pc_device = 0xffffff0012eef600, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 
0xffffffff808b2750, rmq_prev = 0xffffffff808b2750}, pc_dynamic = 18446743526093842688,
  pc_monitorbuf = '\0' <repeats 127 times>, pc_prvspace = 0xffffffff808b2600, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 
0xffffffff808aef20, pc_commontssp = 0xffffffff808aef20, pc_rsp0 = -549754381056,
  pc_scratch_rsp = 140737488347000, pc_apic_id = 50, pc_acpi_id = 11, pc_fs32p = 0xffffffff808add50, pc_gs32p = 
0xffffffff808add58, pc_ldt = 0xffffffff808add98, pc_tss = 0xffffffff808add88, pc_cmci_mask = 44}
$22 = {pc_curthread = 0xffffff0012d7b8c0, pc_idlethread = 0xffffff0012d7b8c0, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb 
= 0xffffff8000158d00, pc_switchtime = 564139962266028, pc_switchticks = 247796549,
  pc_cpuid = 21, pc_cpumask = 2097152, pc_other_cpus = 14680063, pc_allcpu = {sle_next = 0xffffffff808b2600}, pc_spinlocks = 0x0, 
pc_cnt = {v_swtch = 241921442, v_trap = 27867529, v_syscall = 396426755, v_intr = 0,
    v_soft = 36203414, v_vm_faults = 4579069, v_cow_faults = 243971, v_cow_optim = 80, v_zfod = 4045607, v_ozfod = 0, v_swapin = 
0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 2750, v_vnodeout = 13,
    v_vnodepgsin = 2750, v_vnodepgsout = 104, v_intrans = 6546, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, 
v_dfree = 0, v_pfree = 0, v_tfree = 6134243, v_page_size = 0, v_page_count = 0,
    v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, 
v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0,
    v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 5441, v_vforks = 7475, v_rforks = 0, v_kthreads 
= 0, v_forkpages = 1420342, v_vforkpages = 1634961, v_rforkpages = 0, v_kthreadpages = 0},
  pc_cp_time = {7064507, 140, 130489, 35805, 25725921}, pc_device = 0xffffff0012eef500, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 
0xffffffff808b29d0, rmq_prev = 0xffffffff808b29d0}, pc_dynamic = 18446743526093871360,
  pc_monitorbuf = '\0' <repeats 127 times>, pc_prvspace = 0xffffffff808b2880, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 
0xffffffff808aef88, pc_commontssp = 0xffffffff808aef88, pc_rsp0 = -549754401536,
  pc_scratch_rsp = 140737488349080, pc_apic_id = 51, pc_acpi_id = 23, pc_fs32p = 0xffffffff808addb8, pc_gs32p = 
0xffffffff808addc0, pc_ldt = 0xffffffff808ade00, pc_tss = 0xffffffff808addf0, pc_cmci_mask = 44}
$23 = {pc_curthread = 0xffffff0012d70000, pc_idlethread = 0xffffff0012d70000, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb 
= 0xffffff8000153d00, pc_switchtime = 564139963254342, pc_switchticks = 247796550,
  pc_cpuid = 22, pc_cpumask = 4194304, pc_other_cpus = 12582911, pc_allcpu = {sle_next = 0xffffffff808b2880}, pc_spinlocks = 0x0, 
pc_cnt = {v_swtch = 561077805, v_trap = 70503560, v_syscall = 1547897389, v_intr = 0,
    v_soft = 145263466, v_vm_faults = 5402089, v_cow_faults = 373167, v_cow_optim = 88, v_zfod = 4593818, v_ozfod = 0, v_swapin = 
0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 3032, v_vnodeout = 16,
    v_vnodepgsin = 3032, v_vnodepgsout = 128, v_intrans = 7615, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, 
v_dfree = 0, v_pfree = 0, v_tfree = 6367193, v_page_size = 0, v_page_count = 0,
    v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, 
v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0,
    v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 5531, v_vforks = 6954, v_rforks = 0, v_kthreads 
= 0, v_forkpages = 1440742, v_vforkpages = 1477372, v_rforkpages = 0, v_kthreadpages = 0},
  pc_cp_time = {7664569, 311, 54855, 173, 25236954}, pc_device = 0xffffff0012eef400, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 
0xffffffff808b2c50, rmq_prev = 0xffffffff808b2c50}, pc_dynamic = 18446743526093900032,
  pc_monitorbuf = '\0' <repeats 127 times>, pc_prvspace = 0xffffffff808b2b00, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 
0xffffffff808aeff0, pc_commontssp = 0xffffffff808aeff0, pc_rsp0 = -549754422016,
  pc_scratch_rsp = 140737488347256, pc_apic_id = 52, pc_acpi_id = 12, pc_fs32p = 0xffffffff808ade20, pc_gs32p = 
0xffffffff808ade28, pc_ldt = 0xffffffff808ade68, pc_tss = 0xffffffff808ade58, pc_cmci_mask = 44}
$24 = {pc_curthread = 0xffffff0012d70460, pc_idlethread = 0xffffff0012d70460, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb 
= 0xffffff800014ed00, pc_switchtime = 564139962421094, pc_switchticks = 247796549,
  pc_cpuid = 23, pc_cpumask = 8388608, pc_other_cpus = 8388607, pc_allcpu = {sle_next = 0xffffffff808b2b00}, pc_spinlocks = 0x0, 
pc_cnt = {v_swtch = 206202024, v_trap = 21901993, v_syscall = 456089216, v_intr = 0,
    v_soft = 44500078, v_vm_faults = 4394323, v_cow_faults = 229085, v_cow_optim = 71, v_zfod = 3915765, v_ozfod = 0, v_swapin = 
0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 2733, v_vnodeout = 5,
    v_vnodepgsin = 2733, v_vnodepgsout = 40, v_intrans = 6360, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, 
v_dfree = 0, v_pfree = 0, v_tfree = 5866068, v_page_size = 0, v_page_count = 0,
    v_free_reserved = 0, v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, 
v_inactive_target = 0, v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0,
    v_pageout_free_min = 0, v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 4299, v_vforks = 5814, v_rforks = 0, v_kthreads 
= 0, v_forkpages = 1116482, v_vforkpages = 1250752, v_rforkpages = 0, v_kthreadpages = 0},
  pc_cp_time = {7132190, 176, 32115, 131, 25792250}, pc_device = 0xffffff0012eef300, pc_netisr = 0x0, pc_rm_queue = {rmq_next = 
0xffffffff808b2ed0, rmq_prev = 0xffffffff808b2ed0}, pc_dynamic = 18446743526093928704,
  pc_monitorbuf = '\0' <repeats 127 times>, pc_prvspace = 0xffffffff808b2d80, pc_curpmap = 0xffffffff8083ea50, pc_tssp = 
0xffffffff808af058, pc_commontssp = 0xffffffff808af058, pc_rsp0 = -549754442496,
  pc_scratch_rsp = 140737488347256, pc_apic_id = 53, pc_acpi_id = 24, pc_fs32p = 0xffffffff808ade88, pc_gs32p = 
0xffffffff808ade90, pc_ldt = 0xffffffff808aded0, pc_tss = 0xffffffff808adec0, pc_cmci_mask = 44}

> A little bit later I will send you another patch that, I hope, will produce better
> diagnostics for this crash (without DDB in kernel).

Kernel with the patch is now installed on the test machine.

I've taken DDB, INVARIANTS and STACK changes out for now.

    Regards
    Steve


================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-stable@FreeBSD.ORG  Wed Aug 17 12:56:35 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B8075106566B
	for <freebsd-stable@FreeBSD.org>; Wed, 17 Aug 2011 12:56:35 +0000 (UTC)
	(envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id DF6458FC18
	for <freebsd-stable@FreeBSD.org>; Wed, 17 Aug 2011 12:56:34 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id PAA24369;
	Wed, 17 Aug 2011 15:56:32 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Message-ID: <4E4BBA7F.30907@FreeBSD.org>
Date: Wed, 17 Aug 2011 15:56:31 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110705 Thunderbird/5.0
MIME-Version: 1.0
To: Steven Hartland <killing@multiplay.co.uk>
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><A71C3ACF01EC4D36871E49805C1A5321@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk>
	<4E43E272.1060204@FreeBSD.org>
	<62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk>
	<4E440865.1040500@FreeBSD.org>
	<6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk>
	<4E441314.6060606@FreeBSD.org>
	<2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk>
	<4E48D967.9060804@FreeBSD.org>
	<9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk>
	<4E490DAF.1080009@FreeBSD.org>
	<796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk>
	<4E491D01.1090902@FreeBSD.org>
	<570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk>
	<4E4AD35C.7020504@FreeBSD.org>
	<6A7238AED44542A880B082A40304D940@multiplay.co.uk>
	<4E4BA21F.6010805@FreeBSD.org>
	<581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk>
In-Reply-To: <581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk>
X-Enigmail-Version: 1.2pre
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2011 12:56:35 -0000

on 17/08/2011 15:15 Steven Hartland said the following:
>> define allpcpu
>> set $i = 0
>> while ($i <= mp_maxid)
>> p *cpuid_to_pcpu[$i]
>> set $i = $i + 1
>> end
>> end
>> allpcpu
> 
> Here's the output.
[snip]
> $3 = {pc_curthread = 0xffffff06b7f9c000, pc_idlethread = 0xffffff0012d85460,
> pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb = 0xffffff8d8f35ad00,
> pc_switchtime = 564139963042291, pc_switchticks = 247796550, pc_cpuid = 2,
>  pc_cpumask = 4, pc_other_cpus = 16777211, pc_allcpu = {sle_next =
> 0xffffffff808af680}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 1005391948, v_trap =
> 95927887, v_syscall = 2033274537, v_intr = 137253, v_soft = 151981308,
>    v_vm_faults = 14199910, v_cow_faults = 1468132, v_cow_optim = 533, v_zfod =
> 11032593, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout
> = 0, v_vnodein = 17238, v_vnodeout = 48, v_vnodepgsin = 17238,
>    v_vnodepgsout = 378, v_intrans = 6753, v_reactivated = 0, v_pdwakeups = 0,
> v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 15435380,
> v_page_size = 0, v_page_count = 0, v_free_reserved = 0,
>    v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0,
> v_active_count = 0, v_inactive_target = 0, v_inactive_count = 0, v_cache_count =
> 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0,
>    v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 24041, v_vforks = 16857,
> v_rforks = 0, v_kthreads = 0, v_forkpages = 6281292, v_vforkpages = 3606842,
> v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {8629094,
>    693, 594838, 24425, 23707811}, pc_device = 0xffffff0012da2500, pc_netisr = 0x0,
> pc_rm_queue = {rmq_next = 0xffffffff808afa50, rmq_prev = 0xffffffff808afa50},
> pc_dynamic = 18446743526093326592,
>  pc_monitorbuf = '\0' <repeats 127 times>, pc_prvspace = 0xffffffff808af900,
> pc_curpmap = 0xffffffff8083ea50, pc_tssp = 0xffffffff808ae7d0, pc_commontssp =
> 0xffffffff808ae7d0, pc_rsp0 = -491518579456,
>  pc_scratch_rsp = 140737488347240, pc_apic_id = 2, pc_acpi_id = 2, pc_fs32p =
> 0xffffffff808ad600, pc_gs32p = 0xffffffff808ad608, pc_ldt = 0xffffffff808ad648,
> pc_tss = 0xffffffff808ad638, pc_cmci_mask = 8}
[snip]

Thank you.
A few more questions:
1. more kgdb info for the core:
p *(cpuid_to_pcpu[2]->pc_curthread)
p *(cpuid_to_pcpu[2]->pc_curthread->td_proc)
p *(cpuid_to_pcpu[2]->pc_curthread->td_proc->p_limit)

2. do you have any additional patches in your source tree besides those debugging
patches that I provided to you?

3. do you have any thirdparty/out-of-tree kernel modules?

4. could you please send me your kernel config?

-- 
Andriy Gapon

From owner-freebsd-stable@FreeBSD.ORG  Wed Aug 17 13:54:42 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id ECED51065670
	for <freebsd-stable@freebsd.org>; Wed, 17 Aug 2011 13:54:42 +0000 (UTC)
	(envelope-from 000.fbsd@quip.cz)
Received: from elsa.codelab.cz (elsa.codelab.cz [94.124.105.4])
	by mx1.freebsd.org (Postfix) with ESMTP id A92CE8FC0A
	for <freebsd-stable@freebsd.org>; Wed, 17 Aug 2011 13:54:42 +0000 (UTC)
Received: from elsa.codelab.cz (localhost [127.0.0.1])
	by elsa.codelab.cz (Postfix) with ESMTP id DEEFA28429
	for <freebsd-stable@freebsd.org>; Wed, 17 Aug 2011 15:35:16 +0200 (CEST)
Received: from [192.168.1.2] (ip-86-49-61-235.net.upcbroadband.cz
	[86.49.61.235])
	(using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
	(No client certificate requested)
	by elsa.codelab.cz (Postfix) with ESMTPSA id 0DFED28424
	for <freebsd-stable@freebsd.org>; Wed, 17 Aug 2011 15:35:10 +0200 (CEST)
Message-ID: <4E4BC38D.1050808@quip.cz>
Date: Wed, 17 Aug 2011 15:35:09 +0200
From: Miroslav Lachman <000.fbsd@quip.cz>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US;
	rv:1.9.1.19) Gecko/20110420 Lightning/1.0b1 SeaMonkey/2.0.14
MIME-Version: 1.0
To: freebsd-stable@freebsd.org
Content-Type: text/plain; charset=ISO-8859-2; format=flowed
Content-Transfer-Encoding: 7bit
Subject: can not boot from RAIDZ with 8-STABLE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2011 13:54:43 -0000

I tried mfsBSD installation on Dell T110 with PERC H200A and 4x 500GB 
SATA disks. If I create zpool with RAIDZ, the boot immediately hangs 
with following error:

ZFS: i/o error - all block copies unavailable
ZFS: can't read MOS
ZFS: unexpected object set type 0
ZFS: unexpected object set type 0

FreeBSD/x86 boot
Default: tank0:/boot/kernel/kernel
boot:
ZFS: unexpected object set type 0

FreeBSD/x86 boot
Default: tank0:/boot/kernel/kernel
boot:


The system is FreeBSD 8.2-STABLE #0: Sat Aug 13 20:33:31 CEST 2011 
GENERIC  amd64

Built from sources from Aug 13 2011.

Identical system is booting fine from external (USB) drive and I can use 
data on zpool RAIDZ tank0 without any problems.

So the pool and disks are fine, only boot failed.

Disks (da0 - da3) are using GPT:

=>       34  976773101  da0  GPT  (465G)
          34        128    1  freebsd-boot  (64k)
         162    8388608    2  freebsd-swap  (4.0G)
     8388770  964689920    3  freebsd-zfs  (460G)
   973078690    3694445       - free -  (1.8G)

I also tried to create the pool manually instead of script from mfsBSD, 
but the result is the same.


This was my manual method:

gpart create -s GPT da0
gpart add -b 34 -s 128 -t freebsd-boot da0
gpart add -s 4g -t freebsd-swap -l swap0 da0
gpart add -s 460g -t freebsd-zfs -l tank0 da0
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da0

gpart create -s GPT da1
gpart add -b 34 -s 128 -t freebsd-boot da1
gpart add -s 4g -t freebsd-swap -l swap1 da1
gpart add -s 460g -t freebsd-zfs -l tank1 da1
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da1

gpart create -s GPT da2
gpart add -b 34 -s 128 -t freebsd-boot da2
gpart add -s 4g -t freebsd-swap -l swap2 da2
gpart add -s 460g -t freebsd-zfs -l tank2 da2
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da2

gpart create -s GPT da3
gpart add -b 34 -s 128 -t freebsd-boot da3
gpart add -s 4g -t freebsd-swap -l swap3 da3
gpart add -s 460g -t freebsd-zfs -l tank3 da3
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da3


gmirror label -F -h -b load gmswap0 /dev/gpt/{swap0,swap1,swap2,swap3}

zpool create -O mountpoint=/mnt -O atime=off -O setuid=off -O 
canmount=off tank0 raidz /dev/gpt/tank0 /dev/gpt/tank1 /dev/gpt/tank2 
/dev/gpt/tank3

zfs create -o mountpoint=legacy -o setuid=on tank0/root

zpool set bootfs=tank0/root tank0

(...then zfs create for about 10 filesystems according to 
http://blogs.freebsdish.org/pjd/2010/08/06/from-sysinstall-to-zfs-only-configuration/ 
)

zfs set mountpoint=/ system

(...then rsync data from external USB disk with working system...)

And after reboot, the same error as above.


Has somebody any suggestions?

Miroslav Lachman

PS: I can't try 8.2-RELEASE, because there is no support for PERC H200A 
which was commited after RELEASE.


From owner-freebsd-stable@FreeBSD.ORG  Wed Aug 17 13:56:12 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1842A1065676;
	Wed, 17 Aug 2011 13:56:12 +0000 (UTC)
	(envelope-from prvs=1210f20b9f=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	by mx1.freebsd.org (Postfix) with ESMTP id 51A868FC20;
	Wed, 17 Aug 2011 13:56:10 +0000 (UTC)
X-MDAV-Processed: mail1.multiplay.co.uk, Wed, 17 Aug 2011 14:55:32 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Wed, 17 Aug 2011 14:55:32 +0100
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	mail1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST
	shortcircuit=ham autolearn=disabled version=3.2.5
Received: from r2d2 ([188.220.16.49])
	by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	(MDaemon PRO v10.0.4) with ESMTP id md50014634579.msg;
	Wed, 17 Aug 2011 14:55:31 +0100
X-MDRemoteIP: 188.220.16.49
X-Return-Path: prvs=1210f20b9f=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
Message-ID: <88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Andriy Gapon" <avg@FreeBSD.org>
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><A71C3ACF01EC4D36871E49805C1A5321@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk>
	<4E43E272.1060204@FreeBSD.org>
	<62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk>
	<4E440865.1040500@FreeBSD.org>
	<6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk>
	<4E441314.6060606@FreeBSD.org>
	<2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk>
	<4E48D967.9060804@FreeBSD.org>
	<9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk>
	<4E490DAF.1080009@FreeBSD.org>
	<796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk>
	<4E491D01.1090902@FreeBSD.org>
	<570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk>
	<4E4AD35C.7020504@FreeBSD.org>
	<6A7238AED44542A880B082A40304D940@multiplay.co.uk>
	<4E4BA21F.6010805@FreeBSD.org>
	<581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk>
	<4E4BBA7F.30907@FreeBSD.org>
Date: Wed, 17 Aug 2011 14:56:10 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
Cc: freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2011 13:56:12 -0000


----- Original Message ----- 
From: "Andriy Gapon" <avg@FreeBSD.org>
To: "Steven Hartland" <killing@multiplay.co.uk>
Cc: <freebsd-stable@FreeBSD.org>
Sent: Wednesday, August 17, 2011 1:56 PM
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE


> on 17/08/2011 15:15 Steven Hartland said the following:
>>> define allpcpu
>>> set $i = 0
>>> while ($i <= mp_maxid)
>>> p *cpuid_to_pcpu[$i]
>>> set $i = $i + 1
>>> end
>>> end
>>> allpcpu
>>
>> Here's the output.
> [snip]
>> $3 = {pc_curthread = 0xffffff06b7f9c000, pc_idlethread = 0xffffff0012d85460,
>> pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb = 0xffffff8d8f35ad00,
>> pc_switchtime = 564139963042291, pc_switchticks = 247796550, pc_cpuid = 2,
>>  pc_cpumask = 4, pc_other_cpus = 16777211, pc_allcpu = {sle_next =
>> 0xffffffff808af680}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 1005391948, v_trap =
>> 95927887, v_syscall = 2033274537, v_intr = 137253, v_soft = 151981308,
>>    v_vm_faults = 14199910, v_cow_faults = 1468132, v_cow_optim = 533, v_zfod =
>> 11032593, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout
>> = 0, v_vnodein = 17238, v_vnodeout = 48, v_vnodepgsin = 17238,
>>    v_vnodepgsout = 378, v_intrans = 6753, v_reactivated = 0, v_pdwakeups = 0,
>> v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 15435380,
>> v_page_size = 0, v_page_count = 0, v_free_reserved = 0,
>>    v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0,
>> v_active_count = 0, v_inactive_target = 0, v_inactive_count = 0, v_cache_count =
>> 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0,
>>    v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 24041, v_vforks = 16857,
>> v_rforks = 0, v_kthreads = 0, v_forkpages = 6281292, v_vforkpages = 3606842,
>> v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {8629094,
>>    693, 594838, 24425, 23707811}, pc_device = 0xffffff0012da2500, pc_netisr = 0x0,
>> pc_rm_queue = {rmq_next = 0xffffffff808afa50, rmq_prev = 0xffffffff808afa50},
>> pc_dynamic = 18446743526093326592,
>>  pc_monitorbuf = '\0' <repeats 127 times>, pc_prvspace = 0xffffffff808af900,
>> pc_curpmap = 0xffffffff8083ea50, pc_tssp = 0xffffffff808ae7d0, pc_commontssp =
>> 0xffffffff808ae7d0, pc_rsp0 = -491518579456,
>>  pc_scratch_rsp = 140737488347240, pc_apic_id = 2, pc_acpi_id = 2, pc_fs32p =
>> 0xffffffff808ad600, pc_gs32p = 0xffffffff808ad608, pc_ldt = 0xffffffff808ad648,
>> pc_tss = 0xffffffff808ad638, pc_cmci_mask = 8}
> [snip]
>
> Thank you.
> A few more questions:
> 1. more kgdb info for the core:
> p *(cpuid_to_pcpu[2]->pc_curthread)
> p *(cpuid_to_pcpu[2]->pc_curthread->td_proc)
> p *(cpuid_to_pcpu[2]->pc_curthread->td_proc->p_limit)
>

(kgdb) p *(cpuid_to_pcpu[2]->pc_curthread)
$1 = {td_lock = 0xffffffff8084a440, td_proc = 0xffffff070b5a48c0, td_plist = {tqe_next = 0x0, tqe_prev = 0xffffff070b5a48d0}, 
td_runq = {tqe_next = 0x0, tqe_prev = 0xffffffff8084a688}, td_slpq = {tqe_next = 0x0,
    tqe_prev = 0xffffff0296460900}, td_lockq = {tqe_next = 0x0, tqe_prev = 0xffffff8d8fb5c8b0}, td_cpuset = 0xffffff0012d65dc8, 
td_sel = 0xffffff0a1b76c700, td_sleepqueue = 0xffffff0296460900,
  td_turnstile = 0xffffff05f31d8000, td_umtxq = 0xffffff05513d9780, td_tid = 102057, td_sigqueue = {sq_signals = {__bits = {0, 0, 
0, 0}}, sq_kill = {__bits = {0, 0, 0, 0}}, sq_list = {tqh_first = 0x0,
      tqh_last = 0xffffff06b7f9c0a0}, sq_proc = 0xffffff070b5a48c0, sq_flags = 1}, td_flags = 6, td_inhibitors = 0, td_pflags = 0, 
td_dupfd = 0, td_sqqueue = 0, td_wchan = 0x0, td_wmesg = 0x0, td_lastcpu = 2 '\002',
  td_oncpu = 2 '\002', td_owepreempt = 0 '\0', td_tsqueue = 0 '\0', td_locks = 998, td_rw_rlocks = 0, td_lk_slocks = 0, td_blocked 
= 0x0, td_lockname = 0x0, td_contested = {lh_first = 0x0}, td_sleeplocks = 0x0,
  td_intr_nesting_level = 0, td_pinned = 1, td_ucred = 0xffffff0551cf9900, td_estcpu = 0, td_slptick = 0, td_blktick = 0, td_ru = 
{ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 0}, ru_maxrss = 2068,
    ru_ixrss = 5280, ru_idrss = 19296, ru_isrss = 6144, ru_minflt = 5015, ru_majflt = 0, ru_nswap = 0, ru_inblock = 0, ru_oublock 
= 0, ru_msgsnd = 241, ru_msgrcv = 2076, ru_nsignals = 1, ru_nvcsw = 2264, ru_nivcsw = 159},
  td_incruntime = 4257692, td_runtime = 487523210, td_pticks = 0, td_sticks = 0, td_iticks = 0, td_uticks = 0, td_intrval = 4, 
td_oldsigmask = {__bits = {0, 0, 0, 0}}, td_sigmask = {__bits = {16384, 0, 0, 0}},
  td_generation = 2423, td_sigstk = {ss_sp = 0x0, ss_size = 0, ss_flags = 4}, td_xsig = 0, td_profil_addr = 0, td_profil_ticks = 
0, td_name = "httpd", '\0' <repeats 14 times>, td_fpop = 0x0, td_dbgflags = 0, td_dbgksi = {
    ksi_link = {tqe_next = 0x0, tqe_prev = 0x0}, ksi_info = {si_signo = 0, si_errno = 0, si_code = 0, si_pid = 0, si_uid = 0, 
si_status = 0, si_addr = 0x0, si_value = {sival_int = 0, sival_ptr = 0x0, sigval_int = 0,
        sigval_ptr = 0x0}, _reason = {_fault = {_trapno = 0}, _timer = {_timerid = 0, _overrun = 0}, _mesgq = {_mqd = 0}, _poll = 
{_band = 0}, __spare__ = {__spare1__ = 0, __spare2__ = {0, 0, 0, 0, 0, 0, 0}}}}, ksi_flags = 0,
    ksi_sigq = 0x0}, td_ng_outbound = 0, td_osd = {osd_nslots = 0, osd_slots = 0x0, osd_next = {le_next = 0x0, le_prev = 0x0}}, 
td_rqindex = 32 ' ', td_base_pri = 128 '\200', td_priority = 128 '\200', td_pri_class = 3 '\003',
  td_user_pri = 128 '\200', td_base_user_pri = 128 '\200', td_pcb = 0xffffff8d8f35ad00, td_state = TDS_RUNNING, td_retval = {0, 
8}, td_slpcallout = {c_links = {sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0,
        tqe_prev = 0xffffff800088ce00}}, c_time = 247622368, c_arg = 0xffffff06b7f9c000, c_func = 0xffffffff803c4bd0 
<sleepq_timeout>, c_lock = 0x0, c_flags = 16, c_cpu = 13}, td_frame = 0xffffff8d8f35ac40,
  td_kstack_obj = 0xffffff0a51ee5e58, td_kstack = 18446743582190956544, td_kstack_pages = 4, td_unused1 = 0x0, td_unused2 = 0, 
td_unused3 = 0, td_critnest = 0, td_md = {md_spinlock_count = 0, md_saved_flags = 70},
  td_sched = 0xffffff06b7f9c428, td_ar = 0x0, td_syscalls = 129862, td_lprof = {{lh_first = 0x0}, {lh_first = 0x0}}, td_dtrace = 
0x0, td_errno = 0, td_vnet = 0x0, td_vnet_lpush = 0x0, td_rux = {rux_runtime = 483265518,
    rux_uticks = 7, rux_sticks = 17, rux_iticks = 0, rux_uu = 0, rux_su = 0, rux_tu = 0}, td_map_def_user = 0x0}
(kgdb) p *(cpuid_to_pcpu[2]->pc_curthread->td_proc)
$2 = {p_list = {le_next = 0xffffff0653ff78c0, le_prev = 0xffffffff80841b48}, p_threads = {tqh_first = 0xffffff06b7f9c000, tqh_last 
= 0xffffff06b7f9c010}, p_slock = {lock_object = {lo_name = 0xffffffff806323c0 "process slock",
      lo_flags = 720896, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, p_ucred = 0xffffff0551cf9900, p_fd = 0x0, p_fdtol = 0x0, 
p_stats = 0xffffff04ea565600, p_limit = 0x0, p_limco = {c_links = {sle = {sle_next = 0x0}, tqe = {
        tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, c_arg = 0x0, c_func = 0, c_lock = 0xffffff070b5a49b8, c_flags = 0, c_cpu = 
0}, p_sigacts = 0xffffff0a663a1000, p_flag = 268443904, p_state = PRS_NORMAL, p_pid = 78097,
  p_hash = {le_next = 0x0, le_prev = 0xffffff800021c888}, p_pglist = {le_next = 0xffffff00285c5460, le_prev = 0xffffff0afa9b8988}, 
p_pptr = 0xffffff0afa9b88c0, p_sibling = {le_next = 0xffffff00285c5460,
    le_prev = 0xffffff0afa9b89b0}, p_children = {lh_first = 0x0}, p_mtx = {lock_object = {lo_name = 0xffffffff806323b3 "process 
lock", lo_flags = 21168128, lo_data = 10, lo_witness = 0x0}, mtx_lock = 18446743003054325761},
  p_ksi = 0xffffff0016738bd0, p_sigqueue = {sq_signals = {__bits = {16384, 0, 0, 0}}, sq_kill = {__bits = {0, 0, 0, 0}}, sq_list = 
{tqh_first = 0xffffff033829d070, tqh_last = 0xffffff033829d070}, sq_proc = 0xffffff070b5a48c0,
    sq_flags = 1}, p_oppid = 0, p_vmspace = 0xffffffff8083e920, p_swtick = 89392056, p_realtimer = {it_interval = {tv_sec = 0, 
tv_usec = 0}, it_value = {tv_sec = 0, tv_usec = 0}}, p_ru = {ru_utime = {tv_sec = 0, tv_usec = 0},
    ru_stime = {tv_sec = 0, tv_usec = 0}, ru_maxrss = 0, ru_ixrss = 0, ru_idrss = 0, ru_isrss = 0, ru_minflt = 0, ru_majflt = 0, 
ru_nswap = 0, ru_inblock = 0, ru_oublock = 0, ru_msgsnd = 0, ru_msgrcv = 0, ru_nsignals = 0,
    ru_nvcsw = 0, ru_nivcsw = 0}, p_rux = {rux_runtime = 483265518, rux_uticks = 7, rux_sticks = 17, rux_iticks = 0, rux_uu = 
61934, rux_su = 150412, rux_tu = 212347}, p_crux = {rux_runtime = 80058539464, rux_uticks = 2914,
    rux_sticks = 1778, rux_iticks = 0, rux_uu = 21847439, rux_su = 13330387, rux_tu = 35177827}, p_profthreads = 0, p_exitthreads 
= 0, p_traceflag = 0, p_tracevp = 0x0, p_tracecred = 0x0, p_textvp = 0x0, p_lock = 11,
  p_sigiolst = {slh_first = 0x0}, p_sigparent = 20, p_sig = 0, p_code = 0, p_stops = 0, p_stype = 0, p_step = 0 '\0', p_pfsflags = 
0 '\0', p_nlminfo = 0x0, p_aioinfo = 0x0, p_singlethread = 0x0, p_suspcount = 0,
  p_xthread = 0xffffff06b7f9c000, p_boundary_count = 0, p_pendingcnt = 1, p_itimers = 0x0, p_magic = 3203398350, p_osrel = 802000, 
p_comm = "httpd", '\0' <repeats 14 times>, p_pgrp = 0xffffff05f3928080,
  p_sysent = 0xffffffff807fe180, p_args = 0xffffff0a8ad5e600, p_cpulimit = 9223372036854775807, p_nice = 0 '\0', p_fibnum = 0, 
p_xstat = 0, p_klist = {kl_list = {slh_first = 0x0},
    kl_lock = 0xffffffff803586e0 <knlist_mtx_lock>, kl_unlock = 0xffffffff803586b0 <knlist_mtx_unlock>, kl_assert_locked = 
0xffffffff80355380 <knlist_mtx_assert_locked>,
    kl_assert_unlocked = 0xffffffff80355390 <knlist_mtx_assert_unlocked>, kl_lockarg = 0xffffff070b5a49b8}, p_numthreads = 1, p_md 
= {md_ldt = 0x0, md_ldt_sd = {sd_lolimit = 0, sd_lobase = 0, sd_type = 0, sd_dpl = 0, sd_p = 0,
      sd_hilimit = 0, sd_xx0 = 0, sd_gran = 0, sd_hibase = 0, sd_xx1 = 0, sd_mbz = 0, sd_xx2 = 0}}, p_itcallout = {c_links = {sle 
= {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, c_arg = 0x0,
    c_func = 0, c_lock = 0x0, c_flags = 16, c_cpu = 0}, p_acflag = 1, p_peers = 0x0, p_leader = 0xffffff070b5a48c0, p_emuldata = 
0x0, p_label = 0x0, p_sched = 0xffffff070b5a4d20, p_ktr = {stqh_first = 0x0,
    stqh_last = 0xffffff070b5a4cf0}, p_mqnotifier = {lh_first = 0x0}, p_dtrace = 0x0, p_pwait = {cv_description = 
0xffffffff80632b87 "ppwait", cv_waiters = 0}}
(kgdb) p *(cpuid_to_pcpu[2]->pc_curthread->td_proc->p_limit)
Cannot access memory at address 0x0

> 2. do you have any additional patches in your source tree besides those debugging
> patches that I provided to you?

Yes, in this build we have:-
1. tcp_reass.c-logdebug+missingsegment-20110811-lstewart.patch (fixes tcp stalling)
http://people.freebsd.org/~lstewart/patches/misctcp/tcp_reass.c-logdebug%2bmissingsegment-20110811-lstewart.diff
2. libz.patch (disables assembly optimisations in libz as it causes application crashes)
3. udp6_usrreq.c.patch (fixes ipv4 on ipv6 sockets)
http://svnweb.freebsd.org/base/head/sys/netinet6/udp6_usrreq.c?r1=220463&r2=220462&pathrev=220463
4. cam-timeout-fix.patch (fixes overflow in cam timeouts)
http://codelabs.ru/fbsd/patches/cam/CAM-properly-convert-timeout-to-ticks.diff
5. ixgbe.c.patch & ixgbe.h.patch (fixes ipconfig disconnecting link)
6. stop_scheduler_on_panic.8.x.patch (your first patch)
7. panic-info.patch (your second patch)

The only patches of these present when we initially noticed the problem
where #2, #3 & #5 (but these machines are not using this driver)

> 3. do you have any thirdparty/out-of-tree kernel modules?
Nope, our kernel is compiled with a load of drivers disabled and then the following:-

device ahci

makeoptions MODULES_OVERRIDE="linux linprocfs acpi nullfs unionfs accf_http if_lagg opensolaris zfs ipmi i2c"
options     COMPAT_LINUX32
options     DEVICE_POLLING

N.B. although device polling is compiled in its not used on any of these machines.

> 4. could you please send me your kernel config?
See direct email, as not sure it will go to the list.

    Regards
    Steve 


================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-stable@FreeBSD.ORG  Wed Aug 17 14:14:39 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8108B106564A
	for <freebsd-stable@freebsd.org>; Wed, 17 Aug 2011 14:14:39 +0000 (UTC)
	(envelope-from daniel@digsys.bg)
Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230])
	by mx1.freebsd.org (Postfix) with ESMTP id EA6C58FC17
	for <freebsd-stable@freebsd.org>; Wed, 17 Aug 2011 14:14:38 +0000 (UTC)
Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5])
	(authenticated bits=0)
	by smtp-sofia.digsys.bg (8.14.4/8.14.4) with ESMTP id p7HEERj2095723
	(version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO)
	for <freebsd-stable@freebsd.org>; Wed, 17 Aug 2011 17:14:33 +0300 (EEST)
	(envelope-from daniel@digsys.bg)
Message-ID: <4E4BCCC3.60601@digsys.bg>
Date: Wed, 17 Aug 2011 17:14:27 +0300
From: Daniel Kalchev <daniel@digsys.bg>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110720 Thunderbird/5.0
MIME-Version: 1.0
To: freebsd-stable@freebsd.org
References: <4E4BC38D.1050808@quip.cz>
In-Reply-To: <4E4BC38D.1050808@quip.cz>
Content-Type: text/plain; charset=ISO-8859-2; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: can not boot from RAIDZ with 8-STABLE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2011 14:14:39 -0000


On 17.08.11 16:35, Miroslav Lachman wrote:
> I tried mfsBSD installation on Dell T110 with PERC H200A and 4x 500GB 
> SATA disks. If I create zpool with RAIDZ, the boot immediately hangs 
> with following error:
>
May be it that the BIOS does not see all drives at boot?

From owner-freebsd-stable@FreeBSD.ORG  Wed Aug 17 17:39:02 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6A7421065672;
	Wed, 17 Aug 2011 17:39:02 +0000 (UTC) (envelope-from hrs@FreeBSD.org)
Received: from mail.allbsd.org (gatekeeper-int.allbsd.org
	[IPv6:2001:2f0:104:e002::2])
	by mx1.freebsd.org (Postfix) with ESMTP id E10CB8FC0A;
	Wed, 17 Aug 2011 17:39:01 +0000 (UTC)
Received: from alph.allbsd.org (p3028-ipbf608funabasi.chiba.ocn.ne.jp
	[125.175.94.28]) (authenticated bits=128)
	by mail.allbsd.org (8.14.4/8.14.4) with ESMTP id p7HHcdTb013376;
	Thu, 18 Aug 2011 02:38:49 +0900 (JST) (envelope-from hrs@FreeBSD.org)
Received: from localhost (localhost [IPv6:::1]) (authenticated bits=0)
	by alph.allbsd.org (8.14.4/8.14.4) with ESMTP id p7HHcaNH039802;
	Thu, 18 Aug 2011 02:38:38 +0900 (JST) (envelope-from hrs@FreeBSD.org)
Date: Thu, 18 Aug 2011 02:38:32 +0900 (JST)
Message-Id: <20110818.023832.373949045518579359.hrs@allbsd.org>
To: mike@sentex.net
From: Hiroki Sato <hrs@FreeBSD.org>
In-Reply-To: <4E15A08C.6090407@sentex.net>
References: <20110707082027.GX48734@deviant.kiev.zoral.com.ua>
	<4E159959.2070401@sentex.net> <4E15A08C.6090407@sentex.net>
X-PGPkey-fingerprint: BDB3 443F A5DD B3D0 A530  FFD7 4F2C D3D8 2793 CF2D
X-Mailer: Mew version 6.3 on Emacs 23.1 / Mule 6.0 (HANACHIRUSATO)
Mime-Version: 1.0
Content-Type: Multipart/Signed; protocol="application/pgp-signature";
	micalg=pgp-sha1;
	boundary="--Security_Multipart(Thu_Aug_18_02_38_32_2011_300)--"
Content-Transfer-Encoding: 7bit
X-Virus-Scanned: clamav-milter 0.97 at gatekeeper.allbsd.org
X-Virus-Status: Clean
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.3
	(mail.allbsd.org [133.31.130.32]);
	Thu, 18 Aug 2011 02:38:54 +0900 (JST)
X-Spam-Status: No, score=-102.6 required=13.0 tests=BAYES_00,
	CONTENT_TYPE_PRESENT,DIRECTOCNDYN,RCVD_IN_RP_RNBL,SPF_SOFTFAIL,
	USER_IN_WHITELIST autolearn=no version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	gatekeeper.allbsd.org
Cc: kostikbel@gmail.com, freebsd-stable@FreeBSD.org, avg@FreeBSD.org
Subject: Re: panic: spin lock held too long (RELENG_8 from today)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2011 17:39:02 -0000

----Security_Multipart(Thu_Aug_18_02_38_32_2011_300)--
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Hi,

Mike Tancsa <mike@sentex.net> wrote
  in <4E15A08C.6090407@sentex.net>:

mi> On 7/7/2011 7:32 AM, Mike Tancsa wrote:
mi> > On 7/7/2011 4:20 AM, Kostik Belousov wrote:
mi> >>
mi> >> BTW, we had a similar panic, "spinlock held too long", the spinlock
mi> >> is the sched lock N, on busy 8-core box recently upgraded to the
mi> >> stable/8. Unfortunately, machine hung dumping core, so the stack trace
mi> >> for the owner thread was not available.
mi> >>
mi> >> I was unable to make any conclusion from the data that was present.
mi> >> If the situation is reproducable, you coulld try to revert r221937. This
mi> >> is pure speculation, though.
mi> >
mi> > Another crash just now after 5hrs uptime. I will try and revert r221937
mi> > unless there is any extra debugging you want me to add to the kernel
mi> > instead  ?

 I am also suffering from a reproducible panic on an 8-STABLE box, an
 NFS server with heavy I/O load.  I could not get a kernel dump
 because this panic locked up the machine just after it occurred, but
 according to the stack trace it was the same as posted one.
 Switching to an 8.2R kernel can prevent this panic.

 Any progress on the investigation?

--
spin lock 0xffffffff80cb46c0 (sched lock 0) held by 0xffffff01900458c0 (tid 100489) too long
panic: spin lock held too long
cpuid = 1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
kdb_backtrace() at kdb_backtrace+0x37
panic() at panic+0x187
_mtx_lock_spin_failed() at _mtx_lock_spin_failed+0x39
_mtx_lock_spin() at _mtx_lock_spin+0x9e
sched_add() at sched_add+0x117
setrunnable() at setrunnable+0x78
sleepq_signal() at sleepq_signal+0x7a
cv_signal() at cv_signal+0x3b
xprt_active() at xprt_active+0xe3
svc_vc_soupcall() at svc_vc_soupcall+0xc
sowakeup() at sowakeup+0x69
tcp_do_segment() at tcp_do_segment+0x25e7
tcp_input() at tcp_input+0xcdd
ip_input() at ip_input+0xac
netisr_dispatch_src() at netisr_dispatch_src+0x7e
ether_demux() at ether_demux+0x14d
ether_input() at ether_input+0x17d
em_rxeof() at em_rxeof+0x1ca
em_handle_que() at em_handle_que+0x5b
taskqueue_run_locked() at taskqueue_run_locked+0x85
taskqueue_thread_loop() at taskqueue_thread_loop+0x4e
fork_exit() at fork_exit+0x11f
fork_trampoline() at fork_trampoline+0xe
--

-- Hiroki

----Security_Multipart(Thu_Aug_18_02_38_32_2011_300)--
Content-Type: application/pgp-signature
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iEYEABECAAYFAk5L/JgACgkQTyzT2CeTzy3bGgCgtnOCsXiCHd7Ghg5RReen9Q4/
FU4AoKIlZkp/sSlduoEme4rspSG7ZQWR
=8Yer
-----END PGP SIGNATURE-----

----Security_Multipart(Thu_Aug_18_02_38_32_2011_300)----

From owner-freebsd-stable@FreeBSD.ORG  Wed Aug 17 17:52:07 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 50E451065670
	for <freebsd-stable@FreeBSD.org>; Wed, 17 Aug 2011 17:52:07 +0000 (UTC)
	(envelope-from sterling@camdensoftware.com)
Received: from wh1.interactivevillages.com (ca.2e.7bae.static.theplanet.com
	[174.123.46.202])
	by mx1.freebsd.org (Postfix) with ESMTP id 1833E8FC1F
	for <freebsd-stable@FreeBSD.org>; Wed, 17 Aug 2011 17:52:06 +0000 (UTC)
Received: from 184-78-197-203.war.clearwire-wmx.net ([184.78.197.203]
	helo=_HOSTNAME_)
	by wh1.interactivevillages.com with esmtpsa (TLSv1:AES256-SHA:256)
	(Exim 4.69) (envelope-from <sterling@camdensoftware.com>)
	id 1QtkHb-0004li-Lu
	for freebsd-stable@FreeBSD.org; Wed, 17 Aug 2011 10:51:40 -0700
Received: by _HOSTNAME_ (sSMTP sendmail emulation);
	Wed, 17 Aug 2011 10:52:01 -0700
Date: Wed, 17 Aug 2011 10:52:01 -0700
From: Chip Camden <sterling@camdensoftware.com>
To: freebsd-stable@FreeBSD.org
Message-ID: <20110817175201.GB1973@libertas.local.camdensoftware.com>
Mail-Followup-To: freebsd-stable@FreeBSD.org
References: <20110707082027.GX48734@deviant.kiev.zoral.com.ua>
	<4E159959.2070401@sentex.net> <4E15A08C.6090407@sentex.net>
	<20110818.023832.373949045518579359.hrs@allbsd.org>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="ftEhullJWpWg/VHq"
Content-Disposition: inline
In-Reply-To: <20110818.023832.373949045518579359.hrs@allbsd.org>
User-Agent: Mutt/1.4.2.3i
Company: Camden Software Consulting
URL: http://camdensoftware.com
X-PGP-Key: http://pgp.mit.edu:11371/pks/lookup?search=0xD6DBAF91
X-AntiAbuse: This header was added to track abuse,
	please include it with any abuse report
X-AntiAbuse: Primary Hostname - wh1.interactivevillages.com
X-AntiAbuse: Original Domain - freebsd.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - camdensoftware.com
X-Source: 
X-Source-Args: 
X-Source-Dir: 
Cc: 
Subject: Re: panic: spin lock held too long (RELENG_8 from today)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2011 17:52:07 -0000


--ftEhullJWpWg/VHq
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Quoth Hiroki Sato on Thursday, 18 August 2011:
> Hi,
>=20
> Mike Tancsa <mike@sentex.net> wrote
>   in <4E15A08C.6090407@sentex.net>:
>=20
> mi> On 7/7/2011 7:32 AM, Mike Tancsa wrote:
> mi> > On 7/7/2011 4:20 AM, Kostik Belousov wrote:
> mi> >>
> mi> >> BTW, we had a similar panic, "spinlock held too long", the spinlock
> mi> >> is the sched lock N, on busy 8-core box recently upgraded to the
> mi> >> stable/8. Unfortunately, machine hung dumping core, so the stack t=
race
> mi> >> for the owner thread was not available.
> mi> >>
> mi> >> I was unable to make any conclusion from the data that was present.
> mi> >> If the situation is reproducable, you coulld try to revert r221937=
. This
> mi> >> is pure speculation, though.
> mi> >
> mi> > Another crash just now after 5hrs uptime. I will try and revert r22=
1937
> mi> > unless there is any extra debugging you want me to add to the kernel
> mi> > instead  ?
>=20
>  I am also suffering from a reproducible panic on an 8-STABLE box, an
>  NFS server with heavy I/O load.  I could not get a kernel dump
>  because this panic locked up the machine just after it occurred, but
>  according to the stack trace it was the same as posted one.
>  Switching to an 8.2R kernel can prevent this panic.
>=20
>  Any progress on the investigation?
>=20
> --
> spin lock 0xffffffff80cb46c0 (sched lock 0) held by 0xffffff01900458c0 (t=
id 100489) too long
> panic: spin lock held too long
> cpuid =3D 1
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> kdb_backtrace() at kdb_backtrace+0x37
> panic() at panic+0x187
> _mtx_lock_spin_failed() at _mtx_lock_spin_failed+0x39
> _mtx_lock_spin() at _mtx_lock_spin+0x9e
> sched_add() at sched_add+0x117
> setrunnable() at setrunnable+0x78
> sleepq_signal() at sleepq_signal+0x7a
> cv_signal() at cv_signal+0x3b
> xprt_active() at xprt_active+0xe3
> svc_vc_soupcall() at svc_vc_soupcall+0xc
> sowakeup() at sowakeup+0x69
> tcp_do_segment() at tcp_do_segment+0x25e7
> tcp_input() at tcp_input+0xcdd
> ip_input() at ip_input+0xac
> netisr_dispatch_src() at netisr_dispatch_src+0x7e
> ether_demux() at ether_demux+0x14d
> ether_input() at ether_input+0x17d
> em_rxeof() at em_rxeof+0x1ca
> em_handle_que() at em_handle_que+0x5b
> taskqueue_run_locked() at taskqueue_run_locked+0x85
> taskqueue_thread_loop() at taskqueue_thread_loop+0x4e
> fork_exit() at fork_exit+0x11f
> fork_trampoline() at fork_trampoline+0xe
> --
>=20
> -- Hiroki


I'm also getting similar panics on 8.2-STABLE.  Locks up everything and I
have to power off.  Once, I happened to be looking at the console when it
happened and copied dow the following:

Sleeping thread (tif 100037, pid 0) owns a non-sleepable lock
panic: sleeping thread
cpuid=3D1

Another time I got:

lock order reversal:
1st 0xffffff000593e330 snaplk (snaplk) @ /usr/src/sys/kern/vfr_vnops.c:296
2nd 0xffffff0005e5d578 ufs (ufs) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:1587

I didn't copy down the traceback.

These panics seem to hit when I'm doing heavy WAN I/O.  I can go for
about a day without one as long as I stay away from the web or even chat.
Last night this system copied a backup of 35GB over the local network
without failing, but as soon as I hopped onto Firefox this morning, down
she went.  I don't know if that's coincidence or useful data.

I didn't get to say "Thanks" to Eitan Adler for attempting to help me
with this on Monday night.  Thanks, Eitan!

--=20
=2EO. | Sterling (Chip) Camden      | http://camdensoftware.com
=2E.O | sterling@camdensoftware.com | http://chipsquips.com
OOO | 2048R/D6DBAF91              | http://chipstips.com

--ftEhullJWpWg/VHq
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iQEcBAEBAgAGBQJOS//BAAoJEIpckszW26+Rb80H/3/7eQlINeIaoLUz6iE2dSG8
/7Eoyt87VSs1H8XUYVPD+tiYXFgvpz6zu49zkXTcNwS/kwgJjzMHngEeY3eKom8v
6iaWilwe12nrkDOdkJZXB4kml6WTa71VkAlpC0hUJHuPD+trriZfSdJKDBwOXaA/
rJzp25k0TZU+BlJQJr3eXGPP1L/KjxSPLbIeowGWpV7ZPcRQRm3JerAGcn3f38ud
PR4cBwVKHcPYzLm8ZAQLL99QJy5ZqyTWjLVE16Erc2AUyD1coURH2X6w3JtJ4mQ2
YBQhdREV1tchj/mvM30b/xnozcjTZuHDOoXpZgGPxKAQqDRG3Y7FG5jc33yELjg=
=OoPD
-----END PGP SIGNATURE-----

--ftEhullJWpWg/VHq--

From owner-freebsd-stable@FreeBSD.ORG  Wed Aug 17 18:26:15 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6E09B106566C;
	Wed, 17 Aug 2011 18:26:15 +0000 (UTC) (envelope-from mike@sentex.net)
Received: from smarthost1.sentex.ca (smarthost1-6.sentex.ca
	[IPv6:2607:f3e0:0:1::12])
	by mx1.freebsd.org (Postfix) with ESMTP id 17CA88FC13;
	Wed, 17 Aug 2011 18:26:15 +0000 (UTC)
Received: from [IPv6:2607:f3e0:0:4:f025:8813:7603:7e4a] (saphire3.sentex.ca
	[IPv6:2607:f3e0:0:4:f025:8813:7603:7e4a])
	by smarthost1.sentex.ca (8.14.4/8.14.4) with ESMTP id p7HIQBwZ025744;
	Wed, 17 Aug 2011 14:26:11 -0400 (EDT) (envelope-from mike@sentex.net)
Message-ID: <4E4C07C8.9090909@sentex.net>
Date: Wed, 17 Aug 2011 14:26:16 -0400
From: Mike Tancsa <mike@sentex.net>
Organization: Sentex Communications
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US;
	rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7
MIME-Version: 1.0
To: Hiroki Sato <hrs@FreeBSD.org>
References: <20110707082027.GX48734@deviant.kiev.zoral.com.ua>	<4E159959.2070401@sentex.net>	<4E15A08C.6090407@sentex.net>
	<20110818.023832.373949045518579359.hrs@allbsd.org>
In-Reply-To: <20110818.023832.373949045518579359.hrs@allbsd.org>
X-Enigmail-Version: 1.1.1
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Scanned-By: MIMEDefang 2.71 on IPv6:2607:f3e0:0:1::12
Cc: kostikbel@gmail.com, freebsd-stable@FreeBSD.org, avg@FreeBSD.org
Subject: Re: panic: spin lock held too long (RELENG_8 from today)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2011 18:26:15 -0000

On 8/17/2011 1:38 PM, Hiroki Sato wrote:
>  Any progress on the investigation?

Unfortunately, I cannot reproduce it yet with a debugging kernel :(


	---Mike

> 
> --
> spin lock 0xffffffff80cb46c0 (sched lock 0) held by 0xffffff01900458c0 (tid 100489) too long
> panic: spin lock held too long
> cpuid = 1
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> kdb_backtrace() at kdb_backtrace+0x37
> panic() at panic+0x187
> _mtx_lock_spin_failed() at _mtx_lock_spin_failed+0x39
> _mtx_lock_spin() at _mtx_lock_spin+0x9e
> sched_add() at sched_add+0x117
> setrunnable() at setrunnable+0x78
> sleepq_signal() at sleepq_signal+0x7a
> cv_signal() at cv_signal+0x3b
> xprt_active() at xprt_active+0xe3
> svc_vc_soupcall() at svc_vc_soupcall+0xc
> sowakeup() at sowakeup+0x69
> tcp_do_segment() at tcp_do_segment+0x25e7
> tcp_input() at tcp_input+0xcdd
> ip_input() at ip_input+0xac
> netisr_dispatch_src() at netisr_dispatch_src+0x7e
> ether_demux() at ether_demux+0x14d
> ether_input() at ether_input+0x17d
> em_rxeof() at em_rxeof+0x1ca
> em_handle_que() at em_handle_que+0x5b
> taskqueue_run_locked() at taskqueue_run_locked+0x85
> taskqueue_thread_loop() at taskqueue_thread_loop+0x4e
> fork_exit() at fork_exit+0x11f
> fork_trampoline() at fork_trampoline+0xe
> --
> 
> -- Hiroki


-- 
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mike@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/

From owner-freebsd-stable@FreeBSD.ORG  Wed Aug 17 18:37:02 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 71F11106566B;
	Wed, 17 Aug 2011 18:37:02 +0000 (UTC)
	(envelope-from asmrookie@gmail.com)
Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com
	[209.85.213.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 0C67F8FC18;
	Wed, 17 Aug 2011 18:37:01 +0000 (UTC)
Received: by yxn22 with SMTP id 22so46128yxn.13
	for <multiple recipients>; Wed, 17 Aug 2011 11:37:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=trAAOYcmYpZlnYgok46AMUtjvZHFC2k++U47mGVVj2U=;
	b=sqUGJAaE4KDjM/3jS82nUBzMIjgmSNj8I0ZNcNjFyakDCmUxDTnN3Kf5tjoJuOYfoE
	7ZWafvuExXnIBNA+PqhF8NQg5qyTSTF+/ta2Bg0nLy37QYvJswGp7npoS+BAJgNE2+O9
	dLCECfFvvihEYnE93cW0AuajGmL6EESACfJU8=
MIME-Version: 1.0
Received: by 10.236.75.228 with SMTP id z64mr4420989yhd.68.1313606221377; Wed,
	17 Aug 2011 11:37:01 -0700 (PDT)
Sender: asmrookie@gmail.com
Received: by 10.236.108.33 with HTTP; Wed, 17 Aug 2011 11:37:01 -0700 (PDT)
In-Reply-To: <20110818.023832.373949045518579359.hrs@allbsd.org>
References: <20110707082027.GX48734@deviant.kiev.zoral.com.ua>
	<4E159959.2070401@sentex.net> <4E15A08C.6090407@sentex.net>
	<20110818.023832.373949045518579359.hrs@allbsd.org>
Date: Wed, 17 Aug 2011 20:37:01 +0200
X-Google-Sender-Auth: teH8Tr77CO5VlnvGwZYAJJyguaI
Message-ID: <CAJ-FndCDOW0_B2MV0LZEo-tpEa9+7oAnJ7iHvKQsM4j4B0DLqg@mail.gmail.com>
From: Attilio Rao <attilio@freebsd.org>
To: Hiroki Sato <hrs@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: kostikbel@gmail.com, freebsd-stable@freebsd.org, avg@freebsd.org
Subject: Re: panic: spin lock held too long (RELENG_8 from today)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2011 18:37:02 -0000

2011/8/17 Hiroki Sato <hrs@freebsd.org>:
> Hi,
>
> Mike Tancsa <mike@sentex.net> wrote
> =C2=A0in <4E15A08C.6090407@sentex.net>:
>
> mi> On 7/7/2011 7:32 AM, Mike Tancsa wrote:
> mi> > On 7/7/2011 4:20 AM, Kostik Belousov wrote:
> mi> >>
> mi> >> BTW, we had a similar panic, "spinlock held too long", the spinloc=
k
> mi> >> is the sched lock N, on busy 8-core box recently upgraded to the
> mi> >> stable/8. Unfortunately, machine hung dumping core, so the stack t=
race
> mi> >> for the owner thread was not available.
> mi> >>
> mi> >> I was unable to make any conclusion from the data that was present=
.
> mi> >> If the situation is reproducable, you coulld try to revert r221937=
. This
> mi> >> is pure speculation, though.
> mi> >
> mi> > Another crash just now after 5hrs uptime. I will try and revert r22=
1937
> mi> > unless there is any extra debugging you want me to add to the kerne=
l
> mi> > instead =C2=A0?
>
> =C2=A0I am also suffering from a reproducible panic on an 8-STABLE box, a=
n
> =C2=A0NFS server with heavy I/O load. =C2=A0I could not get a kernel dump
> =C2=A0because this panic locked up the machine just after it occurred, bu=
t
> =C2=A0according to the stack trace it was the same as posted one.
> =C2=A0Switching to an 8.2R kernel can prevent this panic.
>
> =C2=A0Any progress on the investigation?

Hiroki,
how easilly can you reproduce it?

It would be important to have a DDB textdump with these informations:
- bt
- ps
- show allpcpu
- alltrace

Alternatively, a coredump which has the stop cpu patch which Andryi can pro=
vide.

Thanks,
Attilio


--=20
Peace can only be achieved by understanding - A. Einstein

From owner-freebsd-stable@FreeBSD.ORG  Wed Aug 17 18:57:30 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DF24C1065670
	for <freebsd-stable@freebsd.org>; Wed, 17 Aug 2011 18:57:30 +0000 (UTC)
	(envelope-from artemb@gmail.com)
Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com
	[74.125.82.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 7A3138FC17
	for <freebsd-stable@freebsd.org>; Wed, 17 Aug 2011 18:57:30 +0000 (UTC)
Received: by wyh15 with SMTP id 15so1128899wyh.13
	for <freebsd-stable@freebsd.org>; Wed, 17 Aug 2011 11:57:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type;
	bh=tSOqYllt8iTeFJUjPd66NgJ7HSw32N+s/TlwTImshDQ=;
	b=kH6UOyC6Q+S8sGdk4gxQHLTPwDxemMuUsMK48DmIUyWdCKY0KI0QVu8fLwEtNCjqkO
	k/kk8pSBT/0DMTN+vlkiiMg0UgO4uV0xJp187T6CKX6al9B3IeAKzLtFb13FKzGFhAMi
	IrtU+AX4QXAxKKCgez7NhBYGZlZ1XlniCr8Bo=
MIME-Version: 1.0
Received: by 10.217.6.81 with SMTP id x59mr1160358wes.50.1313607449443; Wed,
	17 Aug 2011 11:57:29 -0700 (PDT)
Sender: artemb@gmail.com
Received: by 10.216.181.210 with HTTP; Wed, 17 Aug 2011 11:57:29 -0700 (PDT)
In-Reply-To: <4E4BCCC3.60601@digsys.bg>
References: <4E4BC38D.1050808@quip.cz>
	<4E4BCCC3.60601@digsys.bg>
Date: Wed, 17 Aug 2011 11:57:29 -0700
X-Google-Sender-Auth: wjA8PTrtbTVbXuJtpIlzsr1lheA
Message-ID: <CAFqOu6hQzzwrTpuyddqrODr8WP4Ke0pi7MoYhYL9ivfsNHxNhA@mail.gmail.com>
From: Artem Belevich <art@freebsd.org>
To: Miroslav Lachman <000.fbsd@quip.cz>
Content-Type: text/plain; charset=ISO-8859-1
Cc: freebsd-stable@freebsd.org, Daniel Kalchev <daniel@digsys.bg>
Subject: Re: can not boot from RAIDZ with 8-STABLE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2011 18:57:30 -0000

2011/8/17 Daniel Kalchev <daniel@digsys.bg>:
> On 17.08.11 16:35, Miroslav Lachman wrote:
>>
>> I tried mfsBSD installation on Dell T110 with PERC H200A and 4x 500GB SATA
>> disks. If I create zpool with RAIDZ, the boot immediately hangs with
>> following error:
>>
> May be it that the BIOS does not see all drives at boot?

Indeed. On one of my systems BIOS only allows access to the first four
HDDs in the BIOS' boot priority list. What's especially annoying is
that BIOS keep rearranging boot list every time new device is added or
removed or if SATA controller card is moved to another slot. Every
time it happens I have to go back and rearrange the drives so that my
RAIDZ drives are on top of the list.

If you can boot off CD or USB how many drives does bootloader report
just before it gets to the menu?

--Artem

From owner-freebsd-stable@FreeBSD.ORG  Wed Aug 17 19:40:57 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E7A4F1065670;
	Wed, 17 Aug 2011 19:40:56 +0000 (UTC)
	(envelope-from 000.fbsd@quip.cz)
Received: from elsa.codelab.cz (elsa.codelab.cz [94.124.105.4])
	by mx1.freebsd.org (Postfix) with ESMTP id 71DD28FC08;
	Wed, 17 Aug 2011 19:40:56 +0000 (UTC)
Received: from elsa.codelab.cz (localhost [127.0.0.1])
	by elsa.codelab.cz (Postfix) with ESMTP id 3C9B828427;
	Wed, 17 Aug 2011 21:40:55 +0200 (CEST)
Received: from [192.168.1.2] (ip-86-49-61-235.net.upcbroadband.cz
	[86.49.61.235])
	(using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
	(No client certificate requested)
	by elsa.codelab.cz (Postfix) with ESMTPSA id 64D5A28424;
	Wed, 17 Aug 2011 21:40:54 +0200 (CEST)
Message-ID: <4E4C1945.5030504@quip.cz>
Date: Wed, 17 Aug 2011 21:40:53 +0200
From: Miroslav Lachman <000.fbsd@quip.cz>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US;
	rv:1.9.1.19) Gecko/20110420 Lightning/1.0b1 SeaMonkey/2.0.14
MIME-Version: 1.0
To: Artem Belevich <art@freebsd.org>
References: <4E4BC38D.1050808@quip.cz>	<4E4BCCC3.60601@digsys.bg>
	<CAFqOu6hQzzwrTpuyddqrODr8WP4Ke0pi7MoYhYL9ivfsNHxNhA@mail.gmail.com>
In-Reply-To: <CAFqOu6hQzzwrTpuyddqrODr8WP4Ke0pi7MoYhYL9ivfsNHxNhA@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-stable@freebsd.org, Daniel Kalchev <daniel@digsys.bg>
Subject: Re: can not boot from RAIDZ with 8-STABLE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2011 19:40:57 -0000

Artem Belevich wrote:
> 2011/8/17 Daniel Kalchev<daniel@digsys.bg>:
>> On 17.08.11 16:35, Miroslav Lachman wrote:
>>>
>>> I tried mfsBSD installation on Dell T110 with PERC H200A and 4x 500GB SATA
>>> disks. If I create zpool with RAIDZ, the boot immediately hangs with
>>> following error:
>>>
>> May be it that the BIOS does not see all drives at boot?
>
> Indeed. On one of my systems BIOS only allows access to the first four
> HDDs in the BIOS' boot priority list. What's especially annoying is
> that BIOS keep rearranging boot list every time new device is added or
> removed or if SATA controller card is moved to another slot. Every
> time it happens I have to go back and rearrange the drives so that my
> RAIDZ drives are on top of the list.
>
> If you can boot off CD or USB how many drives does bootloader report
> just before it gets to the menu?

Thank you guys, you are right. The BIOS provides only 1 disk to the 
loader! I checked it from loader prompt by lsdev (booted from USB 
external HDD).

So I will try to make a small zpool mirror for root and boot (if ZFS 
mirror can be made of 4 providers instead of two) and the rest will be 
in RAIDZ.

If that fails, I will go my old way with internal USB flash disk with 
UFS for booting and RAIDZ of 4 disks for storage as I did it few years 
ago with 7.0 or 7.1.

Thank you again!

Miroslav Lachman

From owner-freebsd-stable@FreeBSD.ORG  Wed Aug 17 19:44:37 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 31279106564A;
	Wed, 17 Aug 2011 19:44:37 +0000 (UTC) (envelope-from hrs@FreeBSD.org)
Received: from mail.allbsd.org (gatekeeper-int.allbsd.org
	[IPv6:2001:2f0:104:e002::2])
	by mx1.freebsd.org (Postfix) with ESMTP id A08688FC17;
	Wed, 17 Aug 2011 19:44:36 +0000 (UTC)
Received: from alph.allbsd.org (p3028-ipbf608funabasi.chiba.ocn.ne.jp
	[125.175.94.28]) (authenticated bits=128)
	by mail.allbsd.org (8.14.4/8.14.4) with ESMTP id p7HJiEp5092450;
	Thu, 18 Aug 2011 04:44:24 +0900 (JST) (envelope-from hrs@FreeBSD.org)
Received: from localhost (localhost [IPv6:::1]) (authenticated bits=0)
	by alph.allbsd.org (8.14.4/8.14.4) with ESMTP id p7HJiAHT041636;
	Thu, 18 Aug 2011 04:44:12 +0900 (JST) (envelope-from hrs@FreeBSD.org)
Date: Thu, 18 Aug 2011 04:33:32 +0900 (JST)
Message-Id: <20110818.043332.27079545013461535.hrs@allbsd.org>
To: attilio@FreeBSD.org
From: Hiroki Sato <hrs@FreeBSD.org>
In-Reply-To: <CAJ-FndCDOW0_B2MV0LZEo-tpEa9+7oAnJ7iHvKQsM4j4B0DLqg@mail.gmail.com>
References: <4E15A08C.6090407@sentex.net>
	<20110818.023832.373949045518579359.hrs@allbsd.org>
	<CAJ-FndCDOW0_B2MV0LZEo-tpEa9+7oAnJ7iHvKQsM4j4B0DLqg@mail.gmail.com>
X-PGPkey-fingerprint: BDB3 443F A5DD B3D0 A530  FFD7 4F2C D3D8 2793 CF2D
X-Mailer: Mew version 6.3 on Emacs 23.1 / Mule 6.0 (HANACHIRUSATO)
Mime-Version: 1.0
Content-Type: Multipart/Signed; protocol="application/pgp-signature";
	micalg=pgp-sha1;
	boundary="--Security_Multipart(Thu_Aug_18_04_33_32_2011_840)--"
Content-Transfer-Encoding: 7bit
X-Virus-Scanned: clamav-milter 0.97 at gatekeeper.allbsd.org
X-Virus-Status: Clean
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.3
	(mail.allbsd.org [133.31.130.32]);
	Thu, 18 Aug 2011 04:44:29 +0900 (JST)
X-Spam-Status: No, score=-102.2 required=13.0 tests=BAYES_00,
	CONTENT_TYPE_PRESENT,DIRECTOCNDYN,MIMEQENC,QENCPTR2,RCVD_IN_RP_RNBL,
	SPF_SOFTFAIL,USER_IN_WHITELIST autolearn=no version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	gatekeeper.allbsd.org
Cc: kostikbel@gmail.com, freebsd-stable@FreeBSD.org, avg@FreeBSD.org
Subject: Re: panic: spin lock held too long (RELENG_8 from today)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2011 19:44:37 -0000

----Security_Multipart(Thu_Aug_18_04_33_32_2011_840)--
Content-Type: Text/Plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable

Attilio Rao <attilio@freebsd.org> wrote
  in <CAJ-FndCDOW0_B2MV0LZEo-tpEa9+7oAnJ7iHvKQsM4j4B0DLqg@mail.gmail.co=
m>:

at> 2011/8/17 Hiroki Sato <hrs@freebsd.org>:
at> > Hi,
at> >
at> > Mike Tancsa <mike@sentex.net> wrote
at> > =A0in <4E15A08C.6090407@sentex.net>:
at> >
at> > mi> On 7/7/2011 7:32 AM, Mike Tancsa wrote:
at> > mi> > On 7/7/2011 4:20 AM, Kostik Belousov wrote:
at> > mi> >>
at> > mi> >> BTW, we had a similar panic, "spinlock held too long", the=
 spinlock
at> > mi> >> is the sched lock N, on busy 8-core box recently upgraded =
to the
at> > mi> >> stable/8. Unfortunately, machine hung dumping core, so the=
 stack trace
at> > mi> >> for the owner thread was not available.
at> > mi> >>
at> > mi> >> I was unable to make any conclusion from the data that was=
 present.
at> > mi> >> If the situation is reproducable, you coulld try to revert=
 r221937. This
at> > mi> >> is pure speculation, though.
at> > mi> >
at> > mi> > Another crash just now after 5hrs uptime. I will try and re=
vert r221937
at> > mi> > unless there is any extra debugging you want me to add to t=
he kernel
at> > mi> > instead =A0?
at> >
at> > =A0I am also suffering from a reproducible panic on an 8-STABLE b=
ox, an
at> > =A0NFS server with heavy I/O load. =A0I could not get a kernel du=
mp
at> > =A0because this panic locked up the machine just after it occurre=
d, but
at> > =A0according to the stack trace it was the same as posted one.
at> > =A0Switching to an 8.2R kernel can prevent this panic.
at> >
at> > =A0Any progress on the investigation?
at> =

at> Hiroki,
at> how easilly can you reproduce it?

 It takes 5-10 hours.  I installed another kernel for debugging just
 now, so I think I will be able to collect more detail information in
 a couple of days.

at> It would be important to have a DDB textdump with these information=
s:
at> - bt
at> - ps
at> - show allpcpu
at> - alltrace
at> =

at> Alternatively, a coredump which has the stop cpu patch which Andryi=
 can provide.

 Okay, I will post them once I can get another panic.  Thanks!

-- Hiroki

----Security_Multipart(Thu_Aug_18_04_33_32_2011_840)--
Content-Type: application/pgp-signature
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iEYEABECAAYFAk5MF4wACgkQTyzT2CeTzy0Z6gCgluxIPrG308LTbGGysww6wQ4R
4TsAnj2fiZoQOXYk0jycI9e3TPKTFcpy
=lTzB
-----END PGP SIGNATURE-----

----Security_Multipart(Thu_Aug_18_04_33_32_2011_840)----

From owner-freebsd-stable@FreeBSD.ORG  Wed Aug 17 20:21:48 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DA4E8106567A;
	Wed, 17 Aug 2011 20:21:47 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 339038FC20;
	Wed, 17 Aug 2011 20:21:45 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA00935;
	Wed, 17 Aug 2011 23:21:43 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
	by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1QtmcR-000Du9-7E; Wed, 17 Aug 2011 23:21:43 +0300
Message-ID: <4E4C22D6.6070407@FreeBSD.org>
Date: Wed, 17 Aug 2011 23:21:42 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:6.0) Gecko/20110817 Thunderbird/6.0
MIME-Version: 1.0
To: freebsd-jail@FreeBSD.org, freebsd-hackers@FreeBSD.org
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><A71C3ACF01EC4D36871E49805C1A5321@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk>
	<4E43E272.1060204@FreeBSD.org>
	<62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk>
	<4E440865.1040500@FreeBSD.org>
	<6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk>
	<4E441314.6060606@FreeBSD.org>
	<2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk>
	<4E48D967.9060804@FreeBSD.org>
	<9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk>
	<4E490DAF.1080009@FreeBSD.org>
	<796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk>
	<4E491D01.1090902@FreeBSD.org>
	<570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk>
	<4E4AD35C.7020504@FreeBSD.org>
	<6A7238AED44542A880B082A40304D940@multiplay.co.uk>
	<4E4BA21F.6010805@FreeBSD.org>
	<581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk>
	<4E4BBA7F.30907@FreeBSD.org>
	<88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk>
In-Reply-To: <88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk>
X-Enigmail-Version: 1.2.1
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-stable@FreeBSD.org, Steven Hartland <killing@multiplay.co.uk>
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2011 20:21:48 -0000


Thanks to the debug that Steven provided and to the help that I received from
Kostik, I think that now I understand the basic mechanics of this panic, but,
unfortunately, not the details of its root cause.

It seems like everything starts with some kind of a race between terminating
processes in a jail and termination of the jail itself.  This is where the
details are very thin so far.  What we see is that a process (http) is in
exit(2) syscall, in exit1() function actually, and past the place where P_WEXIT
flag is set and even past the place where p_limit is freed and reset to NULL.
At that place the thread calls prison_proc_free(), which calls prison_deref().
Then, we see that in prison_deref() the thread gets a page fault because of what
seems like a NULL pointer dereference.  That's just the start of the problem and
its root cause.

Then, trap_pfault() gets invoked and, because addresses close to NULL look like
userspace addresses, vm_fault/vm_fault_hold gets called, which in its turn goes
on to call vm_map_growstack.  First thing that vm_map_growstack does is a call
to lim_cur(), but because p_limit is already NULL, that call results in a NULL
pointer dereference and a page fault.  Goto the beginning of this paragraph.

So we get this recursion of sorts, which only ends when a stack is exhausted and
a CPU generates a double-fault.

So, of course, Steven is interested in finding and fixing the root cause.  I
hope we will get to that with some help from the "prison guards" :-)

But I also would like to use this opportunity to discuss how we can make it
easier to debug such issue as this.  I think that this problem demonstrates that
when we treat certain junk in kernel address value as a userland address value,
we throw additional heaps of irrelevant stuff on top of an actual problem. One
solution could be to use a special flag that would mark all actual attempts to
access userland address (e.g. setting the flag on entrance to copyin and
clearing it upon return), so that in the page fault handler we could distinguish
actual faults on userland addresses from faults on garbage kernel addresses.  I
am sure that there could be other clever techniques to catch such garbage
addresses early.

-- 
Andriy Gapon

From owner-freebsd-stable@FreeBSD.ORG  Wed Aug 17 21:04:49 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3B180106566C
	for <freebsd-stable@freebsd.org>; Wed, 17 Aug 2011 21:04:49 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta07.emeryville.ca.mail.comcast.net
	(qmta07.emeryville.ca.mail.comcast.net [76.96.30.64])
	by mx1.freebsd.org (Postfix) with ESMTP id 229B48FC0C
	for <freebsd-stable@freebsd.org>; Wed, 17 Aug 2011 21:04:48 +0000 (UTC)
Received: from omta18.emeryville.ca.mail.comcast.net ([76.96.30.74])
	by qmta07.emeryville.ca.mail.comcast.net with comcast
	id MZ471h00C1bwxycA7Z4kZc; Wed, 17 Aug 2011 21:04:44 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta18.emeryville.ca.mail.comcast.net with comcast
	id MZ4M1h00u1t3BNj8eZ4NGT; Wed, 17 Aug 2011 21:04:22 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 04CF4102C1A; Wed, 17 Aug 2011 14:04:47 -0700 (PDT)
Date: Wed, 17 Aug 2011 14:04:47 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: freebsd-stable@FreeBSD.org
Message-ID: <20110817210446.GA49737@icarus.home.lan>
References: <20110707082027.GX48734@deviant.kiev.zoral.com.ua>
	<4E159959.2070401@sentex.net> <4E15A08C.6090407@sentex.net>
	<20110818.023832.373949045518579359.hrs@allbsd.org>
	<20110817175201.GB1973@libertas.local.camdensoftware.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20110817175201.GB1973@libertas.local.camdensoftware.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: 
Subject: Re: panic: spin lock held too long (RELENG_8 from today)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2011 21:04:49 -0000

On Wed, Aug 17, 2011 at 10:52:01AM -0700, Chip Camden wrote:
> Quoth Hiroki Sato on Thursday, 18 August 2011:
> > Hi,
> > 
> > Mike Tancsa <mike@sentex.net> wrote
> >   in <4E15A08C.6090407@sentex.net>:
> > 
> > mi> On 7/7/2011 7:32 AM, Mike Tancsa wrote:
> > mi> > On 7/7/2011 4:20 AM, Kostik Belousov wrote:
> > mi> >>
> > mi> >> BTW, we had a similar panic, "spinlock held too long", the spinlock
> > mi> >> is the sched lock N, on busy 8-core box recently upgraded to the
> > mi> >> stable/8. Unfortunately, machine hung dumping core, so the stack trace
> > mi> >> for the owner thread was not available.
> > mi> >>
> > mi> >> I was unable to make any conclusion from the data that was present.
> > mi> >> If the situation is reproducable, you coulld try to revert r221937. This
> > mi> >> is pure speculation, though.
> > mi> >
> > mi> > Another crash just now after 5hrs uptime. I will try and revert r221937
> > mi> > unless there is any extra debugging you want me to add to the kernel
> > mi> > instead  ?
> > 
> >  I am also suffering from a reproducible panic on an 8-STABLE box, an
> >  NFS server with heavy I/O load.  I could not get a kernel dump
> >  because this panic locked up the machine just after it occurred, but
> >  according to the stack trace it was the same as posted one.
> >  Switching to an 8.2R kernel can prevent this panic.
> > 
> >  Any progress on the investigation?
> > 
> > --
> > spin lock 0xffffffff80cb46c0 (sched lock 0) held by 0xffffff01900458c0 (tid 100489) too long
> > panic: spin lock held too long
> > cpuid = 1
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> > kdb_backtrace() at kdb_backtrace+0x37
> > panic() at panic+0x187
> > _mtx_lock_spin_failed() at _mtx_lock_spin_failed+0x39
> > _mtx_lock_spin() at _mtx_lock_spin+0x9e
> > sched_add() at sched_add+0x117
> > setrunnable() at setrunnable+0x78
> > sleepq_signal() at sleepq_signal+0x7a
> > cv_signal() at cv_signal+0x3b
> > xprt_active() at xprt_active+0xe3
> > svc_vc_soupcall() at svc_vc_soupcall+0xc
> > sowakeup() at sowakeup+0x69
> > tcp_do_segment() at tcp_do_segment+0x25e7
> > tcp_input() at tcp_input+0xcdd
> > ip_input() at ip_input+0xac
> > netisr_dispatch_src() at netisr_dispatch_src+0x7e
> > ether_demux() at ether_demux+0x14d
> > ether_input() at ether_input+0x17d
> > em_rxeof() at em_rxeof+0x1ca
> > em_handle_que() at em_handle_que+0x5b
> > taskqueue_run_locked() at taskqueue_run_locked+0x85
> > taskqueue_thread_loop() at taskqueue_thread_loop+0x4e
> > fork_exit() at fork_exit+0x11f
> > fork_trampoline() at fork_trampoline+0xe
> > --
> > 
> > -- Hiroki
> 
> 
> I'm also getting similar panics on 8.2-STABLE.  Locks up everything and I
> have to power off.  Once, I happened to be looking at the console when it
> happened and copied dow the following:
> 
> Sleeping thread (tif 100037, pid 0) owns a non-sleepable lock
> panic: sleeping thread
> cpuid=1

No idea, might be relevant to the thread.

> Another time I got:
> 
> lock order reversal:
> 1st 0xffffff000593e330 snaplk (snaplk) @ /usr/src/sys/kern/vfr_vnops.c:296
> 2nd 0xffffff0005e5d578 ufs (ufs) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:1587
> 
> I didn't copy down the traceback.

"snaplk" refers to UFS snapshots.  The above must have been typed in
manually as well, due to some typos in filenames as well.

Either this is a different problem, or if everyone in this thread is
doing UFS snapshots (dump -L, mksnap_ffs, etc.) and having this problem
happen then I recommend people stop using UFS snapshots.  I've ranted
about their unreliability in the past (years upon years ago -- still
seems valid) and just how badly they can "wedge" a system.  This is one
of the many (MANY!) reasons why we use rsnapshot/rsync instead.  The
atime clobbering issue is the only downside.

I don't see what this has to do with "heavy WAN I/O" unless you're doing
something like dump-over-ssh, in which case see the above paragraph.

> These panics seem to hit when I'm doing heavy WAN I/O.  I can go for
> about a day without one as long as I stay away from the web or even chat.
> Last night this system copied a backup of 35GB over the local network
> without failing, but as soon as I hopped onto Firefox this morning, down
> she went.  I don't know if that's coincidence or useful data.
> 
> I didn't get to say "Thanks" to Eitan Adler for attempting to help me
> with this on Monday night.  Thanks, Eitan!

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-stable@FreeBSD.ORG  Wed Aug 17 21:10:54 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A8105106566C;
	Wed, 17 Aug 2011 21:10:54 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id 454A08FC16;
	Wed, 17 Aug 2011 21:10:53 +0000 (UTC)
Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua
	[10.1.1.148])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p7HLAnic075382
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Thu, 18 Aug 2011 00:10:49 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id
	p7HLAm2E007269; Thu, 18 Aug 2011 00:10:48 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p7HLAmDw007268; 
	Thu, 18 Aug 2011 00:10:48 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Thu, 18 Aug 2011 00:10:48 +0300
From: Kostik Belousov <kostikbel@gmail.com>
To: Andriy Gapon <avg@freebsd.org>
Message-ID: <20110817211048.GZ17489@deviant.kiev.zoral.com.ua>
References: <796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk>
	<4E491D01.1090902@FreeBSD.org>
	<570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk>
	<4E4AD35C.7020504@FreeBSD.org>
	<6A7238AED44542A880B082A40304D940@multiplay.co.uk>
	<4E4BA21F.6010805@FreeBSD.org>
	<581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk>
	<4E4BBA7F.30907@FreeBSD.org>
	<88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk>
	<4E4C22D6.6070407@FreeBSD.org>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="MPHowW9WJBu+8Ajw"
Content-Disposition: inline
In-Reply-To: <4E4C22D6.6070407@FreeBSD.org>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-3.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,
	DNS_FROM_OPENWHOIS autolearn=no version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: freebsd-hackers@freebsd.org, freebsd-jail@freebsd.org,
	Steven Hartland <killing@multiplay.co.uk>, freebsd-stable@freebsd.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2011 21:10:54 -0000


--MPHowW9WJBu+8Ajw
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Aug 17, 2011 at 11:21:42PM +0300, Andriy Gapon wrote:
[skip]

> But I also would like to use this opportunity to discuss how we can
> make it easier to debug such issue as this. I think that this problem
> demonstrates that when we treat certain junk in kernel address value
> as a userland address value, we throw additional heaps of irrelevant
> stuff on top of an actual problem. One solution could be to use a
> special flag that would mark all actual attempts to access userland
> address (e.g. setting the flag on entrance to copyin and clearing it
> upon return), so that in the page fault handler we could distinguish
> actual faults on userland addresses from faults on garbage kernel
> addresses. I am sure that there could be other clever techniques to
> catch such garbage addresses early.

We already have such mechanism, the kernel code aware of the usermode
page access sets pcb_onfault. See the end of trap_pfault() handler.
In fact, we can catch it earlier, before even calling vm_fault().

BTW, I think this is esp. useful in the combination with the support
for the SMEP in recent Intel CPUs.

commit 2e1b36fa93f9499e37acf04a66ff0646d4f13536
Author: Konstantin Belousov <kostik@pooma.home>
Date:   Thu Aug 18 00:08:50 2011 +0300

    Assert that the exiting process does not return to usermode.
    On x86, do not call vm_fault() when the kernel is not prepared
    to handle unsuccessful page fault.

diff --git a/sys/amd64/amd64/trap.c b/sys/amd64/amd64/trap.c
index 4e5f8b8..55e1e5a 100644
--- a/sys/amd64/amd64/trap.c
+++ b/sys/amd64/amd64/trap.c
@@ -674,6 +674,19 @@ trap_pfault(frame, usermode)
 			goto nogo;
=20
 		map =3D &vm->vm_map;
+
+		/*
+		 * When accessing a usermode address, kernel must be
+		 * ready to accept the page fault, and provide a
+		 * handling routine.  Since accessing the address
+		 * without the handler is a bug, do not try to handle
+		 * it normally, and panic immediately.
+		 */
+		if (!usermode && (td->td_intr_nesting_level !=3D 0 ||
+		    PCPU_GET(curpcb)->pcb_onfault =3D=3D NULL)) {
+			trap_fatal(frame, eva);
+			return (-1);
+		}
 	}
=20
 	/*
diff --git a/sys/i386/i386/trap.c b/sys/i386/i386/trap.c
index 5a8016c..e6d2b5a 100644
--- a/sys/i386/i386/trap.c
+++ b/sys/i386/i386/trap.c
@@ -831,6 +831,11 @@ trap_pfault(frame, usermode, eva)
 			goto nogo;
=20
 		map =3D &vm->vm_map;
+		if (!usermode && (td->td_intr_nesting_level !=3D 0 ||
+		    PCPU_GET(curpcb)->pcb_onfault =3D=3D NULL)) {
+			trap_fatal(frame, eva);
+			return (-1);
+		}
 	}
=20
 	/*
diff --git a/sys/kern/subr_trap.c b/sys/kern/subr_trap.c
index 3527ed1..a69b7b8 100644
--- a/sys/kern/subr_trap.c
+++ b/sys/kern/subr_trap.c
@@ -99,6 +99,8 @@ userret(struct thread *td, struct trapframe *frame)
=20
 	CTR3(KTR_SYSC, "userret: thread %p (pid %d, %s)", td, p->p_pid,
             td->td_name);
+	KASSERT((p->p_flag & P_WEXIT) =3D=3D 0,
+	    ("Exiting process returns to usermode"));
 #if 0
 #ifdef DIAGNOSTIC
 	/* Check that we called signotify() enough. */

--MPHowW9WJBu+8Ajw
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iEYEARECAAYFAk5MLlgACgkQC3+MBN1Mb4hyewCgpKYy+yhG+S3bXm5A324n/C8+
6lIAoPRTszmVWdyBQqw5vhJUnpNbhluY
=i6E1
-----END PGP SIGNATURE-----

--MPHowW9WJBu+8Ajw--

From owner-freebsd-stable@FreeBSD.ORG  Wed Aug 17 21:46:37 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2EBCB106566B
	for <freebsd-stable@freebsd.org>; Wed, 17 Aug 2011 21:46:37 +0000 (UTC)
	(envelope-from artemb@gmail.com)
Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com
	[74.125.82.182])
	by mx1.freebsd.org (Postfix) with ESMTP id BB7528FC19
	for <freebsd-stable@freebsd.org>; Wed, 17 Aug 2011 21:46:36 +0000 (UTC)
Received: by wyh15 with SMTP id 15so1247555wyh.13
	for <freebsd-stable@freebsd.org>; Wed, 17 Aug 2011 14:46:35 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type;
	bh=SxOl6JX6C4cq52TyTWQht8ogs0dbX150pNhutlfhSRI=;
	b=eJuWHmH06uzORVpuR+HXzyvLGFNUkLS5M9faoZ+SMhE6h699WVL0aZD+u/AztH1QLO
	2Bx132Xr5ke8wXqhwCfkGSSI32ZCUqlho+VUxpNKN1fp6Xruka3pr135VWJfb0UoX2Wt
	mKzQfln3O0YozCLeLons0SLs3uB0c4U3wTR9A=
MIME-Version: 1.0
Received: by 10.216.90.19 with SMTP id d19mr4769374wef.35.1313617595556; Wed,
	17 Aug 2011 14:46:35 -0700 (PDT)
Sender: artemb@gmail.com
Received: by 10.216.181.210 with HTTP; Wed, 17 Aug 2011 14:46:35 -0700 (PDT)
In-Reply-To: <4E4C1945.5030504@quip.cz>
References: <4E4BC38D.1050808@quip.cz> <4E4BCCC3.60601@digsys.bg>
	<CAFqOu6hQzzwrTpuyddqrODr8WP4Ke0pi7MoYhYL9ivfsNHxNhA@mail.gmail.com>
	<4E4C1945.5030504@quip.cz>
Date: Wed, 17 Aug 2011 14:46:35 -0700
X-Google-Sender-Auth: YKhN0SAOFEDL08GS1o1N-lzjkUk
Message-ID: <CAFqOu6gVAjhbGd-92kiLGdztVTAbAxe1MwEjVECthnKVV=QEMg@mail.gmail.com>
From: Artem Belevich <art@freebsd.org>
To: Miroslav Lachman <000.fbsd@quip.cz>
Content-Type: text/plain; charset=ISO-8859-1
Cc: freebsd-stable@freebsd.org, Daniel Kalchev <daniel@digsys.bg>
Subject: Re: can not boot from RAIDZ with 8-STABLE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2011 21:46:37 -0000

On Wed, Aug 17, 2011 at 12:40 PM, Miroslav Lachman <000.fbsd@quip.cz> wrote:
> Thank you guys, you are right. The BIOS provides only 1 disk to the loader!
> I checked it from loader prompt by lsdev (booted from USB external HDD).
>
> So I will try to make a small zpool mirror for root and boot (if ZFS mirror
> can be made of 4 providers instead of two) and the rest will be in RAIDZ.
>
> If that fails, I will go my old way with internal USB flash disk with UFS
> for booting and RAIDZ of 4 disks for storage as I did it few years ago with
> 7.0 or 7.1.

You seem to be booting from disks attached to some sort of add-on
card. Sometimes those have per-disk 'bootable' option in their own
extension ROM. You may investigate yours. Perhaps all you need to do
is just tweak controller settings.

--Artem

From owner-freebsd-stable@FreeBSD.ORG  Wed Aug 17 23:15:54 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 016FF106564A;
	Wed, 17 Aug 2011 23:15:54 +0000 (UTC)
	(envelope-from prvs=1210f20b9f=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	by mx1.freebsd.org (Postfix) with ESMTP id E01B98FC1B;
	Wed, 17 Aug 2011 23:15:52 +0000 (UTC)
X-MDAV-Processed: mail1.multiplay.co.uk, Thu, 18 Aug 2011 00:15:17 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Thu, 18 Aug 2011 00:15:17 +0100
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	mail1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST
	shortcircuit=ham autolearn=disabled version=3.2.5
Received: from r2d2 ([188.220.16.49])
	by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	(MDaemon PRO v10.0.4) with ESMTP id md50014640704.msg;
	Thu, 18 Aug 2011 00:15:16 +0100
X-MDRemoteIP: 188.220.16.49
X-Return-Path: prvs=1210f20b9f=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
Message-ID: <4019027648B5493AAC4B654BD821DE88@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Andriy Gapon" <avg@FreeBSD.org>, <freebsd-jail@FreeBSD.org>,
	<freebsd-hackers@FreeBSD.org>
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><A71C3ACF01EC4D36871E49805C1A5321@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk><4E43E272.1060204@FreeBSD.org><62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk><4E440865.1040500@FreeBSD.org><6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk><4E441314.6060606@FreeBSD.org><2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk><4E48D967.9060804@FreeBSD.org><9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk><4E490DAF.1080009@FreeBSD.org><796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk><4E491D01.1090902@FreeBSD.org><570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk><4E4AD35C.7020504@FreeBSD.org><6A7238AED44542A880B082A40304D940@multiplay.co.uk><4E4BA21F.6010805@FreeBSD.org><581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk><4E4BBA7F.30907@FreeBSD.org><88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk>
	<4E4C22D6.6070407@FreeBSD.org>
Date: Thu, 18 Aug 2011 00:15:56 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
Cc: freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2011 23:15:54 -0000

----- Original Message ----- 
From: "Andriy Gapon" <avg@FreeBSD.org>

> Thanks to the debug that Steven provided and to the help that I received from
> Kostik, I think that now I understand the basic mechanics of this panic, but,
> unfortunately, not the details of its root cause.
> 
> It seems like everything starts with some kind of a race between terminating
> processes in a jail and termination of the jail itself.  This is where the
> details are very thin so far.  What we see is that a process (http) is in
> exit(2) syscall, in exit1() function actually, and past the place where P_WEXIT
> flag is set and even past the place where p_limit is freed and reset to NULL.
> At that place the thread calls prison_proc_free(), which calls prison_deref().
> Then, we see that in prison_deref() the thread gets a page fault because of what
> seems like a NULL pointer dereference.  That's just the start of the problem and
> its root cause.

Thats interesting, are you using http as an example or is that something thats
been gleaned from the debugging of our output? I ask as there's only one process
running in each of our jails and thats a single java process.

Now given your description there may be something I can add that may help
clarify what the cause could be.

In a nutshell the jail manager we're using will attempt to resurrect the jail
from a dieing state in a few specific scenarios.

Here's an exmaple:-
1. jail restart requested
2. jail is stopped, so the java processes is killed off, but active tcp sessions
may prevent the timely full shutdown of the jail.
3. if an existing jail is detected, i.e. a dieing jail from #2, instead of
starting a new jail we attach to the old one and exec the new java process.
4. if an existing jail isnt detected, i.e. where there where not hanging tcp
sessions and #2 cleanly shutdown the jail, a new jail is created, attached to
and the java exec'ed.

The system uses static jailid's so its possible to determine if an existing
jail for this "service" exists or not. This prevents duplicate services as
well as making services easy to identify by their jailid.

So what we could be seeing is a race between the jail shutdown and the attach
of the new process?

Now man 2 jail seems to indicate this is a valid use case for jail_set, as
it documents its support for JAIL_DYING as a valid option for flags, but I
suspect its something quite out of the ordinary to actually do, which may be
why this panic hasnt been seen before now.

As some background the reason we use static jailid's is to ensure only one
instance of the jailed service is running, and the reason we re-attach to
the dieing jail is so that jails can be restarted in a timely manor. Without
using the re-attach we would need to wait of all tcp sessions which have
been aborted to timeout.

> So, of course, Steven is interested in finding and fixing the root cause.  I
> hope we will get to that with some help from the "prison guards" :-)

Does the above potentially explain how we're getting to the situation
which generates the panic?

If so we can certainly look at using alternatives to the current design to
workaround this issue. Flagging the jail as permanent and using manual process
management and additional external locking to prevent duplicates, is what
instantly springs to mind.

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-stable@FreeBSD.ORG  Thu Aug 18 00:01:11 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A6C13106566B
	for <freebsd-stable@FreeBSD.org>; Thu, 18 Aug 2011 00:01:11 +0000 (UTC)
	(envelope-from sterling@camdensoftware.com)
Received: from wh1.interactivevillages.com (ca.2e.7bae.static.theplanet.com
	[174.123.46.202])
	by mx1.freebsd.org (Postfix) with ESMTP id 6D7EB8FC1C
	for <freebsd-stable@FreeBSD.org>; Thu, 18 Aug 2011 00:01:11 +0000 (UTC)
Received: from 184-78-197-203.war.clearwire-wmx.net ([184.78.197.203]
	helo=_HOSTNAME_)
	by wh1.interactivevillages.com with esmtpsa (TLSv1:AES256-SHA:256)
	(Exim 4.69) (envelope-from <sterling@camdensoftware.com>)
	id 1Qtq2l-0007YQ-C2
	for freebsd-stable@FreeBSD.org; Wed, 17 Aug 2011 17:00:44 -0700
Received: by _HOSTNAME_ (sSMTP sendmail emulation);
	Wed, 17 Aug 2011 17:01:05 -0700
Date: Wed, 17 Aug 2011 17:01:05 -0700
From: Chip Camden <sterling@camdensoftware.com>
To: freebsd-stable@FreeBSD.org
Message-ID: <20110818000105.GC2489@libertas.local.camdensoftware.com>
Mail-Followup-To: freebsd-stable@FreeBSD.org
References: <20110707082027.GX48734@deviant.kiev.zoral.com.ua>
	<4E159959.2070401@sentex.net> <4E15A08C.6090407@sentex.net>
	<20110818.023832.373949045518579359.hrs@allbsd.org>
	<20110817175201.GB1973@libertas.local.camdensoftware.com>
	<20110817210446.GA49737@icarus.home.lan>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="VywGB/WGlW4DM4P8"
Content-Disposition: inline
In-Reply-To: <20110817210446.GA49737@icarus.home.lan>
User-Agent: Mutt/1.4.2.3i
Company: Camden Software Consulting
URL: http://camdensoftware.com
X-PGP-Key: http://pgp.mit.edu:11371/pks/lookup?search=0xD6DBAF91
X-AntiAbuse: This header was added to track abuse,
	please include it with any abuse report
X-AntiAbuse: Primary Hostname - wh1.interactivevillages.com
X-AntiAbuse: Original Domain - freebsd.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - camdensoftware.com
X-Source: 
X-Source-Args: 
X-Source-Dir: 
Cc: 
Subject: Re: panic: spin lock held too long (RELENG_8 from today)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2011 00:01:11 -0000


--VywGB/WGlW4DM4P8
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Quoth Jeremy Chadwick on Wednesday, 17 August 2011:
> >=20
> > I'm also getting similar panics on 8.2-STABLE.  Locks up everything and=
 I
> > have to power off.  Once, I happened to be looking at the console when =
it
> > happened and copied dow the following:
> >=20
> > Sleeping thread (tif 100037, pid 0) owns a non-sleepable lock
> > panic: sleeping thread
> > cpuid=3D1
>=20
> No idea, might be relevant to the thread.
>=20
> > Another time I got:
> >=20
> > lock order reversal:
> > 1st 0xffffff000593e330 snaplk (snaplk) @ /usr/src/sys/kern/vfr_vnops.c:=
296
> > 2nd 0xffffff0005e5d578 ufs (ufs) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:=
1587
> >=20
> > I didn't copy down the traceback.
>=20
> "snaplk" refers to UFS snapshots.  The above must have been typed in
> manually as well, due to some typos in filenames as well.
>=20
> Either this is a different problem, or if everyone in this thread is
> doing UFS snapshots (dump -L, mksnap_ffs, etc.) and having this problem
> happen then I recommend people stop using UFS snapshots.  I've ranted
> about their unreliability in the past (years upon years ago -- still
> seems valid) and just how badly they can "wedge" a system.  This is one
> of the many (MANY!) reasons why we use rsnapshot/rsync instead.  The
> atime clobbering issue is the only downside.
>=20

If I'm doing UFS snapshots, I didn't know it.  Yes, everything was copied
manually because it only displays on the console and the keyboard does
not respond after that point.  So I copied first to paper, then had to
decode my lousy handwriting to put it in an email.  Sorry for the scribal
errors.

--=20
=2EO. | Sterling (Chip) Camden      | http://camdensoftware.com
=2E.O | sterling@camdensoftware.com | http://chipsquips.com
OOO | 2048R/D6DBAF91              | http://chipstips.com

--VywGB/WGlW4DM4P8
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iQEcBAEBAgAGBQJOTFZBAAoJEIpckszW26+RRtsH/jPEUungeBO9a3idYOTECrqg
BsEo0zyyyz76sd3bkyVVx5QNRlfAygoxhReUsD1r6GC9QhapR0m91qUD1bYNK3yv
wCxKp3bCOCbh4HOG5efwDFBKisfKLRKjyQp2SQ7d2R+RHO6fsk9VHvrPS6LQ3skH
AJ0fvCUd+0GCpvKsLHzV+MqrGJpiMdz2dwPpo+Jwv+EzGZ8H2gJwrzZD4OUAkGC4
gXBqT+YTiJLNQIOr0dteYO037yymUxYRqB9q8lbNcl6RKp3s1NHQWUU3IhDJjeSL
5qTCr9j9wSOomxBCskWXsy6XzEdmc3dzMPBS95D5zbZWDYxl5JXFAE8hLKanWkw=
=cyZM
-----END PGP SIGNATURE-----

--VywGB/WGlW4DM4P8--

From owner-freebsd-stable@FreeBSD.ORG  Thu Aug 18 00:16:41 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7D2A9106564A;
	Thu, 18 Aug 2011 00:16:41 +0000 (UTC) (envelope-from hrs@FreeBSD.org)
Received: from mail.allbsd.org (gatekeeper-int.allbsd.org
	[IPv6:2001:2f0:104:e002::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 716368FC08;
	Thu, 18 Aug 2011 00:16:40 +0000 (UTC)
Received: from alph.allbsd.org (p3028-ipbf608funabasi.chiba.ocn.ne.jp
	[125.175.94.28]) (authenticated bits=128)
	by mail.allbsd.org (8.14.4/8.14.4) with ESMTP id p7I0GI0T059114;
	Thu, 18 Aug 2011 09:16:28 +0900 (JST) (envelope-from hrs@FreeBSD.org)
Received: from localhost (localhost [IPv6:::1]) (authenticated bits=0)
	by alph.allbsd.org (8.14.4/8.14.4) with ESMTP id p7I0GDqV044396;
	Thu, 18 Aug 2011 09:16:15 +0900 (JST) (envelope-from hrs@FreeBSD.org)
Date: Thu, 18 Aug 2011 09:16:00 +0900 (JST)
Message-Id: <20110818.091600.831954331552558249.hrs@allbsd.org>
To: attilio@FreeBSD.org
From: Hiroki Sato <hrs@FreeBSD.org>
In-Reply-To: <20110818.043332.27079545013461535.hrs@allbsd.org>
References: <20110818.023832.373949045518579359.hrs@allbsd.org>
	<CAJ-FndCDOW0_B2MV0LZEo-tpEa9+7oAnJ7iHvKQsM4j4B0DLqg@mail.gmail.com>
	<20110818.043332.27079545013461535.hrs@allbsd.org>
X-PGPkey-fingerprint: BDB3 443F A5DD B3D0 A530  FFD7 4F2C D3D8 2793 CF2D
X-Mailer: Mew version 6.3 on Emacs 23.1 / Mule 6.0 (HANACHIRUSATO)
Mime-Version: 1.0
Content-Type: Text/Plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
X-Virus-Scanned: clamav-milter 0.97 at gatekeeper.allbsd.org
X-Virus-Status: Clean
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.3
	(mail.allbsd.org [133.31.130.32]);
	Thu, 18 Aug 2011 09:16:33 +0900 (JST)
X-Spam-Status: No, score=-102.2 required=13.0 tests=BAYES_00,
	CONTENT_TYPE_PRESENT,DIRECTOCNDYN,MIMEQENC,QENCPTR2,RCVD_IN_RP_RNBL,
	SPF_SOFTFAIL,USER_IN_WHITELIST autolearn=no version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	gatekeeper.allbsd.org
Cc: kostikbel@gmail.com, freebsd-stable@FreeBSD.org, avg@FreeBSD.org
Subject: Re: panic: spin lock held too long (RELENG_8 from today)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2011 00:16:41 -0000

Hiroki Sato <hrs@freebsd.org> wrote
  in <20110818.043332.27079545013461535.hrs@allbsd.org>:

hr> Attilio Rao <attilio@freebsd.org> wrote
hr>   in <CAJ-FndCDOW0_B2MV0LZEo-tpEa9+7oAnJ7iHvKQsM4j4B0DLqg@mail.gmai=
l.com>:
hr> =

hr> at> 2011/8/17 Hiroki Sato <hrs@freebsd.org>:
hr> at> > Hi,
hr> at> >
hr> at> > Mike Tancsa <mike@sentex.net> wrote
hr> at> > =A0in <4E15A08C.6090407@sentex.net>:
hr> at> >
hr> at> > mi> On 7/7/2011 7:32 AM, Mike Tancsa wrote:
hr> at> > mi> > On 7/7/2011 4:20 AM, Kostik Belousov wrote:
hr> at> > mi> >>
hr> at> > mi> >> BTW, we had a similar panic, "spinlock held too long",=
 the spinlock
hr> at> > mi> >> is the sched lock N, on busy 8-core box recently upgra=
ded to the
hr> at> > mi> >> stable/8. Unfortunately, machine hung dumping core, so=
 the stack trace
hr> at> > mi> >> for the owner thread was not available.
hr> at> > mi> >>
hr> at> > mi> >> I was unable to make any conclusion from the data that=
 was present.
hr> at> > mi> >> If the situation is reproducable, you coulld try to re=
vert r221937. This
hr> at> > mi> >> is pure speculation, though.
hr> at> > mi> >
hr> at> > mi> > Another crash just now after 5hrs uptime. I will try an=
d revert r221937
hr> at> > mi> > unless there is any extra debugging you want me to add =
to the kernel
hr> at> > mi> > instead =A0?
hr> at> >
hr> at> > =A0I am also suffering from a reproducible panic on an 8-STAB=
LE box, an
hr> at> > =A0NFS server with heavy I/O load. =A0I could not get a kerne=
l dump
hr> at> > =A0because this panic locked up the machine just after it occ=
urred, but
hr> at> > =A0according to the stack trace it was the same as posted one=
.=

hr> at> > =A0Switching to an 8.2R kernel can prevent this panic.
hr> at> >
hr> at> > =A0Any progress on the investigation?
hr> at> =

hr> at> Hiroki,
hr> at> how easilly can you reproduce it?
hr> =

hr>  It takes 5-10 hours.  I installed another kernel for debugging jus=
t
hr>  now, so I think I will be able to collect more detail information =
in
hr>  a couple of days.
hr> =

hr> at> It would be important to have a DDB textdump with these informa=
tions:
hr> at> - bt
hr> at> - ps
hr> at> - show allpcpu
hr> at> - alltrace
hr> at> =

hr> at> Alternatively, a coredump which has the stop cpu patch which An=
dryi can provide.
hr> =

hr>  Okay, I will post them once I can get another panic.  Thanks!

 I got the panic with a crash dump this time.  The result of bt, ps,
 allpcpu, and traces can be found at the following URL:

  http://people.allbsd.org/~hrs/FreeBSD/pool-panic_20110818-1.txt

-- Hiroki

From owner-freebsd-stable@FreeBSD.ORG  Thu Aug 18 00:35:38 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C73D9106566B;
	Thu, 18 Aug 2011 00:35:38 +0000 (UTC)
	(envelope-from asmrookie@gmail.com)
Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com
	[209.85.213.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 5D32F8FC0C;
	Thu, 18 Aug 2011 00:35:38 +0000 (UTC)
Received: by yxn22 with SMTP id 22so254950yxn.13
	for <multiple recipients>; Wed, 17 Aug 2011 17:35:37 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=nwJCiqzwXhEcptHUzCuSNX6Tqtwr9zjUxyoSseNFsi8=;
	b=qQrWvcZ9eyLkxMPJGE9od7N9AFoXHEpM+5hs+lMTqK9Cv5sighZh+dw5fYXAD3QIsl
	WDV1H/80GNMSiWqjkLLrzsgPnV6VlCsYE92Ahuxnx4jUa2M0DGs5lG3Qp08WeBYghNcY
	d983McfPYNhLo8AeJ5jQUU+JdU2lwEaVrzl/A=
MIME-Version: 1.0
Received: by 10.236.170.9 with SMTP id o9mr11588yhl.43.1313627737497; Wed, 17
	Aug 2011 17:35:37 -0700 (PDT)
Sender: asmrookie@gmail.com
Received: by 10.236.108.33 with HTTP; Wed, 17 Aug 2011 17:35:37 -0700 (PDT)
In-Reply-To: <20110818.091600.831954331552558249.hrs@allbsd.org>
References: <20110818.023832.373949045518579359.hrs@allbsd.org>
	<CAJ-FndCDOW0_B2MV0LZEo-tpEa9+7oAnJ7iHvKQsM4j4B0DLqg@mail.gmail.com>
	<20110818.043332.27079545013461535.hrs@allbsd.org>
	<20110818.091600.831954331552558249.hrs@allbsd.org>
Date: Thu, 18 Aug 2011 02:35:37 +0200
X-Google-Sender-Auth: MCw4hh_Hde0OfacevQCtfvzP3CU
Message-ID: <CAJ-FndDtxKAKr_xZrCYDY0K=UAm=sUuUtHkZRZjBgrZYsus1pQ@mail.gmail.com>
From: Attilio Rao <attilio@freebsd.org>
To: Hiroki Sato <hrs@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: kostikbel@gmail.com, freebsd-stable@freebsd.org, avg@freebsd.org
Subject: Re: panic: spin lock held too long (RELENG_8 from today)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2011 00:35:39 -0000

2011/8/18 Hiroki Sato <hrs@freebsd.org>:
> Hiroki Sato <hrs@freebsd.org> wrote
> =C2=A0in <20110818.043332.27079545013461535.hrs@allbsd.org>:
>
> hr> Attilio Rao <attilio@freebsd.org> wrote
> hr> =C2=A0 in <CAJ-FndCDOW0_B2MV0LZEo-tpEa9+7oAnJ7iHvKQsM4j4B0DLqg@mail.g=
mail.com>:
> hr>
> hr> at> 2011/8/17 Hiroki Sato <hrs@freebsd.org>:
> hr> at> > Hi,
> hr> at> >
> hr> at> > Mike Tancsa <mike@sentex.net> wrote
> hr> at> > =C2=A0in <4E15A08C.6090407@sentex.net>:
> hr> at> >
> hr> at> > mi> On 7/7/2011 7:32 AM, Mike Tancsa wrote:
> hr> at> > mi> > On 7/7/2011 4:20 AM, Kostik Belousov wrote:
> hr> at> > mi> >>
> hr> at> > mi> >> BTW, we had a similar panic, "spinlock held too long", t=
he spinlock
> hr> at> > mi> >> is the sched lock N, on busy 8-core box recently upgrade=
d to the
> hr> at> > mi> >> stable/8. Unfortunately, machine hung dumping core, so t=
he stack trace
> hr> at> > mi> >> for the owner thread was not available.
> hr> at> > mi> >>
> hr> at> > mi> >> I was unable to make any conclusion from the data that w=
as present.
> hr> at> > mi> >> If the situation is reproducable, you coulld try to reve=
rt r221937. This
> hr> at> > mi> >> is pure speculation, though.
> hr> at> > mi> >
> hr> at> > mi> > Another crash just now after 5hrs uptime. I will try and =
revert r221937
> hr> at> > mi> > unless there is any extra debugging you want me to add to=
 the kernel
> hr> at> > mi> > instead =C2=A0?
> hr> at> >
> hr> at> > =C2=A0I am also suffering from a reproducible panic on an 8-STA=
BLE box, an
> hr> at> > =C2=A0NFS server with heavy I/O load. =C2=A0I could not get a k=
ernel dump
> hr> at> > =C2=A0because this panic locked up the machine just after it oc=
curred, but
> hr> at> > =C2=A0according to the stack trace it was the same as posted on=
e.
> hr> at> > =C2=A0Switching to an 8.2R kernel can prevent this panic.
> hr> at> >
> hr> at> > =C2=A0Any progress on the investigation?
> hr> at>
> hr> at> Hiroki,
> hr> at> how easilly can you reproduce it?
> hr>
> hr> =C2=A0It takes 5-10 hours. =C2=A0I installed another kernel for debug=
ging just
> hr> =C2=A0now, so I think I will be able to collect more detail informati=
on in
> hr> =C2=A0a couple of days.
> hr>
> hr> at> It would be important to have a DDB textdump with these informati=
ons:
> hr> at> - bt
> hr> at> - ps
> hr> at> - show allpcpu
> hr> at> - alltrace
> hr> at>
> hr> at> Alternatively, a coredump which has the stop cpu patch which Andr=
yi can provide.
> hr>
> hr> =C2=A0Okay, I will post them once I can get another panic. =C2=A0Than=
ks!
>
> =C2=A0I got the panic with a crash dump this time. =C2=A0The result of bt=
, ps,
> =C2=A0allpcpu, and traces can be found at the following URL:
>
> =C2=A0http://people.allbsd.org/~hrs/FreeBSD/pool-panic_20110818-1.txt

I'm not sure I understand it, is also a corefile available?
If yes, where I could get it? (with the relevant sources and kernel.debug).

Thanks,
Attilio


--=20
Peace can only be achieved by understanding - A. Einstein

From owner-freebsd-stable@FreeBSD.ORG  Thu Aug 18 01:04:35 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 656BA106564A;
	Thu, 18 Aug 2011 01:04:35 +0000 (UTC)
	(envelope-from asmrookie@gmail.com)
Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com
	[209.85.213.182])
	by mx1.freebsd.org (Postfix) with ESMTP id DFF838FC08;
	Thu, 18 Aug 2011 01:04:34 +0000 (UTC)
Received: by yxn22 with SMTP id 22so265810yxn.13
	for <multiple recipients>; Wed, 17 Aug 2011 18:04:34 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=n4cvQcucnODsJVGi9I5dVUPScuIcb3TwlppDmN49vl8=;
	b=epFRImjgWwckQCIxi9uEJIJvXJ44SaCcTcHEh8BBxmkJN6WEDFePl0iA/XdZul7toT
	HxeW81vbc4wkGqAXfe2HPix3sCrRMbh1aqsLRrRdaC2LFEbvTKIhY8e5Pn78xXeBalqU
	yegfacVxvQFlNq2MGb1j/XF7vvcHr3oXgSVjA=
MIME-Version: 1.0
Received: by 10.236.182.66 with SMTP id n42mr113076yhm.128.1313629474018; Wed,
	17 Aug 2011 18:04:34 -0700 (PDT)
Sender: asmrookie@gmail.com
Received: by 10.236.108.33 with HTTP; Wed, 17 Aug 2011 18:04:32 -0700 (PDT)
In-Reply-To: <20110818.091600.831954331552558249.hrs@allbsd.org>
References: <20110818.023832.373949045518579359.hrs@allbsd.org>
	<CAJ-FndCDOW0_B2MV0LZEo-tpEa9+7oAnJ7iHvKQsM4j4B0DLqg@mail.gmail.com>
	<20110818.043332.27079545013461535.hrs@allbsd.org>
	<20110818.091600.831954331552558249.hrs@allbsd.org>
Date: Thu, 18 Aug 2011 03:04:32 +0200
X-Google-Sender-Auth: k_UDqQniEWum2a7YNdHBkktTkYU
Message-ID: <CAJ-FndCL70m41dQ9FPmzUg0V8a9JacvLOnjmMQL=3PfN7NmPfQ@mail.gmail.com>
From: Attilio Rao <attilio@freebsd.org>
To: Hiroki Sato <hrs@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-stable@freebsd.org, sterling@camdensoftware.com, avg@freebsd.org,
	Nick Esborn <nick@desert.net>, kostikbel@gmail.com, mdtansca@freebsd.org
Subject: Re: panic: spin lock held too long (RELENG_8 from today)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2011 01:04:35 -0000

2011/8/18 Hiroki Sato <hrs@freebsd.org>:
> Hiroki Sato <hrs@freebsd.org> wrote
> =C2=A0in <20110818.043332.27079545013461535.hrs@allbsd.org>:
>
> hr> Attilio Rao <attilio@freebsd.org> wrote
> hr> =C2=A0 in <CAJ-FndCDOW0_B2MV0LZEo-tpEa9+7oAnJ7iHvKQsM4j4B0DLqg@mail.g=
mail.com>:
> hr>
> hr> at> 2011/8/17 Hiroki Sato <hrs@freebsd.org>:
> hr> at> > Hi,
> hr> at> >
> hr> at> > Mike Tancsa <mike@sentex.net> wrote
> hr> at> > =C2=A0in <4E15A08C.6090407@sentex.net>:
> hr> at> >
> hr> at> > mi> On 7/7/2011 7:32 AM, Mike Tancsa wrote:
> hr> at> > mi> > On 7/7/2011 4:20 AM, Kostik Belousov wrote:
> hr> at> > mi> >>
> hr> at> > mi> >> BTW, we had a similar panic, "spinlock held too long", t=
he spinlock
> hr> at> > mi> >> is the sched lock N, on busy 8-core box recently upgrade=
d to the
> hr> at> > mi> >> stable/8. Unfortunately, machine hung dumping core, so t=
he stack trace
> hr> at> > mi> >> for the owner thread was not available.
> hr> at> > mi> >>
> hr> at> > mi> >> I was unable to make any conclusion from the data that w=
as present.
> hr> at> > mi> >> If the situation is reproducable, you coulld try to reve=
rt r221937. This
> hr> at> > mi> >> is pure speculation, though.
> hr> at> > mi> >
> hr> at> > mi> > Another crash just now after 5hrs uptime. I will try and =
revert r221937
> hr> at> > mi> > unless there is any extra debugging you want me to add to=
 the kernel
> hr> at> > mi> > instead =C2=A0?
> hr> at> >
> hr> at> > =C2=A0I am also suffering from a reproducible panic on an 8-STA=
BLE box, an
> hr> at> > =C2=A0NFS server with heavy I/O load. =C2=A0I could not get a k=
ernel dump
> hr> at> > =C2=A0because this panic locked up the machine just after it oc=
curred, but
> hr> at> > =C2=A0according to the stack trace it was the same as posted on=
e.
> hr> at> > =C2=A0Switching to an 8.2R kernel can prevent this panic.
> hr> at> >
> hr> at> > =C2=A0Any progress on the investigation?
> hr> at>
> hr> at> Hiroki,
> hr> at> how easilly can you reproduce it?
> hr>
> hr> =C2=A0It takes 5-10 hours. =C2=A0I installed another kernel for debug=
ging just
> hr> =C2=A0now, so I think I will be able to collect more detail informati=
on in
> hr> =C2=A0a couple of days.
> hr>
> hr> at> It would be important to have a DDB textdump with these informati=
ons:
> hr> at> - bt
> hr> at> - ps
> hr> at> - show allpcpu
> hr> at> - alltrace
> hr> at>
> hr> at> Alternatively, a coredump which has the stop cpu patch which Andr=
yi can provide.
> hr>
> hr> =C2=A0Okay, I will post them once I can get another panic. =C2=A0Than=
ks!
>
> =C2=A0I got the panic with a crash dump this time. =C2=A0The result of bt=
, ps,
> =C2=A0allpcpu, and traces can be found at the following URL:
>
> =C2=A0http://people.allbsd.org/~hrs/FreeBSD/pool-panic_20110818-1.txt

Actually, I think I see the bug here.

In callout_cpu_switch() if a low priority thread is migrating the
callout and gets preempted after the outcoming cpu queue lock is left
(and scheduled much later) we get this problem.

In order to fix this bug it could be enough to use a critical section,
but I think this should be really interrupt safe, thus I'd wrap them
up with spinlock_enter()/spinlock_exit(). Fortunately
callout_cpu_switch() should be called rarely and also we already do
expensive locking operations in callout, thus we should not have
problem performance-wise.

Can the guys I also CC'ed here try the following patch, with all the
initial kernel options that were leading you to the deadlock? (thus
revert any debugging patch/option you added for the moment):
http://www.freebsd.org/~attilio/callout-fixup.diff

Please note that this patch is for STABLE_8, if you can confirm the
good result I'll commit to -CURRENT and then backmarge as soon as
possible.

Thanks,
Attilio


--=20
Peace can only be achieved by understanding - A. Einstein

From owner-freebsd-stable@FreeBSD.ORG  Thu Aug 18 01:29:52 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 075391065672
	for <freebsd-stable@freebsd.org>; Thu, 18 Aug 2011 01:29:52 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta07.westchester.pa.mail.comcast.net
	(qmta07.westchester.pa.mail.comcast.net [76.96.62.64])
	by mx1.freebsd.org (Postfix) with ESMTP id A67CA8FC0C
	for <freebsd-stable@freebsd.org>; Thu, 18 Aug 2011 01:29:51 +0000 (UTC)
Received: from omta24.westchester.pa.mail.comcast.net ([76.96.62.76])
	by qmta07.westchester.pa.mail.comcast.net with comcast
	id Mcft1h0061ei1Bg57dVre2; Thu, 18 Aug 2011 01:29:51 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta24.westchester.pa.mail.comcast.net with comcast
	id MdVo1h01U1t3BNj3kdVqvu; Thu, 18 Aug 2011 01:29:51 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 842E0102C1A; Wed, 17 Aug 2011 18:29:47 -0700 (PDT)
Date: Wed, 17 Aug 2011 18:29:47 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: freebsd-stable@FreeBSD.org
Message-ID: <20110818012947.GA53983@icarus.home.lan>
References: <20110707082027.GX48734@deviant.kiev.zoral.com.ua>
	<4E159959.2070401@sentex.net> <4E15A08C.6090407@sentex.net>
	<20110818.023832.373949045518579359.hrs@allbsd.org>
	<20110817175201.GB1973@libertas.local.camdensoftware.com>
	<20110817210446.GA49737@icarus.home.lan>
	<20110818000105.GC2489@libertas.local.camdensoftware.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20110818000105.GC2489@libertas.local.camdensoftware.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: 
Subject: Re: panic: spin lock held too long (RELENG_8 from today)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2011 01:29:52 -0000

On Wed, Aug 17, 2011 at 05:01:05PM -0700, Chip Camden wrote:
> Quoth Jeremy Chadwick on Wednesday, 17 August 2011:
> > > 
> > > I'm also getting similar panics on 8.2-STABLE.  Locks up everything and I
> > > have to power off.  Once, I happened to be looking at the console when it
> > > happened and copied dow the following:
> > > 
> > > Sleeping thread (tif 100037, pid 0) owns a non-sleepable lock
> > > panic: sleeping thread
> > > cpuid=1
> > 
> > No idea, might be relevant to the thread.
> > 
> > > Another time I got:
> > > 
> > > lock order reversal:
> > > 1st 0xffffff000593e330 snaplk (snaplk) @ /usr/src/sys/kern/vfr_vnops.c:296
> > > 2nd 0xffffff0005e5d578 ufs (ufs) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:1587
> > > 
> > > I didn't copy down the traceback.
> > 
> > "snaplk" refers to UFS snapshots.  The above must have been typed in
> > manually as well, due to some typos in filenames as well.
> > 
> > Either this is a different problem, or if everyone in this thread is
> > doing UFS snapshots (dump -L, mksnap_ffs, etc.) and having this problem
> > happen then I recommend people stop using UFS snapshots.  I've ranted
> > about their unreliability in the past (years upon years ago -- still
> > seems valid) and just how badly they can "wedge" a system.  This is one
> > of the many (MANY!) reasons why we use rsnapshot/rsync instead.  The
> > atime clobbering issue is the only downside.
> > 
> 
> If I'm doing UFS snapshots, I didn't know it.

The backtrace indicates that a UFS snapshot is being made -- which
causes the state to be set to string "snaplk", which is then honoured in
vfs_vnops.c.

You can see for yourself: grep -r snaplk /usr/src/sys.

So yes, I'm inclined to believe something on your system is doing UFS
snapshot generation.  Whether or not other people are doing it as well
is a different story.

> Yes, everything was copied manually because it only displays on the
> console and the keyboard does not respond after that point.  So I
> copied first to paper, then had to decode my lousy handwriting to put
> it in an email.  Sorry for the scribal errors.

That sounds more or less like what I saw with UFS snapshots: the system
would go catatonic in one way or another.  It wouldn't "hard lock" (as
in if you had powered it off, etc.), it would "live lock" (as in the
kernel was wedged or held up/spinning doing something).

I never saw a panic as a result of UFS snapshots, only what I described
here.

TL;DR -- Your system appears to be making UFS snapshots, and that
situation is possibly (likely?) unrelated to the sleeping thread issue
you see that causes a panic.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-stable@FreeBSD.ORG  Thu Aug 18 02:55:55 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 72DDB106564A
	for <freebsd-stable@freebsd.org>; Thu, 18 Aug 2011 02:55:55 +0000 (UTC)
	(envelope-from sterling@camdensoftware.com)
Received: from wh1.interactivevillages.com (ca.2e.7bae.static.theplanet.com
	[174.123.46.202])
	by mx1.freebsd.org (Postfix) with ESMTP id 386458FC14
	for <freebsd-stable@freebsd.org>; Thu, 18 Aug 2011 02:55:55 +0000 (UTC)
Received: from 184-78-197-203.war.clearwire-wmx.net ([184.78.197.203]
	helo=_HOSTNAME_)
	by wh1.interactivevillages.com with esmtpsa (TLSv1:AES256-SHA:256)
	(Exim 4.69) (envelope-from <sterling@camdensoftware.com>)
	id 1Qtslr-0001Mh-3w
	for freebsd-stable@freebsd.org; Wed, 17 Aug 2011 19:55:28 -0700
Received: by _HOSTNAME_ (sSMTP sendmail emulation);
	Wed, 17 Aug 2011 19:55:50 -0700
Date: Wed, 17 Aug 2011 19:55:50 -0700
From: Chip Camden <sterling@camdensoftware.com>
To: freebsd-stable@freebsd.org
Message-ID: <20110818025550.GA1971@libertas.local.camdensoftware.com>
Mail-Followup-To: freebsd-stable@freebsd.org
References: <20110818.023832.373949045518579359.hrs@allbsd.org>
	<CAJ-FndCDOW0_B2MV0LZEo-tpEa9+7oAnJ7iHvKQsM4j4B0DLqg@mail.gmail.com>
	<20110818.043332.27079545013461535.hrs@allbsd.org>
	<20110818.091600.831954331552558249.hrs@allbsd.org>
	<CAJ-FndCL70m41dQ9FPmzUg0V8a9JacvLOnjmMQL=3PfN7NmPfQ@mail.gmail.com>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="LQksG6bCIzRHxTLp"
Content-Disposition: inline
In-Reply-To: <CAJ-FndCL70m41dQ9FPmzUg0V8a9JacvLOnjmMQL=3PfN7NmPfQ@mail.gmail.com>
User-Agent: Mutt/1.4.2.3i
Company: Camden Software Consulting
URL: http://camdensoftware.com
X-PGP-Key: http://pgp.mit.edu:11371/pks/lookup?search=0xD6DBAF91
X-AntiAbuse: This header was added to track abuse,
	please include it with any abuse report
X-AntiAbuse: Primary Hostname - wh1.interactivevillages.com
X-AntiAbuse: Original Domain - freebsd.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - camdensoftware.com
X-Source: 
X-Source-Args: 
X-Source-Dir: 
Subject: Re: panic: spin lock held too long (RELENG_8 from today)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2011 02:55:55 -0000


--LQksG6bCIzRHxTLp
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Quoth Attilio Rao on Thursday, 18 August 2011:
> 2011/8/18 Hiroki Sato <hrs@freebsd.org>:
> > Hiroki Sato <hrs@freebsd.org> wrote
> > =A0in <20110818.043332.27079545013461535.hrs@allbsd.org>:
> >
> > hr> Attilio Rao <attilio@freebsd.org> wrote
> > hr> =A0 in <CAJ-FndCDOW0_B2MV0LZEo-tpEa9+7oAnJ7iHvKQsM4j4B0DLqg@mail.gm=
ail.com>:
> > hr>
> > hr> at> 2011/8/17 Hiroki Sato <hrs@freebsd.org>:
> > hr> at> > Hi,
> > hr> at> >
> > hr> at> > Mike Tancsa <mike@sentex.net> wrote
> > hr> at> > =A0in <4E15A08C.6090407@sentex.net>:
> > hr> at> >
> > hr> at> > mi> On 7/7/2011 7:32 AM, Mike Tancsa wrote:
> > hr> at> > mi> > On 7/7/2011 4:20 AM, Kostik Belousov wrote:
> > hr> at> > mi> >>
> > hr> at> > mi> >> BTW, we had a similar panic, "spinlock held too long",=
 the spinlock
> > hr> at> > mi> >> is the sched lock N, on busy 8-core box recently upgra=
ded to the
> > hr> at> > mi> >> stable/8. Unfortunately, machine hung dumping core, so=
 the stack trace
> > hr> at> > mi> >> for the owner thread was not available.
> > hr> at> > mi> >>
> > hr> at> > mi> >> I was unable to make any conclusion from the data that=
 was present.
> > hr> at> > mi> >> If the situation is reproducable, you coulld try to re=
vert r221937. This
> > hr> at> > mi> >> is pure speculation, though.
> > hr> at> > mi> >
> > hr> at> > mi> > Another crash just now after 5hrs uptime. I will try an=
d revert r221937
> > hr> at> > mi> > unless there is any extra debugging you want me to add =
to the kernel
> > hr> at> > mi> > instead =A0?
> > hr> at> >
> > hr> at> > =A0I am also suffering from a reproducible panic on an 8-STAB=
LE box, an
> > hr> at> > =A0NFS server with heavy I/O load. =A0I could not get a kerne=
l dump
> > hr> at> > =A0because this panic locked up the machine just after it occ=
urred, but
> > hr> at> > =A0according to the stack trace it was the same as posted one.
> > hr> at> > =A0Switching to an 8.2R kernel can prevent this panic.
> > hr> at> >
> > hr> at> > =A0Any progress on the investigation?
> > hr> at>
> > hr> at> Hiroki,
> > hr> at> how easilly can you reproduce it?
> > hr>
> > hr> =A0It takes 5-10 hours. =A0I installed another kernel for debugging=
 just
> > hr> =A0now, so I think I will be able to collect more detail informatio=
n in
> > hr> =A0a couple of days.
> > hr>
> > hr> at> It would be important to have a DDB textdump with these informa=
tions:
> > hr> at> - bt
> > hr> at> - ps
> > hr> at> - show allpcpu
> > hr> at> - alltrace
> > hr> at>
> > hr> at> Alternatively, a coredump which has the stop cpu patch which An=
dryi can provide.
> > hr>
> > hr> =A0Okay, I will post them once I can get another panic. =A0Thanks!
> >
> > =A0I got the panic with a crash dump this time. =A0The result of bt, ps,
> > =A0allpcpu, and traces can be found at the following URL:
> >
> > =A0http://people.allbsd.org/~hrs/FreeBSD/pool-panic_20110818-1.txt
>=20
> Actually, I think I see the bug here.
>=20
> In callout_cpu_switch() if a low priority thread is migrating the
> callout and gets preempted after the outcoming cpu queue lock is left
> (and scheduled much later) we get this problem.
>=20
> In order to fix this bug it could be enough to use a critical section,
> but I think this should be really interrupt safe, thus I'd wrap them
> up with spinlock_enter()/spinlock_exit(). Fortunately
> callout_cpu_switch() should be called rarely and also we already do
> expensive locking operations in callout, thus we should not have
> problem performance-wise.
>=20
> Can the guys I also CC'ed here try the following patch, with all the
> initial kernel options that were leading you to the deadlock? (thus
> revert any debugging patch/option you added for the moment):
> http://www.freebsd.org/~attilio/callout-fixup.diff
>=20
> Please note that this patch is for STABLE_8, if you can confirm the
> good result I'll commit to -CURRENT and then backmarge as soon as
> possible.
>=20
> Thanks,
> Attilio
>=20

Thanks, Attilio.  I've applied the patch and removed the extra debug
options I had added (though keeping debug symbols).  I'll let you know if
I experience any more panics.

Regards,

--=20
=2EO. | Sterling (Chip) Camden      | http://camdensoftware.com
=2E.O | sterling@camdensoftware.com | http://chipsquips.com
OOO | 2048R/D6DBAF91              | http://chipstips.com

--LQksG6bCIzRHxTLp
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iQEcBAEBAgAGBQJOTH82AAoJEIpckszW26+Rm0oH/3Ikeau8F1c55yqTjMh6X78B
/3yTy68BsfBwD/VeA00Q/cpxlCafovUeP8WwXPE9mNkdR9Rhf1VuU7K1iLOtbGHe
F+UJ/rB8rNPUNxezCqo2kzoMhx2o9NbCiZPW9toyL1lW/pa/B5/lToma8BnbxzOH
2LBSU/8+HU8YphqXr4hPEPFxWUx74tSvieHOEBI1/GVZea2vpUrInO7cfqQ3DzLE
/6vnvb0KVfhQjTeeApdFen46eS2mbPl+PtMKGv3C7Ctle+Bv2hm3QhoIc8DCOTTE
9lBdByd2lozIUK+bsc2DMg/+keoW9h1MRVcaNRASOhdx1L6QId6ULdg9Z5QO2G8=
=jONj
-----END PGP SIGNATURE-----

--LQksG6bCIzRHxTLp--

From owner-freebsd-stable@FreeBSD.ORG  Thu Aug 18 08:16:53 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6D4F4106566C
	for <freebsd-stable@freebsd.org>; Thu, 18 Aug 2011 08:16:53 +0000 (UTC)
	(envelope-from melifaro@ipfw.ru)
Received: from mail.ipfw.ru (unknown [IPv6:2a01:4f8:120:6141::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 026B38FC08
	for <freebsd-stable@freebsd.org>; Thu, 18 Aug 2011 08:16:53 +0000 (UTC)
Received: from dhcp170-36-red.yandex.net ([95.108.170.36])
	by mail.ipfw.ru with esmtpsa (TLSv1:CAMELLIA256-SHA:256)
	(Exim 4.76 (FreeBSD)) (envelope-from <melifaro@ipfw.ru>)
	id 1QtxmT-000Nxd-DC; Thu, 18 Aug 2011 12:16:49 +0400
Message-ID: <4E4CCA6C.8020408@ipfw.ru>
Date: Thu, 18 Aug 2011 12:16:44 +0400
From: "Alexander V. Chernikov" <melifaro@ipfw.ru>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.1.16) Gecko/20110120 Thunderbird/3.0.11
MIME-Version: 1.0
To: perryh@pluto.rain.com
References: <4E4143A6.6030307@digsys.bg>	<935F8EC2-88E0-45A3-BE8B-7210BE223BC5@mac.com>
	<4e42a0c0.e2t/9MF98O3HFjb1%perryh@pluto.rain.com>
In-Reply-To: <4e42a0c0.e2t/9MF98O3HFjb1%perryh@pluto.rain.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-stable@freebsd.org, daniel@digsys.bg
Subject: Re: 32GB limit per swap device?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2011 08:16:53 -0000

On 10.08.2011 19:16, perryh@pluto.rain.com wrote:
> Chuck Swiger<cswiger@mac.com>  wrote:
>
>> On Aug 9, 2011, at 7:26 AM, Daniel Kalchev wrote:
>>> I am trying to set up 64GB partitions for swap for a system that
>>> has 64GB of RAM (with the idea to dump kernel core etc). But, on
>>> 8-stable as of today I get:
>>>
>>> WARNING: reducing size to maximum of 67108864 blocks per swap unit
>>>
>>> Is there workaround for this limitation?

Another interesting question:

swap pager operates in page blocks (PAGE_SIZE=4k on common arch).

Block device size in passed to swaponsomething() in number of _disk_ 
blocks  (e.g. in DEV_BSIZE=512). After that, kernel b-lists (on top of 
which swap pager is build) maximum objects check is enforced.

The (possible) problem is that real object count we will operate on is 
not the value passed to swaponsomething() since it is calculated in 
wrong units.

we should check b-list limit on (X * DEV_BSIZE512 / PAGE_SIZE) value 
which is rough (X / 8) so we should be able to address 32*8=256G.

The code should look like this:

Index: vm/swap_pager.c
===================================================================
--- vm/swap_pager.c     (revision 223877)
+++ vm/swap_pager.c     (working copy)
@@ -2129,6 +2129,15 @@ swaponsomething(struct vnode *vp, void *id, u_long
         u_long mblocks;

         /*
+        * nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks.
+        * First chop nblks off to page-align it, then convert.
+        *
+        * sw->sw_nblks is in page-sized chunks now too.
+        */
+       nblks &= ~(ctodb(1) - 1);
+       nblks = dbtoc(nblks);
+
+       /*
          * If we go beyond this, we get overflows in the radix
          * tree bitmap code.
          */
@@ -2138,14 +2147,6 @@ swaponsomething(struct vnode *vp, void *id, u_long
                         mblocks);
                 nblks = mblocks;
         }
-       /*
-        * nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks.
-        * First chop nblks off to page-align it, then convert.
-        *
-        * sw->sw_nblks is in page-sized chunks now too.
-        */
-       nblks &= ~(ctodb(1) - 1);
-       nblks = dbtoc(nblks);

         sp = malloc(sizeof *sp, M_VMPGDATA, M_WAITOK | M_ZERO);
         sp->sw_vp = vp;


(move pages recalculation before b-list check)


Can someone comment on this?


>>
>> Apparently, the 32GB swapspace limit is per swap area; you can add
>> up to 4 swap areas so create two or three 32GB swap partitions.
>
> Will that enable a 64GB dump?  In 8.1, dumpon(8) says:
kernel swap pager and dump facility are completely unrelated to each other.
The only possible relation is that dumpon rc-script searches first swap 
device in fstab to notify kernel it should dump on this device.
>
>       The dumpon utility is used to specify a device where the kernel
>       can save a crash dump in the case of a panic.
>       ...
>       For most systems the size of the specified dump device must be
>       at least the size of physical memory.
>       ...
>       The dumpon utility will refuse to enable a dump device which is
>       smaller than the total amount of physical memory as reported by
>       the hw.physmem sysctl(8) variable.
>
> Note the use of the singluar:  "a device" and "the specified device".
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
>


From owner-freebsd-stable@FreeBSD.ORG  Thu Aug 18 08:47:28 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4B92F106564A
	for <freebsd-stable@freebsd.org>; Thu, 18 Aug 2011 08:47:28 +0000 (UTC)
	(envelope-from yuri@rawbw.com)
Received: from shell0.rawbw.com (shell0.rawbw.com [198.144.192.45])
	by mx1.freebsd.org (Postfix) with ESMTP id 218848FC0A
	for <freebsd-stable@freebsd.org>; Thu, 18 Aug 2011 08:47:27 +0000 (UTC)
Received: from eagle.yuri.org (stunnel@localhost [127.0.0.1])
	(authenticated bits=0)
	by shell0.rawbw.com (8.14.4/8.14.4) with ESMTP id p7I8lQAc037584
	for <freebsd-stable@freebsd.org>; Thu, 18 Aug 2011 01:47:27 -0700 (PDT)
	(envelope-from yuri@rawbw.com)
Message-ID: <4E4CD19E.5070108@rawbw.com>
Date: Thu, 18 Aug 2011 01:47:26 -0700
From: Yuri <yuri@rawbw.com>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110716 Thunderbird/5.0
MIME-Version: 1.0
To: freebsd-stable@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: WD Advanced Format: do I need to do something special?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2011 08:47:28 -0000

WD has sectors of the size 4kB in their latest hard drives, which is 
different from the traditional 512B.
http://www.wdc.com/advformat
http://wdc.custhelp.com/app/answers/detail/a_id/5655

These articles assert that something special should be done in OS to 
enable high performance of such drives. For ex. WD recommends to install 
some latest drivers of particular version.
But what about FreeBSD? Should it be configured in some special way too 
for these drive to perform well?
Is it aware of 4kB sector size?

Yuri

From owner-freebsd-stable@FreeBSD.ORG  Thu Aug 18 09:17:28 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EE7A8106566C
	for <freebsd-stable@freebsd.org>; Thu, 18 Aug 2011 09:17:28 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta08.emeryville.ca.mail.comcast.net
	(qmta08.emeryville.ca.mail.comcast.net [76.96.30.80])
	by mx1.freebsd.org (Postfix) with ESMTP id D599B8FC16
	for <freebsd-stable@freebsd.org>; Thu, 18 Aug 2011 09:17:28 +0000 (UTC)
Received: from omta14.emeryville.ca.mail.comcast.net ([76.96.30.60])
	by qmta08.emeryville.ca.mail.comcast.net with comcast
	id MlHQ1h0021HpZEsA8lHQdR; Thu, 18 Aug 2011 09:17:24 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta14.emeryville.ca.mail.comcast.net with comcast
	id MlHT1h0011t3BNj8alHTup; Thu, 18 Aug 2011 09:17:27 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 409C9102C1A; Thu, 18 Aug 2011 02:17:27 -0700 (PDT)
Date: Thu, 18 Aug 2011 02:17:27 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Yuri <yuri@rawbw.com>
Message-ID: <20110818091727.GA61715@icarus.home.lan>
References: <4E4CD19E.5070108@rawbw.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4E4CD19E.5070108@rawbw.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-stable@freebsd.org
Subject: Re: WD Advanced Format: do I need to do something special?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2011 09:17:29 -0000

On Thu, Aug 18, 2011 at 01:47:26AM -0700, Yuri wrote:
> WD has sectors of the size 4kB in their latest hard drives, which is
> different from the traditional 512B.
> http://www.wdc.com/advformat
> http://wdc.custhelp.com/app/answers/detail/a_id/5655
> 
> These articles assert that something special should be done in OS to
> enable high performance of such drives. For ex. WD recommends to
> install some latest drivers of particular version.
> But what about FreeBSD? Should it be configured in some special way
> too for these drive to perform well?
> Is it aware of 4kB sector size?

The below advice still applies.  Do not skim the page, read it.

http://ivoras.net/blog/tree/2011-01-01.freebsd-on-4k-sector-drives.html

You will therefore have to go through some manual rigmarole (preferably
with gpart(8)) to ensure performance.  If you plan on using the disks in
ZFS, you get to go through some extra rigmarole.

Also be aware that mixed LBA sizes on things like RAID (and possibly
ZFS?) may result in abysmal performance.  I just got done assisting a
user on a forum who had horrible performance on his 2-disk RAID-1 array
driven by an Intel ICH9R using Intel's native RST driver under 64-bit
Windows.  How/why?

He bought two drives, both WD10EADS (not a typo).  However, one drive
was WD10EADS-65M2BX (firmware 01.00A01, 512 byte physical, 512 byte
logical) while the other was WD10EADS-11M2B1 (firmware 80.00A80, 4096
byte physical, 512 byte logical).

He replaced the WD10EADS-65M2BX drive with another 4KB physical drive
and his performance problem disappeared.

I only point this out because this could happen to any user.  "Oh I need
to get a replacement WD10EADS drive for my system... what the heck?!?"
This is going to confuse a lot of people, and caught me by surprise when
I saw it.  Shame on Western Digital for not adjusting the model string!

Comparatively, the WD "EARS"-model drives, however, have always been
4KByte physical / 512 byte logical.  The logical size is set to 512 to
ensure full compatibility with existing and legacy OSes.

I'm dreading the day the WD Caviar Black models succumb to all this
nonsense.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-stable@FreeBSD.ORG  Thu Aug 18 09:21:21 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CABC9106566C;
	Thu, 18 Aug 2011 09:21:21 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id AB12E8FC22;
	Thu, 18 Aug 2011 09:21:20 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id MAA10730;
	Thu, 18 Aug 2011 12:21:17 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Message-ID: <4E4CD98C.1000301@FreeBSD.org>
Date: Thu, 18 Aug 2011 12:21:16 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110705 Thunderbird/5.0
MIME-Version: 1.0
To: Steven Hartland <killing@multiplay.co.uk>
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><A71C3ACF01EC4D36871E49805C1A5321@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk><4E43E272.1060204@FreeBSD.org><62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk><4E440865.1040500@FreeBSD.org><6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk><4E441314.6060606@FreeBSD.org><2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk><4E48D967.9060804@FreeBSD.org><9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk><4E490DAF.1080009@FreeBSD.org><796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk><4E491D01.1090902@FreeBSD.org><570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk><4E4AD35C.7020504@FreeBSD.org><6A7238AED44542A880B082A40304D940@multiplay.co.uk><4E4BA21F.6010805@FreeBSD.org><581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk><4E4BBA7F.30907@FreeBSD.org><88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk>
	<4E4C22D6.6070407@FreeBSD.org>
	<4019027648B5493AAC4B654BD821DE88@multiplay.co.! uk>
In-Reply-To: <4019027648B5493AAC4B654BD821DE88@multiplay.co.uk>
X-Enigmail-Version: 1.2pre
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers@FreeBSD.org, freebsd-jail@FreeBSD.org,
	freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2011 09:21:21 -0000

on 18/08/2011 02:15 Steven Hartland said the following:
> ----- Original Message ----- From: "Andriy Gapon" <avg@FreeBSD.org>
> 
>> Thanks to the debug that Steven provided and to the help that I received from
>> Kostik, I think that now I understand the basic mechanics of this panic, but,
>> unfortunately, not the details of its root cause.
>>
>> It seems like everything starts with some kind of a race between terminating
>> processes in a jail and termination of the jail itself.  This is where the
>> details are very thin so far.  What we see is that a process (http) is in
>> exit(2) syscall, in exit1() function actually, and past the place where P_WEXIT
>> flag is set and even past the place where p_limit is freed and reset to NULL.
>> At that place the thread calls prison_proc_free(), which calls prison_deref().
>> Then, we see that in prison_deref() the thread gets a page fault because of what
>> seems like a NULL pointer dereference.  That's just the start of the problem and
>> its root cause.
> 
> Thats interesting, are you using http as an example or is that something thats
> been gleaned from the debugging of our output? I ask as there's only one process
> running in each of our jails and thats a single java process.


It's from the debug data: p_comm = "httpd"
I also would like to ask you to revert the last patch that I sent you (with tf_rip
comparisons) and try the patch from Kostik instead.
Given what we suspect about the problem, can please also try to provoke the
problem by e.g. doing frequent jail restarts or something else that supposedly
should hit the bug.

-- 
Andriy Gapon

From owner-freebsd-stable@FreeBSD.ORG  Thu Aug 18 09:28:11 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5CA9B1065677
	for <freebsd-stable@freebsd.org>; Thu, 18 Aug 2011 09:28:11 +0000 (UTC)
	(envelope-from delphij@gmail.com)
Received: from mail-gw0-f54.google.com (mail-gw0-f54.google.com [74.125.83.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 1FE598FC0A
	for <freebsd-stable@freebsd.org>; Thu, 18 Aug 2011 09:28:10 +0000 (UTC)
Received: by gwb15 with SMTP id 15so895286gwb.13
	for <freebsd-stable@freebsd.org>; Thu, 18 Aug 2011 02:28:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	bh=LZd0CnVhQzZ+BCDJsG6KurDrxeOi40qJiirjkHrqR8M=;
	b=mwMnFnNP5kH9QDzBGawGPrPvM4OUEA3RfGB5kifEboisQGyyCUt45wYLp3uWbq5BDe
	IHvn21Kw8U0SpVlDyQkccQhgzsKpu2OiCoaH0CLmSr4PYXsdTIROJbPsrqP0wFqrf7DF
	2nX/j/RomDuyAYHQH+Zb4GRAgeHmprAHpmAmM=
MIME-Version: 1.0
Received: by 10.151.157.11 with SMTP id j11mr446965ybo.392.1313658029060; Thu,
	18 Aug 2011 02:00:29 -0700 (PDT)
Received: by 10.150.136.11 with HTTP; Thu, 18 Aug 2011 02:00:29 -0700 (PDT)
In-Reply-To: <4E4CD19E.5070108@rawbw.com>
References: <4E4CD19E.5070108@rawbw.com>
Date: Thu, 18 Aug 2011 02:00:29 -0700
Message-ID: <CAGMYy3vHApgP2bHN7eBC_++d4qfYW5NcPC-7aLpe99mchO7upQ@mail.gmail.com>
From: Xin LI <delphij@gmail.com>
To: Yuri <yuri@rawbw.com>
Content-Type: text/plain; charset=UTF-8
Cc: freebsd-stable@freebsd.org
Subject: Re: WD Advanced Format: do I need to do something special?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2011 09:28:11 -0000

Hi,

On Thu, Aug 18, 2011 at 1:47 AM, Yuri <yuri@rawbw.com> wrote:
> WD has sectors of the size 4kB in their latest hard drives, which is
> different from the traditional 512B.
> http://www.wdc.com/advformat
> http://wdc.custhelp.com/app/answers/detail/a_id/5655
>
> These articles assert that something special should be done in OS to enable
> high performance of such drives. For ex. WD recommends to install some
> latest drivers of particular version.
> But what about FreeBSD? Should it be configured in some special way too for
> these drive to perform well?
> Is it aware of 4kB sector size?

The FreeBSD driver detects 4k drives.

At this time as far as I know all AF drives on market advertises
512-bytes sector rather than 4k (mostly for compatibility with BIOS,
etc).  If they advertise 4k sector natively, you don't have to do
anything special but currently you need to make sure:

 - FS Partitions starts at a 4k boundary;
 - FS is aware of 4k sector, e.g. through gnop -S 4k for ZFS, which
will remember this so you don't have to do that at later time.  For
UFS you may want to specify larger fragment size and block size
(4k/32k for example).

Some newly developed application like FreeNAS already detect this and
make adjustment for you by default.  We need to check and make sure
that our base system tools, especially installer, would do that
though.

Cheers,
-- 
Xin LI <delphij@delphij.net> https://www.delphij.net/
FreeBSD - The Power to Serve! Live free or die

From owner-freebsd-stable@FreeBSD.ORG  Thu Aug 18 09:55:39 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DBF6F106566C
	for <freebsd-stable@freebsd.org>; Thu, 18 Aug 2011 09:55:38 +0000 (UTC)
	(envelope-from yuri@rawbw.com)
Received: from shell0.rawbw.com (shell0.rawbw.com [198.144.192.45])
	by mx1.freebsd.org (Postfix) with ESMTP id C88B08FC16
	for <freebsd-stable@freebsd.org>; Thu, 18 Aug 2011 09:55:38 +0000 (UTC)
Received: from eagle.yuri.org (stunnel@localhost [127.0.0.1])
	(authenticated bits=0)
	by shell0.rawbw.com (8.14.4/8.14.4) with ESMTP id p7I9tbTB049134;
	Thu, 18 Aug 2011 02:55:38 -0700 (PDT) (envelope-from yuri@rawbw.com)
Message-ID: <4E4CE199.8030104@rawbw.com>
Date: Thu, 18 Aug 2011 02:55:37 -0700
From: Yuri <yuri@rawbw.com>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110716 Thunderbird/5.0
MIME-Version: 1.0
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
References: <4E4CD19E.5070108@rawbw.com>
	<20110818091727.GA61715@icarus.home.lan>
In-Reply-To: <20110818091727.GA61715@icarus.home.lan>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-stable@freebsd.org
Subject: Re: WD Advanced Format: do I need to do something special?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2011 09:55:39 -0000

On 08/18/2011 02:17, Jeremy Chadwick wrote:
> The below advice still applies.  Do not skim the page, read it.
>
> http://ivoras.net/blog/tree/2011-01-01.freebsd-on-4k-sector-drives.html
>
> You will therefore have to go through some manual rigmarole (preferably
> with gpart(8)) to ensure performance.  If you plan on using the disks in
> ZFS, you get to go through some extra rigmarole.

I didn't know about such extra actions that are required and just 
created ZFS pool.
zdb -C <mypool> shows ashift as 9. I read it as meaning that sector size 
if 512bytes (wrong!).

But I tested the 25GB file writing/reading speed on the middle tracks 
and it seems reasonable:
WR 55MB/s
RD 107MB/s

So can I get even better speeds if it was aware of 4k sector?

Yuri

From owner-freebsd-stable@FreeBSD.ORG  Thu Aug 18 10:09:13 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5BB1F106564A
	for <freebsd-stable@freebsd.org>; Thu, 18 Aug 2011 10:09:13 +0000 (UTC)
	(envelope-from marc@blackend.org)
Received: from smtp6-g21.free.fr (unknown [IPv6:2a01:e0c:1:1599::15])
	by mx1.freebsd.org (Postfix) with ESMTP id C86738FC17
	for <freebsd-stable@freebsd.org>; Thu, 18 Aug 2011 10:09:11 +0000 (UTC)
Received: from emphyrio.blackend.org (unknown [88.179.1.53])
	by smtp6-g21.free.fr (Postfix) with ESMTP id 228F6822A5;
	Thu, 18 Aug 2011 12:09:03 +0200 (CEST)
Received: from emphyrio.blackend.org (localhost [127.0.0.1])
	by emphyrio.blackend.org (8.14.5/8.14.4) with ESMTP id p7IAAZkF002328; 
	Thu, 18 Aug 2011 12:10:35 +0200 (CEST)
	(envelope-from marc@emphyrio.blackend.org)
Received: (from marc@localhost)
	by emphyrio.blackend.org (8.14.5/8.14.4/Submit) id p7IAAYAg002327;
	Thu, 18 Aug 2011 12:10:34 +0200 (CEST) (envelope-from marc)
Date: Thu, 18 Aug 2011 12:10:34 +0200
From: Marc Fonvieille <blackend@freebsd.org>
To: Yuri <yuri@rawbw.com>
Message-ID: <20110818101034.GA1958@emphyrio.blackend.org>
References: <4E4CD19E.5070108@rawbw.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4E4CD19E.5070108@rawbw.com>
X-Useless-Header: blackend.org
X-Operating-System: FreeBSD 8.2-STABLE
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-stable@freebsd.org
Subject: Re: WD Advanced Format: do I need to do something special?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2011 10:09:13 -0000

On Thu, Aug 18, 2011 at 01:47:26AM -0700, Yuri wrote:
> WD has sectors of the size 4kB in their latest hard drives, which is 
> different from the traditional 512B.
> http://www.wdc.com/advformat
> http://wdc.custhelp.com/app/answers/detail/a_id/5655
> 
> These articles assert that something special should be done in OS to 
> enable high performance of such drives. For ex. WD recommends to install 
> some latest drivers of particular version.
> But what about FreeBSD? Should it be configured in some special way too 
> for these drive to perform well?
> Is it aware of 4kB sector size?
>

I own that (I'm running 8-STABLE):

ada0 at ahcich2 bus 0 scbus2 target 0 lun 0
ada0: <WDC WD10EARS-00Y5B1 80.00A80> ATA-8 SATA 2.x device
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)

which has 4kB sectors but says "512 byte sectors" :)

I use the whole disk for the FreeBSD slice, I aligned all partitions on
a multiple of 8 sectors (512*8=4096).

By default fdisk(8) uses a 63 sectors default offset:

******* Working on device /dev/ada0 *******
parameters extracted from in-core disklabel are:
cylinders=1938021 heads=16 sectors/track=63 (1008 blks/cyl)

Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=1938021 heads=16 sectors/track=63 (1008 blks/cyl)

Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
    start 63, size 1953525105 (953869 Meg), flag 80 (active)
	beg: cyl 0/ head 1/ sector 1;
	end: cyl 1023/ head 15/ sector 63
The data for partition 2 is:
<UNUSED>
The data for partition 3 is:
<UNUSED>
The data for partition 4 is:
<UNUSED>


Look at "start 63" statement.  Instead of fixing fdisk(8) behavior, I just
correctly edited my bsdlabel(8) table:

# /dev/ada0s1:
8 partitions:
#          size     offset    fstype   [fsize bsize bps/cpg]
  a:    4194304         17    4.2BSD        0     0     0
  b:    8388608    4194321      swap                    
  c: 1953525105          0    unused        0     0     # "raw" part, don't edit
  d:   16777216   12582929    4.2BSD        0     0     0
  e: 1924163584   29360145    4.2BSD        0     0     0


The important part is the offset 17 to correct the fdisk(8) offset (16+1
to align the previous 63).  The remaining offsets are calculted from the
size I gave for the partitions (in MB, which can be divided by 8).
Then I used newfs(8) with the option "-f 4096".


There's another painful issue with this disk: the automatic head-parking
after few seconds.  I disabled it (with wdidle3) cause after 2 months of
use, I was at more than 35000 head-parkings...

-- 
Marc

From owner-freebsd-stable@FreeBSD.ORG  Thu Aug 18 10:47:04 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0AC10106566C;
	Thu, 18 Aug 2011 10:47:04 +0000 (UTC)
	(envelope-from prvs=12111cb08a=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	by mx1.freebsd.org (Postfix) with ESMTP id 312988FC18;
	Thu, 18 Aug 2011 10:47:02 +0000 (UTC)
X-MDAV-Processed: mail1.multiplay.co.uk, Thu, 18 Aug 2011 11:35:21 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Thu, 18 Aug 2011 11:35:21 +0100
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	mail1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST
	shortcircuit=ham autolearn=disabled version=3.2.5
Received: from r2d2 ([188.220.16.49])
	by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	(MDaemon PRO v10.0.4) with ESMTP id md50014645776.msg;
	Thu, 18 Aug 2011 11:35:20 +0100
X-MDRemoteIP: 188.220.16.49
X-Return-Path: prvs=12111cb08a=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
Message-ID: <F4663E06BEED4401916C0AEAA16DD40E@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Andriy Gapon" <avg@FreeBSD.org>
References: uk> <4E4CD98C.1000301@FreeBSD.org>
Date: Thu, 18 Aug 2011 11:35:58 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
Cc: freebsd-hackers@FreeBSD.org, freebsd-jail@FreeBSD.org,
	freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2011 10:47:04 -0000

----- Original Message ----- 
From: "Andriy Gapon" <avg@FreeBSD.org>
>> Thats interesting, are you using http as an example or is that something thats
>> been gleaned from the debugging of our output? I ask as there's only one process
>> running in each of our jails and thats a single java process.
> 
> 
> It's from the debug data: p_comm = "httpd"

Hmm, there's only one httpd thats ever run on the machine and thats not in the jail
its on the raw machine.

> I also would like to ask you to revert the last patch that I sent you (with tf_rip
> comparisons) and try the patch from Kostik instead.

Sure.

> Given what we suspect about the problem, can please also try to provoke the
> problem by e.g. doing frequent jail restarts or something else that supposedly
> should hit the bug.

I've tried doing this for quite some days on the test machine, but I've been
unable to provoke it, will continue to try.

    Regards
    Steve


================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-stable@FreeBSD.ORG  Thu Aug 18 11:11:08 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 32E58106566C;
	Thu, 18 Aug 2011 11:11:08 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 21EA88FC1B;
	Thu, 18 Aug 2011 11:11:06 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA12584;
	Thu, 18 Aug 2011 14:11:04 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Message-ID: <4E4CF347.6030908@FreeBSD.org>
Date: Thu, 18 Aug 2011 14:11:03 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110705 Thunderbird/5.0
MIME-Version: 1.0
To: Steven Hartland <killing@multiplay.co.uk>
References: uk> <4E4CD98C.1000301@FreeBSD.org>
	<F4663E06BEED4401916C0AEAA16DD40E@multiplay.co.uk>
In-Reply-To: <F4663E06BEED4401916C0AEAA16DD40E@multiplay.co.uk>
X-Enigmail-Version: 1.2pre
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers@FreeBSD.org, freebsd-jail@FreeBSD.org,
	freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2011 11:11:08 -0000

on 18/08/2011 13:35 Steven Hartland said the following:
> ----- Original Message ----- From: "Andriy Gapon" <avg@FreeBSD.org>
>>> Thats interesting, are you using http as an example or is that something thats
>>> been gleaned from the debugging of our output? I ask as there's only one process
>>> running in each of our jails and thats a single java process.
>>
>>
>> It's from the debug data: p_comm = "httpd"
> 
> Hmm, there's only one httpd thats ever run on the machine and thats not in the jail
> its on the raw machine.

Probably I have mistakenly assumed that the 'prison' in prison_derefer() has
something to do with an actual jail, while it could have been just prison0 where
all non-jailed processes belong.

-- 
Andriy Gapon

From owner-freebsd-stable@FreeBSD.ORG  Thu Aug 18 11:26:06 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 17692106566C;
	Thu, 18 Aug 2011 11:26:06 +0000 (UTC)
	(envelope-from prvs=12111cb08a=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	by mx1.freebsd.org (Postfix) with ESMTP id 08B508FC16;
	Thu, 18 Aug 2011 11:26:04 +0000 (UTC)
X-MDAV-Processed: mail1.multiplay.co.uk, Thu, 18 Aug 2011 12:24:30 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Thu, 18 Aug 2011 12:24:30 +0100
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	mail1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST
	shortcircuit=ham autolearn=disabled version=3.2.5
Received: from r2d2 ([188.220.16.49])
	by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	(MDaemon PRO v10.0.4) with ESMTP id md50014646198.msg;
	Thu, 18 Aug 2011 12:24:29 +0100
X-MDRemoteIP: 188.220.16.49
X-Return-Path: prvs=12111cb08a=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
Message-ID: <E1639ED071F74A77AAB376445BF6880E@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Andriy Gapon" <avg@FreeBSD.org>
References: uk> <4E4CD98C.1000301@FreeBSD.org>
	<F4663E06BEED4401916C0AEAA16DD40E@multiplay.co.uk>
	<4E4CF347.6030908@FreeBSD.org>
Date: Thu, 18 Aug 2011 12:25:12 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
Cc: freebsd-hackers@FreeBSD.org, freebsd-jail@FreeBSD.org,
	freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2011 11:26:06 -0000

----- Original Message ----- 
From: "Andriy Gapon" <avg@FreeBSD.org>

> Probably I have mistakenly assumed that the 'prison' in prison_derefer() has
> something to do with an actual jail, while it could have been just prison0 where
> all non-jailed processes belong.

That makes sense as this particular panic was caused by a machine reboot,
which is slightly different from the more common jail panic we're seeing.

Doesn't help with our reproduction scenario though unfortunately. If we
don't have any joy reproducing on our single test machine I'll have this
kernel rolled out across a portion of the farm, which should mean we
see the panic results in a few days time.

I understand there's a risk involved in this but, its important for us
to determine the cause and get a confirmed fix, as well as being able
to prove that the panic fix works which will help everyone in the long
run.

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-stable@FreeBSD.ORG  Thu Aug 18 13:58:09 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 29C121065676;
	Thu, 18 Aug 2011 13:58:09 +0000 (UTC)
	(envelope-from 000.fbsd@quip.cz)
Received: from elsa.codelab.cz (elsa.codelab.cz [94.124.105.4])
	by mx1.freebsd.org (Postfix) with ESMTP id 6DA6C8FC1C;
	Thu, 18 Aug 2011 13:58:08 +0000 (UTC)
Received: from elsa.codelab.cz (localhost [127.0.0.1])
	by elsa.codelab.cz (Postfix) with ESMTP id 99C2828426;
	Thu, 18 Aug 2011 15:58:06 +0200 (CEST)
Received: from [192.168.1.2] (ip-86-49-61-235.net.upcbroadband.cz
	[86.49.61.235])
	(using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
	(No client certificate requested)
	by elsa.codelab.cz (Postfix) with ESMTPSA id A345328424;
	Thu, 18 Aug 2011 15:58:05 +0200 (CEST)
Message-ID: <4E4D1A6C.7060604@quip.cz>
Date: Thu, 18 Aug 2011 15:58:04 +0200
From: Miroslav Lachman <000.fbsd@quip.cz>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US;
	rv:1.9.1.19) Gecko/20110420 Lightning/1.0b1 SeaMonkey/2.0.14
MIME-Version: 1.0
To: Artem Belevich <art@freebsd.org>
References: <4E4BC38D.1050808@quip.cz>
	<4E4BCCC3.60601@digsys.bg>	<CAFqOu6hQzzwrTpuyddqrODr8WP4Ke0pi7MoYhYL9ivfsNHxNhA@mail.gmail.com>	<4E4C1945.5030504@quip.cz>
	<CAFqOu6gVAjhbGd-92kiLGdztVTAbAxe1MwEjVECthnKVV=QEMg@mail.gmail.com>
In-Reply-To: <CAFqOu6gVAjhbGd-92kiLGdztVTAbAxe1MwEjVECthnKVV=QEMg@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-stable@freebsd.org, Daniel Kalchev <daniel@digsys.bg>
Subject: Re: can not boot from RAIDZ with 8-STABLE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2011 13:58:09 -0000

Artem Belevich wrote:
> On Wed, Aug 17, 2011 at 12:40 PM, Miroslav Lachman<000.fbsd@quip.cz>  wrote:
>> Thank you guys, you are right. The BIOS provides only 1 disk to the loader!
>> I checked it from loader prompt by lsdev (booted from USB external HDD).
>>
>> So I will try to make a small zpool mirror for root and boot (if ZFS mirror
>> can be made of 4 providers instead of two) and the rest will be in RAIDZ.
>>
>> If that fails, I will go my old way with internal USB flash disk with UFS
>> for booting and RAIDZ of 4 disks for storage as I did it few years ago with
>> 7.0 or 7.1.
>
> You seem to be booting from disks attached to some sort of add-on
> card. Sometimes those have per-disk 'bootable' option in their own
> extension ROM. You may investigate yours. Perhaps all you need to do
> is just tweak controller settings.

Advanced controller settings allows me to choose which disk will be 
bootable - but I can mark just one of them, not all.

So my working setup is made from 2 pools. First is 4 way ZFS mirror for 
/ (root), second is RAIDZ for the rest.
(plus swap made on the top of gmirrored partitions)

Each disk has following partitions:

# gpart show da0
=>       34  976773101  da0  GPT  (465G)
          34        128    1  freebsd-boot  (64k)
         162    8388608    2  freebsd-swap  (4.0G)
     8388770   20971520    3  freebsd-zfs  (10G)
    29360290  943718400    4  freebsd-zfs  (450G)
   973078690    3694445       - free -  (1.8G)


# zpool list
NAME    SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
sys    9.94G   781M  9.17G     7%  1.00x  ONLINE  -
tank   1.75T  4.77G  1.75T     0%  1.00x  ONLINE  -


Filesystem                 Size    Mounted on
sys/root                   9.8G    /
devfs                      1.0k    /dev
tank/tmp                   1.3T    /tmp
tank/usr/home              1.3T    /usr/home
tank/usr/home/quip         1.3T    /usr/home/quip
tank/usr/local             1.3T    /usr/local
tank/usr/obj               1.3T    /usr/obj
tank/usr/ports             1.3T    /usr/ports
tank/usr/ports/distfiles   1.3T    /usr/ports/distfiles
tank/usr/ports/packages    1.3T    /usr/ports/packages
tank/usr/src               1.3T    /usr/src
tank/var/amavis            1.3T    /var/amavis
tank/var/audit             1.3T    /var/audit
tank/var/crash             1.3T    /var/crash
tank/var/db                1.3T    /var/db
tank/var/db/mysql          1.3T    /var/db/mysql
tank/var/log               1.3T    /var/log
tank/var/mail              1.3T    /var/mail
tank/var/tmp               1.3T    /var/tmp
tank/var/virusmails        1.3T    /var/virusmails
tank/vol0                  1.3T    /vol0


I hope that it helps to somebody with similar problem.

Miroslav Lachman

From owner-freebsd-stable@FreeBSD.ORG  Thu Aug 18 14:31:28 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E2C4F106566B;
	Thu, 18 Aug 2011 14:31:28 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id B50648FC1F;
	Thu, 18 Aug 2011 14:31:27 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA15396;
	Thu, 18 Aug 2011 17:31:23 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Message-ID: <4E4D222E.2090802@FreeBSD.org>
Date: Thu, 18 Aug 2011 17:31:10 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110705 Thunderbird/5.0
MIME-Version: 1.0
To: Steven Hartland <killing@multiplay.co.uk>, freebsd-jail@FreeBSD.org
References: uk> <4E4CD98C.1000301@FreeBSD.org>
	<F4663E06BEED4401916C0AEAA16DD40E@multiplay.co.uk>
	<4E4CF347.6030908@FreeBSD.org>
In-Reply-To: <4E4CF347.6030908@FreeBSD.org>
X-Enigmail-Version: 1.2pre
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers <freebsd-hackers@FreeBSD.org>, freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2011 14:31:29 -0000

on 18/08/2011 14:11 Andriy Gapon said the following:
> Probably I have mistakenly assumed that the 'prison' in prison_derefer() has
> something to do with an actual jail, while it could have been just prison0 where
> all non-jailed processes belong.

So, indeed:
(kgdb) p $2->p_ucred->cr_prison
$10 = (struct prison *) 0xffffffff807d5080
(kgdb) p &prison0
$11 = (struct prison *) 0xffffffff807d5080
(kgdb) p *$2->p_ucred->cr_prison
$12 = {pr_list = {tqe_next = 0x0, tqe_prev = 0x0}, pr_id = 0, pr_ref = 398,
pr_uref = 0, pr_flags = 386, pr_children = {lh_first = 0x0}, pr_sibling = {le_next
= 0x0, le_prev = 0x0}, pr_parent = 0x0,
  pr_mtx = {lock_object = {lo_name = 0xffffffff8063007c "jail mutex", lo_flags =
16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, pr_task = {ta_link =
{stqe_next = 0x0}, ta_pending = 0,
    ta_priority = 0, ta_func = 0, ta_context = 0x0}, pr_osd = {osd_nslots = 0,
osd_slots = 0x0, osd_next = {le_next = 0x0, le_prev = 0x0}}, pr_cpuset =
0xffffff0012d65dc8, pr_vnet = 0x0,
  pr_root = 0xffffff00166ebce8, pr_ip4s = 0, pr_ip6s = 0, pr_ip4 = 0x0, pr_ip6 =
0x0, pr_sparep = {0x0, 0x0, 0x0, 0x0}, pr_childcount = 0, pr_childmax = 999999,
pr_allow = 127, pr_securelevel = -1,
  pr_enforce_statfs = 0, pr_spare = {0, 0, 0, 0, 0}, pr_hostid = 3251597242,
pr_name = "0", '\0' <repeats 254 times>, pr_path = "/", '\0' <repeats 1022 times>,
  pr_hostname = "censored", '\0' <repeats 231 times>, pr_domainname = '\0'
<repeats 255 times>, pr_hostuuid = "54443842-0054-2500-902c-0025902c3cb0", '\0'
<repeats 27 times>}

Also, let's consider this code:
if (flags & PD_DEUREF) {
        for (tpr = pr;; tpr = tpr->pr_parent) {
                if (tpr != pr)
                        mtx_lock(&tpr->pr_mtx);
                if (--tpr->pr_uref > 0)
                        break;
                KASSERT(tpr != &prison0, ("prison0 pr_uref=0"));
                mtx_unlock(&tpr->pr_mtx);
        }
        /* Done if there were only user references to remove. */
        if (!(flags & PD_DEREF)) {
                mtx_unlock(&tpr->pr_mtx);
                if (flags & PD_LIST_SLOCKED)
                        sx_sunlock(&allprison_lock);
                else if (flags & PD_LIST_XLOCKED)
                        sx_xunlock(&allprison_lock);
                return;
        }
        if (tpr != pr) {
                mtx_unlock(&tpr->pr_mtx);
                mtx_lock(&pr->pr_mtx);
        }
}

The most suspicious thing is that pr_uref is zero in the debug data.
With INVARIANTS we would hit the "prison0 pr_uref=0" KASSERT.

Then, because this is prison0 and because pr_uref reached zero, tpr gets assigned
to NULL.  And then because tpr != pr we try to execute mtx_unlock(&tpr->pr_mtx).
That's where the NULL pointer deref happens.

So, now the big question is how/why we reached pr_uref == 0.

-- 
Andriy Gapon

From owner-freebsd-stable@FreeBSD.ORG  Thu Aug 18 17:04:26 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E011D106564A
	for <freebsd-stable@FreeBSD.org>; Thu, 18 Aug 2011 17:04:26 +0000 (UTC)
	(envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 438BF8FC15
	for <freebsd-stable@FreeBSD.org>; Thu, 18 Aug 2011 17:04:25 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id UAA17015;
	Thu, 18 Aug 2011 20:04:11 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Message-ID: <4E4D460A.2080100@FreeBSD.org>
Date: Thu, 18 Aug 2011 20:04:10 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110705 Thunderbird/5.0
MIME-Version: 1.0
To: Andrew Boyer <aboyer@averesystems.com>
References: <DA1FD6FD-2E57-4EC4-899D-2C1CBB769456@averesystems.com>
In-Reply-To: <DA1FD6FD-2E57-4EC4-899D-2C1CBB769456@averesystems.com>
X-Enigmail-Version: 1.2pre
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: freebsd-stable@FreeBSD.org, Eugene Grosbein <egrosbein@rdtc.ru>,
	Vishal.Shah@netapp.com, Hans Petter Selasky <hselasky@c2i.net>,
	Jeremiah Lott <jlott@averesystems.com>,
	Steven Hartland <killing@multiplay.co.uk>
Subject: Re: USB/coredump hangs in 8 and 9
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2011 17:04:27 -0000

on 12/08/2011 22:59 Andrew Boyer said the following:
> Re: panic: bufwrite: buffer is not busy??? (originally on freebsd-net)
> 
> Re: debugging frequent kernel panics on 8.2-RELEASE (originally on freebsd-stable)
> 
> Re: System hang in USB umass module while processing panic  (originally on
> freebsd-usb)
> 
> Hello Andriy and Hans,
> 
> Sorry for tying in so many discussions on this topic, but I think I have an
> explanation for the problems we have been reporting* with hanging coredumps on
> multicore systems on 8.2-RELEASE, and it has implications for Andriy's proposed
> scheduler patch** and for USB.
> 
> In today's 8.X and 9.X branches, nothing that I can find stops the other CPUs when
> the kernel panics, but many parts of the locking code get disabled (grep on
> 'panicstr').  The 'bufwrite: buffer is not busy???' panic is caused by the syncer
> encountering an error.  If that happens when it's on the dumping CPU everything
> hangs.  If it's running on a different CPU, it will be blocked and hidden by the
> panic_cpu spinlock in panic(), and the dump continues, polling every attached
> keyboard for a Ctl-C.
> 
> But, the new 8.X USB stack relies on multithreading.  (The new stack is the
> variable that broke coredumps for us in the 7.1->8.2 transition, I think.)  SVN
> 224223 fixes a hang that would happen when dumpsys() polls the USB keyboard (IPMI
> KVM, in our case).  That helps, but it only gets as far as usb_process(), where it
> hangs in a loop around a cv_wait() call.  This is easy to reproduce by adding code
> to the watchdog to break into the debugger if panicstr is set.
> 
> I am experimenting with Andriy's patch** to stop the scheduler and it seems to be
> most of the way there, stopping the CPUs and disabling the rest of locking.  There
> are a few places that still reference panicstr, but that's minor.  These are the
> changes I made to the patch:
>  * Changed ukbd_do_poll() to return immediately if SCHEDULER_STOPPED() is true, so
> that we don't hang up in USB.  ukbd_yield()  locks up in DROP_GIANT(), and if you
> skip ukbd_yield(), usbd_transfer_poll() locks up trying to drop mutexes.

Hmm, this is a little bit unexpected.  I though that with the patch all the
mutex/lock operations would be skipped.
Can you please check which locks give you the trouble and why?
I would like to improve the patch, so that all lock operations are by-passed
(whether locking or unlocking).

>  * Changed the call to spinlock_enter() back to critical_enter(), so that
> interrupts stay enabled and the hardclock still functions.

Not sure if I like this idea in general.

>  * Added code in the beginning of panic() to switch to CPU 0, so that we're able
> to service the hardclock interrupts and so that watchdog panics get through.

Also I wouldn't like switching a panic thread to a different CPU as that messes up
with a lot of state and is not safe for an arbitrary context.
Also, can you please clarify what you meant by "watchdog panics get through"?
Do you talk about SW_WATCHDOG specifically?

> This has worked 100% for me so far, although anyone using a USB keyboard or dump
> device would still be out of luck.
> 
> Thoughts?  It seems like stopping all of the other CPUs is the right thing to do
> on a panic (what are they doing otherwise?).  Are the USB issues fixable?  If
> Andriy's patch get committed it might just involve short-circuiting all of the
> locking in the polling path, but I haven't gotten that far yet.  I bet dumping to
> NFS will have the same problem.

I think that no subsystem should rely on working scheduling and interrupts in
post-panic world.  In fact, all the code for skipping locking is just a giant
hack/workaround in my opinion.  Ideally, all the subsystems that can be expected
to be called after panic should be aware of that and should check for that.  So
they should not attempt any locking or switching threads or rebinding CPUs or
expect interrupts, etc.  The environment should mirror early boot where we have
only one CPU, only one thread, no interrupts, only polling.

If you can help Hans to figure out what you is wrong with USB subsystem in this
respect that would help us all.

Thank you for your testing and feedback!
-- 
Andriy Gapon

From owner-freebsd-stable@FreeBSD.ORG  Thu Aug 18 20:09:38 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 807AB1065672;
	Thu, 18 Aug 2011 20:09:38 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 786778FC0A;
	Thu, 18 Aug 2011 20:09:37 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA18855;
	Thu, 18 Aug 2011 23:09:36 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
	by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1Qu8uF-000H9C-QY; Thu, 18 Aug 2011 23:09:35 +0300
Message-ID: <4E4D717F.3090802@FreeBSD.org>
Date: Thu, 18 Aug 2011 23:09:35 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:6.0) Gecko/20110817 Thunderbird/6.0
MIME-Version: 1.0
To: freebsd-hackers@FreeBSD.org
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><A71C3ACF01EC4D36871E49805C1A5321@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk>
	<4E43E272.1060204@FreeBSD.org>
	<62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk>
	<4E440865.1040500@FreeBSD.org>
	<6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk>
	<4E441314.6060606@FreeBSD.org>
	<2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk>
	<4E48D967.9060804@FreeBSD.org>
	<9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk>
	<4E490DAF.1080009@FreeBSD.org>
	<796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk>
	<4E491D01.1090902@FreeBSD.org>
	<570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk>
	<4E4AD35C.7020504@FreeBSD.org>
	<6A7238AED44542A880B082A40304D940@multiplay.co.uk>
	<4E4BA21F.6010805@FreeBSD.org>
	<581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk>
	<4E4BBA7F.30907@FreeBSD.org>
	<88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk>
	<4E4C22D6.6070407@FreeBSD.org>
In-Reply-To: <4E4C22D6.6070407@FreeBSD.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2011 20:09:38 -0000

on 17/08/2011 23:21 Andriy Gapon said the following:
> It seems like everything starts with some kind of a race between terminating
> processes in a jail and termination of the jail itself.  This is where the
> details are very thin so far.  What we see is that a process (http) is in
> exit(2) syscall, in exit1() function actually, and past the place where P_WEXIT
> flag is set and even past the place where p_limit is freed and reset to NULL.
> At that place the thread calls prison_proc_free(), which calls prison_deref().
> Then, we see that in prison_deref() the thread gets a page fault because of what
> seems like a NULL pointer dereference.  That's just the start of the problem and
> its root cause.
>
> Then, trap_pfault() gets invoked and, because addresses close to NULL look like
> userspace addresses, vm_fault/vm_fault_hold gets called, which in its turn goes
> on to call vm_map_growstack.  First thing that vm_map_growstack does is a call
> to lim_cur(), but because p_limit is already NULL, that call results in a NULL
> pointer dereference and a page fault.  Goto the beginning of this paragraph.
>
> So we get this recursion of sorts, which only ends when a stack is exhausted and
> a CPU generates a double-fault.

BTW, does anyone has an idea why the thread in question would "disappear" from
the kgdb's point of view?

(kgdb) p cpuid_to_pcpu[2]->pc_curthread->td_tid
$3 = 102057
(kgdb) tid 102057
invalid tid

info threads also doesn't list the thread.

Is it because the panic happened while the thread was somewhere in exit1()?
is there an easy way to examine its stack in this case?

-- 
Andriy Gapon

From owner-freebsd-stable@FreeBSD.ORG  Thu Aug 18 20:11:46 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 79E591065674;
	Thu, 18 Aug 2011 20:11:46 +0000 (UTC)
	(envelope-from asmrookie@gmail.com)
Received: from mail-yi0-f54.google.com (mail-yi0-f54.google.com
	[209.85.218.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 1A8BA8FC08;
	Thu, 18 Aug 2011 20:11:45 +0000 (UTC)
Received: by yib19 with SMTP id 19so2062868yib.13
	for <multiple recipients>; Thu, 18 Aug 2011 13:11:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=mJIC3dzSZFEfm+d2PPNPw14QLDbIUvhvYDXvpi+2xIs=;
	b=EEnO+Rzf3v4M5H2oK39bX1ICVx+TjHgVcpuYSLAx7JdLCTGDefHPWI0jZsA+SGiDeR
	MVd0NLPHbR9Mxtwh0kMu4xwL7Nc80S8M92vIzSpVhjYHmPsI8j/83BEQxriB/Qlw6Df4
	sAFhZWoqMqR6hQ2w1rPVGiCptnAIxMw0m1YEA=
MIME-Version: 1.0
Received: by 10.236.143.5 with SMTP id k5mr1139332yhj.9.1313698305236; Thu, 18
	Aug 2011 13:11:45 -0700 (PDT)
Sender: asmrookie@gmail.com
Received: by 10.236.108.33 with HTTP; Thu, 18 Aug 2011 13:11:44 -0700 (PDT)
In-Reply-To: <4E4D717F.3090802@FreeBSD.org>
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk>
	<A71C3ACF01EC4D36871E49805C1A5321@multiplay.co.uk>
	<4E4380C0.7070908@FreeBSD.org>
	<EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk>
	<4E43E272.1060204@FreeBSD.org>
	<62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk>
	<4E440865.1040500@FreeBSD.org>
	<6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk>
	<4E441314.6060606@FreeBSD.org>
	<2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk>
	<4E48D967.9060804@FreeBSD.org>
	<9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk>
	<4E490DAF.1080009@FreeBSD.org>
	<796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk>
	<4E491D01.1090902@FreeBSD.org>
	<570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk>
	<4E4AD35C.7020504@FreeBSD.org>
	<6A7238AED44542A880B082A40304D940@multiplay.co.uk>
	<4E4BA21F.6010805@FreeBSD.org>
	<581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk>
	<4E4BBA7F.30907@FreeBSD.org>
	<88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk>
	<4E4C22D6.6070407@FreeBSD.org> <4E4D717F.3090802@FreeBSD.org>
Date: Thu, 18 Aug 2011 22:11:44 +0200
X-Google-Sender-Auth: i75Ofelh7IObcWFwDsnjQQzRudA
Message-ID: <CAJ-FndCaTSoAU2Ycj=WEppzc1RmbQ6ugqiuuyCqUpYZuGXKt_g@mail.gmail.com>
From: Attilio Rao <attilio@freebsd.org>
To: Andriy Gapon <avg@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-hackers@freebsd.org, freebsd-stable@freebsd.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2011 20:11:46 -0000

2011/8/18 Andriy Gapon <avg@freebsd.org>:
> on 17/08/2011 23:21 Andriy Gapon said the following:
>>
>> It seems like everything starts with some kind of a race between
>> terminating
>> processes in a jail and termination of the jail itself. =C2=A0This is wh=
ere the
>> details are very thin so far. =C2=A0What we see is that a process (http)=
 is in
>> exit(2) syscall, in exit1() function actually, and past the place where
>> P_WEXIT
>> flag is set and even past the place where p_limit is freed and reset to
>> NULL.
>> At that place the thread calls prison_proc_free(), which calls
>> prison_deref().
>> Then, we see that in prison_deref() the thread gets a page fault because
>> of what
>> seems like a NULL pointer dereference. =C2=A0That's just the start of th=
e
>> problem and
>> its root cause.
>>
>> Then, trap_pfault() gets invoked and, because addresses close to NULL lo=
ok
>> like
>> userspace addresses, vm_fault/vm_fault_hold gets called, which in its tu=
rn
>> goes
>> on to call vm_map_growstack. =C2=A0First thing that vm_map_growstack doe=
s is a
>> call
>> to lim_cur(), but because p_limit is already NULL, that call results in =
a
>> NULL
>> pointer dereference and a page fault. =C2=A0Goto the beginning of this
>> paragraph.
>>
>> So we get this recursion of sorts, which only ends when a stack is
>> exhausted and
>> a CPU generates a double-fault.
>
> BTW, does anyone has an idea why the thread in question would "disappear"
> from
> the kgdb's point of view?
>
> (kgdb) p cpuid_to_pcpu[2]->pc_curthread->td_tid
> $3 =3D 102057
> (kgdb) tid 102057
> invalid tid
>
> info threads also doesn't list the thread.
>
> Is it because the panic happened while the thread was somewhere in exit1(=
)?
> is there an easy way to examine its stack in this case?

Yes it is likely it.

'tid' command should lookup the tid_to_thread() table (or similar
name) which returns NULL, which means the thread has past beyond the
point it was in the lookup table.

Attilio


--=20
Peace can only be achieved by understanding - A. Einstein

From owner-freebsd-stable@FreeBSD.ORG  Thu Aug 18 21:27:27 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E07E21065674;
	Thu, 18 Aug 2011 21:27:27 +0000 (UTC)
	(envelope-from hselasky@c2i.net)
Received: from swip.net (mailfe01.c2i.net [212.247.154.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 31E8B8FC16;
	Thu, 18 Aug 2011 21:27:26 +0000 (UTC)
X-Cloudmark-Score: 0.000000 []
X-Cloudmark-Analysis: v=1.1 cv=yfIOS+81wnQIz0UwZPDdWOvE/jQxEvyI9Z1xC25I9wc=
	c=1 sm=1 a=SvYTsOw2Z4kA:10 a=EPV5yV1zpIAA:10 a=WQU8e4WWZSUA:10
	a=8nJEP1OIZ-IA:10 a=CL8lFSKtTFcA:10
	a=i9M/sDlu2rpZ9XS819oYzg==:17 a=oOb7PSFi1HuzztIfN6YA:9
	a=XpotWNjDgNPmgIl742UA:7 a=wPNLvfGTeEIA:10
	a=i9M/sDlu2rpZ9XS819oYzg==:117
Received: from [188.126.198.129] (account mc467741@c2i.net HELO
	laptop002.hselasky.homeunix.org)
	by mailfe01.swip.net (CommuniGate Pro SMTP 5.2.19)
	with ESMTPA id 168437914; Thu, 18 Aug 2011 23:27:25 +0200
From: Hans Petter Selasky <hselasky@c2i.net>
To: Andriy Gapon <avg@freebsd.org>
Date: Thu, 18 Aug 2011 23:24:58 +0200
User-Agent: KMail/1.13.5 (FreeBSD/8.2-STABLE; KDE/4.4.5; amd64; ; )
References: <DA1FD6FD-2E57-4EC4-899D-2C1CBB769456@averesystems.com>
	<4E4D460A.2080100@FreeBSD.org>
In-Reply-To: <4E4D460A.2080100@FreeBSD.org>
X-Face: *nPdTl_}RuAI6^PVpA02T?$%Xa^>@hE0uyUIoiha$pC:9TVgl.Oq, NwSZ4V"|LR.+tj}g5
	%V,x^qOs~mnU3]Gn; cQLv&.N>TrxmSFf+p6(30a/{)KUU!s}w\IhQBj}[g}bj0I3^glmC(
	:AuzV9:.hESm-x4h240C`9=w
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201108182324.58276.hselasky@c2i.net>
Cc: freebsd-stable@freebsd.org, Eugene Grosbein <egrosbein@rdtc.ru>,
	Andrew Boyer <aboyer@averesystems.com>, Vishal.Shah@netapp.com,
	Jeremiah Lott <jlott@averesystems.com>,
	Steven Hartland <killing@multiplay.co.uk>
Subject: Re: USB/coredump hangs in 8 and 9
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2011 21:27:28 -0000

On Thursday 18 August 2011 19:04:10 Andriy Gapon wrote:
> If you can help Hans to figure out what you is wrong with USB subsystem in
> this respect that would help us all.

Hi,

usb_busdma.c:   /* we use "mtx_owned()" instead of this function */
usb_busdma.c:   owned = mtx_owned(uptag->mtx);
usb_compat_linux.c:     do_unlock = mtx_owned(&Giant) ? 0 : 1;
usb_compat_linux.c:     do_unlock = mtx_owned(&Giant) ? 0 : 1;
usb_compat_linux.c:     do_unlock = mtx_owned(&Giant) ? 0 : 1;
usb_hub.c:      if (mtx_owned(&bus->bus_mtx)) {
usb_transfer.c: if (!mtx_owned(info->xfer_mtx)) {
usb_transfer.c: if (mtx_owned(xfer->xroot->xfer_mtx)) {
usb_transfer.c:         while (mtx_owned(&xroot->udev->bus->bus_mtx)) {
usb_transfer.c:         while (mtx_owned(xroot->xfer_mtx)) {

One fix you will need to do, if mtx_owned is not giving correct value is:

static void
usbd_callback_wrapper(struct usb_xfer_queue *pq)
{
        struct usb_xfer *xfer = pq->curr;
        struct usb_xfer_root *info = xfer->xroot;

        USB_BUS_LOCK_ASSERT(info->bus, MA_OWNED);
        if (!mtx_owned(info->xfer_mtx)) {

The above "if" should be anded with && !paniced && !dumping ... or maybe the 
new not scheduling variable is good for this purpose?

                /*
                 * Cases that end up here:
                 *

#if USB_HAVE_BUSDMA
        if (mtx_owned(xfer->xroot->xfer_mtx)) {
                struct usb_xfer_queue *pq;


This case is more like a BUS-DMA error case, and is not so important to 
execute.

--HPS

From owner-freebsd-stable@FreeBSD.ORG  Thu Aug 18 23:11:09 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7B373106564A
	for <freebsd-stable@freebsd.org>; Thu, 18 Aug 2011 23:11:09 +0000 (UTC)
	(envelope-from kob6558@gmail.com)
Received: from mail-gx0-f182.google.com (mail-gx0-f182.google.com
	[209.85.161.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 3E5E18FC0C
	for <freebsd-stable@freebsd.org>; Thu, 18 Aug 2011 23:11:08 +0000 (UTC)
Received: by gxk28 with SMTP id 28so2174010gxk.13
	for <multiple recipients>; Thu, 18 Aug 2011 16:11:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	bh=ngyvh7D8CEvIKeuadPquGSvXdSfj0bRwNPIWTyTDoxI=;
	b=bppoBXYMtDvN0/hUUEhuIVHa0r30vdXzJScqDmBB4H8BIzDgpGnZaGSilpCCW3x6xx
	Jz1LwMx+lkwaZgiuZvYCbM52y+ud1KWFeZJyg7A8OWQjdn5GQfdxqjxyjmF5LCNSD5oV
	UUWSh8cwrdm9RxWmWZZeOZCH5EgGLpKZVQJbE=
MIME-Version: 1.0
Received: by 10.150.74.10 with SMTP id w10mr1398144yba.224.1313709068244; Thu,
	18 Aug 2011 16:11:08 -0700 (PDT)
Received: by 10.151.98.3 with HTTP; Thu, 18 Aug 2011 16:11:08 -0700 (PDT)
In-Reply-To: <20110818101034.GA1958@emphyrio.blackend.org>
References: <4E4CD19E.5070108@rawbw.com>
	<20110818101034.GA1958@emphyrio.blackend.org>
Date: Thu, 18 Aug 2011 16:11:08 -0700
Message-ID: <CAN6yY1sAZvQ_p9bmco-kSSwKc2ka_b7+XeK+5ufhTh0fkQdHXQ@mail.gmail.com>
From: Kevin Oberman <kob6558@gmail.com>
To: Marc Fonvieille <blackend@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: Yuri <yuri@rawbw.com>, freebsd-stable@freebsd.org
Subject: Re: WD Advanced Format: do I need to do something special?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2011 23:11:09 -0000

On Thu, Aug 18, 2011 at 3:10 AM, Marc Fonvieille <blackend@freebsd.org> wro=
te:
> On Thu, Aug 18, 2011 at 01:47:26AM -0700, Yuri wrote:
>> WD has sectors of the size 4kB in their latest hard drives, which is
>> different from the traditional 512B.
>> http://www.wdc.com/advformat
>> http://wdc.custhelp.com/app/answers/detail/a_id/5655
>>
>> These articles assert that something special should be done in OS to
>> enable high performance of such drives. For ex. WD recommends to install
>> some latest drivers of particular version.
>> But what about FreeBSD? Should it be configured in some special way too
>> for these drive to perform well?
>> Is it aware of 4kB sector size?
>>
>
> I own that (I'm running 8-STABLE):
>
> ada0 at ahcich2 bus 0 scbus2 target 0 lun 0
> ada0: <WDC WD10EARS-00Y5B1 80.00A80> ATA-8 SATA 2.x device
> ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> ada0: Command Queueing enabled
> ada0: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
>
> which has 4kB sectors but says "512 byte sectors" :)
>
> I use the whole disk for the FreeBSD slice, I aligned all partitions on
> a multiple of 8 sectors (512*8=3D4096).
>
> By default fdisk(8) uses a 63 sectors default offset:
>
> ******* Working on device /dev/ada0 *******
> parameters extracted from in-core disklabel are:
> cylinders=3D1938021 heads=3D16 sectors/track=3D63 (1008 blks/cyl)
>
> Figures below won't work with BIOS for partitions not in cyl 1
> parameters to be used for BIOS calculations are:
> cylinders=3D1938021 heads=3D16 sectors/track=3D63 (1008 blks/cyl)
>
> Media sector size is 512
> Warning: BIOS sector numbering starts with sector 1
> Information from DOS bootblock is:
> The data for partition 1 is:
> sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
> =A0 =A0start 63, size 1953525105 (953869 Meg), flag 80 (active)
> =A0 =A0 =A0 =A0beg: cyl 0/ head 1/ sector 1;
> =A0 =A0 =A0 =A0end: cyl 1023/ head 15/ sector 63
> The data for partition 2 is:
> <UNUSED>
> The data for partition 3 is:
> <UNUSED>
> The data for partition 4 is:
> <UNUSED>
>
>
> Look at "start 63" statement. =A0Instead of fixing fdisk(8) behavior, I j=
ust
> correctly edited my bsdlabel(8) table:
>
> # /dev/ada0s1:
> 8 partitions:
> # =A0 =A0 =A0 =A0 =A0size =A0 =A0 offset =A0 =A0fstype =A0 [fsize bsize b=
ps/cpg]
> =A0a: =A0 =A04194304 =A0 =A0 =A0 =A0 17 =A0 =A04.2BSD =A0 =A0 =A0 =A00 =
=A0 =A0 0 =A0 =A0 0
> =A0b: =A0 =A08388608 =A0 =A04194321 =A0 =A0 =A0swap
> =A0c: 1953525105 =A0 =A0 =A0 =A0 =A00 =A0 =A0unused =A0 =A0 =A0 =A00 =A0 =
=A0 0 =A0 =A0 # "raw" part, don't edit
> =A0d: =A0 16777216 =A0 12582929 =A0 =A04.2BSD =A0 =A0 =A0 =A00 =A0 =A0 0 =
=A0 =A0 0
> =A0e: 1924163584 =A0 29360145 =A0 =A04.2BSD =A0 =A0 =A0 =A00 =A0 =A0 0 =
=A0 =A0 0
>
>
> The important part is the offset 17 to correct the fdisk(8) offset (16+1
> to align the previous 63). =A0The remaining offsets are calculted from th=
e
> size I gave for the partitions (in MB, which can be divided by 8).
> Then I used newfs(8) with the option "-f 4096".
>
>
> There's another painful issue with this disk: the automatic head-parking
> after few seconds. =A0I disabled it (with wdidle3) cause after 2 months o=
f
> use, I was at more than 35000 head-parkings...

I'd strongly suggest avoiding fdisk(8) and using gpart(8) on 8 and
above. It has an
alignment option that makes this all just work and also allows the use of G=
PT
formatting. (Watch out for GPT on any system that needs to run 32-bit Windo=
ws.)

gpart create -s gpt ada1
gpart bootcode -b /boot/pmbr ada1
gpart add -t freebsd-boot -a 4 -s 128 -b 40 ad0
gpart bootcode -p /boot/gptboot -i 1 ad0
gpart add -t freebsd-ufs -a 4 -s 2097152 ada1
gpart add -t freebsd-swap -a 4 -s 8388608 ada1
gpart add -t freebsd-ufs -a 4 -s 10485760 ada1
gpart add -t freebsd-ufs -a 4 -s 1048576 ada1
gpart add -t freebsd-ufs -a 4 ada1

This will give you a disk with a 1G root, 4G swap, 5G var, .5G tmp and
the remainder for usr.. You can adjust these as you feel appropriate.
I would suggest a careful reading of the gpart(8) man page, as well,
just so you understand what is going on. You might find the Wikipedia
entry for "GUID Partition Table" intetresting if you want to go the
GPT route.

You can also use gpart create -s mbr to create a traditional MBR
slice/partition setup, There are several on-line articles detailing
this operation.
--=20
R. Kevin Oberman, Network Engineer - Retired
E-mail: kob6558@gmail.com

From owner-freebsd-stable@FreeBSD.ORG  Fri Aug 19 00:29:01 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9AFFD106564A;
	Fri, 19 Aug 2011 00:29:01 +0000 (UTC) (envelope-from hrs@FreeBSD.org)
Received: from mail.allbsd.org (gatekeeper-int.allbsd.org
	[IPv6:2001:2f0:104:e002::2])
	by mx1.freebsd.org (Postfix) with ESMTP id C36848FC15;
	Fri, 19 Aug 2011 00:29:00 +0000 (UTC)
Received: from alph.allbsd.org (p3028-ipbf608funabasi.chiba.ocn.ne.jp
	[125.175.94.28]) (authenticated bits=128)
	by mail.allbsd.org (8.14.4/8.14.4) with ESMTP id p7J0SOkw048646;
	Fri, 19 Aug 2011 09:28:34 +0900 (JST) (envelope-from hrs@FreeBSD.org)
Received: from localhost (localhost [IPv6:::1]) (authenticated bits=0)
	by alph.allbsd.org (8.14.4/8.14.4) with ESMTP id p7J0SK8W078115;
	Fri, 19 Aug 2011 09:28:21 +0900 (JST) (envelope-from hrs@FreeBSD.org)
Date: Fri, 19 Aug 2011 09:28:11 +0900 (JST)
Message-Id: <20110819.092811.1087267565626420460.hrs@allbsd.org>
To: attilio@FreeBSD.org
From: Hiroki Sato <hrs@FreeBSD.org>
In-Reply-To: <20110818025550.GA1971@libertas.local.camdensoftware.com>
References: <20110818.091600.831954331552558249.hrs@allbsd.org>
	<CAJ-FndCL70m41dQ9FPmzUg0V8a9JacvLOnjmMQL=3PfN7NmPfQ@mail.gmail.com>
	<20110818025550.GA1971@libertas.local.camdensoftware.com>
X-PGPkey-fingerprint: BDB3 443F A5DD B3D0 A530  FFD7 4F2C D3D8 2793 CF2D
X-Mailer: Mew version 6.3 on Emacs 23.1 / Mule 6.0 (HANACHIRUSATO)
Mime-Version: 1.0
Content-Type: Multipart/Signed; protocol="application/pgp-signature";
	micalg=pgp-sha1;
	boundary="--Security_Multipart(Fri_Aug_19_09_28_11_2011_956)--"
Content-Transfer-Encoding: 7bit
X-Virus-Scanned: clamav-milter 0.97 at gatekeeper.allbsd.org
X-Virus-Status: Clean
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.3
	(mail.allbsd.org [133.31.130.32]);
	Fri, 19 Aug 2011 09:28:40 +0900 (JST)
X-Spam-Status: No, score=-102.6 required=13.0 tests=BAYES_00,
	CONTENT_TYPE_PRESENT,DIRECTOCNDYN,RCVD_IN_RP_RNBL,SPF_SOFTFAIL,
	USER_IN_WHITELIST autolearn=no version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	gatekeeper.allbsd.org
Cc: freebsd-stable@FreeBSD.org, sterling@camdensoftware.com, avg@FreeBSD.org,
	Nick Esborn <nick@desert.net>, kostikbel@gmail.com, mdtansca@FreeBSD.org
Subject: Re: panic: spin lock held too long (RELENG_8 from today)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Aug 2011 00:29:01 -0000

----Security_Multipart(Fri_Aug_19_09_28_11_2011_956)--
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Chip Camden <sterling@camdensoftware.com> wrote
  in <20110818025550.GA1971@libertas.local.camdensoftware.com>:

st> Quoth Attilio Rao on Thursday, 18 August 2011:
st> > In callout_cpu_switch() if a low priority thread is migrating the
st> > callout and gets preempted after the outcoming cpu queue lock is left
st> > (and scheduled much later) we get this problem.
st> >
st> > In order to fix this bug it could be enough to use a critical section,
st> > but I think this should be really interrupt safe, thus I'd wrap them
st> > up with spinlock_enter()/spinlock_exit(). Fortunately
st> > callout_cpu_switch() should be called rarely and also we already do
st> > expensive locking operations in callout, thus we should not have
st> > problem performance-wise.
st> >
st> > Can the guys I also CC'ed here try the following patch, with all the
st> > initial kernel options that were leading you to the deadlock? (thus
st> > revert any debugging patch/option you added for the moment):
st> > http://www.freebsd.org/~attilio/callout-fixup.diff
st> >
st> > Please note that this patch is for STABLE_8, if you can confirm the
st> > good result I'll commit to -CURRENT and then backmarge as soon as
st> > possible.
st> >
st> > Thanks,
st> > Attilio
st> >
st>
st> Thanks, Attilio.  I've applied the patch and removed the extra debug
st> options I had added (though keeping debug symbols).  I'll let you know if
st> I experience any more panics.

 No panic for 20 hours at this moment, FYI.  For my NFS server, I
 think another 24 hours would be sufficient to confirm the stability.
 I will see how it works...

-- Hiroki

----Security_Multipart(Fri_Aug_19_09_28_11_2011_956)--
Content-Type: application/pgp-signature
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iEYEABECAAYFAk5NrhsACgkQTyzT2CeTzy1O/ACeJPyJpjyI8X68PscHDXRU7iXu
8M0An23TY3RL9ZPaL1R+FCLHmhe9Mqi7
=FHX7
-----END PGP SIGNATURE-----

----Security_Multipart(Fri_Aug_19_09_28_11_2011_956)----

From owner-freebsd-stable@FreeBSD.ORG  Fri Aug 19 00:38:07 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3DC74106566B
	for <freebsd-stable@FreeBSD.org>; Fri, 19 Aug 2011 00:38:07 +0000 (UTC)
	(envelope-from sterling@camdensoftware.com)
Received: from wh1.interactivevillages.com (ca.2e.7bae.static.theplanet.com
	[174.123.46.202])
	by mx1.freebsd.org (Postfix) with ESMTP id F39218FC0C
	for <freebsd-stable@FreeBSD.org>; Fri, 19 Aug 2011 00:38:06 +0000 (UTC)
Received: from 184-78-197-203.war.clearwire-wmx.net ([184.78.197.203]
	helo=_HOSTNAME_)
	by wh1.interactivevillages.com with esmtpsa (TLSv1:AES256-SHA:256)
	(Exim 4.69) (envelope-from <sterling@camdensoftware.com>)
	id 1QuD61-0004Fs-P0; Thu, 18 Aug 2011 17:37:38 -0700
Received: by _HOSTNAME_ (sSMTP sendmail emulation);
	Thu, 18 Aug 2011 17:38:00 -0700
Date: Thu, 18 Aug 2011 17:37:59 -0700
From: Chip Camden <sterling@camdensoftware.com>
To: Hiroki Sato <hrs@FreeBSD.org>
Message-ID: <20110819003759.GC54831@libertas.local.camdensoftware.com>
Mail-Followup-To: Hiroki Sato <hrs@FreeBSD.org>, attilio@FreeBSD.org,
	kostikbel@gmail.com, freebsd-stable@FreeBSD.org, avg@FreeBSD.org,
	mdtansca@FreeBSD.org, Nick Esborn <nick@desert.net>
References: <20110818.091600.831954331552558249.hrs@allbsd.org>
	<CAJ-FndCL70m41dQ9FPmzUg0V8a9JacvLOnjmMQL=3PfN7NmPfQ@mail.gmail.com>
	<20110818025550.GA1971@libertas.local.camdensoftware.com>
	<20110819.092811.1087267565626420460.hrs@allbsd.org>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="f+W+jCU1fRNres8c"
Content-Disposition: inline
In-Reply-To: <20110819.092811.1087267565626420460.hrs@allbsd.org>
User-Agent: Mutt/1.4.2.3i
Company: Camden Software Consulting
URL: http://camdensoftware.com
X-PGP-Key: http://pgp.mit.edu:11371/pks/lookup?search=0xD6DBAF91
X-AntiAbuse: This header was added to track abuse,
	please include it with any abuse report
X-AntiAbuse: Primary Hostname - wh1.interactivevillages.com
X-AntiAbuse: Original Domain - freebsd.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - camdensoftware.com
X-Source: 
X-Source-Args: 
X-Source-Dir: 
Cc: freebsd-stable@FreeBSD.org, avg@FreeBSD.org, attilio@FreeBSD.org,
	Nick Esborn <nick@desert.net>, kostikbel@gmail.com, mdtansca@FreeBSD.org
Subject: Re: panic: spin lock held too long (RELENG_8 from today)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Aug 2011 00:38:07 -0000


--f+W+jCU1fRNres8c
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Quoth Hiroki Sato on Friday, 19 August 2011:
> Chip Camden <sterling@camdensoftware.com> wrote
>   in <20110818025550.GA1971@libertas.local.camdensoftware.com>:
>=20
> st> Quoth Attilio Rao on Thursday, 18 August 2011:
> st> > In callout_cpu_switch() if a low priority thread is migrating the
> st> > callout and gets preempted after the outcoming cpu queue lock is le=
ft
> st> > (and scheduled much later) we get this problem.
> st> >
> st> > In order to fix this bug it could be enough to use a critical secti=
on,
> st> > but I think this should be really interrupt safe, thus I'd wrap them
> st> > up with spinlock_enter()/spinlock_exit(). Fortunately
> st> > callout_cpu_switch() should be called rarely and also we already do
> st> > expensive locking operations in callout, thus we should not have
> st> > problem performance-wise.
> st> >
> st> > Can the guys I also CC'ed here try the following patch, with all the
> st> > initial kernel options that were leading you to the deadlock? (thus
> st> > revert any debugging patch/option you added for the moment):
> st> > http://www.freebsd.org/~attilio/callout-fixup.diff
> st> >
> st> > Please note that this patch is for STABLE_8, if you can confirm the
> st> > good result I'll commit to -CURRENT and then backmarge as soon as
> st> > possible.
> st> >
> st> > Thanks,
> st> > Attilio
> st> >
> st>
> st> Thanks, Attilio.  I've applied the patch and removed the extra debug
> st> options I had added (though keeping debug symbols).  I'll let you kno=
w if
> st> I experience any more panics.
>=20
>  No panic for 20 hours at this moment, FYI.  For my NFS server, I
>  think another 24 hours would be sufficient to confirm the stability.
>  I will see how it works...
>=20
> -- Hiroki

Likewise:

$ uptime
 5:37PM  up 21:45, 5 users, load averages: 0.68, 0.45, 0.63

So far, so good (knocks on head).

--=20
=2EO. | Sterling (Chip) Camden      | http://camdensoftware.com
=2E.O | sterling@camdensoftware.com | http://chipsquips.com
OOO | 2048R/D6DBAF91              | http://chipstips.com

--f+W+jCU1fRNres8c
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iQEcBAEBAgAGBQJOTbBnAAoJEIpckszW26+RT+AIAIRMa07BhoVaRBq3lz1dVcsq
zh+G7945FXqbD+0hhv+/4T75mbtzSG4l72dhlwGWNUZg70hZKqEUfNzQs3meSquR
wmVCi3NH0cu5jIAZqvDWCvU8BigBn2GRjN/sXl5GCsGrZFi50kZXWKmgzTyDVrIM
iwva8366ceK36QfodupVgxSs7ifDt8Jl3tLSdXHdacf17BceW2mETwOVvmd13LXQ
BVOxFE7Qmk7xYXqrt3dj+E/gtO21R31EL3XJYx7prev534eNF99pn1GZCaj2By1Q
B1iG4SfXMgYtzHpqSGniENX8RAhaCJmpFZDrIebnawel2rPMPFHuzJLc5hKp6eE=
=lxLO
-----END PGP SIGNATURE-----

--f+W+jCU1fRNres8c--

From owner-freebsd-stable@FreeBSD.ORG  Fri Aug 19 01:28:05 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 788BB106564A
	for <freebsd-stable@freebsd.org>; Fri, 19 Aug 2011 01:28:05 +0000 (UTC)
	(envelope-from yuri@rawbw.com)
Received: from shell0.rawbw.com (shell0.rawbw.com [198.144.192.45])
	by mx1.freebsd.org (Postfix) with ESMTP id 676968FC1C
	for <freebsd-stable@freebsd.org>; Fri, 19 Aug 2011 01:28:05 +0000 (UTC)
Received: from eagle.yuri.org (stunnel@localhost [127.0.0.1])
	(authenticated bits=0)
	by shell0.rawbw.com (8.14.4/8.14.4) with ESMTP id p7J1S46E028644
	for <freebsd-stable@freebsd.org>; Thu, 18 Aug 2011 18:28:04 -0700 (PDT)
	(envelope-from yuri@rawbw.com)
Message-ID: <4E4DBC24.1070007@rawbw.com>
Date: Thu, 18 Aug 2011 18:28:04 -0700
From: Yuri <yuri@rawbw.com>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110716 Thunderbird/5.0
MIME-Version: 1.0
To: freebsd-stable@freebsd.org
References: <4E4CD19E.5070108@rawbw.com>
In-Reply-To: <4E4CD19E.5070108@rawbw.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: WD Advanced Format: do I need to do something special?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Aug 2011 01:28:05 -0000

Following instructions here 
(http://ivoras.net/blog/tree/2011-01-01.freebsd-on-4k-sector-drives.html) I 
destroyed my previous ZFS pool with 512 byte sectors and did this:
gnop create -S 4096 /dev/ad4
zpool create mypool /dev/ad4.nop
zpol create mypool/mydir
zpool export mypool
gnop destroy /dev/ad4.nop
zpool import mypool

Now this command 'zdb -C data | grep ashift' shows ashift=12 (4096 byte 
sectors).

However, when I begin to copy a lot of files files into /mypool/mydir 
online radio player gets severely affected. Sound get interrupted all 
the time. Itrettuptions stop after 1-2 secs after I stop copying.
This didn't happen with sector size 512 bytes.

What is wrong?

Yuri

From owner-freebsd-stable@FreeBSD.ORG  Fri Aug 19 02:37:39 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx2.freebsd.org (mx2.freebsd.org [IPv6:2001:4f8:fff6::35])
	by hub.freebsd.org (Postfix) with ESMTP id 384341065670
	for <freebsd-stable@FreeBSD.org>; Fri, 19 Aug 2011 02:37:39 +0000 (UTC)
	(envelope-from dougb@FreeBSD.org)
Received: from 172-17-198-245.globalsuite.net (hub.freebsd.org
	[IPv6:2001:4f8:fff6::36])
	by mx2.freebsd.org (Postfix) with ESMTP id 7FE2815169F
	for <freebsd-stable@FreeBSD.org>; Fri, 19 Aug 2011 02:36:51 +0000 (UTC)
Date: Thu, 18 Aug 2011 19:36:50 -0700 (PDT)
From: Doug Barton <dougb@FreeBSD.org>
To: freebsd-stable@FreeBSD.org
Message-ID: <alpine.BSF.2.00.1108181931070.77926@172-17-198-245.tybonyfhvgr.arg>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
X-message-flag: Outlook -- Not just for spreading viruses anymore!
OpenPGP: id=1A1ABC84
Organization: http://SupersetSolutions.com/
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII
Cc: 
Subject: crash on 8.2-RELEASE amd64, high-traffic squid server
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Aug 2011 02:37:39 -0000

Howdy,

I have some high-traffic squid servers, most of which are running a 
flavor of RELENG_7 very successfully, but one that I've been evaluating 
8.x on has had a lot of problems. Most recently we had the crash below 
twice in the last 2 weeks. Same exact backtrace. Any suggestions on 
where to look would be appreciated.


Thanks,

Doug

#0  doadump () at pcpu.h:224
224	pcpu.h: No such file or directory.
 	in pcpu.h
(kgdb) #0  doadump () at pcpu.h:224
#1  0xffffffff803ec4be in boot (howto=260)
     at /usr/src/sys/kern/kern_shutdown.c:419
#2  0xffffffff803ec8f1 in panic (fmt=Variable "fmt" is not available.
)
     at /usr/src/sys/kern/kern_shutdown.c:592
#3  0xffffffff8069a4d0 in trap_fatal (frame=0x1c, eva=Variable "eva" is not available.
)
     at /usr/src/sys/amd64/amd64/trap.c:783
#4  0xffffffff8069aab9 in trap (frame=0xffffff800012f650)
     at /usr/src/sys/amd64/amd64/trap.c:592
#5  0xffffffff80682e84 in calltrap ()
     at /usr/src/sys/amd64/amd64/exception.S:224
#6  0xffffffff80698896 in bcopy ()
     at /usr/src/sys/amd64/amd64/support.S:124
#7  0xffffffff8044df61 in sbcompress (sb=0xffffff01d98945e0,
     m=0xffffff010b815300, n=0xffffff006baa3700)
     at /usr/src/sys/kern/uipc_sockbuf.c:779
#8  0xffffffff8044e1e6 in sbappendstream_locked (sb=0xffffff01d98945e0,
     m=0xffffff010b815300) at /usr/src/sys/kern/uipc_sockbuf.c:534
#9  0xffffffff80527530 in tcp_do_segment (m=0xffffff010b815300, th=Variable "th" is not available.
)
     at /usr/src/sys/netinet/tcp_input.c:2588
#10 0xffffffff80528b4b in tcp_input (m=0xffffff010b815300, off0=Variable "off0" is not available.
)
     at /usr/src/sys/netinet/tcp_input.c:1029
#11 0xffffffff804c3b2c in ip_input (m=0xffffff010b815300)
     at /usr/src/sys/netinet/ip_input.c:787
#12 0xffffffff804a631e in netisr_dispatch_src (proto=1, source=Variable "source" is not available.
)
     at /usr/src/sys/net/netisr.c:917
#13 0xffffffff8049d73d in ether_demux (ifp=0xffffff0002d30000,
     m=0xffffff010b815300) at /usr/src/sys/net/if_ethersubr.c:894
#14 0xffffffff8049db2d in ether_input (ifp=0xffffff0002d30000,
     m=0xffffff010b815300) at /usr/src/sys/net/if_ethersubr.c:753
#15 0xffffffff8027c18a in em_rxeof (rxr=0xffffff0002d7c600, count=98,
     done=0x0) at /usr/src/sys/dev/e1000/if_em.c:4293
#16 0xffffffff8027c5a8 in em_handle_que (context=Variable "context" is not available.
)
     at /usr/src/sys/dev/e1000/if_em.c:1482
#17 0xffffffff80429ab5 in taskqueue_run_locked (queue=0xffffff0002d8d800)
     at /usr/src/sys/kern/subr_taskqueue.c:250
#18 0xffffffff80429c4e in taskqueue_thread_loop (arg=Variable "arg" is not available.
)
     at /usr/src/sys/kern/subr_taskqueue.c:387
#19 0xffffffff803c30f8 in fork_exit (
     callout=0xffffffff80429c00 <taskqueue_thread_loop>,
     arg=0xffffff80005a8748, frame=0xffffff800012fc40)
     at /usr/src/sys/kern/kern_fork.c:845
#20 0xffffffff8068334e in fork_trampoline ()
     at /usr/src/sys/amd64/amd64/exception.S:565
#21 0x0000000000000000 in ?? ()
#22 0x0000000000000000 in ?? ()
#23 0x0000000000000000 in ?? ()
#24 0x0000000000000000 in ?? ()
#25 0x0000000000000000 in ?? ()
#26 0x0000000000000000 in ?? ()
#27 0x0000000000000000 in ?? ()
#28 0x0000000000000000 in ?? ()
#29 0x0000000000000000 in ?? ()
#30 0x0000000000000000 in ?? ()
#31 0x0000000000000000 in ?? ()
#32 0x0000000000000000 in ?? ()
#33 0x0000000000000000 in ?? ()
#34 0x0000000000000000 in ?? ()
#35 0x0000000000000000 in ?? ()
#36 0x0000000000000000 in ?? ()
#37 0x0000000000000000 in ?? ()
#38 0x0000000000000000 in ?? ()
#39 0x0000000000000000 in ?? ()
#40 0x0000000000000000 in ?? ()
#41 0x0000000000000000 in ?? ()
#42 0x0000000000000000 in ?? ()
#43 0x0000000000000000 in ?? ()
#44 0x0000000000000000 in ?? ()
#45 0xffffffff8095ac00 in affinity ()
#46 0x0000000000000000 in ?? ()
#47 0x0000000000000000 in ?? ()
#48 0xffffff0002d2d8c0 in ?? ()
#49 0xffffff800012f320 in ?? ()
#50 0xffffff800012f2c8 in ?? ()
#51 0xffffff0002c59000 in ?? ()
#52 0xffffffff80411db9 in sched_switch (td=0xffffffff80429c00,
     newtd=0xffffff80005a8748, flags=Variable "flags" is not available.
)
     at /usr/src/sys/kern/sched_ule.c:1852
Previous frame inner to this frame (corrupt stack?)
(kgdb)


-- 

 	Nothin' ever doesn't change, but nothin' changes much.
 			-- OK Go

 	Breadth of IT experience, and depth of knowledge in the DNS.
 	Yours for the right price.  :)  http://SupersetSolutions.com/


From owner-freebsd-stable@FreeBSD.ORG  Fri Aug 19 03:04:08 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4241C106564A
	for <freebsd-stable@freebsd.org>; Fri, 19 Aug 2011 03:04:08 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta13.emeryville.ca.mail.comcast.net
	(qmta13.emeryville.ca.mail.comcast.net [76.96.27.243])
	by mx1.freebsd.org (Postfix) with ESMTP id 289338FC0C
	for <freebsd-stable@freebsd.org>; Fri, 19 Aug 2011 03:04:07 +0000 (UTC)
Received: from omta18.emeryville.ca.mail.comcast.net ([76.96.30.74])
	by qmta13.emeryville.ca.mail.comcast.net with comcast
	id N2zm1h0021bwxycAD343qu; Fri, 19 Aug 2011 03:04:03 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta18.emeryville.ca.mail.comcast.net with comcast
	id N33d1h00m1t3BNj8e33d5S; Fri, 19 Aug 2011 03:03:38 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 2E624102C1A; Thu, 18 Aug 2011 20:04:05 -0700 (PDT)
Date: Thu, 18 Aug 2011 20:04:05 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Doug Barton <dougb@FreeBSD.org>
Message-ID: <20110819030405.GA83032@icarus.home.lan>
References: <alpine.BSF.2.00.1108181931070.77926@172-17-198-245.tybonyfhvgr.arg>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <alpine.BSF.2.00.1108181931070.77926@172-17-198-245.tybonyfhvgr.arg>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-stable@FreeBSD.org, "Vogel, Jack" <jack.vogel@intel.com>
Subject: Re: crash on 8.2-RELEASE amd64, high-traffic squid server
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Aug 2011 03:04:08 -0000

On Thu, Aug 18, 2011 at 07:36:50PM -0700, Doug Barton wrote:
> Howdy,
> 
> I have some high-traffic squid servers, most of which are running a
> flavor of RELENG_7 very successfully, but one that I've been
> evaluating 8.x on has had a lot of problems. Most recently we had
> the crash below twice in the last 2 weeks. Same exact backtrace. Any
> suggestions on where to look would be appreciated.
> 
> 
> Thanks,
> 
> Doug
> 
> #0  doadump () at pcpu.h:224
> 224	pcpu.h: No such file or directory.
> 	in pcpu.h
> (kgdb) #0  doadump () at pcpu.h:224
> #1  0xffffffff803ec4be in boot (howto=260)
>     at /usr/src/sys/kern/kern_shutdown.c:419
> #2  0xffffffff803ec8f1 in panic (fmt=Variable "fmt" is not available.
> )
>     at /usr/src/sys/kern/kern_shutdown.c:592
> #3  0xffffffff8069a4d0 in trap_fatal (frame=0x1c, eva=Variable "eva" is not available.
> )
>     at /usr/src/sys/amd64/amd64/trap.c:783
> #4  0xffffffff8069aab9 in trap (frame=0xffffff800012f650)
>     at /usr/src/sys/amd64/amd64/trap.c:592
> #5  0xffffffff80682e84 in calltrap ()
>     at /usr/src/sys/amd64/amd64/exception.S:224
> #6  0xffffffff80698896 in bcopy ()
>     at /usr/src/sys/amd64/amd64/support.S:124
> #7  0xffffffff8044df61 in sbcompress (sb=0xffffff01d98945e0,
>     m=0xffffff010b815300, n=0xffffff006baa3700)
>     at /usr/src/sys/kern/uipc_sockbuf.c:779
> #8  0xffffffff8044e1e6 in sbappendstream_locked (sb=0xffffff01d98945e0,
>     m=0xffffff010b815300) at /usr/src/sys/kern/uipc_sockbuf.c:534
> #9  0xffffffff80527530 in tcp_do_segment (m=0xffffff010b815300, th=Variable "th" is not available.
> )
>     at /usr/src/sys/netinet/tcp_input.c:2588
> #10 0xffffffff80528b4b in tcp_input (m=0xffffff010b815300, off0=Variable "off0" is not available.
> )
>     at /usr/src/sys/netinet/tcp_input.c:1029
> #11 0xffffffff804c3b2c in ip_input (m=0xffffff010b815300)
>     at /usr/src/sys/netinet/ip_input.c:787
> #12 0xffffffff804a631e in netisr_dispatch_src (proto=1, source=Variable "source" is not available.
> )
>     at /usr/src/sys/net/netisr.c:917
> #13 0xffffffff8049d73d in ether_demux (ifp=0xffffff0002d30000,
>     m=0xffffff010b815300) at /usr/src/sys/net/if_ethersubr.c:894
> #14 0xffffffff8049db2d in ether_input (ifp=0xffffff0002d30000,
>     m=0xffffff010b815300) at /usr/src/sys/net/if_ethersubr.c:753
> #15 0xffffffff8027c18a in em_rxeof (rxr=0xffffff0002d7c600, count=98,
>     done=0x0) at /usr/src/sys/dev/e1000/if_em.c:4293
> #16 0xffffffff8027c5a8 in em_handle_que (context=Variable "context" is not available.
> )
>     at /usr/src/sys/dev/e1000/if_em.c:1482
> #17 0xffffffff80429ab5 in taskqueue_run_locked (queue=0xffffff0002d8d800)
>     at /usr/src/sys/kern/subr_taskqueue.c:250
> #18 0xffffffff80429c4e in taskqueue_thread_loop (arg=Variable "arg" is not available.
> )
>     at /usr/src/sys/kern/subr_taskqueue.c:387
> #19 0xffffffff803c30f8 in fork_exit (
>     callout=0xffffffff80429c00 <taskqueue_thread_loop>,
>     arg=0xffffff80005a8748, frame=0xffffff800012fc40)
>     at /usr/src/sys/kern/kern_fork.c:845
> #20 0xffffffff8068334e in fork_trampoline ()
>     at /usr/src/sys/amd64/amd64/exception.S:565
> #21 0x0000000000000000 in ?? ()
> #22 0x0000000000000000 in ?? ()
> #23 0x0000000000000000 in ?? ()
> #24 0x0000000000000000 in ?? ()
> #25 0x0000000000000000 in ?? ()
> #26 0x0000000000000000 in ?? ()
> #27 0x0000000000000000 in ?? ()
> #28 0x0000000000000000 in ?? ()
> #29 0x0000000000000000 in ?? ()
> #30 0x0000000000000000 in ?? ()
> #31 0x0000000000000000 in ?? ()
> #32 0x0000000000000000 in ?? ()
> #33 0x0000000000000000 in ?? ()
> #34 0x0000000000000000 in ?? ()
> #35 0x0000000000000000 in ?? ()
> #36 0x0000000000000000 in ?? ()
> #37 0x0000000000000000 in ?? ()
> #38 0x0000000000000000 in ?? ()
> #39 0x0000000000000000 in ?? ()
> #40 0x0000000000000000 in ?? ()
> #41 0x0000000000000000 in ?? ()
> #42 0x0000000000000000 in ?? ()
> #43 0x0000000000000000 in ?? ()
> #44 0x0000000000000000 in ?? ()
> #45 0xffffffff8095ac00 in affinity ()
> #46 0x0000000000000000 in ?? ()
> #47 0x0000000000000000 in ?? ()
> #48 0xffffff0002d2d8c0 in ?? ()
> #49 0xffffff800012f320 in ?? ()
> #50 0xffffff800012f2c8 in ?? ()
> #51 0xffffff0002c59000 in ?? ()
> #52 0xffffffff80411db9 in sched_switch (td=0xffffffff80429c00,
>     newtd=0xffffff80005a8748, flags=Variable "flags" is not available.
> )
>     at /usr/src/sys/kern/sched_ule.c:1852
> Previous frame inner to this frame (corrupt stack?)
> (kgdb)

CC'ing Jack Vogel here, since I see em(4) is involved.  Jack will
probably want this data from the system:

# uname -a       (hostname can be XXX'd out)
# dmesg          (particularly the emX entries and driver version)
# pciconf -lvbc  (specifically the emX entries and related data)
# ifconfig -a    (IPs and MACs can be X'd out; mainly interested in
                  options and other pieces)
# netstat -m     (if possible from a system which has been up a while
                  and is a likely crash candidate)
# vmstat -i      (same condition as netstat -m)

There isn't enough data above for me to determine what's going on, but
from the stack trace it looks like sbcompress() may be given some data
which is null or inaccessible.  The source for that hasn't been touched
directly in a while.  The TCP stack/code, however, has been (since
8.2-RELEASE for sure).  I think em(4) has as well.  This may end up
being a case where running RELENG_8 is the fix, but I'd love to be able
to say that for certain.

"bt full" would be helpful but the above indicates the kernel might not
have debugging symbols included in it?  I've seen this kind of output
even on a system with "makeoptions DEBUG=-g" in its kernel config before
though.  Never was sure how to deal with that problem.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-stable@FreeBSD.ORG  Fri Aug 19 05:16:12 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 081BA106566B;
	Fri, 19 Aug 2011 05:16:12 +0000 (UTC) (envelope-from ae@FreeBSD.org)
Received: from mail.kirov.so-ups.ru (ns.kirov.so-ups.ru [178.74.170.1])
	by mx1.freebsd.org (Postfix) with ESMTP id A50768FC12;
	Fri, 19 Aug 2011 05:16:10 +0000 (UTC)
Received: from kas30pipe.localhost (localhost.kirov.so-cdu.ru [127.0.0.1])
	by mail.kirov.so-ups.ru (Postfix) with SMTP id 0D7B0B801B;
	Fri, 19 Aug 2011 09:00:22 +0400 (MSD)
Received: from kirov.so-cdu.ru (unknown [172.21.81.1])
	by mail.kirov.so-ups.ru (Postfix) with ESMTP id 03517B8008;
	Fri, 19 Aug 2011 09:00:22 +0400 (MSD)
Received: by ns.kirov.so-cdu.ru (Postfix, from userid 1010)
	id D3800B8F0A; Fri, 19 Aug 2011 09:00:15 +0400 (MSD)
Received: from [10.118.3.52] (elsukov.kirov.oduur.so [10.118.3.52])
	by ns.kirov.so-cdu.ru (Postfix) with ESMTP id 9F605B8F04;
	Fri, 19 Aug 2011 09:00:15 +0400 (MSD)
Message-ID: <4E4DEDDB.6060201@FreeBSD.org>
Date: Fri, 19 Aug 2011 09:00:11 +0400
From: "Andrey V. Elsukov" <ae@FreeBSD.org>
User-Agent: Mozilla Thunderbird 1.5 (FreeBSD/20051231)
MIME-Version: 1.0
To: Kevin Oberman <kob6558@gmail.com>
References: <4E4CD19E.5070108@rawbw.com>
	<20110818101034.GA1958@emphyrio.blackend.org>
	<CAN6yY1sAZvQ_p9bmco-kSSwKc2ka_b7+XeK+5ufhTh0fkQdHXQ@mail.gmail.com>
In-Reply-To: <CAN6yY1sAZvQ_p9bmco-kSSwKc2ka_b7+XeK+5ufhTh0fkQdHXQ@mail.gmail.com>
X-Enigmail-Version: 1.3
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature";
	boundary="------------enig74A1FA41A34B9FA8A9C4CB9D"
X-SpamTest-Version: SMTP-Filter Version 3.0.0 [0284], KAS30/Release
X-SpamTest-Info: Not protected
Cc: Yuri <yuri@rawbw.com>, freebsd-stable@freebsd.org
Subject: Re: WD Advanced Format: do I need to do something special?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Aug 2011 05:16:12 -0000

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig74A1FA41A34B9FA8A9C4CB9D
Content-Type: text/plain; charset=KOI8-R
Content-Transfer-Encoding: quoted-printable

On 19.08.2011 3:11, Kevin Oberman wrote:
> I'd strongly suggest avoiding fdisk(8) and using gpart(8) on 8 and
> above. It has an
> alignment option that makes this all just work and also allows the use =
of GPT
> formatting. (Watch out for GPT on any system that needs to run 32-bit W=
indows.)
>=20
> gpart create -s gpt ada1
> gpart bootcode -b /boot/pmbr ada1
> gpart add -t freebsd-boot -a 4 -s 128 -b 40 ad0
> gpart bootcode -p /boot/gptboot -i 1 ad0
> gpart add -t freebsd-ufs -a 4 -s 2097152 ada1
> gpart add -t freebsd-swap -a 4 -s 8388608 ada1
> gpart add -t freebsd-ufs -a 4 -s 10485760 ada1
> gpart add -t freebsd-ufs -a 4 -s 1048576 ada1
> gpart add -t freebsd-ufs -a 4 ada1

If you are using gpart with -a option you don't need to specify exactly n=
umbers.
And if you want to align your partition to 4096 bytes you should use "-a =
4k" or "-a 8".
E.g.

# gpart add -t freebsd-boot -a 4k -s 64k ad0

--=20
WBR, Andrey V. Elsukov


--------------enig74A1FA41A34B9FA8A9C4CB9D
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (MingW32)

iQEcBAEBAgAGBQJOTe3fAAoJEAHF6gQQyKF6QPsIALO6JwNVmk8GnWYIsCgshZNB
KEAg/DwXlBpapfGONuIXv+F6Db4ydeKeWvZouINc6W9xx4qgrmUwOFs6oVi6tO0d
bQUg2wB6QFHufCdC5Ndfb9RMYZZLKjAfhCnOYEj/8G1SHPoaPOFUBJ+qd4JBwSvq
3M9nMEsNlOyyLsBvti1sPresvypwv3JQrzvGW7XEPUsbU0+VgUEeoIXLXGfMWDgR
z0V45ErMLzN2oc5Le3l9617m4SM5INUpWEZuOU5iHBAYXoTlglsaGscmQPKd9aTt
GK0cs3nlu5xeH2BvmJtbUcmCL8z4vPy700aAu0EUnMdFvHvqTreh0s/bmrWMSiY=
=cvvN
-----END PGP SIGNATURE-----

--------------enig74A1FA41A34B9FA8A9C4CB9D--

From owner-freebsd-stable@FreeBSD.ORG  Fri Aug 19 12:14:02 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EE108106566B;
	Fri, 19 Aug 2011 12:14:02 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id C011E8FC12;
	Fri, 19 Aug 2011 12:14:02 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id 69FD446B35;
	Fri, 19 Aug 2011 08:14:02 -0400 (EDT)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id E7B738A02F;
	Fri, 19 Aug 2011 08:14:01 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-hackers@freebsd.org
Date: Fri, 19 Aug 2011 08:14:00 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110617; KDE/4.5.5; amd64; ; )
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk>
	<4E4C22D6.6070407@FreeBSD.org> <4E4D717F.3090802@FreeBSD.org>
In-Reply-To: <4E4D717F.3090802@FreeBSD.org>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201108190814.00885.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6
	(bigwig.baldwin.cx); Fri, 19 Aug 2011 08:14:02 -0400 (EDT)
Cc: freebsd-stable@freebsd.org, Andriy Gapon <avg@freebsd.org>
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Aug 2011 12:14:03 -0000

On Thursday, August 18, 2011 4:09:35 pm Andriy Gapon wrote:
> on 17/08/2011 23:21 Andriy Gapon said the following:
> > It seems like everything starts with some kind of a race between terminating
> > processes in a jail and termination of the jail itself.  This is where the
> > details are very thin so far.  What we see is that a process (http) is in
> > exit(2) syscall, in exit1() function actually, and past the place where P_WEXIT
> > flag is set and even past the place where p_limit is freed and reset to NULL.
> > At that place the thread calls prison_proc_free(), which calls prison_deref().
> > Then, we see that in prison_deref() the thread gets a page fault because of what
> > seems like a NULL pointer dereference.  That's just the start of the problem and
> > its root cause.
> >
> > Then, trap_pfault() gets invoked and, because addresses close to NULL look like
> > userspace addresses, vm_fault/vm_fault_hold gets called, which in its turn goes
> > on to call vm_map_growstack.  First thing that vm_map_growstack does is a call
> > to lim_cur(), but because p_limit is already NULL, that call results in a NULL
> > pointer dereference and a page fault.  Goto the beginning of this paragraph.
> >
> > So we get this recursion of sorts, which only ends when a stack is exhausted and
> > a CPU generates a double-fault.
> 
> BTW, does anyone has an idea why the thread in question would "disappear" from
> the kgdb's point of view?
> 
> (kgdb) p cpuid_to_pcpu[2]->pc_curthread->td_tid
> $3 = 102057
> (kgdb) tid 102057
> invalid tid
> 
> info threads also doesn't list the thread.
> 
> Is it because the panic happened while the thread was somewhere in exit1()?

Yes, it is a bug in kgdb that it only walks allproc and not zombproc.  Try this:

Index: kthr.c
===================================================================
--- kthr.c	(revision 224879)
+++ kthr.c	(working copy)
@@ -73,11 +73,52 @@ kgdb_thr_first(void)
 	return (first);
 }
 
+static void
+kgdb_thr_add_procs(uintptr_t paddr)
+{
+	struct proc p;
+	struct thread td;
+	struct kthr *kt;
+	CORE_ADDR addr;
+
+	while (paddr != 0) {
+		if (kvm_read(kvm, paddr, &p, sizeof(p)) != sizeof(p)) {
+			warnx("kvm_read: %s", kvm_geterr(kvm));
+			break;
+		}
+		addr = (uintptr_t)TAILQ_FIRST(&p.p_threads);
+		while (addr != 0) {
+			if (kvm_read(kvm, addr, &td, sizeof(td)) !=
+			    sizeof(td)) {
+				warnx("kvm_read: %s", kvm_geterr(kvm));
+				break;
+			}
+			kt = malloc(sizeof(*kt));
+			kt->next = first;
+			kt->kaddr = addr;
+			if (td.td_tid == dumptid)
+				kt->pcb = dumppcb;
+			else if (td.td_state == TDS_RUNNING && stoppcbs != 0 &&
+			    CPU_ISSET(td.td_oncpu, &stopped_cpus))
+				kt->pcb = (uintptr_t)stoppcbs +
+				    sizeof(struct pcb) * td.td_oncpu;
+			else
+				kt->pcb = (uintptr_t)td.td_pcb;
+			kt->kstack = td.td_kstack;
+			kt->tid = td.td_tid;
+			kt->pid = p.p_pid;
+			kt->paddr = paddr;
+			kt->cpu = td.td_oncpu;
+			first = kt;
+			addr = (uintptr_t)TAILQ_NEXT(&td, td_plist);
+		}
+		paddr = (uintptr_t)LIST_NEXT(&p, p_list);
+	}
+}
+
 struct kthr *
 kgdb_thr_init(void)
 {
-	struct proc p;
-	struct thread td;
 	long cpusetsize;
 	struct kthr *kt;
 	CORE_ADDR addr;
@@ -113,37 +154,11 @@ kgdb_thr_init(void)
 
 	stoppcbs = kgdb_lookup("stoppcbs");
 
-	while (paddr != 0) {
-		if (kvm_read(kvm, paddr, &p, sizeof(p)) != sizeof(p)) {
-			warnx("kvm_read: %s", kvm_geterr(kvm));
-			break;
-		}
-		addr = (uintptr_t)TAILQ_FIRST(&p.p_threads);
-		while (addr != 0) {
-			if (kvm_read(kvm, addr, &td, sizeof(td)) !=
-			    sizeof(td)) {
-				warnx("kvm_read: %s", kvm_geterr(kvm));
-				break;
-			}
-			kt = malloc(sizeof(*kt));
-			kt->next = first;
-			kt->kaddr = addr;
-			if (td.td_tid == dumptid)
-				kt->pcb = dumppcb;
-			else if (td.td_state == TDS_RUNNING && stoppcbs != 0 &&
-			    CPU_ISSET(td.td_oncpu, &stopped_cpus))
-				kt->pcb = (uintptr_t) stoppcbs + sizeof(struct pcb) * td.td_oncpu;
-			else
-				kt->pcb = (uintptr_t)td.td_pcb;
-			kt->kstack = td.td_kstack;
-			kt->tid = td.td_tid;
-			kt->pid = p.p_pid;
-			kt->paddr = paddr;
-			kt->cpu = td.td_oncpu;
-			first = kt;
-			addr = (uintptr_t)TAILQ_NEXT(&td, td_plist);
-		}
-		paddr = (uintptr_t)LIST_NEXT(&p, p_list);
+	kgdb_thr_add_procs(paddr);
+	addr = kgdb_lookup("zombproc");
+	if (addr != 0) {
+		kvm_read(kvm, addr, &paddr, sizeof(paddr));
+		kgdb_thr_add_procs(paddr);
 	}
 	curkthr = kgdb_thr_lookup_tid(dumptid);
 	if (curkthr == NULL)

> is there an easy way to examine its stack in this case?

Hmm, you can use something like this from my kgdb macros.

For amd64:

# Do a backtrace given %rip and %rbp as args
define bt
    set $_rip = $arg0
    set $_rbp = $arg1
    set $i = 0
    while ($_rbp != 0 || $_rip != 0)
	printf "%2d: pc ", $i
	if ($_rip != 0)
		x/1i $_rip
	else
		printf "\n"
	end
	if ($_rbp == 0)
	    set $_rip = 0
	else
	    set $fr = (struct amd64_frame *)$_rbp
	    set $_rbp = $fr->f_frame
	    set $_rip = $fr->f_retaddr
	    set $i = $i + 1
	end
    end
end

document bt
Given values for %rip and %rbp, perform a manual backtrace.
end

define btf
    bt $arg0.tf_rip $arg0.tf_rbp
end

document btf
Do a manual backtrace from a specified trapframe.
end

For i386:

# Do a backtrace given %eip and %ebp as args
define bt
    set $_eip = $arg0
    set $_ebp = $arg1
    set $i = 0
    while ($_ebp != 0 || $_eip != 0)
	printf "%2d: pc ", $i
	if ($_eip != 0)
		x/1i $_eip
	else
		printf "\n"
	end
	if ($_ebp == 0)
	    set $_eip = 0
	else
	    set $fr = (struct i386_frame *)$_ebp
	    set $_ebp = $fr->f_frame
	    set $_eip = $fr->f_retaddr
	    set $i = $i + 1
	end
    end
end

document bt
Given values for %eip and %ebp, perform a manual backtrace.
end

define btf
    bt $arg0.tf_eip $arg0.tf_ebp
end

document btf
Do a manual backtrace from a specified trapframe.
end

-- 
John Baldwin

From owner-freebsd-stable@FreeBSD.ORG  Fri Aug 19 12:55:30 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BD1C5106566B;
	Fri, 19 Aug 2011 12:55:30 +0000 (UTC) (envelope-from mike@sentex.net)
Received: from smarthost1.sentex.ca (smarthost1-6.sentex.ca
	[IPv6:2607:f3e0:0:1::12])
	by mx1.freebsd.org (Postfix) with ESMTP id 7BFBF8FC13;
	Fri, 19 Aug 2011 12:55:30 +0000 (UTC)
Received: from [IPv6:2607:f3e0:0:4:f025:8813:7603:7e4a] (saphire3.sentex.ca
	[IPv6:2607:f3e0:0:4:f025:8813:7603:7e4a])
	by smarthost1.sentex.ca (8.14.4/8.14.4) with ESMTP id p7JCtSbd054974;
	Fri, 19 Aug 2011 08:55:28 -0400 (EDT) (envelope-from mike@sentex.net)
Message-ID: <4E4E5D49.4040502@sentex.net>
Date: Fri, 19 Aug 2011 08:55:37 -0400
From: Mike Tancsa <mike@sentex.net>
Organization: Sentex Communications
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US;
	rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7
MIME-Version: 1.0
To: Hiroki Sato <hrs@FreeBSD.org>, attilio@FreeBSD.org, kostikbel@gmail.com,
	freebsd-stable@FreeBSD.org, avg@FreeBSD.org, Nick Esborn <nick@desert.net>
References: <20110818.091600.831954331552558249.hrs@allbsd.org>	<CAJ-FndCL70m41dQ9FPmzUg0V8a9JacvLOnjmMQL=3PfN7NmPfQ@mail.gmail.com>	<20110818025550.GA1971@libertas.local.camdensoftware.com>	<20110819.092811.1087267565626420460.hrs@allbsd.org>
	<20110819003759.GC54831@libertas.local.camdensoftware.com>
In-Reply-To: <20110819003759.GC54831@libertas.local.camdensoftware.com>
X-Enigmail-Version: 1.1.1
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Scanned-By: MIMEDefang 2.71 on IPv6:2607:f3e0:0:1::12
Cc: 
Subject: Re: panic: spin lock held too long (RELENG_8 from today)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Aug 2011 12:55:30 -0000

On 8/18/2011 8:37 PM, Chip Camden wrote:

>> st> Thanks, Attilio.  I've applied the patch and removed the extra debug
>> st> options I had added (though keeping debug symbols).  I'll let you know if
>> st> I experience any more panics.
>>
>>  No panic for 20 hours at this moment, FYI.  For my NFS server, I
>>  think another 24 hours would be sufficient to confirm the stability.
>>  I will see how it works...
>>
>> -- Hiroki
> 
> Likewise:
> 
> $ uptime
>  5:37PM  up 21:45, 5 users, load averages: 0.68, 0.45, 0.63
> 
> So far, so good (knocks on head).
> 


0(ns4)% uptime
 8:55AM  up 22:39, 3 users, load averages: 0.01, 0.00, 0.00
0(ns4)%


So far so good for me too

	---Mike

-- 
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mike@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/

From owner-freebsd-stable@FreeBSD.ORG  Fri Aug 19 15:06:17 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BEBE6106566B
	for <freebsd-stable@FreeBSD.org>; Fri, 19 Aug 2011 15:06:17 +0000 (UTC)
	(envelope-from sterling@camdensoftware.com)
Received: from wh1.interactivevillages.com (ca.2e.7bae.static.theplanet.com
	[174.123.46.202])
	by mx1.freebsd.org (Postfix) with ESMTP id 8341F8FC0A
	for <freebsd-stable@FreeBSD.org>; Fri, 19 Aug 2011 15:06:17 +0000 (UTC)
Received: from 184-78-197-203.war.clearwire-wmx.net ([184.78.197.203]
	helo=_HOSTNAME_)
	by wh1.interactivevillages.com with esmtpsa (TLSv1:AES256-SHA:256)
	(Exim 4.69) (envelope-from <sterling@camdensoftware.com>)
	id 1QuQeE-0004Hf-S5
	for freebsd-stable@FreeBSD.org; Fri, 19 Aug 2011 08:05:52 -0700
Received: by _HOSTNAME_ (sSMTP sendmail emulation);
	Fri, 19 Aug 2011 08:06:12 -0700
Date: Fri, 19 Aug 2011 08:06:12 -0700
From: Chip Camden <sterling@camdensoftware.com>
To: freebsd-stable@FreeBSD.org
Message-ID: <20110819150612.GA34969@libertas.local.camdensoftware.com>
Mail-Followup-To: freebsd-stable@FreeBSD.org
References: <20110818.091600.831954331552558249.hrs@allbsd.org>
	<CAJ-FndCL70m41dQ9FPmzUg0V8a9JacvLOnjmMQL=3PfN7NmPfQ@mail.gmail.com>
	<20110818025550.GA1971@libertas.local.camdensoftware.com>
	<20110819.092811.1087267565626420460.hrs@allbsd.org>
	<20110819003759.GC54831@libertas.local.camdensoftware.com>
	<4E4E5D49.4040502@sentex.net>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="vkogqOf2sHV7VnPd"
Content-Disposition: inline
In-Reply-To: <4E4E5D49.4040502@sentex.net>
User-Agent: Mutt/1.4.2.3i
Company: Camden Software Consulting
URL: http://camdensoftware.com
X-PGP-Key: http://pgp.mit.edu:11371/pks/lookup?search=0xD6DBAF91
X-AntiAbuse: This header was added to track abuse,
	please include it with any abuse report
X-AntiAbuse: Primary Hostname - wh1.interactivevillages.com
X-AntiAbuse: Original Domain - freebsd.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - camdensoftware.com
X-Source: 
X-Source-Args: 
X-Source-Dir: 
Cc: 
Subject: Re: panic: spin lock held too long (RELENG_8 from today)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Aug 2011 15:06:17 -0000


--vkogqOf2sHV7VnPd
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Quoth Mike Tancsa on Friday, 19 August 2011:
> On 8/18/2011 8:37 PM, Chip Camden wrote:
>=20
> >> st> Thanks, Attilio.  I've applied the patch and removed the extra deb=
ug
> >> st> options I had added (though keeping debug symbols).  I'll let you =
know if
> >> st> I experience any more panics.
> >>
> >>  No panic for 20 hours at this moment, FYI.  For my NFS server, I
> >>  think another 24 hours would be sufficient to confirm the stability.
> >>  I will see how it works...
> >>
> >> -- Hiroki
> >=20
> > Likewise:
> >=20
> > $ uptime
> >  5:37PM  up 21:45, 5 users, load averages: 0.68, 0.45, 0.63
> >=20
> > So far, so good (knocks on head).
> >=20
>=20
>=20
> 0(ns4)% uptime
>  8:55AM  up 22:39, 3 users, load averages: 0.01, 0.00, 0.00
> 0(ns4)%
>=20
>=20
> So far so good for me too
>=20
> 	---Mike
>=20
> --=20
> -------------------
> Mike Tancsa, tel +1 519 651 3400
> Sentex Communications, mike@sentex.net
> Providing Internet services since 1994 www.sentex.net
> Cambridge, Ontario Canada   http://www.tancsa.com/
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"

Still up and running here.

 8:02AM  up 1 day, 12:10, 4 users, load averages: 0.08, 0.26, 0.52

After the panics began, I never went more than 12 hours without one before
applying this patch.  I think you nailed it, Attilio.  Or at least, you
moved it.

--=20
=2EO. | Sterling (Chip) Camden      | http://camdensoftware.com
=2E.O | sterling@camdensoftware.com | http://chipsquips.com
OOO | 2048R/D6DBAF91              | http://chipstips.com

--vkogqOf2sHV7VnPd
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iQEcBAEBAgAGBQJOTnvkAAoJEIpckszW26+RbXAH/RmLvrpkuuZU7wUAaXpN/jC/
t6x6ZMWUDJId2AlH3SIORFFDw2VvSQOxck14hZvGGHhBNYsqtfrdrAHi4cZual6S
Lv6hlcCN4asS52wsKCqoBvOasF5xZV+3L+0RARRhwO8kBNh2zKJ0jHYsQijBa/gw
xc1LLE4MPPETQGvZe/yxIQuC/oO5Sdo+zW85g6/8XX84ydDrEZqPSwbPbmtGrj3S
+vGMmexfnhlslgVlHboPnYIOnwRQKMkLb5oM7xejbx4yl6jn8qHtAFo+ltNftj4D
6vhQ/5AsNWimmHdsj/ZGTcTgM537k7gKSgYQvmpJolqdjqJ7hrFZOCRW7ewmPdI=
=S8yN
-----END PGP SIGNATURE-----

--vkogqOf2sHV7VnPd--

From owner-freebsd-stable@FreeBSD.ORG  Fri Aug 19 16:28:08 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 82BE3106566C;
	Fri, 19 Aug 2011 16:28:08 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 609F08FC17;
	Fri, 19 Aug 2011 16:28:06 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA02123;
	Fri, 19 Aug 2011 19:28:05 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Message-ID: <4E4E8F15.5030301@FreeBSD.org>
Date: Fri, 19 Aug 2011 19:28:05 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110705 Thunderbird/5.0
MIME-Version: 1.0
To: John Baldwin <jhb@FreeBSD.org>
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk>
	<4E4C22D6.6070407@FreeBSD.org> <4E4D717F.3090802@FreeBSD.org>
	<201108190814.00885.jhb@freebsd.org>
In-Reply-To: <201108190814.00885.jhb@freebsd.org>
X-Enigmail-Version: 1.2pre
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers@FreeBSD.org, freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Aug 2011 16:28:08 -0000

on 19/08/2011 15:14 John Baldwin said the following:
> Yes, it is a bug in kgdb that it only walks allproc and not zombproc.  Try this:

The patch worked perfectly well for me, thank you!

> Index: kthr.c
> ===================================================================
> --- kthr.c	(revision 224879)
> +++ kthr.c	(working copy)
> @@ -73,11 +73,52 @@ kgdb_thr_first(void)
>  	return (first);
>  }
>  
> +static void
> +kgdb_thr_add_procs(uintptr_t paddr)
> +{
> +	struct proc p;
> +	struct thread td;
> +	struct kthr *kt;
> +	CORE_ADDR addr;
> +
> +	while (paddr != 0) {
> +		if (kvm_read(kvm, paddr, &p, sizeof(p)) != sizeof(p)) {
> +			warnx("kvm_read: %s", kvm_geterr(kvm));
> +			break;
> +		}
> +		addr = (uintptr_t)TAILQ_FIRST(&p.p_threads);
> +		while (addr != 0) {
> +			if (kvm_read(kvm, addr, &td, sizeof(td)) !=
> +			    sizeof(td)) {
> +				warnx("kvm_read: %s", kvm_geterr(kvm));
> +				break;
> +			}
> +			kt = malloc(sizeof(*kt));
> +			kt->next = first;
> +			kt->kaddr = addr;
> +			if (td.td_tid == dumptid)
> +				kt->pcb = dumppcb;
> +			else if (td.td_state == TDS_RUNNING && stoppcbs != 0 &&
> +			    CPU_ISSET(td.td_oncpu, &stopped_cpus))
> +				kt->pcb = (uintptr_t)stoppcbs +
> +				    sizeof(struct pcb) * td.td_oncpu;
> +			else
> +				kt->pcb = (uintptr_t)td.td_pcb;
> +			kt->kstack = td.td_kstack;
> +			kt->tid = td.td_tid;
> +			kt->pid = p.p_pid;
> +			kt->paddr = paddr;
> +			kt->cpu = td.td_oncpu;
> +			first = kt;
> +			addr = (uintptr_t)TAILQ_NEXT(&td, td_plist);
> +		}
> +		paddr = (uintptr_t)LIST_NEXT(&p, p_list);
> +	}
> +}
> +
>  struct kthr *
>  kgdb_thr_init(void)
>  {
> -	struct proc p;
> -	struct thread td;
>  	long cpusetsize;
>  	struct kthr *kt;
>  	CORE_ADDR addr;
> @@ -113,37 +154,11 @@ kgdb_thr_init(void)
>  
>  	stoppcbs = kgdb_lookup("stoppcbs");
>  
> -	while (paddr != 0) {
> -		if (kvm_read(kvm, paddr, &p, sizeof(p)) != sizeof(p)) {
> -			warnx("kvm_read: %s", kvm_geterr(kvm));
> -			break;
> -		}
> -		addr = (uintptr_t)TAILQ_FIRST(&p.p_threads);
> -		while (addr != 0) {
> -			if (kvm_read(kvm, addr, &td, sizeof(td)) !=
> -			    sizeof(td)) {
> -				warnx("kvm_read: %s", kvm_geterr(kvm));
> -				break;
> -			}
> -			kt = malloc(sizeof(*kt));
> -			kt->next = first;
> -			kt->kaddr = addr;
> -			if (td.td_tid == dumptid)
> -				kt->pcb = dumppcb;
> -			else if (td.td_state == TDS_RUNNING && stoppcbs != 0 &&
> -			    CPU_ISSET(td.td_oncpu, &stopped_cpus))
> -				kt->pcb = (uintptr_t) stoppcbs + sizeof(struct pcb) * td.td_oncpu;
> -			else
> -				kt->pcb = (uintptr_t)td.td_pcb;
> -			kt->kstack = td.td_kstack;
> -			kt->tid = td.td_tid;
> -			kt->pid = p.p_pid;
> -			kt->paddr = paddr;
> -			kt->cpu = td.td_oncpu;
> -			first = kt;
> -			addr = (uintptr_t)TAILQ_NEXT(&td, td_plist);
> -		}
> -		paddr = (uintptr_t)LIST_NEXT(&p, p_list);
> +	kgdb_thr_add_procs(paddr);
> +	addr = kgdb_lookup("zombproc");
> +	if (addr != 0) {
> +		kvm_read(kvm, addr, &paddr, sizeof(paddr));
> +		kgdb_thr_add_procs(paddr);
>  	}
>  	curkthr = kgdb_thr_lookup_tid(dumptid);
>  	if (curkthr == NULL)
> 
>> is there an easy way to examine its stack in this case?
> 
> Hmm, you can use something like this from my kgdb macros.

Oh, I completely forgot about them.
I hope I will remember where to search for the tricks next time I need them :-)
Thank you again!

> For amd64:
> 
> # Do a backtrace given %rip and %rbp as args
> define bt
>     set $_rip = $arg0
>     set $_rbp = $arg1
>     set $i = 0
>     while ($_rbp != 0 || $_rip != 0)
> 	printf "%2d: pc ", $i
> 	if ($_rip != 0)
> 		x/1i $_rip
> 	else
> 		printf "\n"
> 	end
> 	if ($_rbp == 0)
> 	    set $_rip = 0
> 	else
> 	    set $fr = (struct amd64_frame *)$_rbp
> 	    set $_rbp = $fr->f_frame
> 	    set $_rip = $fr->f_retaddr
> 	    set $i = $i + 1
> 	end
>     end
> end
> 
> document bt
> Given values for %rip and %rbp, perform a manual backtrace.
> end
> 
> define btf
>     bt $arg0.tf_rip $arg0.tf_rbp
> end
> 
> document btf
> Do a manual backtrace from a specified trapframe.
> end
> 
> For i386:
> 
> # Do a backtrace given %eip and %ebp as args
> define bt
>     set $_eip = $arg0
>     set $_ebp = $arg1
>     set $i = 0
>     while ($_ebp != 0 || $_eip != 0)
> 	printf "%2d: pc ", $i
> 	if ($_eip != 0)
> 		x/1i $_eip
> 	else
> 		printf "\n"
> 	end
> 	if ($_ebp == 0)
> 	    set $_eip = 0
> 	else
> 	    set $fr = (struct i386_frame *)$_ebp
> 	    set $_ebp = $fr->f_frame
> 	    set $_eip = $fr->f_retaddr
> 	    set $i = $i + 1
> 	end
>     end
> end
> 
> document bt
> Given values for %eip and %ebp, perform a manual backtrace.
> end
> 
> define btf
>     bt $arg0.tf_eip $arg0.tf_ebp
> end
> 
> document btf
> Do a manual backtrace from a specified trapframe.
> end
> 


-- 
Andriy Gapon

From owner-freebsd-stable@FreeBSD.ORG  Fri Aug 19 16:32:19 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0AB08106566C
	for <freebsd-stable@FreeBSD.org>; Fri, 19 Aug 2011 16:32:19 +0000 (UTC)
	(envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 5A21C8FC16
	for <freebsd-stable@FreeBSD.org>; Fri, 19 Aug 2011 16:32:17 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA02153;
	Fri, 19 Aug 2011 19:32:13 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Message-ID: <4E4E900D.8010506@FreeBSD.org>
Date: Fri, 19 Aug 2011 19:32:13 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110705 Thunderbird/5.0
MIME-Version: 1.0
To: Hans Petter Selasky <hselasky@c2i.net>
References: <DA1FD6FD-2E57-4EC4-899D-2C1CBB769456@averesystems.com>
	<4E4D460A.2080100@FreeBSD.org>
	<201108182324.58276.hselasky@c2i.net>
In-Reply-To: <201108182324.58276.hselasky@c2i.net>
X-Enigmail-Version: 1.2pre
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: freebsd-stable@FreeBSD.org
Subject: Re: USB/coredump hangs in 8 and 9
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Aug 2011 16:32:19 -0000

on 19/08/2011 00:24 Hans Petter Selasky said the following:
> On Thursday 18 August 2011 19:04:10 Andriy Gapon wrote:
>> If you can help Hans to figure out what you is wrong with USB subsystem in
>> this respect that would help us all.
> 
> Hi,
> 
> usb_busdma.c:   /* we use "mtx_owned()" instead of this function */
> usb_busdma.c:   owned = mtx_owned(uptag->mtx);
> usb_compat_linux.c:     do_unlock = mtx_owned(&Giant) ? 0 : 1;
> usb_compat_linux.c:     do_unlock = mtx_owned(&Giant) ? 0 : 1;
> usb_compat_linux.c:     do_unlock = mtx_owned(&Giant) ? 0 : 1;
> usb_hub.c:      if (mtx_owned(&bus->bus_mtx)) {
> usb_transfer.c: if (!mtx_owned(info->xfer_mtx)) {
> usb_transfer.c: if (mtx_owned(xfer->xroot->xfer_mtx)) {
> usb_transfer.c:         while (mtx_owned(&xroot->udev->bus->bus_mtx)) {
> usb_transfer.c:         while (mtx_owned(xroot->xfer_mtx)) {
> 
> One fix you will need to do, if mtx_owned is not giving correct value is:

First, could you please clarify what is the correct, or rather - expected, value
in this case.  It's not immediately clear to me if we should consider all locks as
owned or un-owned in a situation where all locks are actually skipped behind the
scenes.
Maybe USB code should explicitly check for that condition as to not make any
unsafe assumptions.

Second, it's not clear to me what the above list actually represents in the
context of this discussion.

> static void
> usbd_callback_wrapper(struct usb_xfer_queue *pq)
> {
>         struct usb_xfer *xfer = pq->curr;
>         struct usb_xfer_root *info = xfer->xroot;
> 
>         USB_BUS_LOCK_ASSERT(info->bus, MA_OWNED);
>         if (!mtx_owned(info->xfer_mtx)) {
> 
> The above "if" should be anded with && !paniced && !dumping ... or maybe the 
> new not scheduling variable is good for this purpose?
> 
>                 /*
>                  * Cases that end up here:
>                  *
> 
> #if USB_HAVE_BUSDMA
>         if (mtx_owned(xfer->xroot->xfer_mtx)) {
>                 struct usb_xfer_queue *pq;
> 
> 
> This case is more like a BUS-DMA error case, and is not so important to 
> execute.
> 
> --HPS


-- 
Andriy Gapon

From owner-freebsd-stable@FreeBSD.ORG  Fri Aug 19 16:44:39 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2EE0D106564A;
	Fri, 19 Aug 2011 16:44:38 +0000 (UTC)
	(envelope-from asmrookie@gmail.com)
Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com
	[209.85.160.182])
	by mx1.freebsd.org (Postfix) with ESMTP id CA6068FC13;
	Fri, 19 Aug 2011 16:44:37 +0000 (UTC)
Received: by gyd10 with SMTP id 10so2717217gyd.13
	for <multiple recipients>; Fri, 19 Aug 2011 09:44:37 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=0Lz5H6GCIvnE1/LQqKZyiedNy77sAEIyOH0cmJVDOFk=;
	b=aBdz3xeAtXi14nYrzdTfbYnnuhNqjufzZl/IQ9oX2bGTTZNnmH34Yd1WsZIPayJECW
	KDX3eLfZRxwYbItBWAvsxE58zhqHvdqB7vtaJClFX4Fx0mQOJotTU2R1foquCqQtbepG
	XTtxiWrQ8g2CvxKbJ/Vb8kVORyJdwvRQtliaQ=
MIME-Version: 1.0
Received: by 10.236.80.9 with SMTP id j9mr8972365yhe.94.1313772277318; Fri, 19
	Aug 2011 09:44:37 -0700 (PDT)
Sender: asmrookie@gmail.com
Received: by 10.236.108.33 with HTTP; Fri, 19 Aug 2011 09:44:37 -0700 (PDT)
In-Reply-To: <DA1FD6FD-2E57-4EC4-899D-2C1CBB769456@averesystems.com>
References: <DA1FD6FD-2E57-4EC4-899D-2C1CBB769456@averesystems.com>
Date: Fri, 19 Aug 2011 18:44:37 +0200
X-Google-Sender-Auth: -paXys0lGuRJVgp-FLwxqxKuPdc
Message-ID: <CAJ-FndD6SyzNSG9whzz+zAeXO4mTmRbD8uU4ttNXJhDobdeG-g@mail.gmail.com>
From: Attilio Rao <attilio@freebsd.org>
To: Andrew Boyer <aboyer@averesystems.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-stable@freebsd.org, Eugene Grosbein <egrosbein@rdtc.ru>,
	Vishal.Shah@netapp.com, Andriy Gapon <avg@freebsd.org>,
	Hans Petter Selasky <hselasky@c2i.net>,
	Jeremiah Lott <jlott@averesystems.com>,
	Steven Hartland <killing@multiplay.co.uk>
Subject: Re: USB/coredump hangs in 8 and 9
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Aug 2011 16:44:39 -0000

2011/8/12 Andrew Boyer <aboyer@averesystems.com>:
> Re: panic: bufwrite: buffer is not busy??? (originally on freebsd-net)
> Re: debugging frequent kernel panics on 8.2-RELEASE (originally on freebs=
d-stable)
> Re: System hang in USB umass module while processing panic =C2=A0(origina=
lly on freebsd-usb)
>
> Hello Andriy and Hans,
>
> Sorry for tying in so many discussions on this topic, but I think I have =
an explanation for the problems we have been reporting* with hanging coredu=
mps on multicore systems on 8.2-RELEASE, and it has implications for Andriy=
's proposed scheduler patch** and for USB.
>
> In today's 8.X and 9.X branches, nothing that I can find stops the other =
CPUs when the kernel panics, but many parts of the locking code get disable=
d (grep on 'panicstr'). =C2=A0The 'bufwrite: buffer is not busy???' panic i=
s caused by the syncer encountering an error. =C2=A0If that happens when it=
's on the dumping CPU everything hangs. =C2=A0If it's running on a differen=
t CPU, it will be blocked and hidden by the panic_cpu spinlock in panic(), =
and the dump continues, polling every attached keyboard for a Ctl-C.
>
> But, the new 8.X USB stack relies on multithreading. =C2=A0(The new stack=
 is the variable that broke coredumps for us in the 7.1->8.2 transition, I =
think.) =C2=A0SVN 224223 fixes a hang that would happen when dumpsys() poll=
s the USB keyboard (IPMI KVM, in our case). =C2=A0That helps, but it only g=
ets as far as usb_process(), where it hangs in a loop around a cv_wait() ca=
ll. =C2=A0This is easy to reproduce by adding code to the watchdog to break=
 into the debugger if panicstr is set.
>
> I am experimenting with Andriy's patch** to stop the scheduler and it see=
ms to be most of the way there, stopping the CPUs and disabling the rest of=
 locking. =C2=A0There are a few places that still reference panicstr, but t=
hat's minor. =C2=A0These are the changes I made to the patch:
> =C2=A0* Changed ukbd_do_poll() to return immediately if SCHEDULER_STOPPED=
() is true, so that we don't hang up in USB. =C2=A0ukbd_yield() =C2=A0locks=
 up in DROP_GIANT(), and if you skip ukbd_yield(), usbd_transfer_poll() loc=
ks up trying to drop mutexes.
> =C2=A0* Changed the call to spinlock_enter() back to critical_enter(), so=
 that interrupts stay enabled and the hardclock still functions.

Which spinlock_enter() are you referring here?
I think that having interrupts fast handlers running during
panic/shutdown is something we should avoid like hell.

Attilio


--=20
Peace can only be achieved by understanding - A. Einstein

From owner-freebsd-stable@FreeBSD.ORG  Fri Aug 19 21:09:10 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2C40E1065679
	for <freebsd-stable@freebsd.org>; Fri, 19 Aug 2011 21:09:10 +0000 (UTC)
	(envelope-from dan@langille.org)
Received: from nyi.unixathome.org (nyi.unixathome.org [64.147.113.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 028A58FC08
	for <freebsd-stable@freebsd.org>; Fri, 19 Aug 2011 21:09:09 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
	by nyi.unixathome.org (Postfix) with ESMTP id C69B450A09
	for <freebsd-stable@freebsd.org>; Fri, 19 Aug 2011 20:50:02 +0000 (UTC)
X-Virus-Scanned: amavisd-new at unixathome.org
Received: from nyi.unixathome.org ([127.0.0.1])
	by localhost (nyi.unixathome.org [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id 29vy5iYDDqPK for <freebsd-stable@freebsd.org>;
	Fri, 19 Aug 2011 21:50:02 +0100 (BST)
Received: from smtp-auth.unixathome.org (smtp-auth.unixathome.org [10.4.7.7])
	(Authenticated sender: hidden)
	by nyi.unixathome.org (Postfix) with ESMTPSA id 63A34509F3  
	for <freebsd-stable@freebsd.org>; Fri, 19 Aug 2011 20:50:02 +0000 (UTC)
From: Dan Langille <dan@langille.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Date: Fri, 19 Aug 2011 16:50:01 -0400
Message-Id: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org>
To: freebsd-stable@freebsd.org
Mime-Version: 1.0 (Apple Message framework v1084)
X-Mailer: Apple Mail (2.1084)
Subject: bad sector in gmirror HDD
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Aug 2011 21:09:10 -0000

System in question: FreeBSD 8.2-STABLE #3: Thu Mar  3 04:52:04 GMT 2011

After a recent power failure, I'm seeing this in my logs:

Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently =
unreadable (pending) sectors

And gmirror reports:

# gmirror status
      Name    Status  Components
mirror/gm0  DEGRADED  ad0 (100%)
                      ad2

I think the solution is: gmirror rebuild

Comments?


Searching on that error message, I was led to believe that identifying =
the bad sector and
running dd to read it would cause the HDD to reallocate that bad block.

  http://smartmontools.sourceforge.net/badblockhowto.html

However, since ad2 is one half of a gmirror, I don't think this is the =
best approach.

Comments?


More information:

smartd, gpart, dh, diskinfo, and fdisk output at =
http://beta.freebsddiary.org/smart-fixing-bad-sector.php

also:

# gmirror list
Geom name: gm0
State: DEGRADED
Components: 2
Balance: round-robin
Slice: 4096
Flags: NONE
GenID: 0
SyncID: 1
ID: 3362720654
Providers:
1. Name: mirror/gm0
   Mediasize: 40027028992 (37G)
   Sectorsize: 512
   Mode: r6w5e14
Consumers:
1. Name: ad0
   Mediasize: 40027029504 (37G)
   Sectorsize: 512
   Mode: r1w1e1
   State: SYNCHRONIZING
   Priority: 0
   Flags: DIRTY, SYNCHRONIZING
   GenID: 0
   SyncID: 1
   Synchronized: 100%
   ID: 949692477
2. Name: ad2
   Mediasize: 40027029504 (37G)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
   Priority: 0
   Flags: DIRTY, BROKEN
   GenID: 0
   SyncID: 1
   ID: 3585934016


--=20
Dan Langille - http://langille.org


From owner-freebsd-stable@FreeBSD.ORG  Fri Aug 19 21:52:02 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C6781106564A
	for <freebsd-stable@freebsd.org>; Fri, 19 Aug 2011 21:52:02 +0000 (UTC)
	(envelope-from cswiger@mac.com)
Received: from asmtpout025.mac.com (asmtpout025.mac.com [17.148.16.100])
	by mx1.freebsd.org (Postfix) with ESMTP id AFFAC8FC08
	for <freebsd-stable@freebsd.org>; Fri, 19 Aug 2011 21:52:02 +0000 (UTC)
MIME-version: 1.0
Content-transfer-encoding: 7BIT
Content-type: text/plain; CHARSET=US-ASCII
Received: from [17.153.44.144] by asmtp025.mac.com
	(Oracle Communications Messaging Exchange Server 7u4-20.01 64bit (built
	Nov 21 2010)) with ESMTPSA id <0LQ7000ZO3DZCG70@asmtp025.mac.com> for
	freebsd-stable@freebsd.org; Fri, 19 Aug 2011 14:51:37 -0700 (PDT)
X-Proofpoint-Virus-Version: vendor=fsecure
	engine=2.50.10432:5.4.6813,1.0.211,0.0.0000
	definitions=2011-08-19_08:2011-08-19, 2011-08-19,
	1970-01-01 signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0
	ipscore=0 suspectscore=0 phishscore=0 bulkscore=0 adultscore=0
	classifier=spam
	adjust=0 reason=mlx engine=6.0.2-1012030000 definitions=main-1108190264
From: Chuck Swiger <cswiger@mac.com>
In-reply-to: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org>
Date: Fri, 19 Aug 2011 14:51:34 -0700
Message-id: <65474D95-F56F-4DC7-8029-BA7166C4E46F@mac.com>
References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org>
To: Dan Langille <dan@langille.org>
X-Mailer: Apple Mail (2.1084)
Cc: freebsd-stable@freebsd.org
Subject: Re: bad sector in gmirror HDD
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Aug 2011 21:52:02 -0000

On Aug 19, 2011, at 1:50 PM, Dan Langille wrote:
> Searching on that error message, I was led to believe that identifying the bad sector and
> running dd to read it would cause the HDD to reallocate that bad block.
> 
>  http://smartmontools.sourceforge.net/badblockhowto.html
> 
> However, since ad2 is one half of a gmirror, I don't think this is the best approach.
> 
> Comments?

Reading the underlying failing drive with dd will help identify any other questionable sectors.  However, your drive temps are too high-- many vendors call out either 50C or 55C as the point where drive reliability becomes significantly degraded.

Regards,
-- 
-Chuck


From owner-freebsd-stable@FreeBSD.ORG  Fri Aug 19 23:21:31 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id F335F106564A
	for <freebsd-stable@freebsd.org>; Fri, 19 Aug 2011 23:21:30 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta04.westchester.pa.mail.comcast.net
	(qmta04.westchester.pa.mail.comcast.net [76.96.62.40])
	by mx1.freebsd.org (Postfix) with ESMTP id A28F88FC08
	for <freebsd-stable@freebsd.org>; Fri, 19 Aug 2011 23:21:30 +0000 (UTC)
Received: from omta20.westchester.pa.mail.comcast.net ([76.96.62.71])
	by qmta04.westchester.pa.mail.comcast.net with comcast
	id NPJP1h0021YDfWL54PMWHa; Fri, 19 Aug 2011 23:21:30 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta20.westchester.pa.mail.comcast.net with comcast
	id NPMS1h0191t3BNj3gPMTnv; Fri, 19 Aug 2011 23:21:29 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 5054F102C1A; Fri, 19 Aug 2011 16:21:25 -0700 (PDT)
Date: Fri, 19 Aug 2011 16:21:25 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Dan Langille <dan@langille.org>
Message-ID: <20110819232125.GA4965@icarus.home.lan>
References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-stable@freebsd.org
Subject: Re: bad sector in gmirror HDD
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Aug 2011 23:21:31 -0000

On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote:
> System in question: FreeBSD 8.2-STABLE #3: Thu Mar  3 04:52:04 GMT 2011
> 
> After a recent power failure, I'm seeing this in my logs:
> 
> Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently unreadable (pending) sectors

I doubt this is related to a power failure.

> Searching on that error message, I was led to believe that identifying the bad sector and
> running dd to read it would cause the HDD to reallocate that bad block.
> 
>   http://smartmontools.sourceforge.net/badblockhowto.html

This is incorrect (meaning you've misunderstood what's written there).

Unreadable LBAs can be a result of the LBA being actually bad (as in
uncorrectable), or the LBA being marked "suspect".  In either case the
LBA will return an I/O error when read.

If the LBAs are marked "suspect", the drive will perform re-analysis of
the LBA (to determine if the LBA can be read and the data re-mapped, or
if it cannot then the LBA is marked uncorrectable) when you **write** to
the LBA.

The above smartd output doesn't tell me much.  Providing actual SMART
attribute data (smartctl -a) for the drive would help.  The brand of the
drive, the firmware version, and the model all matter -- every drive
behaves a little differently.

Furthermore, if the LBA is re-analysed and determined to be
uncorrectable -- regardless of remapping -- this doesn't actually fix
I/O errors on a filesystem level.  The filesystem itself (and more often
than not in the data section of the file/inode, so things like fsck
can't work around this) can still contain references to the LBA which is
uncorrectable, and will still continue to return I/O errors when read.
There has to be a way to tell the filesystem, when formatted, "avoid use
of this LBA".  How UFS/FFS handles this is unknown to me.  I know of
badsect(8) but I don't know if this works.  "Transparent" remapping I
have never seen work except on SSDs.

If you want me to step you through the procedure of re-testing the LBAs
(assuming they're suspect and not uncorrectable) I can do so, just ask.
Finding the suspect LBAs can be done using a dd loop (I wrote a shell
script for this), or using "smartctl -t select,0-max /dev/XXX" and let
the drive's internal selective test see if it can find them.  From there
it's an issue of submitting a write request to the LBA and seeing what
happens (I do this via dd as well, but the parameters you pass it are
very specific, e.g. don't mix up/misunderstand seek vs. skip).

I've assisted with this time and time again for folks on forums with
varying success.

I've also found some models of drives which claim there's suspect LBAs
yet an internal surface scan passes with no issues (and these are drives
which I myself have, the only difference between my drives and the
individuals' drive is firmware, which leads me to believe a bug on some
drives in the field).

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-stable@FreeBSD.ORG  Fri Aug 19 23:53:50 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C22BB1065670;
	Fri, 19 Aug 2011 23:53:50 +0000 (UTC)
	(envelope-from asmrookie@gmail.com)
Received: from mail-gw0-f54.google.com (mail-gw0-f54.google.com [74.125.83.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 560328FC08;
	Fri, 19 Aug 2011 23:53:50 +0000 (UTC)
Received: by gwb15 with SMTP id 15so2335350gwb.13
	for <multiple recipients>; Fri, 19 Aug 2011 16:53:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=Xuj6PAUpGO/iake7/sudbaeNchPxrBxs8fUdliE4DYQ=;
	b=HC/Hu5uIrhD9LJE3SFIMZyz15FZ7mDSnf1mvDABpnP1CwxZXh2/2B/mf5xww+C5M8a
	cC2VrebwGsJcIep4JwD2wA5I3a+hmtg2OtUFOXcEA61SKgebD3Foe4DONj3LoHQ96rqM
	oT0M8wfcZ35IvJGIVGfo8hM7A58W6LOL0raeA=
MIME-Version: 1.0
Received: by 10.236.116.40 with SMTP id f28mr2295591yhh.60.1313798029787; Fri,
	19 Aug 2011 16:53:49 -0700 (PDT)
Sender: asmrookie@gmail.com
Received: by 10.236.108.33 with HTTP; Fri, 19 Aug 2011 16:53:49 -0700 (PDT)
In-Reply-To: <4E4E5D49.4040502@sentex.net>
References: <20110818.091600.831954331552558249.hrs@allbsd.org>
	<CAJ-FndCL70m41dQ9FPmzUg0V8a9JacvLOnjmMQL=3PfN7NmPfQ@mail.gmail.com>
	<20110818025550.GA1971@libertas.local.camdensoftware.com>
	<20110819.092811.1087267565626420460.hrs@allbsd.org>
	<20110819003759.GC54831@libertas.local.camdensoftware.com>
	<4E4E5D49.4040502@sentex.net>
Date: Sat, 20 Aug 2011 01:53:49 +0200
X-Google-Sender-Auth: qEZlhSvUegqFgRJ99Kef9GaLXuU
Message-ID: <CAJ-FndDHmwa+=LNGgU+5MK2Xmtj8kWHB10JsoytkMGEtVgncYw@mail.gmail.com>
From: Attilio Rao <attilio@freebsd.org>
To: Mike Tancsa <mike@sentex.net>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: kostikbel@gmail.com, Nick Esborn <nick@desert.net>,
	freebsd-stable@freebsd.org, avg@freebsd.org
Subject: Re: panic: spin lock held too long (RELENG_8 from today)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Aug 2011 23:53:50 -0000

If nobody complains about it earlier, I'll propose the patch to re@ in 8 ho=
urs.

Attilio

2011/8/19 Mike Tancsa <mike@sentex.net>:
> On 8/18/2011 8:37 PM, Chip Camden wrote:
>
>>> st> Thanks, Attilio. =C2=A0I've applied the patch and removed the extra=
 debug
>>> st> options I had added (though keeping debug symbols). =C2=A0I'll let =
you know if
>>> st> I experience any more panics.
>>>
>>> =C2=A0No panic for 20 hours at this moment, FYI. =C2=A0For my NFS serve=
r, I
>>> =C2=A0think another 24 hours would be sufficient to confirm the stabili=
ty.
>>> =C2=A0I will see how it works...
>>>
>>> -- Hiroki
>>
>> Likewise:
>>
>> $ uptime
>> =C2=A05:37PM =C2=A0up 21:45, 5 users, load averages: 0.68, 0.45, 0.63
>>
>> So far, so good (knocks on head).
>>
>
>
> 0(ns4)% uptime
> =C2=A08:55AM =C2=A0up 22:39, 3 users, load averages: 0.01, 0.00, 0.00
> 0(ns4)%
>
>
> So far so good for me too
>
> =C2=A0 =C2=A0 =C2=A0 =C2=A0---Mike
>
> --
> -------------------
> Mike Tancsa, tel +1 519 651 3400
> Sentex Communications, mike@sentex.net
> Providing Internet services since 1994 www.sentex.net
> Cambridge, Ontario Canada =C2=A0 http://www.tancsa.com/
>


--=20
Peace can only be achieved by understanding - A. Einstein

From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 00:14:57 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2F203106566B
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 00:14:57 +0000 (UTC)
	(envelope-from db@db.net)
Received: from diana.db.net (diana.db.net [66.113.102.10])
	by mx1.freebsd.org (Postfix) with ESMTP id 1B9488FC0A
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 00:14:56 +0000 (UTC)
Received: from night.db.net (localhost [127.0.0.1])
	by diana.db.net (Postfix) with ESMTP id 158F62282A;
	Fri, 19 Aug 2011 17:48:20 -0600 (MDT)
Received: by night.db.net (Postfix, from userid 1000)
	id 0FA996533; Fri, 19 Aug 2011 19:57:19 -0400 (EDT)
Date: Fri, 19 Aug 2011 19:57:19 -0400
From: Diane Bruce <db@db.net>
To: Dan Langille <dan@langille.org>
Message-ID: <20110819235719.GA64220@night.db.net>
References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org>
User-Agent: Mutt/1.4.2.3i
Cc: freebsd-stable@freebsd.org
Subject: Re: bad sector in gmirror HDD
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 00:14:57 -0000

On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote:
> System in question: FreeBSD 8.2-STABLE #3: Thu Mar  3 04:52:04 GMT 2011
> 
> After a recent power failure, I'm seeing this in my logs:
> 
> Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently unreadable (pending) sectors
> 

Personally, I'd replace that drive now. 

> Searching on that error message, I was led to believe that identifying the bad sector and
> running dd to read it would cause the HDD to reallocate that bad block.

No, as otherwise mentioned (Hi Jeremy!) you need to read and write the
block. This could buy you a few more days or a few more weeks. Personally,
I would not wait. Your call.
 
> Comments?
...
> Dan Langille - http://langille.org

- Diane
-- 
- db@FreeBSD.org db@db.net http://www.db.net/~db
  Why leave money to our children if we don't leave them the Earth?

From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 00:51:03 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 56142106566B
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 00:51:03 +0000 (UTC)
	(envelope-from kob6558@gmail.com)
Received: from mail-gw0-f54.google.com (mail-gw0-f54.google.com [74.125.83.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 1912F8FC18
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 00:51:02 +0000 (UTC)
Received: by gwb15 with SMTP id 15so2352159gwb.13
	for <freebsd-stable@freebsd.org>; Fri, 19 Aug 2011 17:51:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	bh=mthNhNDly5UeuehxGgrRZJoGGvFlzBzKGLh3/jP2/RU=;
	b=P2lh2vxgDkC8yPuWt2HSEIrPrLYgHu6CYUMTIi9OS1p+mhxx736x9V//SlygrMtUc7
	O6ni74t7P7dnbjWwRVc8hAwzDmvk8bBcXBCiGo/rwm8fKEr7cS1JCpEzjLnkhWWaUGHK
	GdIjR3M9A6c51UVO6SdlarDDTNqlK1iPZzGu8=
MIME-Version: 1.0
Received: by 10.150.236.9 with SMTP id j9mr36820ybh.167.1313801462341; Fri, 19
	Aug 2011 17:51:02 -0700 (PDT)
Received: by 10.151.98.3 with HTTP; Fri, 19 Aug 2011 17:51:02 -0700 (PDT)
In-Reply-To: <20110819235719.GA64220@night.db.net>
References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org>
	<20110819235719.GA64220@night.db.net>
Date: Fri, 19 Aug 2011 17:51:02 -0700
Message-ID: <CAN6yY1vitKEiry1SGUv4gCe69mvXoqFOTYZn299cFKw+G1VS4g@mail.gmail.com>
From: Kevin Oberman <kob6558@gmail.com>
To: Dan Langille <dan@langille.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-stable@freebsd.org
Subject: Re: bad sector in gmirror HDD
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 00:51:03 -0000

On Fri, Aug 19, 2011 at 4:57 PM, Diane Bruce <db@db.net> wrote:
> On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote:
>> System in question: FreeBSD 8.2-STABLE #3: Thu Mar =A03 04:52:04 GMT 201=
1
>>
>> After a recent power failure, I'm seeing this in my logs:
>>
>> Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently unreada=
ble (pending) sectors
>>
>
> Personally, I'd replace that drive now.
>
>> Searching on that error message, I was led to believe that identifying t=
he bad sector and
>> running dd to read it would cause the HDD to reallocate that bad block.
>
> No, as otherwise mentioned (Hi Jeremy!) you need to read and write the
> block. This could buy you a few more days or a few more weeks. Personally=
,
> I would not wait. Your call.
>

While I largely agree, it depends on several factors as to whether I'd
replace the drive.

First, what does SMART show other then these errors?  If the reported
statistics look generally good, and considering that you a mirror with
one "good" copy of the blocks in question, the impact is zero unless
the other drive fails. That is why the blocks need to be re-written so
that they will be re-located on the drive.

Second, how critical is the data? The mirror gives good integrity, but
you also need good backups. If the data MUST be on-line with high
reliability, buy a replacement drive. You need to look at cost-benefit
(or really the cost of replacement vs. cost of failure).

It's worth mentioning that all drives have bad blocks. Most are hard
bad blocks and are re-mapped before the drive is shipped, but marginal
bad blocks can and do slip through to customers and it is entirely
possible that the drive is just fine for the most part and replacing
it is really a waste of money.

Only you can make the call, but if further bad blocks show up in the
near term, I'll go along with recommending replacement.

--=20
R. Kevin Oberman, Network Engineer - Retired
E-mail: kob6558@gmail.com

From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 01:14:08 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2EAD31065670
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 01:14:08 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta12.westchester.pa.mail.comcast.net
	(qmta12.westchester.pa.mail.comcast.net [76.96.59.227])
	by mx1.freebsd.org (Postfix) with ESMTP id CFDC28FC13
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 01:14:07 +0000 (UTC)
Received: from omta01.westchester.pa.mail.comcast.net ([76.96.62.11])
	by qmta12.westchester.pa.mail.comcast.net with comcast
	id NRCw1h0020EZKEL5CRE8ze; Sat, 20 Aug 2011 01:14:08 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta01.westchester.pa.mail.comcast.net with comcast
	id NRE61h0191t3BNj3MRE70T; Sat, 20 Aug 2011 01:14:08 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 4D774102C1A; Fri, 19 Aug 2011 18:14:05 -0700 (PDT)
Date: Fri, 19 Aug 2011 18:14:05 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Kevin Oberman <kob6558@gmail.com>
Message-ID: <20110820011405.GA20330@icarus.home.lan>
References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org>
	<20110819235719.GA64220@night.db.net>
	<CAN6yY1vitKEiry1SGUv4gCe69mvXoqFOTYZn299cFKw+G1VS4g@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAN6yY1vitKEiry1SGUv4gCe69mvXoqFOTYZn299cFKw+G1VS4g@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-stable@freebsd.org, Dan Langille <dan@langille.org>
Subject: Re: bad sector in gmirror HDD
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 01:14:08 -0000

On Fri, Aug 19, 2011 at 05:51:02PM -0700, Kevin Oberman wrote:
> On Fri, Aug 19, 2011 at 4:57 PM, Diane Bruce <db@db.net> wrote:
> > On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote:
> >> System in question: FreeBSD 8.2-STABLE #3: Thu Mar ?3 04:52:04 GMT 2011
> >>
> >> After a recent power failure, I'm seeing this in my logs:
> >>
> >> Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently unreadable (pending) sectors
> >>
> >
> > Personally, I'd replace that drive now.
> >
> >> Searching on that error message, I was led to believe that identifying the bad sector and
> >> running dd to read it would cause the HDD to reallocate that bad block.
> >
> > No, as otherwise mentioned (Hi Jeremy!) you need to read and write the
> > block. This could buy you a few more days or a few more weeks. Personally,
> > I would not wait. Your call.
> >
> 
> While I largely agree, it depends on several factors as to whether I'd
> replace the drive.
> 
> First, what does SMART show other then these errors?  If the reported
> statistics look generally good, and considering that you a mirror with
> one "good" copy of the blocks in question, the impact is zero unless
> the other drive fails. That is why the blocks need to be re-written so
> that they will be re-located on the drive.
> 
> Second, how critical is the data? The mirror gives good integrity, but
> you also need good backups. If the data MUST be on-line with high
> reliability, buy a replacement drive. You need to look at cost-benefit
> (or really the cost of replacement vs. cost of failure).
> 
> It's worth mentioning that all drives have bad blocks. Most are hard
> bad blocks and are re-mapped before the drive is shipped, but marginal
> bad blocks can and do slip through to customers and it is entirely
> possible that the drive is just fine for the most part and replacing
> it is really a waste of money.
>
> Only you can make the call, but if further bad blocks show up in the
> near term, I'll go along with recommending replacement.

I can expand a bit on this.

With ATA/SATA and SCSI disks, there's a factory default list of LBAs
which are bad (referred to as the "physical defect list").  Everyone by
now is familiar with this.

With SCSI disks there's "grown defects", which is a drive-managed AND
user-managed list of LBAs which are considered bad.  Whether these LBAs
were correctable (remapped) or not is tracked by SMART on SCSI.  I can
provide many examples of this if people want to see what it looks like
(we have quite a collection of Fujitsu disks at my workplace.  They're
one of a few vendors I more or less boycott).

With SCSI, you can clear the grown defect list with ease.  Some drives
support clearing the physical defect list too, but doing that requires a
*true* low-level format to be done afterward.  In the case you issue a
SCSI FORMAT command, any grown defects (as the drive encounters them)
will be "merged" with the physical defect list.  When the FORMAT is
done, the drive will report 0 grown defects.  Again, I can confirm this
exact behaviour with our Fujitsu disks at my workplace; it's easy to get
a list of the physical and grown defects with SCSI.

With ATA/SATA disks it's a different story:

It seems vary from vendor to vendor and model to model.  The established
theory is that the drive has a list of spare LBAs for remappings, which
is managed entirely by the drive itself -- and not reported back to the
user via SMART or any other means.  This happens entirely without user
intervention, and (on repetitive errors) might show up as the drive
stalling on some I/O or other oddities.  These situations are not
reported back to the OS either -- it's entirely 100% transparent to the
user.

When an ATA/SATA disk begins reporting errors back via SMART, or to the
OS (e.g. I/O error), on certain LBA accesses, then the theory is that
the spare LBA list used by the drive internally has been exhausted, and
it will begin using a different spare list (or an extension of the
existing spares; I'm not sure).

What Diane's getting at (Hi Diane!) is that since the drive is already
to the stage/point of reporting errors back to the OS and SMART, it
means the drive has experienced problems (which it worked around) prior
to this point in time.  Hence her recommendation to replace the drive.

What I still have a bit of trouble stomaching these days is whether or
not the above theories are still used *today* in practise on SATA disks.
Part of me is inclined to believe that **any** errors are reported to
SMART and the OS, and the remapping is reported via SMART, etc.; e.g.
there's no more "transparent" anything.  The problem is that I don't
have a good way to confirm/deny this.

Oh what I'd give for good engineering contacts within Western Digital
and Seagate...

These days, I replace drives depending upon their age (Power_On_Hours)
combined with how many errors are seen and what kind of errors.  For
example, if I have a drive that's been in operation for 20,000 hours and
it now has 2 bad LBAs, I can accept that.  If I have a drive that's been
in operation for 48 hours and it has 30 errors, that drive is getting
RMA'd.

When I get new or RMA'd/refurbished drives, I test them before putting
them to use.  I do a read-only surface scan using SMART ("smartctl -t
select,0-max /dev/XXX") and let that finish.  Assuming no errors are
shown in the selective scan log, I then proceed with a full disk zero
("dd if=/dev/zero of=/dev/XXX bs=64k").  When finished I check SMART for
any errors.  If there are any, I RMA the drive -- or if it's been RMA'd
already, I get angry at the vendor.  :-)

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 01:39:22 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 06479106566B
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 01:39:22 +0000 (UTC)
	(envelope-from dan@langille.org)
Received: from nyi.unixathome.org (nyi.unixathome.org [64.147.113.42])
	by mx1.freebsd.org (Postfix) with ESMTP id CD7218FC0A
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 01:39:21 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
	by nyi.unixathome.org (Postfix) with ESMTP id 17AB250A09;
	Sat, 20 Aug 2011 01:39:21 +0000 (UTC)
X-Virus-Scanned: amavisd-new at unixathome.org
Received: from nyi.unixathome.org ([127.0.0.1])
	by localhost (nyi.unixathome.org [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id dptRzAcxz9Zj; Sat, 20 Aug 2011 02:39:20 +0100 (BST)
Received: from smtp-auth.unixathome.org (smtp-auth.unixathome.org [10.4.7.7])
	(Authenticated sender: hidden)
	by nyi.unixathome.org (Postfix) with ESMTPSA id 9E15A50A06  ;
	Sat, 20 Aug 2011 01:39:20 +0000 (UTC)
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset=us-ascii
From: Dan Langille <dan@langille.org>
In-Reply-To: <20110819232125.GA4965@icarus.home.lan>
Date: Fri, 19 Aug 2011 21:39:17 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <B6B0AD0F-A74C-4F2C-88B0-101443D7831A@langille.org>
References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org>
	<20110819232125.GA4965@icarus.home.lan>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
X-Mailer: Apple Mail (2.1084)
Cc: freebsd-stable@freebsd.org
Subject: Re: bad sector in gmirror HDD
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 01:39:22 -0000


On Aug 19, 2011, at 7:21 PM, Jeremy Chadwick wrote:

> On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote:
>> System in question: FreeBSD 8.2-STABLE #3: Thu Mar  3 04:52:04 GMT =
2011
>>=20
>> After a recent power failure, I'm seeing this in my logs:
>>=20
>> Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently =
unreadable (pending) sectors
>=20
> I doubt this is related to a power failure.
>=20
>> Searching on that error message, I was led to believe that =
identifying the bad sector and
>> running dd to read it would cause the HDD to reallocate that bad =
block.
>>=20
>>  http://smartmontools.sourceforge.net/badblockhowto.html
>=20
> This is incorrect (meaning you've misunderstood what's written there).
>=20
> Unreadable LBAs can be a result of the LBA being actually bad (as in
> uncorrectable), or the LBA being marked "suspect".  In either case the
> LBA will return an I/O error when read.
>=20
> If the LBAs are marked "suspect", the drive will perform re-analysis =
of
> the LBA (to determine if the LBA can be read and the data re-mapped, =
or
> if it cannot then the LBA is marked uncorrectable) when you **write** =
to
> the LBA.
>=20
> The above smartd output doesn't tell me much.  Providing actual SMART
> attribute data (smartctl -a) for the drive would help.  The brand of =
the
> drive, the firmware version, and the model all matter -- every drive
> behaves a little differently.

Information such as this?  =
http://beta.freebsddiary.org/smart-fixing-bad-sector.php


--=20
Dan Langille - http://langille.org


From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 01:53:10 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0467D1065670;
	Sat, 20 Aug 2011 01:53:10 +0000 (UTC) (envelope-from hrs@FreeBSD.org)
Received: from mail.allbsd.org (gatekeeper-int.allbsd.org
	[IPv6:2001:2f0:104:e002::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 819D48FC0A;
	Sat, 20 Aug 2011 01:53:07 +0000 (UTC)
Received: from alph.allbsd.org ([IPv6:2001:2f0:104:e010:862b:2bff:febc:8956])
	(authenticated bits=128)
	by mail.allbsd.org (8.14.4/8.14.4) with ESMTP id p7K1qlhV028971;
	Sat, 20 Aug 2011 10:52:57 +0900 (JST) (envelope-from hrs@FreeBSD.org)
Received: from localhost (localhost [IPv6:::1]) (authenticated bits=0)
	by alph.allbsd.org (8.14.4/8.14.4) with ESMTP id p7K1qivo090479;
	Sat, 20 Aug 2011 10:52:45 +0900 (JST) (envelope-from hrs@FreeBSD.org)
Date: Sat, 20 Aug 2011 10:52:29 +0900 (JST)
Message-Id: <20110820.105229.834911491934932780.hrs@allbsd.org>
To: attilio@FreeBSD.org
From: Hiroki Sato <hrs@FreeBSD.org>
In-Reply-To: <CAJ-FndDHmwa+=LNGgU+5MK2Xmtj8kWHB10JsoytkMGEtVgncYw@mail.gmail.com>
References: <20110819003759.GC54831@libertas.local.camdensoftware.com>
	<4E4E5D49.4040502@sentex.net>
	<CAJ-FndDHmwa+=LNGgU+5MK2Xmtj8kWHB10JsoytkMGEtVgncYw@mail.gmail.com>
X-PGPkey-fingerprint: BDB3 443F A5DD B3D0 A530  FFD7 4F2C D3D8 2793 CF2D
X-Mailer: Mew version 6.3 on Emacs 23.1 / Mule 6.0 (HANACHIRUSATO)
Mime-Version: 1.0
Content-Type: Multipart/Signed; protocol="application/pgp-signature";
	micalg=pgp-sha1;
	boundary="--Security_Multipart(Sat_Aug_20_10_52_29_2011_674)--"
Content-Transfer-Encoding: 7bit
X-Virus-Scanned: clamav-milter 0.97 at gatekeeper.allbsd.org
X-Virus-Status: Clean
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.3
	(mail.allbsd.org [IPv6:2001:2f0:104:e001::32]);
	Sat, 20 Aug 2011 10:52:57 +0900 (JST)
X-Spam-Status: No, score=-104.6 required=13.0 tests=BAYES_00,
	CONTENT_TYPE_PRESENT, RDNS_NONE, SPF_SOFTFAIL,
	USER_IN_WHITELIST autolearn=no version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	gatekeeper.allbsd.org
Cc: kostikbel@gmail.com, nick@desert.net, freebsd-stable@FreeBSD.org,
	avg@FreeBSD.org
Subject: Re: panic: spin lock held too long (RELENG_8 from today)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 01:53:10 -0000

----Security_Multipart(Sat_Aug_20_10_52_29_2011_674)--
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Attilio Rao <attilio@freebsd.org> wrote
  in <CAJ-FndDHmwa+=LNGgU+5MK2Xmtj8kWHB10JsoytkMGEtVgncYw@mail.gmail.com>:

at> If nobody complains about it earlier, I'll propose the patch to re@ in 8 hours.

 Running fine for 45 hours so far.  Please go ahead!

-- Hiroki

----Security_Multipart(Sat_Aug_20_10_52_29_2011_674)--
Content-Type: application/pgp-signature
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iEYEABECAAYFAk5PE10ACgkQTyzT2CeTzy3lWwCfUKrro8MGV4zpxKks9mpTEPZS
OfsAoNeFETyjH+4n+IJZdwwF5ITdjNHB
=JoJG
-----END PGP SIGNATURE-----

----Security_Multipart(Sat_Aug_20_10_52_29_2011_674)----

From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 03:24:41 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4F8D0106566B
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 03:24:41 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta05.emeryville.ca.mail.comcast.net
	(qmta05.emeryville.ca.mail.comcast.net [76.96.30.48])
	by mx1.freebsd.org (Postfix) with ESMTP id 350BC8FC08
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 03:24:40 +0000 (UTC)
Received: from omta12.emeryville.ca.mail.comcast.net ([76.96.30.44])
	by qmta05.emeryville.ca.mail.comcast.net with comcast
	id NTLZ1h0010x6nqcA5TQc9p; Sat, 20 Aug 2011 03:24:36 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta12.emeryville.ca.mail.comcast.net with comcast
	id NTQZ1h01A1t3BNj8YTQaet; Sat, 20 Aug 2011 03:24:34 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 8ACEF102C1A; Fri, 19 Aug 2011 20:24:38 -0700 (PDT)
Date: Fri, 19 Aug 2011 20:24:38 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Dan Langille <dan@langille.org>
Message-ID: <20110820032438.GA21925@icarus.home.lan>
References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org>
	<20110819232125.GA4965@icarus.home.lan>
	<B6B0AD0F-A74C-4F2C-88B0-101443D7831A@langille.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <B6B0AD0F-A74C-4F2C-88B0-101443D7831A@langille.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-stable@freebsd.org
Subject: Re: bad sector in gmirror HDD
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 03:24:41 -0000

On Fri, Aug 19, 2011 at 09:39:17PM -0400, Dan Langille wrote:
> 
> On Aug 19, 2011, at 7:21 PM, Jeremy Chadwick wrote:
> 
> > On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote:
> >> System in question: FreeBSD 8.2-STABLE #3: Thu Mar  3 04:52:04 GMT 2011
> >> 
> >> After a recent power failure, I'm seeing this in my logs:
> >> 
> >> Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently unreadable (pending) sectors
> > 
> > I doubt this is related to a power failure.
> > 
> >> Searching on that error message, I was led to believe that identifying the bad sector and
> >> running dd to read it would cause the HDD to reallocate that bad block.
> >> 
> >>  http://smartmontools.sourceforge.net/badblockhowto.html
> > 
> > This is incorrect (meaning you've misunderstood what's written there).
> > 
> > Unreadable LBAs can be a result of the LBA being actually bad (as in
> > uncorrectable), or the LBA being marked "suspect".  In either case the
> > LBA will return an I/O error when read.
> > 
> > If the LBAs are marked "suspect", the drive will perform re-analysis of
> > the LBA (to determine if the LBA can be read and the data re-mapped, or
> > if it cannot then the LBA is marked uncorrectable) when you **write** to
> > the LBA.
> > 
> > The above smartd output doesn't tell me much.  Providing actual SMART
> > attribute data (smartctl -a) for the drive would help.  The brand of the
> > drive, the firmware version, and the model all matter -- every drive
> > behaves a little differently.
> 
> Information such as this?  http://beta.freebsddiary.org/smart-fixing-bad-sector.php

Yes, perfect.  Thank you.  First thing first: upgrade smartmontools to
5.41.  Your attributes will be the same after you do this (the drive is
already in smartmontools' internal drive DB), but I often have to remind
people that they really need to keep smartmontools updated as often as
possible.  The changes between versions are vast; this is especially
important for people with SSDs (I'm responsible for submitting some
recent improvements for Intel 320 and 510 SSDs).

Anyway, the drive (albeit an old PATA Maxtor) appears to have three
anomalies:

1) One confirmed reallocated LBA (SMART attribute 5)

2) One "suspect" LBA (SMART attribute 197)

3) A very high temperature of 51C (SMART attribute 194).  If this drive
is in an enclosure or in a system with no fans this would be
understandable, otherwise this is a bit high.  My home workstation which
has only one case fan has a drive with more platters than your Maxtor,
and it idles at ~38C.  Possibly this drive has been undergoing constant
I/O recently (which does greatly increase drive temperature)?  Not sure.
I'm not going to focus too much on this one.

The SMART error log also indicates an LBA failure at the 26000 hour mark
(which is 16 hours prior to when you did smartctl -a /dev/ad2).  Whether
that LBA is the remapped one or the suspect one is unknown.  The LBA was
5566440.

The SMART tests you did didn't really amount to anything; no surprise.
short and long tests usually do not test the surface of the disk.  There
are some drives which do it on a long test, but as I said before,
everything varies from drive to drive.

Furthermore, on this model of drive, you cannot do a surface scans via
SMART.  Bummer.  That's indicated in the "Offline data collection
capabilities" section at the top, where it reads:

	No Selective Self-test supported.

So you'll have to use the dd method.  This takes longer than if surface
scanning was supported by the drive, but is acceptable.  I'll get to how
to go about that in a moment.

The reallocated LBA cannot be dealt with aside from re-creating the
filesystem and telling it not to use the LBA.  I see no flags in
newfs(8) that indicate a way to specify LBAs to avoid.  And we don't
know what LBA it is so we can't refer to it right now anyway.

As I said previously, I have no idea how UFS/FFS deals with this.  Using
fsck(8) is not sufficient; fsck does not attempt reading every LBA on
the disk or every LBA that makes up the data portions of an inode.  It
only examines the "structure" of the filesystem.  Is it possible the
remapped LBA lived within a structure region and not data?  Yes.  Is it
likely?  Given the size of the disk, probably not.

As mentioned previously too, there's badsect(8) but I don't know if it
works correctly on present-day FreeBSD, if it works with larger drives,
on 64-bit, etc...  You get the idea.  Plus as I said I don't know what
LBA to tell it to avoid.  You also need to keep something in mind: the
terms "sector" and "LBA" are in some ways interchangeable and in other
ways aren't.  I use the term LBA because nobody in their right mind uses
CHS addressing any more.  badsect(8) claims it wants sectors, which I
want to assume are LBAs.

I hope someone familiar with UFS/FFS can explain how to go about this
process for UFS/FFS.

As for ZFS (because I know someone will ask) -- AFAIK there is no
mechanism to deal with excluding certain LBAs from use.  The attitude is
that disks are cheap, if you see errors replace the disk.  I agree with
this attitude.  You can "deal with" the error with ZFS if the pool
consists of a mirror or raidzN, but you'll never be able to rid yourself
of seeing R/W/CKSUM errors or possible I/O timeouts when accessing those
LBAs.  That's just how it goes.

Anyway -- as for the "suspect" LBA -- we can absolutely determine what
this one is and submit a write request to it to see if it turns out to
be bad (uncorrectable) or if it's remappable.  If remapped, see above
explanation.

Below is a script I wrote for scanning disks with dd.  See script
comments for how to use it.  Quite simple.  Things to note about the
script because I'm 100% certain people will get all spun up about it:

1) It assumes 512-byte LBAs.  Using this on an SSD or a 4KB-sector drive
is probably not wise.

2) It's slow ("unintelligent").  This is by choice -- I wanted to keep
it simple.  It reads 512 bytes at a time, rather than larger chunks
(e.g. 64k) and then "work down" to a smaller size when it encounters a
read error to determine what LBA is responsible.  I wanted something
that "just worked" and wasn't fancy.  There may be alternate utilities
out there which do this (dd_rescue?).

3) I needed something that worked on Solaris and FreeBSD regardless of
disk type.  We use PATA, SATA, and SCSI disks at my workplace, and
smartmontools really needs a rehaul for SATA on Solaris; so, shell
scripting for the win.

4) I needed something that didn't depend on third-party tools I had to
compile or deal with (see #5 though).

5) The hashbang refers to bash, though there aren't "bash-isms" in the
script.  The reason for this is Solaris; /bin/sh there is a non-evolved
travesty that I loathe, so I write everything using /usr/local/bin/bash.
You could, on FreeBSD, change this to /bin/sh and it should just work.

That said:

http://jdc.parodius.com/freebsd/bad_block_scan

If you run this on your ad2 drive, I'm hoping what you'll find are two
LBAs which can't be read -- one will be the remapped LBA and one will be
the "suspect" LBA.  If you only get one LBA error then that's fine too,
and will be the "suspect" LBA.

Once you have the LBA(s), you can submit writes to them to get the drive
to re-analyse them (assuming they're "suspect"):

dd if=/dev/zero of=/dev/XXX bs=512 count=1 seek=NNNNN

Where XXX is the device and NNNNN is the LBA number.

If this works properly, the dd command should sit there for a little bit
(as the drive does its re-analysis magic) and then should complete.

You'll want to check SMART stats after that; you should see
Current_Pending_Sector drop to 0.  If Offline_Uncorrectable incremented
then the LBA could not be re-read/remapped.  If Reallocated_Sector_Ct
incremented then you now have a total of 2 LBAs which are remapped.  In
the case of remapping, you get to deal with the UFS/FFS thing above.
To get the stats to update in this situation you *might* (but probably
not) have to run "smartctl -t offline /dev/XXX".

You might also be wondering "that dd command writes 512 bytes of zero to
that LBA; what about the old data that was there, in the case that the
drive remaps the LBA?"  This is a great question, and one I've never
actually taken the time to answer because at this present time I have
absolutely *no* bad disks in my possession.  I'm under the impression
that the write does in fact write zeros if the LBA is remapped, but that
might not be true at all.  I've been waiting to test this for quite some
time and document it/write about it.

I still suggest you replace the drive, although given its age I doubt
you'll be able to find a suitable replacement.  I tend to keep disks
like this around for testing/experimental purposes and not for actual
use.

Good luck!

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 03:47:38 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A1C7C106564A
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 03:47:38 +0000 (UTC)
	(envelope-from wblock@wonkity.com)
Received: from wonkity.com (wonkity.com [67.158.26.137])
	by mx1.freebsd.org (Postfix) with ESMTP id 5E0ED8FC13
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 03:47:38 +0000 (UTC)
Received: from wonkity.com (localhost [127.0.0.1])
	by wonkity.com (8.14.5/8.14.5) with ESMTP id p7K3lbQ3082870;
	Fri, 19 Aug 2011 21:47:37 -0600 (MDT)
	(envelope-from wblock@wonkity.com)
Received: from localhost (wblock@localhost)
	by wonkity.com (8.14.5/8.14.5/Submit) with ESMTP id p7K3lbws082867;
	Fri, 19 Aug 2011 21:47:37 -0600 (MDT)
	(envelope-from wblock@wonkity.com)
Date: Fri, 19 Aug 2011 21:47:37 -0600 (MDT)
From: Warren Block <wblock@wonkity.com>
To: Chuck Swiger <cswiger@mac.com>
In-Reply-To: <65474D95-F56F-4DC7-8029-BA7166C4E46F@mac.com>
Message-ID: <alpine.BSF.2.00.1108192143160.82697@wonkity.com>
References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org>
	<65474D95-F56F-4DC7-8029-BA7166C4E46F@mac.com>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7
	(wonkity.com [127.0.0.1]); Fri, 19 Aug 2011 21:47:37 -0600 (MDT)
Cc: freebsd-stable@freebsd.org, Dan Langille <dan@langille.org>
Subject: Re: bad sector in gmirror HDD
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 03:47:38 -0000

On Fri, 19 Aug 2011, Chuck Swiger wrote:

> Reading the underlying failing drive with dd will help identify any 
> other questionable sectors.  However, your drive temps are too high-- 
> many vendors call out either 50C or 55C as the point where drive 
> reliability becomes significantly degraded.

The high temperature could be due to impending drive failure.  I've seen 
that exact situation with a failing WD notebook drive.  Lots of read 
failures, and it got very hot.  The same model replacement drive ran 
normally, just warm.

From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 06:43:38 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7A530106566C
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 06:43:38 +0000 (UTC)
	(envelope-from daniel@digsys.bg)
Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230])
	by mx1.freebsd.org (Postfix) with ESMTP id 0501A8FC13
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 06:43:37 +0000 (UTC)
Received: from digsys236-136.pip.digsys.bg (digsys236-136.pip.digsys.bg
	[193.68.136.236]) (authenticated bits=0)
	by smtp-sofia.digsys.bg (8.14.4/8.14.4) with ESMTP id p7K6hO2n009616
	(version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO);
	Sat, 20 Aug 2011 09:43:30 +0300 (EEST)
	(envelope-from daniel@digsys.bg)
Mime-Version: 1.0 (Apple Message framework v1244.3)
Content-Type: text/plain; charset=us-ascii
From: Daniel Kalchev <daniel@digsys.bg>
In-Reply-To: <20110820032438.GA21925@icarus.home.lan>
Date: Sat, 20 Aug 2011 09:43:23 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <65623662-0232-4599-B633-6D207A4CF15A@digsys.bg>
References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org>
	<20110819232125.GA4965@icarus.home.lan>
	<B6B0AD0F-A74C-4F2C-88B0-101443D7831A@langille.org>
	<20110820032438.GA21925@icarus.home.lan>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
X-Mailer: Apple Mail (2.1244.3)
Cc: freebsd-stable@freebsd.org, Dan Langille <dan@langille.org>
Subject: Re: bad sector in gmirror HDD
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 06:43:38 -0000


On Aug 20, 2011, at 06:24 , Jeremy Chadwick wrote:

> You might also be wondering "that dd command writes 512 bytes of zero =
to
> that LBA; what about the old data that was there, in the case that the
> drive remaps the LBA?"

If you write zeros at OS level to an LBA, you will end up with zeros at =
that LBA. What else did you expect???

The already remapped LBAs in ATA are not visible anymore to the user/OS. =
You get a perfectly readable sector. Of course not at the original =
location, but as you confirmed we are done with CHS addressing.

The pending bad sectors are almost always 'corrected', that is, remapped =
when you write to that LBA.

So your script will find only one readable sector and that will be the =
sector that is pending reallocation.

It may be that writing zeros to all free space, like

dd if=3D/dev/zero of=3D/filesystem/zero bs=3D1m; rm /filesystem/zero

is enough to remap the pending bad block and not have any unreadable =
sectors. But if the unreadable sector is in a file or directory -- bad =
luck -- these will need to be rewritten.

Once upon a time, BSD/OS had wonderful disk 'repair' utility. It could =
detect failing disks by reading every sector (had nice visual), or could =
re-write the drive by reading and writing back every sector. On bad =
blocks it would retry lots of times and eventually average what was read =
(with error).
Having said that, I doubt modern ATA drives will let anything be read by =
the pending bad block, but.. who knows.

Daniel


From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 10:02:32 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 03FB9106564A;
	Sat, 20 Aug 2011 10:02:32 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 157338FC08;
	Sat, 20 Aug 2011 10:02:30 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id NAA10374;
	Sat, 20 Aug 2011 13:02:27 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
	by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1QuiNn-000NrY-IE; Sat, 20 Aug 2011 13:02:27 +0300
Message-ID: <4E4F8631.1070300@FreeBSD.org>
Date: Sat, 20 Aug 2011 13:02:25 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:6.0) Gecko/20110819 Thunderbird/6.0
MIME-Version: 1.0
To: Steven Hartland <killing@multiplay.co.uk>
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><A71C3ACF01EC4D36871E49805C1A5321@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk><4E43E272.1060204@FreeBSD.org><62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk><4E440865.1040500@FreeBSD.org><6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk><4E441314.6060606@FreeBSD.org><2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk><4E48D967.9060804@FreeBSD.org><9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk><4E490DAF.1080009@FreeBSD.org><796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk><4E491D01.1090902@FreeBSD.org><570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk><4E4AD35C.7020504@FreeBSD.org><6A7238AED44542A880B082A40304D940@multiplay.co.uk><4E4BA21F.6010805@FreeBSD.org><581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk><4E4BBA7F.30907@FreeBSD.org><88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk>
	<4E4C22D6.6070407@FreeBSD.org>
	<4019027648B5493AAC4B654BD821DE88@multiplay.co.! uk>
In-Reply-To: <4019027648B5493AAC4B654BD821DE88@multiplay.co.uk>
X-Enigmail-Version: undefined
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 10:02:32 -0000

on 18/08/2011 02:15 Steven Hartland said the following:
> In a nutshell the jail manager we're using will attempt to resurrect the jail
> from a dieing state in a few specific scenarios.
> 
> Here's an exmaple:-
> 1. jail restart requested
> 2. jail is stopped, so the java processes is killed off, but active tcp sessions
> may prevent the timely full shutdown of the jail.
> 3. if an existing jail is detected, i.e. a dieing jail from #2, instead of
> starting a new jail we attach to the old one and exec the new java process.
> 4. if an existing jail isnt detected, i.e. where there where not hanging tcp
> sessions and #2 cleanly shutdown the jail, a new jail is created, attached to
> and the java exec'ed.
> 
> The system uses static jailid's so its possible to determine if an existing
> jail for this "service" exists or not. This prevents duplicate services as
> well as making services easy to identify by their jailid.
> 
> So what we could be seeing is a race between the jail shutdown and the attach
> of the new process?

Not a jail expert at all, but a few suggestions...

First, wouldn't the 'persist' jail option simplify your life a little bit?

Second, you may want to try to monitor value of prison0.pr_uref variable (e.g.
via kgdb) while executing various scenarios of what you do now.  If after
finishing a certain scenario you end up with a value lower than at the start of
scenario, then this is the troublesome one.
Please note that prison0.pr_uref is composed from a number of non-jailed
processes plus a number of top-level jails.  So take this into account when
comparing prison0.pr_uref values - it's better to record the initial value when
no jails are started and it's important to keep the number of non-jailed
processes the same (or to account for its changes).

-- 
Andriy Gapon

From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 10:10:55 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9C3F0106564A;
	Sat, 20 Aug 2011 10:10:55 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 91EF98FC17;
	Sat, 20 Aug 2011 10:10:44 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id NAA10423;
	Sat, 20 Aug 2011 13:10:42 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
	by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1QuiVl-000Nrj-VF; Sat, 20 Aug 2011 13:10:41 +0300
Message-ID: <4E4F8821.80108@FreeBSD.org>
Date: Sat, 20 Aug 2011 13:10:41 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:6.0) Gecko/20110819 Thunderbird/6.0
MIME-Version: 1.0
To: Steven Hartland <killing@multiplay.co.uk>
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk><4E43E272.1060204@FreeBSD.org><62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk><4E440865.1040500@FreeBSD.org><6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk><4E441314.6060606@FreeBSD.org><2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk><4E48D967.9060804@FreeBSD.org><9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk><4E490DAF.1080009@FreeBSD.org><796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk><4E491D01.1090902@FreeBSD.org><570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk><4E4AD35C.7020504@FreeBSD.org><6A7238AED44542A880B082A40304D940@multiplay.co.uk><4E4BA21F.6010805@FreeBSD.org><581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk><4E4BBA7F.30907@FreeBSD.org><88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk>
	<4E4C22D6.6070407@FreeBSD.org>
	<4019027648B5493AAC4B654BD821DE88@multiplay.co.uk>
	<4E4F8631.1070300@FreeBSD.org>
In-Reply-To: <4E4F8631.1070300@FreeBSD.org>
X-Enigmail-Version: undefined
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 10:10:55 -0000

on 20/08/2011 13:02 Andriy Gapon said the following:
> on 18/08/2011 02:15 Steven Hartland said the following:
>> In a nutshell the jail manager we're using will attempt to resurrect the jail
>> from a dieing state in a few specific scenarios.
>>
>> Here's an exmaple:-
>> 1. jail restart requested
>> 2. jail is stopped, so the java processes is killed off, but active tcp sessions
>> may prevent the timely full shutdown of the jail.
>> 3. if an existing jail is detected, i.e. a dieing jail from #2, instead of
>> starting a new jail we attach to the old one and exec the new java process.
>> 4. if an existing jail isnt detected, i.e. where there where not hanging tcp
>> sessions and #2 cleanly shutdown the jail, a new jail is created, attached to
>> and the java exec'ed.
>>
>> The system uses static jailid's so its possible to determine if an existing
>> jail for this "service" exists or not. This prevents duplicate services as
>> well as making services easy to identify by their jailid.
>>
>> So what we could be seeing is a race between the jail shutdown and the attach
>> of the new process?
> 
> Not a jail expert at all, but a few suggestions...
> 
> First, wouldn't the 'persist' jail option simplify your life a little bit?
> 
> Second, you may want to try to monitor value of prison0.pr_uref variable (e.g.
> via kgdb) while executing various scenarios of what you do now.  If after
> finishing a certain scenario you end up with a value lower than at the start of
> scenario, then this is the troublesome one.
> Please note that prison0.pr_uref is composed from a number of non-jailed
> processes plus a number of top-level jails.  So take this into account when
> comparing prison0.pr_uref values - it's better to record the initial value when
> no jails are started and it's important to keep the number of non-jailed
> processes the same (or to account for its changes).

BTW, I suspect the following scenario, but I am not able to verify it either via
testing or in the code:
- last process in a dying jail exits
- pr_uref of the jail reaches zero
- pr_uref of prison0 gets decremented
- you attach to the jail and resurrect it
- but pr_uref of prison0 stays decremented

Repeat this enough times and prison0.pr_uref reaches zero.
To reach zero even sooner just kill enough of non-jailed processes.

-- 
Andriy Gapon

From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 11:15:15 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4E9A2106566B
	for <stable@freebsd.org>; Sat, 20 Aug 2011 11:15:15 +0000 (UTC)
	(envelope-from wjw@digiware.nl)
Received: from mail.digiware.nl (mail.ip6.digiware.nl
	[IPv6:2001:4cb8:1:106::2])
	by mx1.freebsd.org (Postfix) with ESMTP id E189B8FC12
	for <stable@freebsd.org>; Sat, 20 Aug 2011 11:15:14 +0000 (UTC)
Received: from rack1.digiware.nl (localhost.digiware.nl [127.0.0.1])
	by mail.digiware.nl (Postfix) with ESMTP id EBC7615346B
	for <stable@freebsd.org>; Sat, 20 Aug 2011 13:15:13 +0200 (CEST)
X-Virus-Scanned: amavisd-new at digiware.nl
Received: from mail.digiware.nl ([127.0.0.1])
	by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new,
	port 10024) with ESMTP id zrnB3hqJ4huR for <stable@freebsd.org>;
	Sat, 20 Aug 2011 13:15:12 +0200 (CEST)
Received: from [IPv6:2001:4cb8:3:1:1d6a:c449:c682:7195] (unknown
	[IPv6:2001:4cb8:3:1:1d6a:c449:c682:7195])
	by mail.digiware.nl (Postfix) with ESMTP id F2A33153433
	for <stable@freebsd.org>; Sat, 20 Aug 2011 13:15:11 +0200 (CEST)
Message-ID: <4E4F973D.9070706@digiware.nl>
Date: Sat, 20 Aug 2011 13:15:09 +0200
From: Willem Jan Withagen <wjw@digiware.nl>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
	rv:6.0) Gecko/20110812 Thunderbird/6.0
MIME-Version: 1.0
To: "stable@freebsd.org" <stable@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: 
Subject: Remote installing
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 11:15:15 -0000

Hi,

Today I liked to live dangerously, and want to upgrade a backups server 
from i386 to amd64. Just to see if we could.
And otherwise I'd scap it and install from usb-stick.

So I have my server running amd64 build GENERIC.
export /, /var, /usr on the server to be upgraded.

But upgrading world dus have a snag already early on:

----
empty changed
         flags expected "schg" found "none" not modified: Operation not 
supported
----

This is probably where some program wants to set immutable flag on 
/var/tmp/empy...

But looks like NFS does not grok that.

Now I seen plenty of sugestions to do it this way, but never saw anybody 
come back with this complaint....

So I must be ommiting something ??

--WjW


From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 11:26:34 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E6941106566C
	for <stable@freebsd.org>; Sat, 20 Aug 2011 11:26:34 +0000 (UTC)
	(envelope-from wjw@digiware.nl)
Received: from mail.digiware.nl (mail.ip6.digiware.nl
	[IPv6:2001:4cb8:1:106::2])
	by mx1.freebsd.org (Postfix) with ESMTP id A8DC28FC0A
	for <stable@freebsd.org>; Sat, 20 Aug 2011 11:26:34 +0000 (UTC)
Received: from rack1.digiware.nl (localhost.digiware.nl [127.0.0.1])
	by mail.digiware.nl (Postfix) with ESMTP id 9A04A153434
	for <stable@freebsd.org>; Sat, 20 Aug 2011 13:26:33 +0200 (CEST)
X-Virus-Scanned: amavisd-new at digiware.nl
Received: from mail.digiware.nl ([127.0.0.1])
	by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new,
	port 10024) with ESMTP id A4Y-3r82JPf1 for <stable@freebsd.org>;
	Sat, 20 Aug 2011 13:26:31 +0200 (CEST)
Received: from [IPv6:2001:4cb8:3:1:1d6a:c449:c682:7195] (unknown
	[IPv6:2001:4cb8:3:1:1d6a:c449:c682:7195])
	by mail.digiware.nl (Postfix) with ESMTP id AEA2E153433
	for <stable@freebsd.org>; Sat, 20 Aug 2011 13:26:31 +0200 (CEST)
Message-ID: <4E4F99E4.8060009@digiware.nl>
Date: Sat, 20 Aug 2011 13:26:28 +0200
From: Willem Jan Withagen <wjw@digiware.nl>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
	rv:6.0) Gecko/20110812 Thunderbird/6.0
MIME-Version: 1.0
To: "stable@freebsd.org" <stable@freebsd.org>
References: <4E4F973D.9070706@digiware.nl>
In-Reply-To: <4E4F973D.9070706@digiware.nl>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: 
Subject: Re: Remote installing
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 11:26:35 -0000

On 2011-08-20 13:15, Willem Jan Withagen wrote:
> Hi,
>
> Today I liked to live dangerously, and want to upgrade a backups server
> from i386 to amd64. Just to see if we could.
> And otherwise I'd scap it and install from usb-stick.
>
> So I have my server running amd64 build GENERIC.
> export /, /var, /usr on the server to be upgraded.
>
> But upgrading world dus have a snag already early on:
>
> ----
> empty changed
> flags expected "schg" found "none" not modified: Operation not supported
> ----
>
> This is probably where some program wants to set immutable flag on
> /var/tmp/empy...
>
> But looks like NFS does not grok that.
>
> Now I seen plenty of sugestions to do it this way, but never saw anybody
> come back with this complaint....
>
> So I must be ommiting something ??

I looked at the work errors.
-----------
cd /mnt/; rm -f /mnt/sys; ln -s usr/src/sys sys
cd /mnt/usr/share/man/en.ISO8859-1; ln -sf ../man* .
ln: ./man1: Permission denied
ln: ./man1aout: Permission denied
ln: ./man2: Permission denied
ln: ./man3: Permission denied
ln: ./man4: Permission denied
ln: ./man5: Permission denied
ln: ./man6: Permission denied
ln: ./man7: Permission denied
ln: ./man8: Permission denied
ln: ./man9: Permission denied
---------

Which comes from the target distrib-dirs in etc

Why would an ln -sf like that fail....
the filesystems are exported with -maproot=0

--WjW


From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 13:24:51 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 05984106566B;
	Sat, 20 Aug 2011 13:24:51 +0000 (UTC)
	(envelope-from prvs=12137168ef=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	by mx1.freebsd.org (Postfix) with ESMTP id 1125A8FC0C;
	Sat, 20 Aug 2011 13:24:49 +0000 (UTC)
X-MDAV-Processed: mail1.multiplay.co.uk, Sat, 20 Aug 2011 14:13:34 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Sat, 20 Aug 2011 14:13:34 +0100
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	mail1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST
	shortcircuit=ham autolearn=disabled version=3.2.5
Received: from r2d2 ([188.220.16.49])
	by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	(MDaemon PRO v10.0.4) with ESMTP id md50014672793.msg;
	Sat, 20 Aug 2011 14:13:34 +0100
X-MDRemoteIP: 188.220.16.49
X-Return-Path: prvs=12137168ef=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
Message-ID: <4E55CB4A4F694A7997FEBDF9EADF87F5@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Andriy Gapon" <avg@FreeBSD.org>
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk><4E43E272.1060204@FreeBSD.org><62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk><4E440865.1040500@FreeBSD.org><6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk><4E441314.6060606@FreeBSD.org><2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk><4E48D967.9060804@FreeBSD.org><9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk><4E490DAF.1080009@FreeBSD.org><796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk><4E491D01.1090902@FreeBSD.org><570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk><4E4AD35C.7020504@FreeBSD.org><6A7238AED44542A880B082A40304D940@multiplay.co.uk><4E4BA21F.6010805@FreeBSD.org><581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk><4E4BBA7F.30907@FreeBSD.org><88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk>
	<4E4C22D6.6070407@FreeBSD.org>
	<4019027648B5493AAC4B654BD821DE88@multiplay.co.uk>
	<4E4F8631.1070300@FreeBSD.org> <4E4F8821.80108@ FreeBSD.org>
Date: Sat, 20 Aug 2011 14:14:15 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
Cc: freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 13:24:51 -0000

----- Original Message ----- 
From: "Andriy Gapon" <avg@FreeBSD.org>

> BTW, I suspect the following scenario, but I am not able to
> verify it either via testing or in the code:
> - last process in a dying jail exits
> - pr_uref of the jail reaches zero
> - pr_uref of prison0 gets decremented
> - you attach to the jail and resurrect it
> - but pr_uref of prison0 stays decremented
> 
> Repeat this enough times and prison0.pr_uref reaches zero.
> To reach zero even sooner just kill enough of non-jailed processes.

Ahh now that explains all of our experienced panic scenarios:-
1. jail stop / start causing the panic but only after at least a
few days worth of uptime.

Here what we're seeing is enough "leak" of pr_uref from the restarted
jails to decrement prison0.pr_uref to 0 even with all the standard
unjailed processes still running.

2. A machine reboot, after all jails have been stopped but after
less time than #2.

In this case we haven't seen enough leakage to decrement
prison0.pr_uref to 0 given the number or prison0 process but
it has been incorrectly decremented, so as soon as the reboot kicks
in and prison0 processes start exiting, prison0.pr_uref gets 
further decremented and again hits 0 when it shouldn't

Now if this is the case, we should be able to confirm it with a little
more info.

1. What exactly does pr_uref represent?
2. Can what its value should be, be calculated from examining other
details of the system i.e. number of running processes, number of
running jails?

If we can calculate the value that prison0.pr_uref should be, then
by examining the machines we have which have been up for a while,
we should be able to confirm if an incorrect value is present on
them and hence prove this is the case.

Ideally a little script to run in kgdb to test this would be the
best way to go.

    Regards
    Steve


================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 13:37:57 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DC2571065672;
	Sat, 20 Aug 2011 13:37:57 +0000 (UTC)
	(envelope-from hselasky@c2i.net)
Received: from swip.net (mailfe01.c2i.net [212.247.154.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 32A6F8FC19;
	Sat, 20 Aug 2011 13:37:56 +0000 (UTC)
X-Cloudmark-Score: 0.000000 []
X-Cloudmark-Analysis: v=1.1 cv=ELIg/Y9mGCPhWMyRcSlygtjWSLZJE4Mi+f/g6oC4Nzw=
	c=1 sm=1 a=SvYTsOw2Z4kA:10 a=EPV5yV1zpIAA:10 a=WQU8e4WWZSUA:10
	a=8nJEP1OIZ-IA:10 a=CL8lFSKtTFcA:10
	a=i9M/sDlu2rpZ9XS819oYzg==:17 a=_baaTq9UrmHnDPmMK9MA:9
	a=wPNLvfGTeEIA:10 a=i9M/sDlu2rpZ9XS819oYzg==:117
Received: from [188.126.198.129] (account mc467741@c2i.net HELO
	laptop002.hselasky.homeunix.org)
	by mailfe01.swip.net (CommuniGate Pro SMTP 5.2.19)
	with ESMTPA id 169096590; Sat, 20 Aug 2011 15:37:52 +0200
From: Hans Petter Selasky <hselasky@c2i.net>
To: Andriy Gapon <avg@freebsd.org>
Date: Sat, 20 Aug 2011 15:35:24 +0200
User-Agent: KMail/1.13.5 (FreeBSD/8.2-STABLE; KDE/4.4.5; amd64; ; )
References: <DA1FD6FD-2E57-4EC4-899D-2C1CBB769456@averesystems.com>
	<201108182324.58276.hselasky@c2i.net>
	<4E4E900D.8010506@FreeBSD.org>
In-Reply-To: <4E4E900D.8010506@FreeBSD.org>
X-Face: *nPdTl_}RuAI6^PVpA02T?$%Xa^>@hE0uyUIoiha$pC:9TVgl.Oq, NwSZ4V"|LR.+tj}g5
	%V,x^qOs~mnU3]Gn; cQLv&.N>TrxmSFf+p6(30a/{)KUU!s}w\IhQBj}[g}bj0I3^glmC(
	:AuzV9:.hESm-x4h240C`9=w
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201108201535.24061.hselasky@c2i.net>
Cc: freebsd-stable@freebsd.org
Subject: Re: USB/coredump hangs in 8 and 9
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 13:37:57 -0000

On Friday 19 August 2011 18:32:13 Andriy Gapon wrote:
> on 19/08/2011 00:24 Hans Petter Selasky said the following:
> > On Thursday 18 August 2011 19:04:10 Andriy Gapon wrote:
> >> If you can help Hans to figure out what you is wrong with USB subsystem
> >> in this respect that would help us all.
> > 
> > Hi,
> > 
> > usb_busdma.c:   /* we use "mtx_owned()" instead of this function */
> > usb_busdma.c:   owned = mtx_owned(uptag->mtx);
> > usb_compat_linux.c:     do_unlock = mtx_owned(&Giant) ? 0 : 1;
> > usb_compat_linux.c:     do_unlock = mtx_owned(&Giant) ? 0 : 1;
> > usb_compat_linux.c:     do_unlock = mtx_owned(&Giant) ? 0 : 1;
> > usb_hub.c:      if (mtx_owned(&bus->bus_mtx)) {
> > usb_transfer.c: if (!mtx_owned(info->xfer_mtx)) {
> > usb_transfer.c: if (mtx_owned(xfer->xroot->xfer_mtx)) {
> > usb_transfer.c:         while (mtx_owned(&xroot->udev->bus->bus_mtx)) {
> > usb_transfer.c:         while (mtx_owned(xroot->xfer_mtx)) {
> 
> > One fix you will need to do, if mtx_owned is not giving correct value is:
> First, could you please clarify what is the correct, or rather - expected,
> value in this case.  It's not immediately clear to me if we should
> consider all locks as owned or un-owned in a situation where all locks are
> actually skipped behind the scenes.
> Maybe USB code should explicitly check for that condition as to not make
> any unsafe assumptions.
> 
> Second, it's not clear to me what the above list actually represents in the
> context of this discussion.

Hi,

The mtx_owned() is not only used to assert mutex ownership, but also to figure 
out which context the function is being called from. If the correct mutex is 
not locked already we postpone the work until later. In the panic case, there 
is no way to postpone work, so this check should be skipped in case of panic, 
because there is no other thread to put work to.

--HPS

From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 15:51:03 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 10E9E106566C;
	Sat, 20 Aug 2011 15:51:03 +0000 (UTC)
	(envelope-from prvs=12137168ef=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	by mx1.freebsd.org (Postfix) with ESMTP id 2364E8FC15;
	Sat, 20 Aug 2011 15:51:01 +0000 (UTC)
X-MDAV-Processed: mail1.multiplay.co.uk, Sat, 20 Aug 2011 16:50:27 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Sat, 20 Aug 2011 16:50:27 +0100
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	mail1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST
	shortcircuit=ham autolearn=disabled version=3.2.5
Received: from r2d2 ([188.220.16.49])
	by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	(MDaemon PRO v10.0.4) with ESMTP id md50014673864.msg;
	Sat, 20 Aug 2011 16:50:26 +0100
X-MDRemoteIP: 188.220.16.49
X-Return-Path: prvs=12137168ef=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
Message-ID: <82E865FBA30747078AF6EE3C1701F973@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Andriy Gapon" <avg@FreeBSD.org>
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk><4E43E272.1060204@FreeBSD.org><62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk><4E440865.1040500@FreeBSD.org><6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk><4E441314.6060606@FreeBSD.org><2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk><4E48D967.9060804@FreeBSD.org><9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk><4E490DAF.1080009@FreeBSD.org><796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk><4E491D01.1090902@FreeBSD.org><570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk><4E4AD35C.7020504@FreeBSD.org><6A7238AED44542A880B082A40304D940@multiplay.co.uk><4E4BA21F.6010805@FreeBSD.org><581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk><4E4BBA7F.30907@FreeBSD.org><88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk><4E4C22D6.6070407@FreeBSD.org><4019027648B5493AAC4B654BD821DE88@multiplay.co.uk><4E4F8631.1070300@FreeBSD.org>
	<4E4F8821.80108@Fre eBSD.org>
Date: Sat, 20 Aug 2011 16:51:50 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
Cc: freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 15:51:03 -0000

----- Original Message ----- 
From: "Andriy Gapon" <avg@FreeBSD.org>
 
> BTW, I suspect the following scenario, but I am not able to verify it either via
> testing or in the code:
> - last process in a dying jail exits
> - pr_uref of the jail reaches zero
> - pr_uref of prison0 gets decremented
> - you attach to the jail and resurrect it
> - but pr_uref of prison0 stays decremented
> 
> Repeat this enough times and prison0.pr_uref reaches zero.
> To reach zero even sooner just kill enough of non-jailed processes.

I've just checked across a number of the panic dumps from the
past few days and they all have prison0.pr_uref = 0 which confirms
the cause of the panic.

I've tried scripting continuous jail start stops, but even after 1000's
of iterations have been unable to trigger this on my test machine, so
I'm going to dig into the jail code to see if I can find out how its
incorrectly decrementing prison0 via inspection.

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 16:46:06 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C76F2106566C
	for <freebsd-stable@FreeBSD.org>; Sat, 20 Aug 2011 16:46:06 +0000 (UTC)
	(envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 1C5138FC12
	for <freebsd-stable@FreeBSD.org>; Sat, 20 Aug 2011 16:46:05 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA12920;
	Sat, 20 Aug 2011 19:46:00 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
	by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1QuogK-000O3S-6U; Sat, 20 Aug 2011 19:46:00 +0300
Message-ID: <4E4FE4C5.9030305@FreeBSD.org>
Date: Sat, 20 Aug 2011 19:45:57 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:6.0) Gecko/20110819 Thunderbird/6.0
MIME-Version: 1.0
To: Hans Petter Selasky <hselasky@c2i.net>
References: <DA1FD6FD-2E57-4EC4-899D-2C1CBB769456@averesystems.com>
	<201108182324.58276.hselasky@c2i.net>
	<4E4E900D.8010506@FreeBSD.org>
	<201108201535.24061.hselasky@c2i.net>
In-Reply-To: <201108201535.24061.hselasky@c2i.net>
X-Enigmail-Version: undefined
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-stable@FreeBSD.org
Subject: Re: USB/coredump hangs in 8 and 9
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 16:46:06 -0000

on 20/08/2011 16:35 Hans Petter Selasky said the following:
> On Friday 19 August 2011 18:32:13 Andriy Gapon wrote:
>> on 19/08/2011 00:24 Hans Petter Selasky said the following:
>>> On Thursday 18 August 2011 19:04:10 Andriy Gapon wrote:
>>>> If you can help Hans to figure out what you is wrong with USB subsystem
>>>> in this respect that would help us all.
>>>
>>> Hi,
>>>
>>> usb_busdma.c:   /* we use "mtx_owned()" instead of this function */
>>> usb_busdma.c:   owned = mtx_owned(uptag->mtx);
>>> usb_compat_linux.c:     do_unlock = mtx_owned(&Giant) ? 0 : 1;
>>> usb_compat_linux.c:     do_unlock = mtx_owned(&Giant) ? 0 : 1;
>>> usb_compat_linux.c:     do_unlock = mtx_owned(&Giant) ? 0 : 1;
>>> usb_hub.c:      if (mtx_owned(&bus->bus_mtx)) {
>>> usb_transfer.c: if (!mtx_owned(info->xfer_mtx)) {
>>> usb_transfer.c: if (mtx_owned(xfer->xroot->xfer_mtx)) {
>>> usb_transfer.c:         while (mtx_owned(&xroot->udev->bus->bus_mtx)) {
>>> usb_transfer.c:         while (mtx_owned(xroot->xfer_mtx)) {
>>
>>> One fix you will need to do, if mtx_owned is not giving correct value is:
>> First, could you please clarify what is the correct, or rather - expected,
>> value in this case.  It's not immediately clear to me if we should
>> consider all locks as owned or un-owned in a situation where all locks are
>> actually skipped behind the scenes.
>> Maybe USB code should explicitly check for that condition as to not make
>> any unsafe assumptions.
>>
>> Second, it's not clear to me what the above list actually represents in the
>> context of this discussion.
> 
> Hi,
> 
> The mtx_owned() is not only used to assert mutex ownership, but also to figure 
> out which context the function is being called from. If the correct mutex is 
> not locked already we postpone the work until later. In the panic case, there 
> is no way to postpone work, so this check should be skipped in case of panic, 
> because there is no other thread to put work to.

Now I see, but still I can not make the conclusions...
So what would you suggest - should USB code explicitly check for panicstr (or
SCHEDULER_STOPPED in the future)?  Or what mutex_owned should return - true or
false?

-- 
Andriy Gapon

From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 16:48:30 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E35821065677;
	Sat, 20 Aug 2011 16:48:30 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id EE0C48FC1C;
	Sat, 20 Aug 2011 16:48:29 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA12940;
	Sat, 20 Aug 2011 19:48:27 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
	by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1Quoig-000O3V-LU; Sat, 20 Aug 2011 19:48:26 +0300
Message-ID: <4E4FE55A.9000101@FreeBSD.org>
Date: Sat, 20 Aug 2011 19:48:26 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:6.0) Gecko/20110819 Thunderbird/6.0
MIME-Version: 1.0
To: Steven Hartland <killing@multiplay.co.uk>
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E43E272.1060204@FreeBSD.org><62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk><4E440865.1040500@FreeBSD.org><6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk><4E441314.6060606@FreeBSD.org><2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk><4E48D967.9060804@FreeBSD.org><9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk><4E490DAF.1080009@FreeBSD.org><796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk><4E491D01.1090902@FreeBSD.org><570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk><4E4AD35C.7020504@FreeBSD.org><6A7238AED44542A880B082A40304D940@multiplay.co.uk><4E4BA21F.6010805@FreeBSD.org><581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk><4E4BBA7F.30907@FreeBSD.org><88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk><4E4C22D6.6070407@FreeBSD.org><4019027648B5493AAC4B654BD821DE88@multiplay.co.uk><4E4F8631.1070300@FreeBSD.org>
	<4E4F8821.80108@Fre eBSD.org>
	<82E865FBA30747078AF6EE3C1701F973@multiplay.co.uk>
In-Reply-To: <82E865FBA30747078AF6EE3C1701F973@multiplay.co.uk>
X-Enigmail-Version: undefined
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 16:48:31 -0000

on 20/08/2011 18:51 Steven Hartland said the following:
> ----- Original Message ----- From: "Andriy Gapon" <avg@FreeBSD.org>
> 
>> BTW, I suspect the following scenario, but I am not able to verify it either via
>> testing or in the code:
>> - last process in a dying jail exits
>> - pr_uref of the jail reaches zero
>> - pr_uref of prison0 gets decremented
>> - you attach to the jail and resurrect it
>> - but pr_uref of prison0 stays decremented
>>
>> Repeat this enough times and prison0.pr_uref reaches zero.
>> To reach zero even sooner just kill enough of non-jailed processes.
> 
> I've just checked across a number of the panic dumps from the
> past few days and they all have prison0.pr_uref = 0 which confirms
> the cause of the panic.
> 
> I've tried scripting continuous jail start stops, but even after 1000's
> of iterations have been unable to trigger this on my test machine, so
> I'm going to dig into the jail code to see if I can find out how its
> incorrectly decrementing prison0 via inspection.

Steve,

thanks for doing this!  I'll reiterate my suspicion just in case - I think that
you should look for the cases where you stop a jail, but then re-attach and
resurrect the jail before it's completely dead.

-- 
Andriy Gapon

From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 16:56:53 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 77D68106564A;
	Sat, 20 Aug 2011 16:56:53 +0000 (UTC)
	(envelope-from hselasky@c2i.net)
Received: from swip.net (mailfe03.c2i.net [212.247.154.66])
	by mx1.freebsd.org (Postfix) with ESMTP id CD8EE8FC1C;
	Sat, 20 Aug 2011 16:56:52 +0000 (UTC)
X-Cloudmark-Score: 0.000000 []
X-Cloudmark-Analysis: v=1.1 cv=Ic1eHMOXbQHcCvhs/sz3xt2crOpE4ZQ8e7+3c6x+FwY=
	c=1 sm=1 a=SvYTsOw2Z4kA:10 a=EPV5yV1zpIAA:10 a=WQU8e4WWZSUA:10
	a=8nJEP1OIZ-IA:10 a=CL8lFSKtTFcA:10
	a=i9M/sDlu2rpZ9XS819oYzg==:17 a=SkclzYauFElIErlVZ5wA:9
	a=wPNLvfGTeEIA:10 a=i9M/sDlu2rpZ9XS819oYzg==:117
Received: from [188.126.198.129] (account mc467741@c2i.net HELO
	laptop002.hselasky.homeunix.org)
	by mailfe03.swip.net (CommuniGate Pro SMTP 5.2.19)
	with ESMTPA id 804488; Sat, 20 Aug 2011 18:56:49 +0200
From: Hans Petter Selasky <hselasky@c2i.net>
To: Andriy Gapon <avg@freebsd.org>
Date: Sat, 20 Aug 2011 18:54:21 +0200
User-Agent: KMail/1.13.5 (FreeBSD/8.2-STABLE; KDE/4.4.5; amd64; ; )
References: <DA1FD6FD-2E57-4EC4-899D-2C1CBB769456@averesystems.com>
	<201108201535.24061.hselasky@c2i.net>
	<4E4FE4C5.9030305@FreeBSD.org>
In-Reply-To: <4E4FE4C5.9030305@FreeBSD.org>
X-Face: *nPdTl_}RuAI6^PVpA02T?$%Xa^>@hE0uyUIoiha$pC:9TVgl.Oq, NwSZ4V"|LR.+tj}g5
	%V,x^qOs~mnU3]Gn; cQLv&.N>TrxmSFf+p6(30a/{)KUU!s}w\IhQBj}[g}bj0I3^glmC(
	:AuzV9:.hESm-x4h240C`9=w
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201108201854.21180.hselasky@c2i.net>
Cc: freebsd-stable@freebsd.org
Subject: Re: USB/coredump hangs in 8 and 9
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 16:56:53 -0000

On Saturday 20 August 2011 18:45:57 Andriy Gapon wrote:
> SCHEDULER_STOPPED

The USB code needs to check for the SCHEDULER_STOPPED and cold at the present 
moment. If this state can be set during bootup, and cleared at the same time 
like "cold", it would be very good.

--HPS

From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 17:09:09 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 487091065670
	for <freebsd-stable@FreeBSD.org>; Sat, 20 Aug 2011 17:09:09 +0000 (UTC)
	(envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 8FD658FC08
	for <freebsd-stable@FreeBSD.org>; Sat, 20 Aug 2011 17:09:08 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id UAA13081;
	Sat, 20 Aug 2011 20:09:04 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
	by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1Qup2d-000O4E-Ph; Sat, 20 Aug 2011 20:09:03 +0300
Message-ID: <4E4FEA2E.7050209@FreeBSD.org>
Date: Sat, 20 Aug 2011 20:09:02 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:6.0) Gecko/20110819 Thunderbird/6.0
MIME-Version: 1.0
To: Hans Petter Selasky <hselasky@c2i.net>
References: <DA1FD6FD-2E57-4EC4-899D-2C1CBB769456@averesystems.com>
	<201108201535.24061.hselasky@c2i.net>
	<4E4FE4C5.9030305@FreeBSD.org>
	<201108201854.21180.hselasky@c2i.net>
In-Reply-To: <201108201854.21180.hselasky@c2i.net>
X-Enigmail-Version: undefined
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-stable@FreeBSD.org
Subject: Re: USB/coredump hangs in 8 and 9
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 17:09:09 -0000

on 20/08/2011 19:54 Hans Petter Selasky said the following:
> On Saturday 20 August 2011 18:45:57 Andriy Gapon wrote:
>> SCHEDULER_STOPPED
> 
> The USB code needs to check for the SCHEDULER_STOPPED and cold at the present 
> moment. If this state can be set during bootup, and cleared at the same time 
> like "cold", it would be very good.

Sorry again - not sure if I follow.
SCHEDULER_STOPPED is supposed to be set on panic and never be reset.  It's like
a mirror of 'cold' in a sense.

-- 
Andriy Gapon

From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 17:21:15 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 484E71065670;
	Sat, 20 Aug 2011 17:21:15 +0000 (UTC)
	(envelope-from hselasky@c2i.net)
Received: from swip.net (mailfe07.c2i.net [212.247.154.194])
	by mx1.freebsd.org (Postfix) with ESMTP id 9EF1E8FC0A;
	Sat, 20 Aug 2011 17:21:14 +0000 (UTC)
X-Cloudmark-Score: 0.000000 []
X-Cloudmark-Analysis: v=1.1 cv=ND3JYWI3bJ4ZiXLCJEAs5I5grFUWsY+sOY5HCnTiTok=
	c=1 sm=1 a=SvYTsOw2Z4kA:10 a=EPV5yV1zpIAA:10 a=WQU8e4WWZSUA:10
	a=8nJEP1OIZ-IA:10 a=CL8lFSKtTFcA:10
	a=i9M/sDlu2rpZ9XS819oYzg==:17 a=_c88UKqYe0xIkabUrEwA:9
	a=wPNLvfGTeEIA:10 a=i9M/sDlu2rpZ9XS819oYzg==:117
Received: from [188.126.198.129] (account mc467741@c2i.net HELO
	laptop002.hselasky.homeunix.org)
	by mailfe07.swip.net (CommuniGate Pro SMTP 5.2.19)
	with ESMTPA id 168343860; Sat, 20 Aug 2011 19:21:12 +0200
From: Hans Petter Selasky <hselasky@c2i.net>
To: Andriy Gapon <avg@freebsd.org>
Date: Sat, 20 Aug 2011 19:18:43 +0200
User-Agent: KMail/1.13.5 (FreeBSD/8.2-STABLE; KDE/4.4.5; amd64; ; )
References: <DA1FD6FD-2E57-4EC4-899D-2C1CBB769456@averesystems.com>
	<201108201854.21180.hselasky@c2i.net>
	<4E4FEA2E.7050209@FreeBSD.org>
In-Reply-To: <4E4FEA2E.7050209@FreeBSD.org>
X-Face: *nPdTl_}RuAI6^PVpA02T?$%Xa^>@hE0uyUIoiha$pC:9TVgl.Oq, NwSZ4V"|LR.+tj}g5
	%V,x^qOs~mnU3]Gn; cQLv&.N>TrxmSFf+p6(30a/{)KUU!s}w\IhQBj}[g}bj0I3^glmC(
	:AuzV9:.hESm-x4h240C`9=w
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201108201918.43978.hselasky@c2i.net>
Cc: freebsd-stable@freebsd.org
Subject: Re: USB/coredump hangs in 8 and 9
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 17:21:15 -0000

On Saturday 20 August 2011 19:09:02 Andriy Gapon wrote:
> on 20/08/2011 19:54 Hans Petter Selasky said the following:
> > On Saturday 20 August 2011 18:45:57 Andriy Gapon wrote:
> >> SCHEDULER_STOPPED
> > 
> > The USB code needs to check for the SCHEDULER_STOPPED and cold at the
> > present moment. If this state can be set during bootup, and cleared at
> > the same time like "cold", it would be very good.
> 
> Sorry again - not sure if I follow.
> SCHEDULER_STOPPED is supposed to be set on panic and never be reset.  It's
> like a mirror of 'cold' in a sense.

OK. Then you should add a test "&& !SCHEDULER_STOPPED" where I pointed out:

static void
usbd_callback_wrapper(struct usb_xfer_queue *pq)
{
        struct usb_xfer *xfer = pq->curr;
        struct usb_xfer_root *info = xfer->xroot;

        USB_BUS_LOCK_ASSERT(info->bus, MA_OWNED);
        if (!mtx_owned(info->xfer_mtx) && !SCHEDULER_STOPPED) {
                /*
                 * Cases that end up here:
                 *

And also ensure that no mutex asserts can trigger further panics.

--HPS

From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 17:34:45 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 405BC106566C
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 17:34:45 +0000 (UTC)
	(envelope-from dan@langille.org)
Received: from nyi.unixathome.org (nyi.unixathome.org [64.147.113.42])
	by mx1.freebsd.org (Postfix) with ESMTP id D43198FC0A
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 17:34:44 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
	by nyi.unixathome.org (Postfix) with ESMTP id EBC5850A09;
	Sat, 20 Aug 2011 17:34:43 +0000 (UTC)
X-Virus-Scanned: amavisd-new at unixathome.org
Received: from nyi.unixathome.org ([127.0.0.1])
	by localhost (nyi.unixathome.org [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id glPSpkeHR3wf; Sat, 20 Aug 2011 18:34:43 +0100 (BST)
Received: from smtp-auth.unixathome.org (smtp-auth.unixathome.org [10.4.7.7])
	(Authenticated sender: hidden)
	by nyi.unixathome.org (Postfix) with ESMTPSA id 6B74B50A06  ;
	Sat, 20 Aug 2011 17:34:43 +0000 (UTC)
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset=us-ascii
From: Dan Langille <dan@langille.org>
In-Reply-To: <20110820032438.GA21925@icarus.home.lan>
Date: Sat, 20 Aug 2011 13:34:41 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org>
References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org>
	<20110819232125.GA4965@icarus.home.lan>
	<B6B0AD0F-A74C-4F2C-88B0-101443D7831A@langille.org>
	<20110820032438.GA21925@icarus.home.lan>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
X-Mailer: Apple Mail (2.1084)
Cc: freebsd-stable@freebsd.org
Subject: Re: bad sector in gmirror HDD
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 17:34:45 -0000

On Aug 19, 2011, at 11:24 PM, Jeremy Chadwick wrote:

> On Fri, Aug 19, 2011 at 09:39:17PM -0400, Dan Langille wrote:
>>=20
>> On Aug 19, 2011, at 7:21 PM, Jeremy Chadwick wrote:
>>=20
>>> On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote:
>>>> System in question: FreeBSD 8.2-STABLE #3: Thu Mar  3 04:52:04 GMT =
2011
>>>>=20
>>>> After a recent power failure, I'm seeing this in my logs:
>>>>=20
>>>> Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently =
unreadable (pending) sectors
>>>=20
>>> I doubt this is related to a power failure.
>>>=20
>>>> Searching on that error message, I was led to believe that =
identifying the bad sector and
>>>> running dd to read it would cause the HDD to reallocate that bad =
block.
>>>>=20
>>>> http://smartmontools.sourceforge.net/badblockhowto.html
>>>=20
>>> This is incorrect (meaning you've misunderstood what's written =
there).
>>>=20
>>> Unreadable LBAs can be a result of the LBA being actually bad (as in
>>> uncorrectable), or the LBA being marked "suspect".  In either case =
the
>>> LBA will return an I/O error when read.
>>>=20
>>> If the LBAs are marked "suspect", the drive will perform re-analysis =
of
>>> the LBA (to determine if the LBA can be read and the data re-mapped, =
or
>>> if it cannot then the LBA is marked uncorrectable) when you =
**write** to
>>> the LBA.
>>>=20
>>> The above smartd output doesn't tell me much.  Providing actual =
SMART
>>> attribute data (smartctl -a) for the drive would help.  The brand of =
the
>>> drive, the firmware version, and the model all matter -- every drive
>>> behaves a little differently.
>>=20
>> Information such as this?  =
http://beta.freebsddiary.org/smart-fixing-bad-sector.php
>=20
> Yes, perfect.  Thank you.  First thing first: upgrade smartmontools to
> 5.41.  Your attributes will be the same after you do this (the drive =
is
> already in smartmontools' internal drive DB), but I often have to =
remind
> people that they really need to keep smartmontools updated as often as
> possible.  The changes between versions are vast; this is especially
> important for people with SSDs (I'm responsible for submitting some
> recent improvements for Intel 320 and 510 SSDs).

Done.

> Anyway, the drive (albeit an old PATA Maxtor) appears to have three
> anomalies:
>=20
> 1) One confirmed reallocated LBA (SMART attribute 5)
>=20
> 2) One "suspect" LBA (SMART attribute 197)
>=20
> 3) A very high temperature of 51C (SMART attribute 194).  If this =
drive
> is in an enclosure or in a system with no fans this would be
> understandable, otherwise this is a bit high.  My home workstation =
which
> has only one case fan has a drive with more platters than your Maxtor,
> and it idles at ~38C.  Possibly this drive has been undergoing =
constant
> I/O recently (which does greatly increase drive temperature)?  Not =
sure.
> I'm not going to focus too much on this one.

This is an older system.  I suspect insufficient ventilation.  I'll look =
at getting
a new case fan, if not some HDD fans.

> The SMART error log also indicates an LBA failure at the 26000 hour =
mark
> (which is 16 hours prior to when you did smartctl -a /dev/ad2).  =
Whether
> that LBA is the remapped one or the suspect one is unknown.  The LBA =
was
> 5566440.
>=20
> The SMART tests you did didn't really amount to anything; no surprise.
> short and long tests usually do not test the surface of the disk.  =
There
> are some drives which do it on a long test, but as I said before,
> everything varies from drive to drive.
>=20
> Furthermore, on this model of drive, you cannot do a surface scans via
> SMART.  Bummer.  That's indicated in the "Offline data collection
> capabilities" section at the top, where it reads:
>=20
> 	No Selective Self-test supported.
>=20
> So you'll have to use the dd method.  This takes longer than if =
surface
> scanning was supported by the drive, but is acceptable.  I'll get to =
how
> to go about that in a moment.

FWIW, I've done a dd read of the entire suspect disk already.  Just two =
errors.
=46rom the URL mentioned above:

[root@bast:~] # dd of=3D/dev/null if=3D/dev/ad2 bs=3D1m conv=3Dnoerror
dd: /dev/ad2: Input/output error
2717+0 records in
2717+0 records out
2848980992 bytes transferred in 127.128503 secs (22410246 bytes/sec)
dd: /dev/ad2: Input/output error
38170+1 records in
38170+1 records out
40025063424 bytes transferred in 1544.671423 secs (25911701 bytes/sec)
[root@bast:~] #=20

That seems to indicate two problems.  Are those the values I should be =
using=20
with dd?

I did some more precise testing:

# time dd of=3D/dev/null if=3D/dev/ad2 bs=3D512 iseek=3D5566440
dd: /dev/ad2: Input/output error
9+0 records in
9+0 records out
4608 bytes transferred in 5.368668 secs (858 bytes/sec)

real	0m5.429s
user	0m0.000s
sys	0m0.010s

NOTE: that's 9 blocks later than mentioned in smarctl

The above generated this in /var/log/messages:

Aug 20 17:29:25 bast kernel: ad2: FAILURE - READ_DMA =
status=3D51<READY,DSC,ERROR> error=3D40<UNCORRECTABLE> LBA=3D5566449


> [stuff snipped]


> That said:
>=20
> http://jdc.parodius.com/freebsd/bad_block_scan
>=20
> If you run this on your ad2 drive, I'm hoping what you'll find are two
> LBAs which can't be read -- one will be the remapped LBA and one will =
be
> the "suspect" LBA.  If you only get one LBA error then that's fine =
too,
> and will be the "suspect" LBA.

> Once you have the LBA(s), you can submit writes to them to get the =
drive
> to re-analyse them (assuming they're "suspect"):
>=20
> dd if=3D/dev/zero of=3D/dev/XXX bs=3D512 count=3D1 seek=3DNNNNN
>=20
> Where XXX is the device and NNNNN is the LBA number.
>=20
> If this works properly, the dd command should sit there for a little =
bit
> (as the drive does its re-analysis magic) and then should complete.

ad2 is part of a gmirror with ad0.   Does this change things?

I haven't tried the dd yet.

>=20
> You'll want to check SMART stats after that; you should see
> Current_Pending_Sector drop to 0.  If Offline_Uncorrectable =
incremented
> then the LBA could not be re-read/remapped.

It did increment:

197 Current_Pending_Sector  0x0032   100   100   020    Old_age   Always =
      -       2

[was 1]

>  If Reallocated_Sector_Ct
> incremented then you now have a total of 2 LBAs which are remapped.

It did increment:

$ diff smarctl.1 smarctl.3 | grep Reallocated_Sector_Ct
<   5 Reallocated_Sector_Ct   0x0033   100   100   020    Pre-fail  =
Always       -       1
>   5 Reallocated_Sector_Ct   0x0033   100   100   020    Pre-fail  =
Always       -       2


Full output of smartctl has been appended to =
http://beta.freebsddiary.org/smart-fixing-bad-sector.php


> In
> the case of remapping, you get to deal with the UFS/FFS thing above.
> To get the stats to update in this situation you *might* (but probably
> not) have to run "smartctl -t offline /dev/XXX".

I didn't try that...

>=20
> You might also be wondering "that dd command writes 512 bytes of zero =
to
> that LBA; what about the old data that was there, in the case that the
> drive remaps the LBA?"  This is a great question, and one I've never
> actually taken the time to answer because at this present time I have
> absolutely *no* bad disks in my possession.  I'm under the impression
> that the write does in fact write zeros if the LBA is remapped, but =
that
> might not be true at all.  I've been waiting to test this for quite =
some
> time and document it/write about it.
>=20
> I still suggest you replace the drive, although given its age I doubt
> you'll be able to find a suitable replacement.  I tend to keep disks
> like this around for testing/experimental purposes and not for actual
> use.

I have several unused 80GB HDD I can place into this system.  I think =
that's
what I'll wind up doing.  But I'd like to follow this process through =
and get it documented
for future reference.

--=20
Dan Langille - http://langille.org


From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 17:41:56 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2BEA8106564A;
	Sat, 20 Aug 2011 17:41:56 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id BD7428FC14;
	Sat, 20 Aug 2011 17:41:55 +0000 (UTC)
Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua
	[10.1.1.148])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p7KHflxW068168
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sat, 20 Aug 2011 20:41:47 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id
	p7KHflhw008180; Sat, 20 Aug 2011 20:41:47 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p7KHflc6008179; 
	Sat, 20 Aug 2011 20:41:47 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Sat, 20 Aug 2011 20:41:47 +0300
From: Kostik Belousov <kostikbel@gmail.com>
To: alc@freebsd.org
Message-ID: <20110820174147.GW17489@deviant.kiev.zoral.com.ua>
References: <4E4143A6.6030307@digsys.bg>
	<935F8EC2-88E0-45A3-BE8B-7210BE223BC5@mac.com>
	<4e42a0c0.e2t/9MF98O3HFjb1%perryh@pluto.rain.com>
	<4E4CCA6C.8020408@ipfw.ru>
	<CAJUyCcMc7m65c_XjHNFi0A4cHHySC1brLS7HdivstxeOi6uFQw@mail.gmail.com>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="gfR41eDGUhhc/UyZ"
Content-Disposition: inline
In-Reply-To: <CAJUyCcMc7m65c_XjHNFi0A4cHHySC1brLS7HdivstxeOi6uFQw@mail.gmail.com>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-3.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,
	DNS_FROM_OPENWHOIS autolearn=no version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: freebsd-stable@freebsd.org, perryh@pluto.rain.com,
	"Alexander V. Chernikov" <melifaro@ipfw.ru>, daniel@digsys.bg
Subject: Re: 32GB limit per swap device?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 17:41:56 -0000


--gfR41eDGUhhc/UyZ
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sat, Aug 20, 2011 at 12:33:29PM -0500, Alan Cox wrote:
> On Thu, Aug 18, 2011 at 3:16 AM, Alexander V. Chernikov <melifaro@ipfw.ru=
>wrote:
>=20
> > On 10.08.2011 19:16, perryh@pluto.rain.com wrote:
> >
> >> Chuck Swiger<cswiger@mac.com>  wrote:
> >>
> >>  On Aug 9, 2011, at 7:26 AM, Daniel Kalchev wrote:
> >>>
> >>>> I am trying to set up 64GB partitions for swap for a system that
> >>>> has 64GB of RAM (with the idea to dump kernel core etc). But, on
> >>>> 8-stable as of today I get:
> >>>>
> >>>> WARNING: reducing size to maximum of 67108864 blocks per swap unit
> >>>>
> >>>> Is there workaround for this limitation?
> >>>>
> >>>
> > Another interesting question:
> >
> > swap pager operates in page blocks (PAGE_SIZE=3D4k on common arch).
> >
> > Block device size in passed to swaponsomething() in number of _disk_ bl=
ocks
> >  (e.g. in DEV_BSIZE=3D512). After that, kernel b-lists (on top of which=
 swap
> > pager is build) maximum objects check is enforced.
> >
> > The (possible) problem is that real object count we will operate on is =
not
> > the value passed to swaponsomething() since it is calculated in wrong u=
nits.
> >
> > we should check b-list limit on (X * DEV_BSIZE512 / PAGE_SIZE) value wh=
ich
> > is rough (X / 8) so we should be able to address 32*8=3D256G.
> >
> > The code should look like this:
> >
> > Index: vm/swap_pager.c
> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D**=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D**=3D=3D=3D=3D=3D=3D=3D
> > --- vm/swap_pager.c     (revision 223877)
> > +++ vm/swap_pager.c     (working copy)
> > @@ -2129,6 +2129,15 @@ swaponsomething(struct vnode *vp, void *id, u_lo=
ng
> >        u_long mblocks;
> >
> >        /*
> > +        * nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunk=
s.
> > +        * First chop nblks off to page-align it, then convert.
> > +        *
> > +        * sw->sw_nblks is in page-sized chunks now too.
> > +        */
> > +       nblks &=3D ~(ctodb(1) - 1);
> > +       nblks =3D dbtoc(nblks);
> > +
> > +       /*
> >
> >         * If we go beyond this, we get overflows in the radix
> >         * tree bitmap code.
> >         */
> > @@ -2138,14 +2147,6 @@ swaponsomething(struct vnode *vp, void *id, u_lo=
ng
> >                        mblocks);
> >                nblks =3D mblocks;
> >        }
> > -       /*
> > -        * nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunk=
s.
> > -        * First chop nblks off to page-align it, then convert.
> > -        *
> > -        * sw->sw_nblks is in page-sized chunks now too.
> > -        */
> > -       nblks &=3D ~(ctodb(1) - 1);
> > -       nblks =3D dbtoc(nblks);
> >
> >        sp =3D malloc(sizeof *sp, M_VMPGDATA, M_WAITOK | M_ZERO);
> >        sp->sw_vp =3D vp;
> >
> >
> > (move pages recalculation before b-list check)
> >
> >
> > Can someone comment on this?
> >
> >
> I believe that you are correct.  Have you tried testing this change on a
> large swap device?
I probably agree too, but I am in the process of re-reading the swap code,
and I do not quite believe in the limit.

When the initial code was committed, our daddr_t was 32bit, I checked
the RELENG_4 sources. Current code uses int64_t for daddr_t. My impression
right now is that we only utilize the low 32bits of daddr_t.

Esp. interesting looks the following typedef:
typedef	uint32_t	u_daddr_t;	/* unsigned disk address */
which (correctly) means that typical mask (u_daddr_t)-1 is 0xffffffff.

I wonder whether we could just use full 64bit and de-facto remove the
limitation on the swap partition size.

--gfR41eDGUhhc/UyZ
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iEYEARECAAYFAk5P8dsACgkQC3+MBN1Mb4gKdwCeK7fVc2QYLxELDvVNP+xeDEdQ
bk8An2aneYCGFD/rDi0TA2tSjFHD5Srd
=Eikm
-----END PGP SIGNATURE-----

--gfR41eDGUhhc/UyZ--

From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 17:54:49 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 598F0106566C
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 17:54:49 +0000 (UTC)
	(envelope-from ml@os2.kiev.ua)
Received: from s1.sdv.com.ua (s1.sdv.com.ua [77.120.97.61])
	by mx1.freebsd.org (Postfix) with ESMTP id 14C9F8FC1A
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 17:54:48 +0000 (UTC)
Received: from 90-105-243-80.cust.centrio.cz ([80.243.105.90]
	helo=[192.168.100.107])
	by s1.sdv.com.ua with esmtpsa (TLSv1:CAMELLIA256-SHA:256)
	(Exim 4.76 (FreeBSD)) (envelope-from <ml@os2.kiev.ua>)
	id 1Qupkn-0007Nu-BH; Sat, 20 Aug 2011 20:54:43 +0300
Message-ID: <4E4FF4D6.1090305@os2.kiev.ua>
Date: Sat, 20 Aug 2011 19:54:30 +0200
From: Alex Samorukov <ml@os2.kiev.ua>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
	rv:1.9.2.18) Gecko/20110617 Lightning/1.0b2 Thunderbird/3.1.11
MIME-Version: 1.0
To: Dan Langille <dan@langille.org>
References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org>	<20110819232125.GA4965@icarus.home.lan>	<B6B0AD0F-A74C-4F2C-88B0-101443D7831A@langille.org>	<20110820032438.GA21925@icarus.home.lan>
	<4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org>
In-Reply-To: <4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-SA-Score: -1.0
Cc: freebsd-stable@freebsd.org, Jeremy Chadwick <freebsd@jdc.parodius.com>
Subject: Re: bad sector in gmirror HDD
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 17:54:49 -0000

You can run long self-test in smartmontools (-t long). Then you can get 
failed sector number from the smartmontools (-l selftest) and then you 
can use DD to write zero to the specific sector. Also i am highly 
recommending to setup smartd as daemon and to monitor number of 
relocated sectors. If they will grow again - then it is a good time to 
utilize this disk.
> [root@bast:~] # dd of=/dev/null if=/dev/ad2 bs=1m conv=noerror
> dd: /dev/ad2: Input/output error
> 2717+0 records in
> 2717+0 records out
> 2848980992 bytes transferred in 127.128503 secs (22410246 bytes/sec)
> dd: /dev/ad2: Input/output error
> 38170+1 records in
> 38170+1 records out
> 40025063424 bytes transferred in 1544.671423 secs (25911701 bytes/sec)
> [root@bast:~] #
>
> That seems to indicate two problems.  Are those the values I should be using
> with dd?
>


From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 18:01:33 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 616D1106566B
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 18:01:33 +0000 (UTC)
	(envelope-from alan.l.cox@gmail.com)
Received: from mail-yi0-f54.google.com (mail-yi0-f54.google.com
	[209.85.218.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 206B28FC13
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 18:01:32 +0000 (UTC)
Received: by yib19 with SMTP id 19so3272232yib.13
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 11:01:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:reply-to:in-reply-to:references:date:message-id
	:subject:from:to:cc:content-type;
	bh=BxwVeO12cVrdIaF5/1P2XlGzxmKj97ajQnDOuvXV/BU=;
	b=Ot6GmqJnnbdVRt98QP92Tp5pCQlRTg2D/pNyx1PSPE89rXcRhLo6T+5orWs7zLSm6U
	4ZXrM8JCN/4pEX5NQE6fufyqWZ11w/ynoidPIz8y5baVvdoubIB4SJbOFY5Mt1LCdiQ+
	PwhsVAPv3XE2xfN00X6Jd01Tol7xSRXhrJLIw=
MIME-Version: 1.0
Received: by 10.42.137.2 with SMTP id w2mr668882ict.116.1313861609094; Sat, 20
	Aug 2011 10:33:29 -0700 (PDT)
Received: by 10.231.192.20 with HTTP; Sat, 20 Aug 2011 10:33:29 -0700 (PDT)
In-Reply-To: <4E4CCA6C.8020408@ipfw.ru>
References: <4E4143A6.6030307@digsys.bg>
	<935F8EC2-88E0-45A3-BE8B-7210BE223BC5@mac.com>
	<4e42a0c0.e2t/9MF98O3HFjb1%perryh@pluto.rain.com>
	<4E4CCA6C.8020408@ipfw.ru>
Date: Sat, 20 Aug 2011 12:33:29 -0500
Message-ID: <CAJUyCcMc7m65c_XjHNFi0A4cHHySC1brLS7HdivstxeOi6uFQw@mail.gmail.com>
From: Alan Cox <alan.l.cox@gmail.com>
To: "Alexander V. Chernikov" <melifaro@ipfw.ru>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: Kostik Belousov <kostikbel@gmail.com>, freebsd-stable@freebsd.org,
	perryh@pluto.rain.com, daniel@digsys.bg
Subject: Re: 32GB limit per swap device?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: alc@freebsd.org
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 18:01:33 -0000

On Thu, Aug 18, 2011 at 3:16 AM, Alexander V. Chernikov <melifaro@ipfw.ru>wrote:

> On 10.08.2011 19:16, perryh@pluto.rain.com wrote:
>
>> Chuck Swiger<cswiger@mac.com>  wrote:
>>
>>  On Aug 9, 2011, at 7:26 AM, Daniel Kalchev wrote:
>>>
>>>> I am trying to set up 64GB partitions for swap for a system that
>>>> has 64GB of RAM (with the idea to dump kernel core etc). But, on
>>>> 8-stable as of today I get:
>>>>
>>>> WARNING: reducing size to maximum of 67108864 blocks per swap unit
>>>>
>>>> Is there workaround for this limitation?
>>>>
>>>
> Another interesting question:
>
> swap pager operates in page blocks (PAGE_SIZE=4k on common arch).
>
> Block device size in passed to swaponsomething() in number of _disk_ blocks
>  (e.g. in DEV_BSIZE=512). After that, kernel b-lists (on top of which swap
> pager is build) maximum objects check is enforced.
>
> The (possible) problem is that real object count we will operate on is not
> the value passed to swaponsomething() since it is calculated in wrong units.
>
> we should check b-list limit on (X * DEV_BSIZE512 / PAGE_SIZE) value which
> is rough (X / 8) so we should be able to address 32*8=256G.
>
> The code should look like this:
>
> Index: vm/swap_pager.c
> ==============================**==============================**=======
> --- vm/swap_pager.c     (revision 223877)
> +++ vm/swap_pager.c     (working copy)
> @@ -2129,6 +2129,15 @@ swaponsomething(struct vnode *vp, void *id, u_long
>        u_long mblocks;
>
>        /*
> +        * nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks.
> +        * First chop nblks off to page-align it, then convert.
> +        *
> +        * sw->sw_nblks is in page-sized chunks now too.
> +        */
> +       nblks &= ~(ctodb(1) - 1);
> +       nblks = dbtoc(nblks);
> +
> +       /*
>
>         * If we go beyond this, we get overflows in the radix
>         * tree bitmap code.
>         */
> @@ -2138,14 +2147,6 @@ swaponsomething(struct vnode *vp, void *id, u_long
>                        mblocks);
>                nblks = mblocks;
>        }
> -       /*
> -        * nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks.
> -        * First chop nblks off to page-align it, then convert.
> -        *
> -        * sw->sw_nblks is in page-sized chunks now too.
> -        */
> -       nblks &= ~(ctodb(1) - 1);
> -       nblks = dbtoc(nblks);
>
>        sp = malloc(sizeof *sp, M_VMPGDATA, M_WAITOK | M_ZERO);
>        sp->sw_vp = vp;
>
>
> (move pages recalculation before b-list check)
>
>
> Can someone comment on this?
>
>
I believe that you are correct.  Have you tried testing this change on a
large swap device?

Alan

From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 18:04:17 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 093C61065675
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 18:04:17 +0000 (UTC)
	(envelope-from db@db.net)
Received: from diana.db.net (diana.db.net [66.113.102.10])
	by mx1.freebsd.org (Postfix) with ESMTP id E53638FC15
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 18:04:16 +0000 (UTC)
Received: from night.db.net (localhost [127.0.0.1])
	by diana.db.net (Postfix) with ESMTP id 850E62283B;
	Sat, 20 Aug 2011 11:55:14 -0600 (MDT)
Received: by night.db.net (Postfix, from userid 1000)
	id 589C96914; Sat, 20 Aug 2011 14:04:15 -0400 (EDT)
Date: Sat, 20 Aug 2011 14:04:15 -0400
From: Diane Bruce <db@db.net>
To: Dan Langille <dan@langille.org>
Message-ID: <20110820180415.GA74553@night.db.net>
References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org>
	<20110819232125.GA4965@icarus.home.lan>
	<B6B0AD0F-A74C-4F2C-88B0-101443D7831A@langille.org>
	<20110820032438.GA21925@icarus.home.lan>
	<4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org>
User-Agent: Mutt/1.4.2.3i
Cc: freebsd-stable@freebsd.org, Jeremy Chadwick <freebsd@jdc.parodius.com>
Subject: Re: bad sector in gmirror HDD
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 18:04:17 -0000

On Sat, Aug 20, 2011 at 01:34:41PM -0400, Dan Langille wrote:
> On Aug 19, 2011, at 11:24 PM, Jeremy Chadwick wrote:
> 
> > On Fri, Aug 19, 2011 at 09:39:17PM -0400, Dan Langille wrote:
...
> >> Information such as this?  http://beta.freebsddiary.org/smart-fixing-bad-sector.php
...
> > 3) A very high temperature of 51C (SMART attribute 194).  If this drive
> > is in an enclosure or in a system with no fans this would be

...

eh? What's the temperature of the second drive?

...

> This is an older system.  I suspect insufficient ventilation.  I'll look at getting
> a new case fan, if not some HDD fans.

...

> > I still suggest you replace the drive, although given its age I doubt

Older drive and errors starting to happen, replace ASAP.

> > you'll be able to find a suitable replacement.  I tend to keep disks
> > like this around for testing/experimental purposes and not for actual
> > use.
> 
> I have several unused 80GB HDD I can place into this system.  I think that's
> what I'll wind up doing.  But I'd like to follow this process through and get it documented
> for future reference.

If the data is valuable, the sooner the better. 
It's actually somewhat saner if the two drives are not from the same lot.


> -- 
> Dan Langille - http://langille.org
> 

- Diane
-- 
- db@FreeBSD.org db@db.net http://www.db.net/~db
  Why leave money to our children if we don't leave them the Earth?

From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 18:16:00 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BA4FA1065670
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 18:16:00 +0000 (UTC)
	(envelope-from dan@langille.org)
Received: from nyi.unixathome.org (nyi.unixathome.org [64.147.113.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 8ACBE8FC1B
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 18:16:00 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
	by nyi.unixathome.org (Postfix) with ESMTP id B740750A09;
	Sat, 20 Aug 2011 18:15:59 +0000 (UTC)
X-Virus-Scanned: amavisd-new at unixathome.org
Received: from nyi.unixathome.org ([127.0.0.1])
	by localhost (nyi.unixathome.org [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id NNGCgOb+uOFQ; Sat, 20 Aug 2011 19:15:59 +0100 (BST)
Received: from smtp-auth.unixathome.org (smtp-auth.unixathome.org [10.4.7.7])
	(Authenticated sender: hidden)
	by nyi.unixathome.org (Postfix) with ESMTPSA id 3D67E50A06  ;
	Sat, 20 Aug 2011 18:15:59 +0000 (UTC)
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset=us-ascii
From: Dan Langille <dan@langille.org>
In-Reply-To: <4E4FF4D6.1090305@os2.kiev.ua>
Date: Sat, 20 Aug 2011 14:15:57 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <2AB04C16-FF20-467E-9508-AF35CB6323BC@langille.org>
References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org>	<20110819232125.GA4965@icarus.home.lan>	<B6B0AD0F-A74C-4F2C-88B0-101443D7831A@langille.org>	<20110820032438.GA21925@icarus.home.lan>
	<4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org>
	<4E4FF4D6.1090305@os2.kiev.ua>
To: Alex Samorukov <ml@os2.kiev.ua>
X-Mailer: Apple Mail (2.1084)
Cc: freebsd-stable@freebsd.org, Jeremy Chadwick <freebsd@jdc.parodius.com>
Subject: Re: bad sector in gmirror HDD
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 18:16:00 -0000


On Aug 20, 2011, at 1:54 PM, Alex Samorukov wrote:

>> [root@bast:~] # dd of=3D/dev/null if=3D/dev/ad2 bs=3D1m conv=3Dnoerror
>> dd: /dev/ad2: Input/output error
>> 2717+0 records in
>> 2717+0 records out
>> 2848980992 bytes transferred in 127.128503 secs (22410246 bytes/sec)
>> dd: /dev/ad2: Input/output error
>> 38170+1 records in
>> 38170+1 records out
>> 40025063424 bytes transferred in 1544.671423 secs (25911701 =
bytes/sec)
>> [root@bast:~] #
>>=20
>> That seems to indicate two problems.  Are those the values I should =
be using
>> with dd?
>>=20
>=20


> You can run long self-test in smartmontools (-t long). Then you can =
get failed sector number from the smartmontools (-l selftest) and then =
you can use DD to write zero to the specific sector.

Already done: http://beta.freebsddiary.org/smart-fixing-bad-sector.php

Search for 786767

Or did you mean something else?

That doesn't seem to map to a particular sector though... I ran it for a =
while...

# time dd of=3D/dev/null if=3D/dev/ad2 bs=3D512 iseek=3D786767=20
^C4301949+0 records in
4301949+0 records out
2202597888 bytes transferred in 780.245828 secs (2822954 bytes/sec)

real	13m0.256s
user	0m22.087s
sys	3m24.215s


> Also i am highly recommending to setup smartd as daemon and to monitor =
number of relocated sectors. If they will grow again - then it is a good =
time to utilize this disk.

It is running, but with nothing custom in the .conf file.

--=20
Dan Langille - http://langille.org


From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 18:17:39 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 76BCA106567A
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 18:17:39 +0000 (UTC)
	(envelope-from dan@langille.org)
Received: from nyi.unixathome.org (nyi.unixathome.org [64.147.113.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 47F8B8FC1D
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 18:17:39 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
	by nyi.unixathome.org (Postfix) with ESMTP id C589750A09;
	Sat, 20 Aug 2011 18:17:38 +0000 (UTC)
X-Virus-Scanned: amavisd-new at unixathome.org
Received: from nyi.unixathome.org ([127.0.0.1])
	by localhost (nyi.unixathome.org [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id w91gjHj2BT9y; Sat, 20 Aug 2011 19:17:38 +0100 (BST)
Received: from smtp-auth.unixathome.org (smtp-auth.unixathome.org [10.4.7.7])
	(Authenticated sender: hidden)
	by nyi.unixathome.org (Postfix) with ESMTPSA id 607D150A06  ;
	Sat, 20 Aug 2011 18:17:38 +0000 (UTC)
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset=us-ascii
From: Dan Langille <dan@langille.org>
In-Reply-To: <20110820180415.GA74553@night.db.net>
Date: Sat, 20 Aug 2011 14:17:37 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <D383C94D-D7AD-4A6C-ACAD-ACB58ACA4E1E@langille.org>
References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org>
	<20110819232125.GA4965@icarus.home.lan>
	<B6B0AD0F-A74C-4F2C-88B0-101443D7831A@langille.org>
	<20110820032438.GA21925@icarus.home.lan>
	<4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org>
	<20110820180415.GA74553@night.db.net>
To: Diane Bruce <db@db.net>
X-Mailer: Apple Mail (2.1084)
Cc: freebsd-stable@freebsd.org, Jeremy Chadwick <freebsd@jdc.parodius.com>
Subject: Re: bad sector in gmirror HDD
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 18:17:39 -0000


On Aug 20, 2011, at 2:04 PM, Diane Bruce wrote:

> On Sat, Aug 20, 2011 at 01:34:41PM -0400, Dan Langille wrote:
>> On Aug 19, 2011, at 11:24 PM, Jeremy Chadwick wrote:
>>=20
>>> On Fri, Aug 19, 2011 at 09:39:17PM -0400, Dan Langille wrote:
> ...
>>>> Information such as this?  =
http://beta.freebsddiary.org/smart-fixing-bad-sector.php
> ...
>>> 3) A very high temperature of 51C (SMART attribute 194).  If this =
drive
>>> is in an enclosure or in a system with no fans this would be
>=20
> ...
>=20
> eh? What's the temperature of the second drive?

Roughly the same:


[root@bast:/home/dan/tmp] # smartctl -a /dev/ad2 | grep -i temp
194 Temperature_Celsius     0x0022   080   076   042    Old_age   Always =
      -       51

[root@bast:/home/dan/tmp] # smartctl -a /dev/ad0 | grep -i temp
194 Temperature_Celsius     0x0022   081   074   042    Old_age   Always =
      -       49
[root@bast:/home/dan/tmp] #=20


FYI, when I first set up smartd, I questioned those values.  The HDD in =
question, at the time,
did not feel hot to the touch.

>=20
> ...
>=20
>> This is an older system.  I suspect insufficient ventilation.  I'll =
look at getting
>> a new case fan, if not some HDD fans.
>=20
> ...
>=20
>>> I still suggest you replace the drive, although given its age I =
doubt
>=20
> Older drive and errors starting to happen, replace ASAP.
>=20
>>> you'll be able to find a suitable replacement.  I tend to keep disks
>>> like this around for testing/experimental purposes and not for =
actual
>>> use.
>>=20
>> I have several unused 80GB HDD I can place into this system.  I think =
that's
>> what I'll wind up doing.  But I'd like to follow this process through =
and get it documented
>> for future reference.
>=20
> If the data is valuable, the sooner the better.=20
> It's actually somewhat saner if the two drives are not from the same =
lot.

Noted.

--=20
Dan Langille - http://langille.org


From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 18:23:30 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C6852106566B;
	Sat, 20 Aug 2011 18:23:30 +0000 (UTC)
	(envelope-from marquis@roble.com)
Received: from mx5.roble.com (mx5.roble.com [206.40.34.5])
	by mx1.freebsd.org (Postfix) with ESMTP id B7EB08FC08;
	Sat, 20 Aug 2011 18:23:30 +0000 (UTC)
Received: from mx5.roble.com (mx5.roble.com [206.40.34.5])
	by mx5.roble.com (Postfix) with ESMTP id 2FCA867899;
	Sat, 20 Aug 2011 11:10:31 -0700 (PDT)
Date: Sat, 20 Aug 2011 11:10:31 -0700 (PDT)
From: Roger Marquis <marquis@roble.com>
To: freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org
In-Reply-To: <82E865FBA30747078AF6EE3C1701F973@multiplay.co.uk>
References: <82E865FBA30747078AF6EE3C1701F973@multiplay.co.uk>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Message-Id: <20110820182330.C6852106566B@hub.freebsd.org>
Cc: 
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 18:23:30 -0000

>> Repeat this enough times and prison0.pr_uref reaches zero.
>> To reach zero even sooner just kill enough of non-jailed processes.

Interesting.  We've been getting kernel panics in -stable but with only
one jail started at boot without being restarted.

Are you using SAS drives by any chance?  Setting ethernet polling and HZ?
How about softupdates, gmirror, and/or anything in sysctl.conf?

Roger Marquis

From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 18:30:46 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id ECA10106567A;
	Sat, 20 Aug 2011 18:30:46 +0000 (UTC)
	(envelope-from prvs=12137168ef=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	by mx1.freebsd.org (Postfix) with ESMTP id 431938FC22;
	Sat, 20 Aug 2011 18:30:45 +0000 (UTC)
X-MDAV-Processed: mail1.multiplay.co.uk, Sat, 20 Aug 2011 19:30:11 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Sat, 20 Aug 2011 19:30:11 +0100
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	mail1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST
	shortcircuit=ham autolearn=disabled version=3.2.5
Received: from r2d2 ([188.220.16.49])
	by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	(MDaemon PRO v10.0.4) with ESMTP id md50014675318.msg;
	Sat, 20 Aug 2011 19:30:09 +0100
X-MDRemoteIP: 188.220.16.49
X-Return-Path: prvs=12137168ef=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
Message-ID: <A09E66BA62C54161965E5FDE7B097D3B@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Roger Marquis" <marquis@roble.com>, <freebsd-jail@FreeBSD.org>,
	<freebsd-stable@FreeBSD.org>
References: <82E865FBA30747078AF6EE3C1701F973@multiplay.co.uk>
	<20110820182330.C6852106566B@hub.freebsd.org>
Date: Sat, 20 Aug 2011 19:31:11 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=response
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
Cc: 
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 18:30:47 -0000

----- Original Message ----- 
From: "Roger Marquis" <marquis@roble.com>
To: <freebsd-jail@FreeBSD.org>; <freebsd-stable@FreeBSD.org>
Sent: Saturday, August 20, 2011 7:10 PM
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE


>>> Repeat this enough times and prison0.pr_uref reaches zero.
>>> To reach zero even sooner just kill enough of non-jailed processes.
> 
> Interesting.  We've been getting kernel panics in -stable but with only
> one jail started at boot without being restarted.
> 
> Are you using SAS drives by any chance?  Setting ethernet polling and HZ?
> How about softupdates, gmirror, and/or anything in sysctl.conf?

If your not restarting things it may be unrelated. No SAS, polling is
compiled in but no devices have it active and using ZFS only.

Are you seeing a double fault panic?

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 18:35:01 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 329DB1065677
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 18:35:01 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta04.emeryville.ca.mail.comcast.net
	(qmta04.emeryville.ca.mail.comcast.net [76.96.30.40])
	by mx1.freebsd.org (Postfix) with ESMTP id 14B7A8FC14
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 18:35:00 +0000 (UTC)
Received: from omta09.emeryville.ca.mail.comcast.net ([76.96.30.20])
	by qmta04.emeryville.ca.mail.comcast.net with comcast
	id NiZg1h0020S2fkCA4iawSK; Sat, 20 Aug 2011 18:34:56 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta09.emeryville.ca.mail.comcast.net with comcast
	id Niar1h00Z1t3BNj8ViasCe; Sat, 20 Aug 2011 18:34:54 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id C8CCC102C1A; Sat, 20 Aug 2011 11:34:56 -0700 (PDT)
Date: Sat, 20 Aug 2011 11:34:56 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Alex Samorukov <ml@os2.kiev.ua>
Message-ID: <20110820183456.GA38317@icarus.home.lan>
References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org>
	<20110819232125.GA4965@icarus.home.lan>
	<B6B0AD0F-A74C-4F2C-88B0-101443D7831A@langille.org>
	<20110820032438.GA21925@icarus.home.lan>
	<4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org>
	<4E4FF4D6.1090305@os2.kiev.ua>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4E4FF4D6.1090305@os2.kiev.ua>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-stable@freebsd.org, Dan Langille <dan@langille.org>
Subject: Re: bad sector in gmirror HDD
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 18:35:01 -0000

On Sat, Aug 20, 2011 at 07:54:30PM +0200, Alex Samorukov wrote:
> You can run long self-test in smartmontools (-t long). Then you can
> get failed sector number from the smartmontools (-l selftest) and
> then you can use DD to write zero to the specific sector.

This is inaccurate advice.  I covered this in my reply already as well:

http://lists.freebsd.org/pipermail/freebsd-stable/2011-August/063665.html

Quote:

"The SMART tests you did didn't really amount to anything; no surprise.
short and long tests usually do not test the surface of the disk.  There
are some drives which do it on a long test, but as I said before,
everything varies from drive to drive."

TL;DR version: smartctl -t long  !=  smartctl -t select.

The OP's drive does not support selective scans (-t select), and long
turned up nothing (no surprise there either).  So, using dd to find the
bad LBAs is the only choice he has.

> Also i am highly recommending to setup smartd as daemon and to monitor
> number of relocated sectors. If they will grow again - then it is a
> good time to utilize this disk.

You have to know what you're looking at and how to interpret the data
smartd gives you for it to be useful.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 18:36:33 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5C910106564A
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 18:36:33 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta01.westchester.pa.mail.comcast.net
	(qmta01.westchester.pa.mail.comcast.net [76.96.62.16])
	by mx1.freebsd.org (Postfix) with ESMTP id 1D8898FC17
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 18:36:32 +0000 (UTC)
Received: from omta21.westchester.pa.mail.comcast.net ([76.96.62.72])
	by qmta01.westchester.pa.mail.comcast.net with comcast
	id NibJ1h0021ZXKqc51icZx5; Sat, 20 Aug 2011 18:36:33 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta21.westchester.pa.mail.comcast.net with comcast
	id NicP1h01q1t3BNj3hicV8Q; Sat, 20 Aug 2011 18:36:31 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id AEA5C102C1A; Sat, 20 Aug 2011 11:36:22 -0700 (PDT)
Date: Sat, 20 Aug 2011 11:36:22 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Dan Langille <dan@langille.org>
Message-ID: <20110820183622.GA38427@icarus.home.lan>
References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org>
	<20110819232125.GA4965@icarus.home.lan>
	<B6B0AD0F-A74C-4F2C-88B0-101443D7831A@langille.org>
	<20110820032438.GA21925@icarus.home.lan>
	<4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-stable@freebsd.org
Subject: Re: bad sector in gmirror HDD
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 18:36:33 -0000

Dan, I will respond to your reply sometime tomorrow.  I do not have time
to review the Email today (~7.7KBytes), but will have time tomorrow.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 18:36:58 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2ABC610656B6
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 18:36:58 +0000 (UTC)
	(envelope-from alc@rice.edu)
Received: from mh5.mail.rice.edu (mh5.mail.rice.edu [128.42.199.32])
	by mx1.freebsd.org (Postfix) with ESMTP id E88268FC12
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 18:36:57 +0000 (UTC)
Received: from mh5.mail.rice.edu (localhost.localdomain [127.0.0.1])
	by mh5.mail.rice.edu (Postfix) with ESMTP id 212CB29021B;
	Sat, 20 Aug 2011 13:20:05 -0500 (CDT)
X-Virus-Scanned: by amavis-2.6.4 at mh5.mail.rice.edu, auth channel
Received: from mh5.mail.rice.edu ([127.0.0.1])
	by mh5.mail.rice.edu (mh5.mail.rice.edu [127.0.0.1]) (amavis,
	port 10026)
	with ESMTP id VY-Q6Bmihokg; Sat, 20 Aug 2011 13:20:05 -0500 (CDT)
Received: from adsl-216-63-78-18.dsl.hstntx.swbell.net
	(adsl-216-63-78-18.dsl.hstntx.swbell.net [216.63.78.18])
	(using TLSv1 with cipher RC4-MD5 (128/128 bits))
	(No client certificate requested) (Authenticated sender: alc)
	by mh5.mail.rice.edu (Postfix) with ESMTPSA id 5869E2901AB;
	Sat, 20 Aug 2011 13:20:04 -0500 (CDT)
Message-ID: <4E4FFAD3.4090706@rice.edu>
Date: Sat, 20 Aug 2011 13:20:03 -0500
From: Alan Cox <alc@rice.edu>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US;
	rv:1.9.2.17) Gecko/20110620 Thunderbird/3.1.10
MIME-Version: 1.0
To: Kostik Belousov <kostikbel@gmail.com>
References: <4E4143A6.6030307@digsys.bg>
	<935F8EC2-88E0-45A3-BE8B-7210BE223BC5@mac.com>
	<4e42a0c0.e2t/9MF98O3HFjb1%perryh@pluto.rain.com>
	<4E4CCA6C.8020408@ipfw.ru>
	<CAJUyCcMc7m65c_XjHNFi0A4cHHySC1brLS7HdivstxeOi6uFQw@mail.gmail.com>
	<20110820174147.GW17489@deviant.kiev.zoral.com.ua>
In-Reply-To: <20110820174147.GW17489@deviant.kiev.zoral.com.ua>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: alc@freebsd.org, freebsd-stable@freebsd.org, perryh@pluto.rain.com,
	"Alexander V. Chernikov" <melifaro@ipfw.ru>, daniel@digsys.bg
Subject: Re: 32GB limit per swap device?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 18:36:58 -0000

On 08/20/2011 12:41, Kostik Belousov wrote:
> On Sat, Aug 20, 2011 at 12:33:29PM -0500, Alan Cox wrote:
>> On Thu, Aug 18, 2011 at 3:16 AM, Alexander V. Chernikov<melifaro@ipfw.ru>wrote:
>>
>>> On 10.08.2011 19:16, perryh@pluto.rain.com wrote:
>>>
>>>> Chuck Swiger<cswiger@mac.com>   wrote:
>>>>
>>>>   On Aug 9, 2011, at 7:26 AM, Daniel Kalchev wrote:
>>>>>> I am trying to set up 64GB partitions for swap for a system that
>>>>>> has 64GB of RAM (with the idea to dump kernel core etc). But, on
>>>>>> 8-stable as of today I get:
>>>>>>
>>>>>> WARNING: reducing size to maximum of 67108864 blocks per swap unit
>>>>>>
>>>>>> Is there workaround for this limitation?
>>>>>>
>>> Another interesting question:
>>>
>>> swap pager operates in page blocks (PAGE_SIZE=4k on common arch).
>>>
>>> Block device size in passed to swaponsomething() in number of _disk_ blocks
>>>   (e.g. in DEV_BSIZE=512). After that, kernel b-lists (on top of which swap
>>> pager is build) maximum objects check is enforced.
>>>
>>> The (possible) problem is that real object count we will operate on is not
>>> the value passed to swaponsomething() since it is calculated in wrong units.
>>>
>>> we should check b-list limit on (X * DEV_BSIZE512 / PAGE_SIZE) value which
>>> is rough (X / 8) so we should be able to address 32*8=256G.
>>>
>>> The code should look like this:
>>>
>>> Index: vm/swap_pager.c
>>> ==============================**==============================**=======
>>> --- vm/swap_pager.c     (revision 223877)
>>> +++ vm/swap_pager.c     (working copy)
>>> @@ -2129,6 +2129,15 @@ swaponsomething(struct vnode *vp, void *id, u_long
>>>         u_long mblocks;
>>>
>>>         /*
>>> +        * nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks.
>>> +        * First chop nblks off to page-align it, then convert.
>>> +        *
>>> +        * sw->sw_nblks is in page-sized chunks now too.
>>> +        */
>>> +       nblks&= ~(ctodb(1) - 1);
>>> +       nblks = dbtoc(nblks);
>>> +
>>> +       /*
>>>
>>>          * If we go beyond this, we get overflows in the radix
>>>          * tree bitmap code.
>>>          */
>>> @@ -2138,14 +2147,6 @@ swaponsomething(struct vnode *vp, void *id, u_long
>>>                         mblocks);
>>>                 nblks = mblocks;
>>>         }
>>> -       /*
>>> -        * nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks.
>>> -        * First chop nblks off to page-align it, then convert.
>>> -        *
>>> -        * sw->sw_nblks is in page-sized chunks now too.
>>> -        */
>>> -       nblks&= ~(ctodb(1) - 1);
>>> -       nblks = dbtoc(nblks);
>>>
>>>         sp = malloc(sizeof *sp, M_VMPGDATA, M_WAITOK | M_ZERO);
>>>         sp->sw_vp = vp;
>>>
>>>
>>> (move pages recalculation before b-list check)
>>>
>>>
>>> Can someone comment on this?
>>>
>>>
>> I believe that you are correct.  Have you tried testing this change on a
>> large swap device?
> I probably agree too, but I am in the process of re-reading the swap code,
> and I do not quite believe in the limit.
>

I'm uncertain whether the current limit, "0x40000000 / 
BLIST_META_RADIX", is exact or not, but I doubt that it is too large.

> When the initial code was committed, our daddr_t was 32bit, I checked
> the RELENG_4 sources. Current code uses int64_t for daddr_t. My impression
> right now is that we only utilize the low 32bits of daddr_t.
>
> Esp. interesting looks the following typedef:
> typedef	uint32_t	u_daddr_t;	/* unsigned disk address */
> which (correctly) means that typical mask (u_daddr_t)-1 is 0xffffffff.
>
> I wonder whether we could just use full 64bit and de-facto remove the
> limitation on the swap partition size.

I would rather argue first that the subr_list code should not be using 
daddr_t all.  The code is abusing daddr_t and defining u_daddr_t to 
represent things that are not disk addresses.  Instead, it should either 
define its own type or directly use (u)int*_t.  Then, as for choosing 
between 32 and 64 bits, I'm skeptical of using this structure for 
managing more than 32 bits worth of blocks, given the amount of RAM it 
will use.


From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 18:40:24 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4C6101065672
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 18:40:24 +0000 (UTC)
	(envelope-from dan@langille.org)
Received: from nyi.unixathome.org (nyi.unixathome.org [64.147.113.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 1D7F68FC0A
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 18:40:23 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
	by nyi.unixathome.org (Postfix) with ESMTP id 73A2750A09;
	Sat, 20 Aug 2011 18:40:23 +0000 (UTC)
X-Virus-Scanned: amavisd-new at unixathome.org
Received: from nyi.unixathome.org ([127.0.0.1])
	by localhost (nyi.unixathome.org [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id JG+6dJqrxBhn; Sat, 20 Aug 2011 19:40:23 +0100 (BST)
Received: from smtp-auth.unixathome.org (smtp-auth.unixathome.org [10.4.7.7])
	(Authenticated sender: hidden)
	by nyi.unixathome.org (Postfix) with ESMTPSA id 3225F50A06  ;
	Sat, 20 Aug 2011 18:40:23 +0000 (UTC)
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset=us-ascii
From: Dan Langille <dan@langille.org>
In-Reply-To: <20110820183622.GA38427@icarus.home.lan>
Date: Sat, 20 Aug 2011 14:40:21 -0400
Content-Transfer-Encoding: 7bit
Message-Id: <04B6AC2F-A1F5-42B9-B0D2-D2840DFE7917@langille.org>
References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org>
	<20110819232125.GA4965@icarus.home.lan>
	<B6B0AD0F-A74C-4F2C-88B0-101443D7831A@langille.org>
	<20110820032438.GA21925@icarus.home.lan>
	<4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org>
	<20110820183622.GA38427@icarus.home.lan>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
X-Mailer: Apple Mail (2.1084)
Cc: freebsd-stable@freebsd.org
Subject: Re: bad sector in gmirror HDD
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 18:40:24 -0000

On Aug 20, 2011, at 2:36 PM, Jeremy Chadwick wrote:

> Dan, I will respond to your reply sometime tomorrow.  I do not have time
> to review the Email today (~7.7KBytes), but will have time tomorrow.


No worries.  Thank you.

-- 
Dan Langille - http://langille.org


From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 18:43:24 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5F07C106566C
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 18:43:24 +0000 (UTC)
	(envelope-from ml@os2.kiev.ua)
Received: from s1.sdv.com.ua (s1.sdv.com.ua [77.120.97.61])
	by mx1.freebsd.org (Postfix) with ESMTP id 19DEF8FC18
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 18:43:23 +0000 (UTC)
Received: from 90-105-243-80.cust.centrio.cz ([80.243.105.90]
	helo=[192.168.100.107])
	by s1.sdv.com.ua with esmtpsa (TLSv1:CAMELLIA256-SHA:256)
	(Exim 4.76 (FreeBSD)) (envelope-from <ml@os2.kiev.ua>)
	id 1QuqVr-0009Nq-PW; Sat, 20 Aug 2011 21:43:21 +0300
Message-ID: <4E50003D.30803@os2.kiev.ua>
Date: Sat, 20 Aug 2011 20:43:09 +0200
From: Alex Samorukov <ml@os2.kiev.ua>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
	rv:1.9.2.18) Gecko/20110617 Lightning/1.0b2 Thunderbird/3.1.11
MIME-Version: 1.0
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org>
	<20110819232125.GA4965@icarus.home.lan>
	<B6B0AD0F-A74C-4F2C-88B0-101443D7831A@langille.org>
	<20110820032438.GA21925@icarus.home.lan>
	<4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org>
	<4E4FF4D6.1090305@os2.kiev.ua>
	<20110820183456.GA38317@icarus.home.lan>
In-Reply-To: <20110820183456.GA38317@icarus.home.lan>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-SA-Score: -1.0
Cc: freebsd-stable@freebsd.org, Dan Langille <dan@langille.org>
Subject: Re: bad sector in gmirror HDD
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 18:43:24 -0000


> "The SMART tests you did didn't really amount to anything; no surprise.
> short and long tests usually do not test the surface of the disk.  There
> are some drives which do it on a long test, but as I said before,
> everything varies from drive to drive."
>
It is not correct statement, sorry. Long test trying to read all the 
data from surface (and doing some other things).

// one of the smartmontools developers and sysutils/smartmontools 
maintainer.


From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 18:46:13 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8D01D10656E2;
	Sat, 20 Aug 2011 18:46:13 +0000 (UTC)
	(envelope-from melifaro@ipfw.ru)
Received: from mail.ipfw.ru (unknown [IPv6:2a01:4f8:120:6141::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 159938FC0A;
	Sat, 20 Aug 2011 18:46:13 +0000 (UTC)
Received: from v6.mpls.in ([2a02:978:2::5] helo=ws.su29.net)
	by mail.ipfw.ru with esmtpsa (TLSv1:AES256-SHA:256)
	(Exim 4.76 (FreeBSD)) (envelope-from <melifaro@ipfw.ru>)
	id 1QuqYa-0005fe-Sb; Sat, 20 Aug 2011 22:46:09 +0400
Message-ID: <4E500014.6030800@ipfw.ru>
Date: Sat, 20 Aug 2011 22:42:28 +0400
From: "Alexander V. Chernikov" <melifaro@ipfw.ru>
User-Agent: Thunderbird 2.0.0.24 (X11/20100515)
MIME-Version: 1.0
To: Alan Cox <alc@rice.edu>
References: <4E4143A6.6030307@digsys.bg>
	<935F8EC2-88E0-45A3-BE8B-7210BE223BC5@mac.com>
	<4e42a0c0.e2t/9MF98O3HFjb1%perryh@pluto.rain.com>
	<4E4CCA6C.8020408@ipfw.ru>
	<CAJUyCcMc7m65c_XjHNFi0A4cHHySC1brLS7HdivstxeOi6uFQw@mail.gmail.com>
	<20110820174147.GW17489@deviant.kiev.zoral.com.ua>
	<4E4FFAD3.4090706@rice.edu>
In-Reply-To: <4E4FFAD3.4090706@rice.edu>
X-Enigmail-Version: 0.96.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: Kostik Belousov <kostikbel@gmail.com>, alc@freebsd.org,
	perryh@pluto.rain.com, freebsd-stable@freebsd.org, daniel@digsys.bg
Subject: Re: 32GB limit per swap device?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 18:46:13 -0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Alan Cox wrote:
> On 08/20/2011 12:41, Kostik Belousov wrote:
>> On Sat, Aug 20, 2011 at 12:33:29PM -0500, Alan Cox wrote:
>>> On Thu, Aug 18, 2011 at 3:16 AM, Alexander V.
>>> Chernikov<melifaro@ipfw.ru>wrote:
>>>
>>>> On 10.08.2011 19:16, perryh@pluto.rain.com wrote:
>>>>
>>>>> Chuck Swiger<cswiger@mac.com>   wrote:
>>>>>
>>>>>   On Aug 9, 2011, at 7:26 AM, Daniel Kalchev wrote:
>>>>>>> I am trying to set up 64GB partitions for swap for a system that
>>>>>>> has 64GB of RAM (with the idea to dump kernel core etc). But, on
>>>>>>> 8-stable as of today I get:
>>>>>>>
>>>>>>> WARNING: reducing size to maximum of 67108864 blocks per swap unit
>>>>>>>
>>>>>>> Is there workaround for this limitation?
>>>>>>>
>>>> Another interesting question:
>>>>
>>>> swap pager operates in page blocks (PAGE_SIZE=4k on common arch).
>>>>
>>>> Block device size in passed to swaponsomething() in number of _disk_
>>>> blocks
>>>>   (e.g. in DEV_BSIZE=512). After that, kernel b-lists (on top of
>>>> which swap
>>>> pager is build) maximum objects check is enforced.
>>>>
>>>> The (possible) problem is that real object count we will operate on
>>>> is not
>>>> the value passed to swaponsomething() since it is calculated in
>>>> wrong units.
>>>>
>>>> we should check b-list limit on (X * DEV_BSIZE512 / PAGE_SIZE) value
>>>> which
>>>> is rough (X / 8) so we should be able to address 32*8=256G.
>>>>
>>>> The code should look like this:
>>>>
>>>> Index: vm/swap_pager.c
>>>> ==============================**==============================**=======
>>>> --- vm/swap_pager.c     (revision 223877)
>>>> +++ vm/swap_pager.c     (working copy)
>>>> @@ -2129,6 +2129,15 @@ swaponsomething(struct vnode *vp, void *id,
>>>> u_long
>>>>         u_long mblocks;
>>>>
>>>>         /*
>>>> +        * nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd
>>>> chunks.
>>>> +        * First chop nblks off to page-align it, then convert.
>>>> +        *
>>>> +        * sw->sw_nblks is in page-sized chunks now too.
>>>> +        */
>>>> +       nblks&= ~(ctodb(1) - 1);
>>>> +       nblks = dbtoc(nblks);
>>>> +
>>>> +       /*
>>>>
>>>>          * If we go beyond this, we get overflows in the radix
>>>>          * tree bitmap code.
>>>>          */
>>>> @@ -2138,14 +2147,6 @@ swaponsomething(struct vnode *vp, void *id,
>>>> u_long
>>>>                         mblocks);
>>>>                 nblks = mblocks;
>>>>         }
>>>> -       /*
>>>> -        * nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd
>>>> chunks.
>>>> -        * First chop nblks off to page-align it, then convert.
>>>> -        *
>>>> -        * sw->sw_nblks is in page-sized chunks now too.
>>>> -        */
>>>> -       nblks&= ~(ctodb(1) - 1);
>>>> -       nblks = dbtoc(nblks);
>>>>
>>>>         sp = malloc(sizeof *sp, M_VMPGDATA, M_WAITOK | M_ZERO);
>>>>         sp->sw_vp = vp;
>>>>
>>>>
>>>> (move pages recalculation before b-list check)
>>>>
>>>>
>>>> Can someone comment on this?
>>>>
>>>>
>>> I believe that you are correct.  Have you tried testing this change on a
>>> large swap device?
I will try tomorrow.

>> I probably agree too, but I am in the process of re-reading the swap
>> code,
>> and I do not quite believe in the limit.
>>
> 
> I'm uncertain whether the current limit, "0x40000000 /
> BLIST_META_RADIX", is exact or not, but I doubt that it is too large.

It is not exact.  It is rough estimation of
sizeof(blmeta_t) * X < 4G (blist_create() assumes malloc() not being
able to allocate more that 4G. I'm not sure if it is true this days)
X is number of blocks we need to store. Actual number, however, it is X
/ (1 + 1/BLIST_META_RADIX + 1/BLIST_META_RADIX^2 + ...) but it dffers
from X not very much.

blist can be seen as tree of radix trees, with metainformation for all
those radix trees allocated by single allocation which imposes this
limit. Metatinformation is used to find free blocks more quickly

Single linear allocation is required to advance to next radix tree on
the same level very fast:


*   *   *   *   *
**  **  **  **  **
********************
^^^
Some kind of schema with 3 level in tree and BLIST_META_RADIX=2 (instead
of 16).


> 
>> When the initial code was committed, our daddr_t was 32bit, I checked
>> the RELENG_4 sources. Current code uses int64_t for daddr_t. My
>> impression
>> right now is that we only utilize the low 32bits of daddr_t.
>>
>> Esp. interesting looks the following typedef:
>> typedef    uint32_t    u_daddr_t;    /* unsigned disk address */
>> which (correctly) means that typical mask (u_daddr_t)-1 is 0xffffffff.
>>
>> I wonder whether we could just use full 64bit and de-facto remove the
>> limitation on the swap partition size.

This will increase struct blmeta_t twice and cause 2*X memory usage for
every swap configuration.

> 
> I would rather argue first that the subr_list code should not be using
> daddr_t all.  The code is abusing daddr_t and defining u_daddr_t to
> represent things that are not disk addresses.  Instead, it should either
> define its own type or directly use (u)int*_t.  Then, as for choosing
> between 32 and 64 bits, I'm skeptical of using this structure for
> managing more than 32 bits worth of blocks, given the amount of RAM it
> will use.
> 
> 
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk5QABQACgkQwcJ4iSZ1q2kdXwCfWPN48wauijoGOQCUaalYnFCR
BIgAnRLCuDmPwySp1gd0xf+UPG5nC7KJ
=sP6M
-----END PGP SIGNATURE-----

From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 19:17:31 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D7EF61065672
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 19:17:31 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta13.emeryville.ca.mail.comcast.net
	(qmta13.emeryville.ca.mail.comcast.net [76.96.27.243])
	by mx1.freebsd.org (Postfix) with ESMTP id BC2748FC15
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 19:17:31 +0000 (UTC)
Received: from omta18.emeryville.ca.mail.comcast.net ([76.96.30.74])
	by qmta13.emeryville.ca.mail.comcast.net with comcast
	id NjED1h0011bwxycADjHTjs; Sat, 20 Aug 2011 19:17:27 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta18.emeryville.ca.mail.comcast.net with comcast
	id NjGv1h0061t3BNj8ejGxAZ; Sat, 20 Aug 2011 19:16:57 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id B2DDA102C1A; Sat, 20 Aug 2011 12:17:26 -0700 (PDT)
Date: Sat, 20 Aug 2011 12:17:26 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Alex Samorukov <ml@os2.kiev.ua>
Message-ID: <20110820191726.GA39027@icarus.home.lan>
References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org>
	<20110819232125.GA4965@icarus.home.lan>
	<B6B0AD0F-A74C-4F2C-88B0-101443D7831A@langille.org>
	<20110820032438.GA21925@icarus.home.lan>
	<4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org>
	<4E4FF4D6.1090305@os2.kiev.ua>
	<20110820183456.GA38317@icarus.home.lan>
	<4E50003D.30803@os2.kiev.ua>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4E50003D.30803@os2.kiev.ua>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-stable@freebsd.org, Dan Langille <dan@langille.org>
Subject: Re: bad sector in gmirror HDD
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 19:17:31 -0000

On Sat, Aug 20, 2011 at 08:43:09PM +0200, Alex Samorukov wrote:
> 
> >"The SMART tests you did didn't really amount to anything; no surprise.
> >short and long tests usually do not test the surface of the disk.  There
> >are some drives which do it on a long test, but as I said before,
> >everything varies from drive to drive."
> >
> It is not correct statement, sorry. Long test trying to read all the
> data from surface (and doing some other things).
>
> // one of the smartmontools developers and sysutils/smartmontools
> maintainer.

That's great, but too bad it's generally not true in practise.  Dan's
long scan on his site proves it, and I've dealt with this situation
myself many times over.

SMART long tests *may* do a surface scan, but in most cases they just
seem to do something that's similar to "short" but over a longer period
of time.  Furthermore, some which *do* do a surface scan on a "long"
test don't always report LBA failures in the self-test log.  I've
personally seen this happen on Western Digital disks (model strings are
unknown, I'm certain I've rid myself of those drives).  Firmware
bug/quirk?  Possibly, but at the end of the day it doesn't matter -- it
means the end-user has wasted 2-3 hours for something that tests OK yet
we know for a fact isn't OK.

I *have* seen a drive do a surface scan on a "long" test and report LBAs
it couldn't read, but as I said, it's rare and varies from vendor to
vendor, drive to drive, and firmware to firmware.  When it happened I
was very, very surprised (and delighted).

The only thing I can trust 100% of the time when it comes to surface
scans is SMART selective scans (if available, which again the OP's drive
does not offer this), or using dd or a read-per-LBA on the OS level
(which works everywhere).

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 19:17:39 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8383A1065700;
	Sat, 20 Aug 2011 19:17:39 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id E54DF8FC1B;
	Sat, 20 Aug 2011 19:17:38 +0000 (UTC)
Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua
	[10.1.1.148])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p7KJHQON072993
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sat, 20 Aug 2011 22:17:26 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id
	p7KJHQxE011642; Sat, 20 Aug 2011 22:17:26 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p7KJHQoC011641; 
	Sat, 20 Aug 2011 22:17:26 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Sat, 20 Aug 2011 22:17:26 +0300
From: Kostik Belousov <kostikbel@gmail.com>
To: "Alexander V. Chernikov" <melifaro@ipfw.ru>
Message-ID: <20110820191726.GY17489@deviant.kiev.zoral.com.ua>
References: <4E4143A6.6030307@digsys.bg>
	<935F8EC2-88E0-45A3-BE8B-7210BE223BC5@mac.com>
	<4e42a0c0.e2t/9MF98O3HFjb1%perryh@pluto.rain.com>
	<4E4CCA6C.8020408@ipfw.ru>
	<CAJUyCcMc7m65c_XjHNFi0A4cHHySC1brLS7HdivstxeOi6uFQw@mail.gmail.com>
	<20110820174147.GW17489@deviant.kiev.zoral.com.ua>
	<4E4FFAD3.4090706@rice.edu> <4E500014.6030800@ipfw.ru>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="fXc9gqH37d6mfFz8"
Content-Disposition: inline
In-Reply-To: <4E500014.6030800@ipfw.ru>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-3.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,
	DNS_FROM_OPENWHOIS autolearn=no version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: alc@freebsd.org, freebsd-stable@freebsd.org, daniel@digsys.bg,
	perryh@pluto.rain.com, Alan Cox <alc@rice.edu>
Subject: Re: 32GB limit per swap device?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 19:17:39 -0000


--fXc9gqH37d6mfFz8
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sat, Aug 20, 2011 at 10:42:28PM +0400, Alexander V. Chernikov wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>=20
> Alan Cox wrote:
> > On 08/20/2011 12:41, Kostik Belousov wrote:
> >> On Sat, Aug 20, 2011 at 12:33:29PM -0500, Alan Cox wrote:
> >>> On Thu, Aug 18, 2011 at 3:16 AM, Alexander V.
> >>> Chernikov<melifaro@ipfw.ru>wrote:
> >>>
> >>>> On 10.08.2011 19:16, perryh@pluto.rain.com wrote:
> >>>>
> >>>>> Chuck Swiger<cswiger@mac.com>   wrote:
> >>>>>
> >>>>>   On Aug 9, 2011, at 7:26 AM, Daniel Kalchev wrote:
> >>>>>>> I am trying to set up 64GB partitions for swap for a system that
> >>>>>>> has 64GB of RAM (with the idea to dump kernel core etc). But, on
> >>>>>>> 8-stable as of today I get:
> >>>>>>>
> >>>>>>> WARNING: reducing size to maximum of 67108864 blocks per swap unit
> >>>>>>>
> >>>>>>> Is there workaround for this limitation?
> >>>>>>>
> >>>> Another interesting question:
> >>>>
> >>>> swap pager operates in page blocks (PAGE_SIZE=3D4k on common arch).
> >>>>
> >>>> Block device size in passed to swaponsomething() in number of _disk_
> >>>> blocks
> >>>>   (e.g. in DEV_BSIZE=3D512). After that, kernel b-lists (on top of
> >>>> which swap
> >>>> pager is build) maximum objects check is enforced.
> >>>>
> >>>> The (possible) problem is that real object count we will operate on
> >>>> is not
> >>>> the value passed to swaponsomething() since it is calculated in
> >>>> wrong units.
> >>>>
> >>>> we should check b-list limit on (X * DEV_BSIZE512 / PAGE_SIZE) value
> >>>> which
> >>>> is rough (X / 8) so we should be able to address 32*8=3D256G.
> >>>>
> >>>> The code should look like this:
> >>>>
> >>>> Index: vm/swap_pager.c
> >>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D**=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D**=3D=3D=3D=3D=3D=3D=3D
> >>>> --- vm/swap_pager.c     (revision 223877)
> >>>> +++ vm/swap_pager.c     (working copy)
> >>>> @@ -2129,6 +2129,15 @@ swaponsomething(struct vnode *vp, void *id,
> >>>> u_long
> >>>>         u_long mblocks;
> >>>>
> >>>>         /*
> >>>> +        * nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd
> >>>> chunks.
> >>>> +        * First chop nblks off to page-align it, then convert.
> >>>> +        *
> >>>> +        * sw->sw_nblks is in page-sized chunks now too.
> >>>> +        */
> >>>> +       nblks&=3D ~(ctodb(1) - 1);
> >>>> +       nblks =3D dbtoc(nblks);
> >>>> +
> >>>> +       /*
> >>>>
> >>>>          * If we go beyond this, we get overflows in the radix
> >>>>          * tree bitmap code.
> >>>>          */
> >>>> @@ -2138,14 +2147,6 @@ swaponsomething(struct vnode *vp, void *id,
> >>>> u_long
> >>>>                         mblocks);
> >>>>                 nblks =3D mblocks;
> >>>>         }
> >>>> -       /*
> >>>> -        * nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd
> >>>> chunks.
> >>>> -        * First chop nblks off to page-align it, then convert.
> >>>> -        *
> >>>> -        * sw->sw_nblks is in page-sized chunks now too.
> >>>> -        */
> >>>> -       nblks&=3D ~(ctodb(1) - 1);
> >>>> -       nblks =3D dbtoc(nblks);
> >>>>
> >>>>         sp =3D malloc(sizeof *sp, M_VMPGDATA, M_WAITOK | M_ZERO);
> >>>>         sp->sw_vp =3D vp;
> >>>>
> >>>>
> >>>> (move pages recalculation before b-list check)
> >>>>
> >>>>
> >>>> Can someone comment on this?
> >>>>
> >>>>
> >>> I believe that you are correct.  Have you tried testing this change o=
n a
> >>> large swap device?
> I will try tomorrow.
>=20
> >> I probably agree too, but I am in the process of re-reading the swap
> >> code,
> >> and I do not quite believe in the limit.
> >>
> >=20
> > I'm uncertain whether the current limit, "0x40000000 /
> > BLIST_META_RADIX", is exact or not, but I doubt that it is too large.
>=20
> It is not exact.  It is rough estimation of
> sizeof(blmeta_t) * X < 4G (blist_create() assumes malloc() not being
> able to allocate more that 4G. I'm not sure if it is true this days)
> X is number of blocks we need to store. Actual number, however, it is X
> / (1 + 1/BLIST_META_RADIX + 1/BLIST_META_RADIX^2 + ...) but it dffers
> from X not very much.
>=20
> blist can be seen as tree of radix trees, with metainformation for all
> those radix trees allocated by single allocation which imposes this
> limit. Metatinformation is used to find free blocks more quickly
>=20
> Single linear allocation is required to advance to next radix tree on
> the same level very fast:
>=20
>=20
> *   *   *   *   *
> **  **  **  **  **
> ********************
> ^^^
> Some kind of schema with 3 level in tree and BLIST_META_RADIX=3D2 (instead
> of 16).
>=20
>=20
>=20
> >=20
> >> When the initial code was committed, our daddr_t was 32bit, I checked
> >> the RELENG_4 sources. Current code uses int64_t for daddr_t. My
> >> impression
> >> right now is that we only utilize the low 32bits of daddr_t.
> >>
> >> Esp. interesting looks the following typedef:
> >> typedef    uint32_t    u_daddr_t;    /* unsigned disk address */
> >> which (correctly) means that typical mask (u_daddr_t)-1 is 0xffffffff.
> >>
> >> I wonder whether we could just use full 64bit and de-facto remove the
> >> limitation on the swap partition size.
>=20
> This will increase struct blmeta_t twice and cause 2*X memory usage for
> every swap configuration.
No, daddr_t is already 64bit. Nothing will increase.
My point is the current limitation is artificial.

I think Alan note referred to the amount of the radix tree nodes
required to cover the large swap partition. But it could be a good
temporary measure.

I expect to be able to provide some numeric evidence later.
>=20
> >=20
> > I would rather argue first that the subr_list code should not be using
> > daddr_t all.  The code is abusing daddr_t and defining u_daddr_t to
> > represent things that are not disk addresses.  Instead, it should either
> > define its own type or directly use (u)int*_t.  Then, as for choosing
> > between 32 and 64 bits, I'm skeptical of using this structure for
> > managing more than 32 bits worth of blocks, given the amount of RAM it
> > will use.
> >=20
> >=20
> >=20
>=20
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.14 (FreeBSD)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>=20
> iEYEARECAAYFAk5QABQACgkQwcJ4iSZ1q2kdXwCfWPN48wauijoGOQCUaalYnFCR
> BIgAnRLCuDmPwySp1gd0xf+UPG5nC7KJ
> =3DsP6M
> -----END PGP SIGNATURE-----

--fXc9gqH37d6mfFz8
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iEYEARECAAYFAk5QCEYACgkQC3+MBN1Mb4g3VQCfYlGrzdJOUw3Z2pL0mAWpb9fK
6hsAoLHoHVBteVjYBCRBEfRGCbACp6HU
=BGLI
-----END PGP SIGNATURE-----

--fXc9gqH37d6mfFz8--

From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 19:44:46 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E922D106566B
	for <stable@freebsd.org>; Sat, 20 Aug 2011 19:44:46 +0000 (UTC)
	(envelope-from wjw@digiware.nl)
Received: from mail.digiware.nl (mail.ip6.digiware.nl
	[IPv6:2001:4cb8:1:106::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 839148FC0C
	for <stable@freebsd.org>; Sat, 20 Aug 2011 19:44:46 +0000 (UTC)
Received: from rack1.digiware.nl (localhost.digiware.nl [127.0.0.1])
	by mail.digiware.nl (Postfix) with ESMTP id F29FC153434
	for <stable@freebsd.org>; Sat, 20 Aug 2011 21:44:44 +0200 (CEST)
X-Virus-Scanned: amavisd-new at digiware.nl
Received: from mail.digiware.nl ([127.0.0.1])
	by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new,
	port 10024) with ESMTP id R7qEMSDvelXP for <stable@freebsd.org>;
	Sat, 20 Aug 2011 21:44:43 +0200 (CEST)
Received: from [IPv6:2001:4cb8:3:1:c02b:ce62:71ff:9cbc] (unknown
	[IPv6:2001:4cb8:3:1:c02b:ce62:71ff:9cbc])
	(using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.digiware.nl (Postfix) with ESMTPSA id 1BD16153433
	for <stable@freebsd.org>; Sat, 20 Aug 2011 21:44:43 +0200 (CEST)
Message-ID: <4E500EAE.10005@digiware.nl>
Date: Sat, 20 Aug 2011 21:44:46 +0200
From: Willem Jan Withagen <wjw@digiware.nl>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
	rv:6.0) Gecko/20110812 Thunderbird/6.0
MIME-Version: 1.0
To: "stable@freebsd.org" <stable@freebsd.org>
References: <4E4F973D.9070706@digiware.nl> <4E4F99E4.8060009@digiware.nl>
In-Reply-To: <4E4F99E4.8060009@digiware.nl>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: 
Subject: Re: Remote installing
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 19:44:47 -0000

On 20-8-2011 13:26, Willem Jan Withagen wrote:
> On 2011-08-20 13:15, Willem Jan Withagen wrote:
>> Hi,
>>
>> Today I liked to live dangerously, and want to upgrade a backups server
>> from i386 to amd64. Just to see if we could.
>> And otherwise I'd scap it and install from usb-stick.
>>
>> So I have my server running amd64 build GENERIC.
>> export /, /var, /usr on the server to be upgraded.
>>
>> But upgrading world dus have a snag already early on:
>>
>> ----
>> empty changed
>> flags expected "schg" found "none" not modified: Operation not supported
>> ----
>>
>> This is probably where some program wants to set immutable flag on
>> /var/tmp/empy...
>>
>> But looks like NFS does not grok that.
>>
>> Now I seen plenty of sugestions to do it this way, but never saw anybody
>> come back with this complaint....
>>
>> So I must be ommiting something ??
>
> I looked at the work errors.
> -----------
> cd /mnt/; rm -f /mnt/sys; ln -s usr/src/sys sys
> cd /mnt/usr/share/man/en.ISO8859-1; ln -sf ../man* .
> ln: ./man1: Permission denied
> ln: ./man1aout: Permission denied
> ln: ./man2: Permission denied
> ln: ./man3: Permission denied
> ln: ./man4: Permission denied
> ln: ./man5: Permission denied
> ln: ./man6: Permission denied
> ln: ./man7: Permission denied
> ln: ./man8: Permission denied
> ln: ./man9: Permission denied
> ---------
>
> Which comes from the target distrib-dirs in etc
>
> Why would an ln -sf like that fail....
> the filesystems are exported with -maproot=0

Well turned out that the easiest fix was to run
	chflags -R noschg /
at the client, because certain files are immutable and once you run into 
those, it is hard to fix it after the fact.

Next would be to move /lib and /usr/lib out of the way. So that doesn't 
cause conflict in near future.
Which will cause new programs to start to fail. So better make shure 
that every thing is set before you start upgrading over NFS.

But I did manage to get it "upgraded" from i386 to amd64.

--WjW

From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 19:57:05 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 01657106564A
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 19:57:05 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta07.emeryville.ca.mail.comcast.net
	(qmta07.emeryville.ca.mail.comcast.net [76.96.30.64])
	by mx1.freebsd.org (Postfix) with ESMTP id DA1778FC16
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 19:57:04 +0000 (UTC)
Received: from omta18.emeryville.ca.mail.comcast.net ([76.96.30.74])
	by qmta07.emeryville.ca.mail.comcast.net with comcast
	id Njss1h0011bwxycA7jx06k; Sat, 20 Aug 2011 19:57:00 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta18.emeryville.ca.mail.comcast.net with comcast
	id NjwX1h0051t3BNj8ejwXUX; Sat, 20 Aug 2011 19:56:31 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id A4ACA102C1A; Sat, 20 Aug 2011 12:57:02 -0700 (PDT)
Date: Sat, 20 Aug 2011 12:57:02 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Dan Langille <dan@langille.org>
Message-ID: <20110820195702.GA39109@icarus.home.lan>
References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org>
	<20110819232125.GA4965@icarus.home.lan>
	<B6B0AD0F-A74C-4F2C-88B0-101443D7831A@langille.org>
	<20110820032438.GA21925@icarus.home.lan>
	<4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-stable@freebsd.org
Subject: Re: bad sector in gmirror HDD
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 19:57:05 -0000

Dan, sorry for the previous mail.  Seems my schedule today has just
unexpected changed; I had social events to deal with but as I found out
a few minutes ago those events are cancelled, which means I have time
today to look at your mail.

On Sat, Aug 20, 2011 at 01:34:41PM -0400, Dan Langille wrote:
> On Aug 19, 2011, at 11:24 PM, Jeremy Chadwick wrote:
> > The SMART error log also indicates an LBA failure at the 26000 hour mark
> > (which is 16 hours prior to when you did smartctl -a /dev/ad2).  Whether
> > that LBA is the remapped one or the suspect one is unknown.  The LBA was
> > 5566440.
> > 
> > The SMART tests you did didn't really amount to anything; no surprise.
> > short and long tests usually do not test the surface of the disk.  There
> > are some drives which do it on a long test, but as I said before,
> > everything varies from drive to drive.
> > 
> > Furthermore, on this model of drive, you cannot do a surface scans via
> > SMART.  Bummer.  That's indicated in the "Offline data collection
> > capabilities" section at the top, where it reads:
> > 
> > 	No Selective Self-test supported.
> > 
> > So you'll have to use the dd method.  This takes longer than if surface
> > scanning was supported by the drive, but is acceptable.  I'll get to how
> > to go about that in a moment.
> 
> FWIW, I've done a dd read of the entire suspect disk already.  Just two errors.

Actually one error -- keep reading.

> From the URL mentioned above:
> 
> [root@bast:~] # dd of=/dev/null if=/dev/ad2 bs=1m conv=noerror
> dd: /dev/ad2: Input/output error
> 2717+0 records in
> 2717+0 records out
> 2848980992 bytes transferred in 127.128503 secs (22410246 bytes/sec)
> dd: /dev/ad2: Input/output error
> 38170+1 records in
> 38170+1 records out
> 40025063424 bytes transferred in 1544.671423 secs (25911701 bytes/sec)
> [root@bast:~] # 
> 
> That seems to indicate two problems.  Are those the values I should be using 
> with dd?

The "values" you refer to are byte offsets, not LBAs.  Furthermore, you
used a block size of 1 megabyte (not sure why people keep doing this).
LBA size on your drive is 512 bytes; asking for 1 megabyte in dd is
going to make the drive try to read() 1MByte, and an I/O error could
happen anywhere within that 1MByte range.  (1024*1024) / 512 == 2048
LBAs make up 1MByte.

Next, remember that the "noerror" attribute has some quirks associated
with it that need to be kept in mind.  The man page discusses these.

Finally, I believe the last I/O error you see (at byte 40025063424) is
normal given what you told dd to do.  It was trying to use bs=1m, and
your drive has a capacity limit of 40027029504 bytes.  I'm left to
believe you had a "short read" (less than 1MByte), so this is normal.
40027029504 / (1024*1024) == 38172.75, which is not a round number,
hence the error.

> I did some more precise testing:
> 
> # time dd of=/dev/null if=/dev/ad2 bs=512 iseek=5566440
> dd: /dev/ad2: Input/output error
> 9+0 records in
> 9+0 records out
> 4608 bytes transferred in 5.368668 secs (858 bytes/sec)
> 
> real	0m5.429s
> user	0m0.000s
> sys	0m0.010s
> 
> NOTE: that's 9 blocks later than mentioned in smarctl
> 
> The above generated this in /var/log/messages:
> 
> Aug 20 17:29:25 bast kernel: ad2: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=5566449

Your dd command above is saying "use a block size of 512 bytes, and read
indefinitely from /dev/ad2, starting with an lseek() on /dev/ad2 of
5566440".  You then get an I/O error "somewhere" from where you start to
when the device ends.  You're assuming that the "number of bytes
transferred" indicates where the actual error happened, which in my
experience is not always true.

What really needs to happen here is use of count=1, and you adjusting
iseek manually per each LBA.  Or you could use the script I wrote and
let the computer do it for you.  :-)

I understand what you're getting at, re: "that's 9 blocks later".  But
the OS does some caching of I/O and so on sometimes, or aggregates
block reads larger than physical LBA size, so that may be what's going
on here.  However, if you keep reading, you might find your answer is
that you may (still unsure) have other LBAs which are now marked suspect.

> > That said:
> > 
> > http://jdc.parodius.com/freebsd/bad_block_scan
> > 
> > If you run this on your ad2 drive, I'm hoping what you'll find are two
> > LBAs which can't be read -- one will be the remapped LBA and one will be
> > the "suspect" LBA.  If you only get one LBA error then that's fine too,
> > and will be the "suspect" LBA.
> 
> > Once you have the LBA(s), you can submit writes to them to get the drive
> > to re-analyse them (assuming they're "suspect"):
> > 
> > dd if=/dev/zero of=/dev/XXX bs=512 count=1 seek=NNNNN
> > 
> > Where XXX is the device and NNNNN is the LBA number.
> > 
> > If this works properly, the dd command should sit there for a little bit
> > (as the drive does its re-analysis magic) and then should complete.
> 
> ad2 is part of a gmirror with ad0.   Does this change things?
> 
> I haven't tried the dd yet.

It does not change things, but I don't know what's going to happen if
you do write commands to the device directly while the drive is still
attached in gmirror.

When I encounter a disk that's behaving like this, I immediately remove
it from the pool/mirror so I can work on it.  I do not trust the OS to
do things like not panic/crash/behave weirdly when doing these things.

> > You'll want to check SMART stats after that; you should see
> > Current_Pending_Sector drop to 0.  If Offline_Uncorrectable incremented
> > then the LBA could not be re-read/remapped.
> 
> It did increment:
> 
> 197 Current_Pending_Sector  0x0032   100   100   020    Old_age   Always       -       2
> 
> [was 1]

What this means is that you have *another* LBA the drive found and
marked suspect.  This could have happened any time; possibly during the
above dd you did, possibly during normal read operation (assuming the
drive is still handling I/O as part of your mirror).

> >  If Reallocated_Sector_Ct
> > incremented then you now have a total of 2 LBAs which are remapped.
> 
> It did increment:
> 
> $ diff smarctl.1 smarctl.3 | grep Reallocated_Sector_Ct
> <   5 Reallocated_Sector_Ct   0x0033   100   100   020    Pre-fail  Always       -       1
> >   5 Reallocated_Sector_Ct   0x0033   100   100   020    Pre-fail  Always       -       2
>
> Full output of smartctl has been appended to http://beta.freebsddiary.org/smart-fixing-bad-sector.php

But you didn't issue any writes to the drive (quote: "I haven't tried
the dd yet"), so I cannot explain why this attribute would increment.
Unless you *did* try the dd?  I don't know; there's not enough
information here for me to ascertain what may have happened between this
paragraph and a couple paragraphs up.

To me, this looks like a write to the drive was issued either manually
(with the dd or if the drive is still in use for I/O by gmirror) and
happened to hit an LBA which was previously marked suspect -- and
induced a remap.

Alternately -- and this is just as plausible as what I just described --
the drive may have a firmware quirk/bug/behavioural different from what
I'm used to, where Current_Pending_Sector acts as a counter (e.g. it
will never reset to zero).  Maxtor "should" be using
Reallocated_Event_Count for this (since that's what it's for; it
indicates failures OR successes), but as I've said time and time again,
the behaviour varies from drive to drive, model to model, and firmware
to firmware.

Also alternatively, there's the whole "smartctl -t offline" ordeal which
might update the attribute data, but it's labelled Old_age not Offline,
so I don't think this would be the case (unless there's a bug in the
firmware or mislabeling of the attribute in the firmware for this drive).

The thing about bad LBAs is that they often come in groups/bunches; dust
on the drive, some region loses its magnetic integrity, etc...  Your
drive is ""old"" (27416 hours = 1142 days = 3.1 years) so it's
understandable IMO.

The only way to know for sure would be to do a surface scan on the drive
and see if any more I/O errors show up.  If they do, I would recommend
just writing zeros from LBA 0 all the way to the end of the drive, then
afterward see what the SMART attributes look like.  "dd if=/dev/zero
of=/dev/ad2 bs=64k" would do the trick (in this case 'bs' doesn't matter
since all you're trying to do is zero the drive; doesn't matter if
writes get aggregated or not).

> > In
> > the case of remapping, you get to deal with the UFS/FFS thing above.
> > To get the stats to update in this situation you *might* (but probably
> > not) have to run "smartctl -t offline /dev/XXX".
> 
> I didn't try that...
> 
> > You might also be wondering "that dd command writes 512 bytes of zero to
> > that LBA; what about the old data that was there, in the case that the
> > drive remaps the LBA?"  This is a great question, and one I've never
> > actually taken the time to answer because at this present time I have
> > absolutely *no* bad disks in my possession.  I'm under the impression
> > that the write does in fact write zeros if the LBA is remapped, but that
> > might not be true at all.  I've been waiting to test this for quite some
> > time and document it/write about it.
> > 
> > I still suggest you replace the drive, although given its age I doubt
> > you'll be able to find a suitable replacement.  I tend to keep disks
> > like this around for testing/experimental purposes and not for actual
> > use.
> 
> I have several unused 80GB HDD I can place into this system.  I think that's
> what I'll wind up doing.  But I'd like to follow this process through and get it documented
> for future reference.

Yes, given the behaviour of the drive I would recommend you simply
replace it at this point in time.  What concerns me the most is
Current_Pending_Sector incrementing, but it's impossible for me to
determine if that incrementing means there are other LBAs which are bad,
or if the drive is behaving how its firmware is designed.

Keep the drive around for further experiments/tinkering if you're
interested.  Stuff like this is always interesting/fun as long as your
data isn't at risk, so doing the replacement first would be best
(especially if both drives in your mirror were bought at the same time
from the same place and have similar manufacturing plants/dates on
them).

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 20:00:19 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B79E2106566B;
	Sat, 20 Aug 2011 20:00:17 +0000 (UTC)
	(envelope-from prvs=12137168ef=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	by mx1.freebsd.org (Postfix) with ESMTP id C2F038FC0C;
	Sat, 20 Aug 2011 20:00:16 +0000 (UTC)
X-MDAV-Processed: mail1.multiplay.co.uk, Sat, 20 Aug 2011 20:59:42 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Sat, 20 Aug 2011 20:59:41 +0100
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	mail1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST
	shortcircuit=ham autolearn=disabled version=3.2.5
Received: from r2d2 ([188.220.16.49])
	by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	(MDaemon PRO v10.0.4) with ESMTP id md50014676027.msg;
	Sat, 20 Aug 2011 20:59:41 +0100
X-MDRemoteIP: 188.220.16.49
X-Return-Path: prvs=12137168ef=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
Message-ID: <B367F6CE151B4B6A8B0E048022D7F8AB@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Andriy Gapon" <avg@FreeBSD.org>
References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E43E272.1060204@FreeBSD.org><62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk><4E440865.1040500@FreeBSD.org><6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk><4E441314.6060606@FreeBSD.org><2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk><4E48D967.9060804@FreeBSD.org><9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk><4E490DAF.1080009@FreeBSD.org><796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk><4E491D01.1090902@FreeBSD.org><570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk><4E4AD35C.7020504@FreeBSD.org><6A7238AED44542A880B082A40304D940@multiplay.co.uk><4E4BA21F.6010805@FreeBSD.org><581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk><4E4BBA7F.30907@FreeBSD.org><88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk><4E4C22D6.6070407@FreeBSD.org><4019027648B5493AAC4B654BD821DE88@multiplay.co.uk><4E4F8631.1070300@FreeBSD.org>
	<4E4F8821.80108@Fre eBSD.org>
	<82E865FBA30747078AF6EE3C1701F973@multiplay.co.uk>
	<4E4FE55A.9000101@ FreeBSD.org>
Date: Sat, 20 Aug 2011 21:01:00 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
Cc: freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 20:00:20 -0000

----- Original Message ----- 
From: "Andriy Gapon" <avg@FreeBSD.org>

> thanks for doing this!  I'll reiterate my suspicion just in case - I think that
> you should look for the cases where you stop a jail, but then re-attach and
> resurrect the jail before it's completely dead.

Yer that's where I think its happening too, but I also suspect its not just
dieing jail that's needed, I think its a dieing jail in the final stages of
cleanup.

Looking through the code I believe I may have noticed a scenario which could
trigger the problem.

Given the following code:-

static void
prison_deref(struct prison *pr, int flags)
{
    struct prison *ppr, *tpr;
    int vfslocked;

    if (!(flags & PD_LOCKED))
        mtx_lock(&pr->pr_mtx);
    /* Decrement the user references in a separate loop. */
    if (flags & PD_DEUREF) {
        for (tpr = pr;; tpr = tpr->pr_parent) {
            if (tpr != pr)
                mtx_lock(&tpr->pr_mtx);
            if (--tpr->pr_uref > 0)
                break;
            KASSERT(tpr != &prison0, ("prison0 pr_uref=0"));
            mtx_unlock(&tpr->pr_mtx);
        }
        /* Done if there were only user references to remove. */
        if (!(flags & PD_DEREF)) {
            mtx_unlock(&tpr->pr_mtx);
            if (flags & PD_LIST_SLOCKED)
                sx_sunlock(&allprison_lock);
            else if (flags & PD_LIST_XLOCKED)
                sx_xunlock(&allprison_lock);
            return;
        }
        if (tpr != pr) {
            mtx_unlock(&tpr->pr_mtx);
            mtx_lock(&pr->pr_mtx);
        }
    }

If you take a scenario of a simple one level prison setup running a single process
where a prison has just been stopped.

In the above code pr_uref of the processes prison is decremented. As this is the
last process then pr_uref will hit 0 and the loop continues instead of breaking
early.

Now at the end of the loop iteration the mtx is unlocked so other process can
now manipulate the jail, this is where I think the problem may be.

If we now have another process come in and attach to the jail but then instantly
exit, this process may allow another kernel thread to hit this same bit of code
and so two process for the same prison get into the section which decrements
prison0's pr_uref, instead of only one.

In essence I think we can get the following flow where 1# = process1
and 2# = process2
1#1. prison1.pr_uref = 1 (single process jail)
1#2. prison_deref( prison1,...
1#3. prison1.pr_uref-- (prison1.pr_uref = 0)
1#3. prison1.mtx_unlock <-- this now allows others to change prison1.pr_uref
1#3. prison0.pr_uref--
2#1. process1.attach( prison1 ) (prison1.pr_uref = 1)
2#2. process1.exit
2#3. prison_deref( prison1,...
2#4. prison1.pr_uref-- (prison1.pr_uref = 0)
2#5. prison1.mtx_unlock <-- this now allows others to change prison1.pr_uref
2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented twice by prison1)

It seems like the action on the parent prison to decrement the pr_uref is
happening too early, while the jail can still be used and without the lock on
the child jails mtx, so causing a race condition.

I think the fix is to the move the decrement of parent prison pr_uref's down
so it only takes place if the jail is "really" being removed. Either that or
to change the locking semantics so that once the lock is aquired in this
prison_deref its not unlocked until the function completes.

What do people think?

    Regards
    Steve


================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 20:07:47 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DDABF1065674
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 20:07:47 +0000 (UTC)
	(envelope-from dan@langille.org)
Received: from nyi.unixathome.org (nyi.unixathome.org [64.147.113.42])
	by mx1.freebsd.org (Postfix) with ESMTP id AC1428FC15
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 20:07:47 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
	by nyi.unixathome.org (Postfix) with ESMTP id E34C950A09;
	Sat, 20 Aug 2011 20:07:46 +0000 (UTC)
X-Virus-Scanned: amavisd-new at unixathome.org
Received: from nyi.unixathome.org ([127.0.0.1])
	by localhost (nyi.unixathome.org [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id 1Xf1KD8QjThm; Sat, 20 Aug 2011 21:07:46 +0100 (BST)
Received: from smtp-auth.unixathome.org (smtp-auth.unixathome.org [10.4.7.7])
	(Authenticated sender: hidden)
	by nyi.unixathome.org (Postfix) with ESMTPSA id 781F1509F3  ;
	Sat, 20 Aug 2011 20:07:46 +0000 (UTC)
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset=us-ascii
From: Dan Langille <dan@langille.org>
In-Reply-To: <20110820195702.GA39109@icarus.home.lan>
Date: Sat, 20 Aug 2011 16:07:44 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <09FB1664-127D-4835-88C4-BF5CD3A320C1@langille.org>
References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org>
	<20110819232125.GA4965@icarus.home.lan>
	<B6B0AD0F-A74C-4F2C-88B0-101443D7831A@langille.org>
	<20110820032438.GA21925@icarus.home.lan>
	<4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org>
	<20110820195702.GA39109@icarus.home.lan>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
X-Mailer: Apple Mail (2.1084)
Cc: freebsd-stable@freebsd.org
Subject: Re: bad sector in gmirror HDD
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 20:07:47 -0000

On Aug 20, 2011, at 3:57 PM, Jeremy Chadwick wrote:

>>> I still suggest you replace the drive, although given its age I =
doubt
>>> you'll be able to find a suitable replacement.  I tend to keep disks
>>> like this around for testing/experimental purposes and not for =
actual
>>> use.
>>=20
>> I have several unused 80GB HDD I can place into this system.  I think =
that's
>> what I'll wind up doing.  But I'd like to follow this process through =
and get it documented
>> for future reference.
>=20
> Yes, given the behaviour of the drive I would recommend you simply
> replace it at this point in time.  What concerns me the most is
> Current_Pending_Sector incrementing, but it's impossible for me to
> determine if that incrementing means there are other LBAs which are =
bad,
> or if the drive is behaving how its firmware is designed.
>=20
> Keep the drive around for further experiments/tinkering if you're
> interested.  Stuff like this is always interesting/fun as long as your
> data isn't at risk, so doing the replacement first would be best
> (especially if both drives in your mirror were bought at the same time
> from the same place and have similar manufacturing plants/dates on
> them).


I'm happy to send you this drive for your experimentation pleasure.

If so, please email me an address offline.  You don't have a disk with=20=

errors, and it seems you should have one.

After I wipe it.  I'm sure I have a destroyer CD here somewhere....

--=20
Dan Langille - http://langille.org


From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 20:19:19 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E1D2E106566B
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 20:19:19 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta03.westchester.pa.mail.comcast.net
	(qmta03.westchester.pa.mail.comcast.net [76.96.62.32])
	by mx1.freebsd.org (Postfix) with ESMTP id 8D1578FC13
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 20:19:19 +0000 (UTC)
Received: from omta23.westchester.pa.mail.comcast.net ([76.96.62.74])
	by qmta03.westchester.pa.mail.comcast.net with comcast
	id NkCF1h0061c6gX853kKKcB; Sat, 20 Aug 2011 20:19:19 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta23.westchester.pa.mail.comcast.net with comcast
	id NkKE1h0101t3BNj3jkKGCM; Sat, 20 Aug 2011 20:19:17 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 829EA102C1A; Sat, 20 Aug 2011 13:19:13 -0700 (PDT)
Date: Sat, 20 Aug 2011 13:19:13 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Dan Langille <dan@langille.org>
Message-ID: <20110820201913.GA39827@icarus.home.lan>
References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org>
	<20110819232125.GA4965@icarus.home.lan>
	<B6B0AD0F-A74C-4F2C-88B0-101443D7831A@langille.org>
	<20110820032438.GA21925@icarus.home.lan>
	<4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org>
	<20110820195702.GA39109@icarus.home.lan>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20110820195702.GA39109@icarus.home.lan>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-stable@freebsd.org
Subject: Re: bad sector in gmirror HDD
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 20:19:20 -0000

A follow-up given that I just viewed the SMART attribute data at the
very bottom of this page as of this writing (Sat Aug 20 13:00:09 PDT
2011):

http://beta.freebsddiary.org/smart-fixing-bad-sector.php

And I see this:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   020    Pre-fail  Always   -           2
  9 Power_On_Hours          0x0012   059   059   001    Old_age   Always   -           27440
196 Reallocated_Event_Count 0x0010   099   099   020    Old_age   Offline  -           1
197 Current_Pending_Sector  0x0032   100   100   020    Old_age   Always   -           2
198 Offline_Uncorrectable   0x0010   100   253   000    Old_age   Offline  -           0

These attributes USUALLY mean:

1) Reallocated_Sector_Ct   == There are 2 remapped LBAs.
2) Reallocated_Event_Count == There is 1 remapping event which has been
                              noticed (either failure or success).
3) Current_Pending_Sector  == There are 2 LBAs which are suspect.

Now, given my previous statement about this particular model of drive,
Maxtor may have a firmware quirk or other oddities that don't cause
Current_Pending_Sector to drop to 0 or Reallocated_Event_Count to match
reality.  I simply don't know.  But keep reading.

And remember, this is what we started with:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   020    Pre-fail  Always   -           1
  9 Power_On_Hours          0x0012   059   059   001    Old_age   Always   -           27416
196 Reallocated_Event_Count 0x0010   100   100   020    Old_age   Offline  -           0
197 Current_Pending_Sector  0x0032   100   100   020    Old_age   Always   -           1
198 Offline_Uncorrectable   0x0010   100   253   000    Old_age   Offline  -           0

Anyway, in the SMART error log, I see 3 entries (2 new ones since the
last time I saw the web page):

* Error 3 occurred at disk power-on lifetime: 27422 hours (1142 days + 14 hours)
  40 59 18 e8 ef 54 e0  Error: UNC 24 sectors at LBA = 0x0054efe8 = 5566440
* Error 2 occurred at disk power-on lifetime: 27421 hours (1142 days + 13 hours)
  40 59 18 e8 ef 54 e0  Error: UNC 24 sectors at LBA = 0x0054efe8 = 5566440
* Error 1 occurred at disk power-on lifetime: 27400 hours (1141 days + 16 hours)
  40 59 18 e8 ef 54 e0  Error: UNC 24 sectors at LBA = 0x0054efe8 = 5566440

These are all for the same LBA -- 5566440.

"Error 1" was something we already saw on the page the first time.  So
where did the other two come from?  Earlier on the web page I saw these
commands being executed:

sh ./bad_block_scan /dev/ad2 5566400 5566500   <-- will hit bad LBA
sh ./bad_block_scan /dev/ad2 5566000 5566500   <-- will hit bad LBA
sh ./bad_block_scan /dev/ad2 5560000 5566000   <-- will not hit bad LBA
sh ./bad_block_scan /dev/ad2 5560000 5566000   <-- will not hit bad LBA

So there's the explanation for the two newly-added entries in the SMART
error log.  I'm very surprised if bad_block_scan did not echo that it
had encountered read errors on LBA 5566440.  It should have, unless I
left the script in some weird state.  The commands to use to verify
would be:

dd if=/dev/ad2 of=/dev/null bs=512 count=1 skip=5566439
dd if=/dev/ad2 of=/dev/null bs=512 count=1 skip=5566440
dd if=/dev/ad2 of=/dev/null bs=512 count=1 skip=5566441

(I tend to check "around" that LBA area as well, just to make sure,
that's why there's 3 commands with -1 and +1 LBAs).  One of these should
return an I/O error, unless the LBA has been remapped already, in which
case it shouldn't.

Finally, there's this very interesting piece of information in the SMART
self-test log (not selective scan log, but the self-test log; meaning
this was the result of "smartctl -t long /dev/ad2" at some point):

Num  Test_Description    Status                  Remaining     LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%     27416            786767

So it seems this is one of those drives which does do a surface scan on
a long test.

But that's interesting -- LBA 786767.

If that's true, then issuing the same dd commands as above (but with
"skip" changed appropriately) should return an I/O error as well.
Naturally check the SMART error log for verification.

So, it's possible that there are actually two bad LBAs on this drive --
LBA 5566440 and LBA 786767.  I simply don't know about the latter, but
the former is confirmed in the SMART error log.

If either of these LBAs are the ones which Current_Pending_Sector is
referring to, then writes to them should be sufficient to induce
re-analysis.  E.g.:

dd if=/dev/zero of=/dev/ad2 bs=512 count=1 seek=5566440
dd if=/dev/zero of=/dev/ad2 bs=512 count=1 seek=786767

The offsets for seek (not skip!!!) should probably be based on what the
dd reads done earlier would show.  Unless of course what we're seeing is
just a batch of LBAs in a small region that are getting worse the more
they're read from (possible).

No idea if LBA 5566440 and LBA 786767 are anywhere near one another on
the physical media.  I don't have a way to determine that (way too
complex).

That's about all the light I can shed on this for now.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 20:23:41 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DC564106567A;
	Sat, 20 Aug 2011 20:23:41 +0000 (UTC)
	(envelope-from prvs=12137168ef=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	by mx1.freebsd.org (Postfix) with ESMTP id 076718FC2A;
	Sat, 20 Aug 2011 20:23:39 +0000 (UTC)
X-MDAV-Processed: mail1.multiplay.co.uk, Sat, 20 Aug 2011 21:23:06 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Sat, 20 Aug 2011 21:23:06 +0100
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	mail1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST
	shortcircuit=ham autolearn=disabled version=3.2.5
Received: from r2d2 ([188.220.16.49])
	by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	(MDaemon PRO v10.0.4) with ESMTP id md50014676211.msg;
	Sat, 20 Aug 2011 21:23:05 +0100
X-MDRemoteIP: 188.220.16.49
X-Return-Path: prvs=12137168ef=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
Message-ID: <D34B2BC140E14E8A8D450474CCC39CB8@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Andriy Gapon" <avg@FreeBSD.org>
References: eBSD.org><82E865FBA30747078AF6EE3C1701F973@multiplay.co.uk><4E4FE55A.9000101@
	FreeBSD.org> <B367F6CE151B4B6A8B0E048022D7F8AB@multiplay.co.uk>
Date: Sat, 20 Aug 2011 21:24:30 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=response
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
Cc: freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 20:23:42 -0000

----- Original Message ----- 
From: "Steven Hartland" <killing@multiplay.co.uk>
> Looking through the code I believe I may have noticed a scenario which could
> trigger the problem.
> 
> Given the following code:-
> 
> static void
> prison_deref(struct prison *pr, int flags)
> {
>    struct prison *ppr, *tpr;
>    int vfslocked;
> 
>    if (!(flags & PD_LOCKED))
>        mtx_lock(&pr->pr_mtx);
>    /* Decrement the user references in a separate loop. */
>    if (flags & PD_DEUREF) {
>        for (tpr = pr;; tpr = tpr->pr_parent) {
>            if (tpr != pr)
>                mtx_lock(&tpr->pr_mtx);
>            if (--tpr->pr_uref > 0)
>                break;
>            KASSERT(tpr != &prison0, ("prison0 pr_uref=0"));
>            mtx_unlock(&tpr->pr_mtx);
>        }
>        /* Done if there were only user references to remove. */
>        if (!(flags & PD_DEREF)) {
>            mtx_unlock(&tpr->pr_mtx);
>            if (flags & PD_LIST_SLOCKED)
>                sx_sunlock(&allprison_lock);
>            else if (flags & PD_LIST_XLOCKED)
>                sx_xunlock(&allprison_lock);
>            return;
>        }
>        if (tpr != pr) {
>            mtx_unlock(&tpr->pr_mtx);
>            mtx_lock(&pr->pr_mtx);
>        }
>    }
> 
> If you take a scenario of a simple one level prison setup running a single process
> where a prison has just been stopped.
> 
> In the above code pr_uref of the processes prison is decremented. As this is the
> last process then pr_uref will hit 0 and the loop continues instead of breaking
> early.
> 
> Now at the end of the loop iteration the mtx is unlocked so other process can
> now manipulate the jail, this is where I think the problem may be.
> 
> If we now have another process come in and attach to the jail but then instantly
> exit, this process may allow another kernel thread to hit this same bit of code
> and so two process for the same prison get into the section which decrements
> prison0's pr_uref, instead of only one.
> 
> In essence I think we can get the following flow where 1# = process1
> and 2# = process2
> 1#1. prison1.pr_uref = 1 (single process jail)
> 1#2. prison_deref( prison1,...
> 1#3. prison1.pr_uref-- (prison1.pr_uref = 0)
> 1#3. prison1.mtx_unlock <-- this now allows others to change prison1.pr_uref
> 1#3. prison0.pr_uref--
> 2#1. process1.attach( prison1 ) (prison1.pr_uref = 1)
> 2#2. process1.exit
> 2#3. prison_deref( prison1,...
> 2#4. prison1.pr_uref-- (prison1.pr_uref = 0)
> 2#5. prison1.mtx_unlock <-- this now allows others to change prison1.pr_uref
> 2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented twice by prison1)
> 
> It seems like the action on the parent prison to decrement the pr_uref is
> happening too early, while the jail can still be used and without the lock on
> the child jails mtx, so causing a race condition.
> 
> I think the fix is to the move the decrement of parent prison pr_uref's down
> so it only takes place if the jail is "really" being removed. Either that or
> to change the locking semantics so that once the lock is aquired in this
> prison_deref its not unlocked until the function completes.
> 
> What do people think?

After reviewing the changes to prison_deref in commit which added hierarchical
jails, the removal of the lock by the inital loop on the passed in prison may
be unintentional.
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/kern_jail.c.diff?r1=1.101;r2=1.102;f=h

If so the following may be all that's needed to fix this issue:-

diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c
--- sys/kern/kern_jail.c.orig   2011-08-20 21:17:14.856618854 +0100
+++ sys/kern/kern_jail.c        2011-08-20 21:18:35.307201425 +0100
@@ -2455,7 +2455,8 @@
                        if (--tpr->pr_uref > 0)
                                break;
                        KASSERT(tpr != &prison0, ("prison0 pr_uref=0"));
-                       mtx_unlock(&tpr->pr_mtx);
+                       if (tpr != pr)
+                               mtx_unlock(&tpr->pr_mtx);
                }
                /* Done if there were only user references to remove. */
                if (!(flags & PD_DEREF)) {

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 20:34:56 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 42F871065672;
	Sat, 20 Aug 2011 20:34:56 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 6FC888FC0C;
	Sat, 20 Aug 2011 20:34:55 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA14332;
	Sat, 20 Aug 2011 23:34:52 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
	by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1QusFo-000O9N-7B; Sat, 20 Aug 2011 23:34:52 +0300
Message-ID: <4E501A6A.3030801@FreeBSD.org>
Date: Sat, 20 Aug 2011 23:34:50 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:6.0) Gecko/20110819 Thunderbird/6.0
MIME-Version: 1.0
To: Steven Hartland <killing@multiplay.co.uk>
References: eBSD.org><82E865FBA30747078AF6EE3C1701F973@multiplay.co.uk><4E4FE55A.9000101@
	FreeBSD.org> <B367F6CE151B4B6A8B0E048022D7F8AB@multiplay.co.uk>
	<D34B2BC140E14E8A8D450474CCC39CB8@multiplay.co.uk>
In-Reply-To: <D34B2BC140E14E8A8D450474CCC39CB8@multiplay.co.uk>
X-Enigmail-Version: undefined
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 20:34:56 -0000

on 20/08/2011 23:24 Steven Hartland said the following:
> ----- Original Message ----- From: "Steven Hartland" <killing@multiplay.co.uk>
>> Looking through the code I believe I may have noticed a scenario which could
>> trigger the problem.
>>
>> Given the following code:-
>>
>> static void
>> prison_deref(struct prison *pr, int flags)
>> {
>>    struct prison *ppr, *tpr;
>>    int vfslocked;
>>
>>    if (!(flags & PD_LOCKED))
>>        mtx_lock(&pr->pr_mtx);
>>    /* Decrement the user references in a separate loop. */
>>    if (flags & PD_DEUREF) {
>>        for (tpr = pr;; tpr = tpr->pr_parent) {
>>            if (tpr != pr)
>>                mtx_lock(&tpr->pr_mtx);
>>            if (--tpr->pr_uref > 0)
>>                break;
>>            KASSERT(tpr != &prison0, ("prison0 pr_uref=0"));
>>            mtx_unlock(&tpr->pr_mtx);
>>        }
>>        /* Done if there were only user references to remove. */
>>        if (!(flags & PD_DEREF)) {
>>            mtx_unlock(&tpr->pr_mtx);
>>            if (flags & PD_LIST_SLOCKED)
>>                sx_sunlock(&allprison_lock);
>>            else if (flags & PD_LIST_XLOCKED)
>>                sx_xunlock(&allprison_lock);
>>            return;
>>        }
>>        if (tpr != pr) {
>>            mtx_unlock(&tpr->pr_mtx);
>>            mtx_lock(&pr->pr_mtx);
>>        }
>>    }
>>
>> If you take a scenario of a simple one level prison setup running a single
>> process
>> where a prison has just been stopped.
>>
>> In the above code pr_uref of the processes prison is decremented. As this is the
>> last process then pr_uref will hit 0 and the loop continues instead of breaking
>> early.
>>
>> Now at the end of the loop iteration the mtx is unlocked so other process can
>> now manipulate the jail, this is where I think the problem may be.
>>
>> If we now have another process come in and attach to the jail but then instantly
>> exit, this process may allow another kernel thread to hit this same bit of code
>> and so two process for the same prison get into the section which decrements
>> prison0's pr_uref, instead of only one.
>>
>> In essence I think we can get the following flow where 1# = process1
>> and 2# = process2
>> 1#1. prison1.pr_uref = 1 (single process jail)
>> 1#2. prison_deref( prison1,...
>> 1#3. prison1.pr_uref-- (prison1.pr_uref = 0)
>> 1#3. prison1.mtx_unlock <-- this now allows others to change prison1.pr_uref
>> 1#3. prison0.pr_uref--
>> 2#1. process1.attach( prison1 ) (prison1.pr_uref = 1)
>> 2#2. process1.exit
>> 2#3. prison_deref( prison1,...
>> 2#4. prison1.pr_uref-- (prison1.pr_uref = 0)
>> 2#5. prison1.mtx_unlock <-- this now allows others to change prison1.pr_uref
>> 2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented twice by prison1)
>>
>> It seems like the action on the parent prison to decrement the pr_uref is
>> happening too early, while the jail can still be used and without the lock on
>> the child jails mtx, so causing a race condition.
>>
>> I think the fix is to the move the decrement of parent prison pr_uref's down
>> so it only takes place if the jail is "really" being removed. Either that or
>> to change the locking semantics so that once the lock is aquired in this
>> prison_deref its not unlocked until the function completes.
>>
>> What do people think?
> 
> After reviewing the changes to prison_deref in commit which added hierarchical
> jails, the removal of the lock by the inital loop on the passed in prison may
> be unintentional.
> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/kern_jail.c.diff?r1=1.101;r2=1.102;f=h
> 
> 
> If so the following may be all that's needed to fix this issue:-
> 
> diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c
> --- sys/kern/kern_jail.c.orig   2011-08-20 21:17:14.856618854 +0100
> +++ sys/kern/kern_jail.c        2011-08-20 21:18:35.307201425 +0100
> @@ -2455,7 +2455,8 @@
>                        if (--tpr->pr_uref > 0)
>                                break;
>                        KASSERT(tpr != &prison0, ("prison0 pr_uref=0"));
> -                       mtx_unlock(&tpr->pr_mtx);
> +                       if (tpr != pr)
> +                               mtx_unlock(&tpr->pr_mtx);
>                }
>                /* Done if there were only user references to remove. */
>                if (!(flags & PD_DEREF)) {

Not sure if this would fly as is - please double check the later block where
pr->pr_mtx is re-locked.

-- 
Andriy Gapon

From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 20:49:58 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7B68F106566C;
	Sat, 20 Aug 2011 20:49:58 +0000 (UTC)
	(envelope-from prvs=12137168ef=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	by mx1.freebsd.org (Postfix) with ESMTP id 9306A8FC15;
	Sat, 20 Aug 2011 20:49:57 +0000 (UTC)
X-MDAV-Processed: mail1.multiplay.co.uk, Sat, 20 Aug 2011 21:49:23 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Sat, 20 Aug 2011 21:49:23 +0100
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	mail1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST
	shortcircuit=ham autolearn=disabled version=3.2.5
Received: from r2d2 ([188.220.16.49])
	by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	(MDaemon PRO v10.0.4) with ESMTP id md50014676446.msg;
	Sat, 20 Aug 2011 21:49:23 +0100
X-MDRemoteIP: 188.220.16.49
X-Return-Path: prvs=12137168ef=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
Message-ID: <7585E1DAE11E47488CD5A7F038957F4D@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Andriy Gapon" <avg@FreeBSD.org>
References: eBSD.org><82E865FBA30747078AF6EE3C1701F973@multiplay.co.uk><4E4FE55A.9000101@FreeBSD.org>
	<B367F6CE151B4B6A8B0E048022D7F8AB@multiplay.co.uk><D34B2BC140E14E8A8D450474CCC39CB8@multiplay.co.uk>
	<4E501A6A.3030801@FreeBSD.org>
Date: Sat, 20 Aug 2011 21:50:51 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
Cc: freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 20:49:58 -0000

----- Original Message ----- 
From: "Andriy Gapon" <avg@FreeBSD.org>

>> diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c
>> --- sys/kern/kern_jail.c.orig   2011-08-20 21:17:14.856618854 +0100
>> +++ sys/kern/kern_jail.c        2011-08-20 21:18:35.307201425 +0100
>> @@ -2455,7 +2455,8 @@
>>                        if (--tpr->pr_uref > 0)
>>                                break;
>>                        KASSERT(tpr != &prison0, ("prison0 pr_uref=0"));
>> -                       mtx_unlock(&tpr->pr_mtx);
>> +                       if (tpr != pr)
>> +                               mtx_unlock(&tpr->pr_mtx);
>>                }
>>                /* Done if there were only user references to remove. */
>>                if (!(flags & PD_DEREF)) {
> 
> Not sure if this would fly as is - please double check the later block where
> pr->pr_mtx is re-locked.

Will do, I'm now 99.9% sure this is the problem and even better I now have a
reproducible scenario :)

Something else you many be more interested in Andriy:-
I added in debugging options DDB & INVARIANTS to see if I can get a more
useful info and the panic results in a looping panic constantly scrolling up
the console. Not sure if this is a side effect of the patches we've been
trying.

Going to see if I can confirm that, lmk if there's something you want me
to try?

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 21:38:56 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id F0C5A106564A;
	Sat, 20 Aug 2011 21:38:56 +0000 (UTC)
	(envelope-from prvs=12137168ef=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	by mx1.freebsd.org (Postfix) with ESMTP id 1A5868FC0A;
	Sat, 20 Aug 2011 21:38:55 +0000 (UTC)
X-MDAV-Processed: mail1.multiplay.co.uk, Sat, 20 Aug 2011 22:38:21 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Sat, 20 Aug 2011 22:38:21 +0100
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	mail1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST
	shortcircuit=ham autolearn=disabled version=3.2.5
Received: from r2d2 ([188.220.16.49])
	by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	(MDaemon PRO v10.0.4) with ESMTP id md50014676893.msg;
	Sat, 20 Aug 2011 22:38:20 +0100
X-MDRemoteIP: 188.220.16.49
X-Return-Path: prvs=12137168ef=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
Message-ID: <75D250E28B9A424EAF387E07CA223213@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Steven Hartland" <killing@multiplay.co.uk>,
	"Andriy Gapon" <avg@FreeBSD.org>
References: eBSD.org><82E865FBA30747078AF6EE3C1701F973@multiplay.co.uk><4E4FE55A.9000101@FreeBSD.org><B367F6CE151B4B6A8B0E048022D7F8AB@multiplay.co.uk><D34B2BC140E14E8A8D450474CCC39CB8@multiplay.co.uk><4E501A6A.3030801@FreeBSD.org>
	<7585E1DAE11E47488CD5A7F038957F4D@multiplay.co.uk>
Date: Sat, 20 Aug 2011 22:38:49 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=response
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
Cc: freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 21:38:57 -0000


----- Original Message ----- 
From: "Steven Hartland" <killing@multiplay.co.uk>

> Something else you many be more interested in Andriy:-
> I added in debugging options DDB & INVARIANTS to see if I can get a more
> useful info and the panic results in a looping panic constantly scrolling up
> the console. Not sure if this is a side effect of the patches we've been
> trying.
> 
> Going to see if I can confirm that, lmk if there's something you want me
> to try?

Seems the stop_scheduler_on_panic.8.x.patch is the cause of this.

Removing it allows me to drop to ddb when the panic due to the KASSERT
happens.

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.