From owner-freebsd-stable@FreeBSD.ORG Thu Oct 12 15:18:10 2006 Return-Path: X-Original-To: freebsd-stable@FreeBSD.org Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6FB7616A407 for ; Thu, 12 Oct 2006 15:18:10 +0000 (UTC) (envelope-from enatiello@broadviewnet.net) Received: from unix29.broadviewnet.net (smtp-01.broadviewnet.net [64.115.0.67]) by mx1.FreeBSD.org (Postfix) with SMTP id 01DDB43D72 for ; Thu, 12 Oct 2006 15:18:05 +0000 (GMT) (envelope-from enatiello@broadviewnet.net) Received: (qmail 22997 invoked by uid 32008); 12 Oct 2006 11:19:29 -0400 Received: from unknown (HELO enatiello-01.broadviewnet.net) (64.115.0.249) by unix29.broadviewnet.net with SMTP; 12 Oct 2006 11:19:29 -0400 From: Ernest Natiello To: Gleb Smirnoff In-Reply-To: <20061012101525.GM59833@cell.sick.ru> References: <20061012091309.GK59833@FreeBSD.org> <20061012101525.GM59833@cell.sick.ru> Content-Type: text/plain Date: Thu, 12 Oct 2006 11:18:03 -0400 Message-Id: <1160666283.5159.22.camel@localhost> Mime-Version: 1.0 X-Mailer: Evolution 2.6.1 Content-Transfer-Encoding: 7bit Cc: freebsd-stable@FreeBSD.org Subject: Re: freebsd panic on HP Proliant DL360 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Oct 2006 15:18:10 -0000 Hello, Thank you very much for all of the help. I am trying to understand this issue, as it has been plaguing me for quite some time. So, extrapolating from the below kgdb output, am I to assume that the process causing the error is tcpserver? And should I further infer that tcpserver would cause this issue on all instances of FreeBSD RELENG_6, regardless of hardware? I have three other servers HP Proliant DL380s (2u) which are operating in a _similar_ capacity, (incoming vs. outgoing mailservers) running the exact same software, which have never had a problem. These three servers are running: FreeBSD unix29 6.1-PRERELEASE FreeBSD 6.1-PRERELEASE #0: Mon Mar 27 10:42:56 EST 2006 root@unix34.broadviewnet.net:/usr/obj/usr/src/sys/UNIX34 i386 The operating system on this machine was rsync'd from one of the servers that is having the panic issue, yet it continues to operate flawlessly. I guess I could try swapping the services between two of the servers and see if the behavior follows the move. Does that sound viable? Thank you very much, Ernest Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x104 fault code = supervisor read, page not present instruction pointer = 0x20:0xc0679cd1 stack pointer = 0x28:0xe9226af0 frame pointer = 0x28:0xe9226afc code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = resume, IOPL = 0 current process = 71782 (tcpserver) trap number = 12 panic: page fault cpuid = 0 Uptime: 1d7h12m9s Dumping 2047 MB (2 chunks) chunk 0: 1MB (159 pages) ... ok chunk 1: 2047MB (524026 pages) 2032 2016 2000 1984 1968 1952 1936 1920 1904 1888 1872 1856 1840 1824 1808 1792 1776 1760 1744 1728 1712 1696 1680 1664 1648 1632 1616 1600 1584 1568 1552 1536 1520 1504 1488 1472 1456 1440 1424 1408 1392 1376 1360 1344 1328 1312 1296 1280 1264 1248 1232 1216 1200 1184 1168 1152 1136 1120 1104 1088 1072 1056 1040 1024 1008 992 976 960 944 928 912 896 880 864 848 832 816 800 784 768 752 736 720 704 688 672 656 640 624 608 592 576 560 544 528 512 496 480 464 448 432 416 400 384 368 352 336 320 304 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16 #0 doadump () at pcpu.h:165 165 __asm __volatile("movl %%fs:0,%0" : "=r" (td)); On Thu, 2006-10-12 at 14:15 +0400, Gleb Smirnoff wrote: > On Thu, Oct 12, 2006 at 11:03:36AM +0100, Pete French wrote: > P> > This is a known problem. It is fixed in HEAD, but unfortunately it > P> > isn't mergeable to RELENG_6. The problem isn't related to either pf, > P> > ipf or NIC drivers. > P> > P> This is a little alarming - because what you seem to be saying is that > P> if you have DL360's then you need to either run current, or accept that > P> they will panic every so often for as long as you are running RELENG_6. > P> We are looking to change our hardware soon, and DL360's were top of the > P> list for replacements! > > Again, this has nothing to do with hardware. It is general problem in RELENG_6. > > P> Is there a PR reference for this describing the solution to the problem > P> in HEAD somewhere that I could take a look at ? > > The problem wasn't fixed with a single commit. Maybe Robert, who is carbon > copied, can provide more details. >