From owner-freebsd-questions@FreeBSD.ORG Mon Dec 1 19:31:29 2008 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C6B421065673 for ; Mon, 1 Dec 2008 19:31:29 +0000 (UTC) (envelope-from fbsd.questions@rachie.is-a-geek.net) Received: from mail.rachie.is-a-geek.net (rachie.is-a-geek.net [66.230.99.27]) by mx1.freebsd.org (Postfix) with ESMTP id 8B0688FC12 for ; Mon, 1 Dec 2008 19:31:29 +0000 (UTC) (envelope-from fbsd.questions@rachie.is-a-geek.net) Received: from localhost (mail.rachie.is-a-geek.net [192.168.2.101]) by mail.rachie.is-a-geek.net (Postfix) with ESMTP id 2A86CAFC1C6; Mon, 1 Dec 2008 10:31:28 -0900 (AKST) From: Mel To: freebsd-questions@freebsd.org Date: Mon, 1 Dec 2008 20:31:24 +0100 User-Agent: KMail/1.9.7 References: <20081201101311.C81770@pop.citytel.net> In-Reply-To: <20081201101311.C81770@pop.citytel.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200812012031.25424.fbsd.questions@rachie.is-a-geek.net> Cc: Keith Subject: Re: Page Fault. X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Dec 2008 19:31:29 -0000 On Monday 01 December 2008 19:32:59 Keith wrote: > Have a machine, Dell dual CPU/quad core Xeon. Runs FBSD 6.2. > Custom kernel, with IPFW compiled in and using SMP. > > FreeBSD 6.2-RELEASE FreeBSD 6.2-RELEASE #1: Wed Jan 23 > 12:17:29 PST 2008 > > It runs, Dovecot, Postfix, Mysql, Apache. Standard email stuff. Put into > production in March, ran perfect until July 29th when it rebooted by > itself. > > It rebooted 2 more times in the last few months on its own. But in the > last 6 weeks it has become a weekly occurance, with uptime no more than > 6-7 days at most. > > The last 2 times I have cores and have run kgdb on them. Both vmcore's > show the same things. Same pointers etc, the only difference is what the > cpuid was at the time. > > ====== > kernel trap 12 with interrupts disabled > > Fatal trap 12: page fault while in kernel mode > cpuid = 2; apic id = 02 > fault virtual address = 0x104 > fault code = supervisor read, page not present > instruction pointer = 0x20:0xc066ca51 > stack pointer = 0x28:0xe6ec0c90 > frame pointer = 0x28:0xe6ec0c9c > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = resume, IOPL = 0 > current process = 9 (thread taskq) > trap number = 12 > panic: page fault > cpuid = 2 > Uptime: 6d6h23m45s > Dumping 3327 MB (2 chunks) > chunk 0: 1MB (159 pages) ... ok > chunk 1: 3327MB (851624 pages) 3311 3295 3279 3263 3247 3231 3215 3199 > 3183 3167 3151 3135 3119 3103 3087 3071 3055 3039 3023 3007 2991 2975 2959 > 2943 2927 2911 2895 2879 2863 2847 2831 2815 2799 2783 2767 2751 2735 2719 > 2703 2687 2671 2655 2639 2623 2607 2591 2575 2559 2543 2527 2511 2495 2479 > 2463 2447 2431 2415 2399 2383 2367 2351 2335 2319 2303 2287 2271 2255 2239 > 2223 2207 2191 2175 2159 2143 2127 2111 2095 2079 2063 2047 2031 2015 1999 > 1983 1967 1951 1935 1919 1903 1887 1871 1855 1839 1823 1807 1791 1775 1759 > 1743 1727 1711 1695 1679 1663 1647 1631 1615 1599 1583 1567 1551 1535 1519 > 1503 1487 1471 1455 1439 1423 1407 1391 1375 1359 1343 1327 1311 1295 1279 > 1263 1247 1231 1215 1199 1183 1167 1151 1135 1119 1103 1087 1071 1055 1039 > 1023 1007 991 975 959 943 927 911 895 879 863 847 831 815 799 783 767 751 > 735 719 703 687 671 655 639 623 607 591 575 559 543 527 511 495 479 463 > 447 431 415 399 383 367 351 335 319 303 287 271 255 239 223 207 191 175 > 159 143 127 111 95 79 63 47 31 15 > > #0 doadump () at pcpu.h:165 > 165 __asm __volatile("movl %%fs:0,%0" : "=r" (td)); > ============ > > What might be the cause for this? It is the in the same place every time. > Once the machine hung and had to be powercycled. But on the screen was the > same page fault error on the same process. > frame 0 useless. You need the frame after calltrap(). And: > instruction pointer = 0x20:0xc066ca51 list *0xc066ca51 Generally a bt will show the needed information. Likely cause: file system corruption, caused by background_fsck, but a backtrace should show more. -- Mel Problem with today's modular software: they start with the modules and never get to the software part.