From owner-freebsd-questions@FreeBSD.ORG Thu Apr 5 11:48:32 2007 Return-Path: X-Original-To: freebsd-questions@freebsd.org Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C38BA16A401 for ; Thu, 5 Apr 2007 11:48:32 +0000 (UTC) (envelope-from viper@fx-services.com) Received: from mamata.fx-services.com (mamata.fx-services.com [217.25.36.100]) by mx1.freebsd.org (Postfix) with ESMTP id 64E9613C44C for ; Thu, 5 Apr 2007 11:48:32 +0000 (UTC) (envelope-from viper@fx-services.com) Received: from cpanel by mamata.fx-services.com with local (Exim 4.66 (FreeBSD)) (envelope-from ) id 1HZQ3Y-0005lZ-05 for freebsd-questions@freebsd.org; Thu, 05 Apr 2007 13:23:08 +0200 Received: from 127.0.0.1 ([127.0.0.1]) by www.fxs.se (Horde MIME library) with HTTP; Thu, 05 Apr 2007 13:23:07 +0200 Message-ID: <20070405132307.r71n36m3uogscc0s@www.fxs.se> Date: Thu, 05 Apr 2007 13:23:07 +0200 From: viper@fx-services.com To: freebsd-questions@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable User-Agent: Internet Messaging Program (IMP) H3 (4.1.3) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - mamata.fx-services.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [1003 1005] / [26 6] X-AntiAbuse: Sender Address Domain - fx-services.com X-Source: X-Source-Args: X-Source-Dir: Subject: Random reboots 5.4 with CPanel X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Apr 2007 11:48:32 -0000 Hi guys, For a couple of years already I've been trying to find out why our =20 hosting machine reboots randomly. I posted some stuff to this list =20 too. Got some tips, mostly about hardware. What happens is that both =20 the main server and the backup server (which is just idling) just =20 reboot. Sometimes after 60 days, sometimes after one day. No logs, no =20 strange traffic patterns, nothing. I enabled kernel debugging. Caught =20 a crashdump on our backup machine which I will post below. The process =20 that crashes is the CPU monitor for Cpanel. I disabled that one, so it =20 crashed on any other process (httpd, perl, etc). I tried disabling =20 ACPI, rebuild world with just -O in make.conf, etc etc. This morning =20 the main server rebooted again, it didn't even leave a dump in =20 /var/crash. Hardware is not the same. This behavious I've seen on dual =20 athlons (two different mainboards) and dual Xeons. It seems related to =20 SMP code. Played around with idle and hyperthreading settings in =20 sysctl too. Nothing seems to make any difference at all. The crashump =20 is below, does anyone have ANY idea what might cause this? I think it has to be the cpanel hosting panel, but such an application =20 shouldn't be able to to crash the OS... Fatal trap 12: page fault while in kernel mode cpuid =3D 0; apic id =3D 01 fault virtual address =3D 0x98 fault code =3D supervisor write, page not present instruction pointer =3D 0x20:0xc06b7f1e stack pointer =3D 0x28:0xece5f730 frame pointer =3D 0x28:0xece5f774 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, def32 1, gran 1 processor eflags =3D interrupt enabled, resume, IOPL =3D 0 current process =3D 69885 (dcpumon) trap number =3D 12 panic: page fault cpuid =3D 0 Uptime: 2d22h1m13s Dumping 2047 MB (2 chunks) chunk 0: 1MB (159 pages) ... ok chunk 1: 2047MB (523904 pages) 2031 2015 1999 1983 1967 1951 1935 =20 1919 1903 1887 1871 1855 1839 1823 1807 1791 1775 1759 1743 1727 1711 =20 1695 1679 1663 1647 1631 1615 1599 1583 1567 1551 1535 1519 1503 1487 =20 1471 1455 1439 1423 1407 1391 1375 1359 1343 1327 1311 1295 1279 1263 =20 1247 1231 1215 1199 1183 1167 1151 1135 1119 1103 1087 1071 1055 1039 =20 1023 1007 991 975 959 943 927 911 895 879 863 847 831 815 799 783 767 =20 751 735 719 703 687 671 655 639 623 607 591 575 559 543 527 511 495 =20 479 463 447 431 415 399 383 367 351 335 319 303 287 271 255 239 223 =20 207 191 175 159 143 127 111 95 79 63 47 31 15 #0 doadump () at pcpu.h:165 165 __asm __volatile("movl %%fs:0,%0" : "=3Dr" (td)); (kgdb) backtrace #0 doadump () at pcpu.h:165 #1 0xc063efca in boot (howto=3D260) at /usr/src/sys/kern/kern_shutdown.c:39= 9 #2 0xc063f396 in panic (fmt=3D0xc0870bd4 "%s") at =20 /usr/src/sys/kern/kern_shutdown.c:555 #3 0xc082e16c in trap_fatal (frame=3D0xece5f6f0, eva=3D0) at =20 /usr/src/sys/i386/i386/trap.c:831 #4 0xc082de52 in trap_pfault (frame=3D0xece5f6f0, usermode=3D0, eva=3D152) = =20 at /usr/src/sys/i386/i386/trap.c:742 #5 0xc082da02 in trap (frame=3D {tf_fs =3D 8, tf_es =3D 40, tf_ds =3D 40, tf_edi =3D 4, tf_esi =3D 0,= =20 tf_ebp =3D -320473228, tf_isp =3D -320473316, tf_ebx =3D 4098, tf_edx =3D = =20 -1002850048, tf_ecx =3D 0, tf_eax =3D 4, tf_trapno =3D 12, tf_err =3D 2, =20 tf_eip =3D -1066696930, tf_cs =3D 32, tf_eflags =3D 66118, tf_esp =3D =20 -320473100, tf_ss =3D 1017}) at /usr/src/sys/i386/i386/trap.c:432 #6 0xc0817d0a in calltrap () at /usr/src/sys/i386/i386/exception.s:139 #7 0xc06b7f1e in vn_lock (vp=3D0x0, flags=3D4098, td=3D0xc439b900) at atomi= c.h:149 #8 0xc05eee46 in procfs_doprocfile (td=3D0xc439b900, p=3D0xc9068830, =20 pn=3D0xc35f3900, sb=3D0x4, uio=3D0x0) at /usr/src/sys/fs/procfs/procfs.c:73 #9 0xc05f3f5b in pfs_readlink (va=3D0x4) at pcpu.h:162 #10 0xc0841a13 in VOP_READLINK_APV (vop=3D0x4, a=3D0xc439b900) at vnode_if.c= :1481 #11 0xc06b14e3 in kern_readlink (td=3D0xc439b900, path=3D0xc439b900 =20 ", bufseg=3D4, count=3D1024) at vnode_if.h:772 #12 0xc06b13e8 in readlink (td=3D0x4, uap=3D0xc439b900) at =20 /usr/src/sys/kern/vfs_syscalls.c:2261 #13 0xc082e573 in syscall (frame=3D {tf_fs =3D 59, tf_es =3D 59, tf_ds =3D 59, tf_edi =3D 135512892, tf_e= si =20 =3D 135663632, tf_ebp =3D -1077940936, tf_isp =3D -320471708, tf_ebx =3D =20 674109588, tf_edx =3D -1077941960, tf_ecx =3D 0, tf_eax =3D 58, tf_trapno = =3D =20 0, tf_err =3D 2, tf_eip =3D 672579140, tf_cs =3D 51, tf_eflags =3D 647, tf_e= sp =20 =3D -1077942020, tf_ss =3D 59}) at /usr/src/sys/i386/i386/trap.c:976 #14 0xc0817d5f in Xint0x80_syscall () at =20 /usr/src/sys/i386/i386/exception.s:200 #15 0x00000033 in ?? () Previous frame inner to this frame (corrupt stack?) /Robin