From owner-freebsd-hackers@FreeBSD.ORG Fri Mar 30 14:03:29 2007 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C2C2E16A402 for ; Fri, 30 Mar 2007 14:03:29 +0000 (UTC) (envelope-from myself@rojer.pp.ru) Received: from wooster.rojer.pp.ru (wooster.rojer.pp.ru [80.68.246.188]) by mx1.freebsd.org (Postfix) with ESMTP id 7C0AF13C480 for ; Fri, 30 Mar 2007 14:03:29 +0000 (UTC) (envelope-from myself@rojer.pp.ru) Received: from wooster.rojer.pp.ru (localhost [127.0.0.1]) by wooster.rojer.pp.ru (Postfix) with ESMTP id 3A916114FD for ; Fri, 30 Mar 2007 17:42:36 +0400 (MSD) X-Spam-Checker-Version: SpamAssassin 3.1.7-rojer (2006-10-05) on wooster.rojer.pp.ru X-Spam-Level: X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.1.7-rojer Received: from [IPv6:::1] (localhost [127.0.0.1]) by wooster.rojer.pp.ru (Postfix) with ESMTP for ; Fri, 30 Mar 2007 17:42:30 +0400 (MSD) Message-ID: <460D13B0.5070500@rojer.pp.ru> Date: Fri, 30 Mar 2007 14:42:08 +0100 From: Deomid Ryabkov User-Agent: Thunderbird 1.5.0.10 (X11/20070313) MIME-Version: 1.0 To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=KOI8-R; format=flowed Content-Transfer-Encoding: 7bit Subject: 6.2: reproducible hang on amd64, traced to 24h of commits X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Mar 2007 14:03:29 -0000 ok, now that the machine has been up for 10 days, i am reasonably sure i've close enough to this one. back in january i cvsupped to -STABLE and the box (dual head opteron box) started hanging. and i mean it dies completely. i have all debug options and a working serial console, but still it just dies and both serial and system console are unresponsive. no panic message on either, nothing. pretty sad. the kernel config is vanilla SMP GENERIC, with all debug options i could think of enabled (after it started hanging). so the first thing i did after rebooting the box a couple of times is fall back to kernel.old (6.1-STABLE circa august '06). no hangs. i then started incrementally updating, gradually getting closer to jan 22. long story short, i seem to have isolated the problem to commits made between date=2006.12.28.00.00.00 and date=2006.12.29.00.00.00. last hang i had was when running the 12/29 kernel, now it's 12/28 and the box has been up for 2 weeks already. based on previois experience i'm pretty certain that this is it. with bad kernel the box would never stay up more than a few days, never more than 5. between 12/28 and 12/29 i see some changes to /sys/amd64/ and /sys/pci/, which might've be the cause. i will probably start looking into individual changes, but if anyone more experienced than me could take a look, it'd be appreciated. i am willing to try patches. i confirmed that recent (as of 3 weeks or so) -STABLE still has this problem. thanks in advance. ==== files under /sys that were changed between 12/28 and 12/29: Edit src/sys/amd64/amd64/mptable_pci.c Edit src/sys/amd64/pci/pci_bus.c Edit src/sys/contrib/dev/ath/public/wackelf.c Edit src/sys/dev/acpica/acpi_pci.c Edit src/sys/dev/acpica/acpi_pcib_acpi.c Edit src/sys/dev/acpica/acpi_pcib_pci.c Checkout src/sys/dev/ath/if_ath.c Edit src/sys/dev/cardbus/cardbus.c Edit src/sys/dev/drm/drm_agpsupport.c Edit src/sys/dev/pci/pci.c Edit src/sys/dev/pci/pci_if.m Edit src/sys/dev/pci/pci_pci.c Edit src/sys/dev/pci/pci_private.h Edit src/sys/dev/pci/pcib_private.h Edit src/sys/dev/pci/pcivar.h Edit src/sys/i386/i386/mptable_pci.c Edit src/sys/i386/pci/pci_bus.c Edit src/sys/kern/subr_bus.c Checkout src/sys/netgraph/ng_deflate.h Edit src/sys/pci/agp.c Edit src/sys/pci/agpreg.h Edit src/sys/powerpc/ofw/ofw_pcib_pci.c Edit src/sys/sparc64/pci/apb.c Edit src/sys/sparc64/pci/ofw_pcib.c Edit src/sys/sparc64/pci/ofw_pcibus.c Edit src/sys/sys/param.h ==== kernel configuration used: include GENERIC options SMP options KDB options DDB makeoptions DEBUG=-g options INVARIANTS options INVARIANT_SUPPORT options WITNESS options DEBUG_LOCKS options DEBUG_VFS_LOCKS options DIAGNOSTIC ==== -- Deomid Ryabkov aka Rojer myself@rojer.pp.ru rojer@sysadmins.ru ICQ: 8025844