From owner-freebsd-current@FreeBSD.ORG Wed Aug 4 20:34:57 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4951F16A4CE for ; Wed, 4 Aug 2004 20:34:57 +0000 (GMT) Received: from mail.parodius.com (mail.parodius.com [64.62.145.229]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0247843D55 for ; Wed, 4 Aug 2004 20:34:57 +0000 (GMT) (envelope-from jdc@pentarou.parodius.com) Received: from pentarou.parodius.com (jdc@localhost [127.0.0.1]) by mail.parodius.com (8.12.11/8.12.11) with ESMTP id i74KYuEB046716 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Wed, 4 Aug 2004 13:34:56 -0700 (PDT) (envelope-from jdc@pentarou.parodius.com) Received: (from jdc@localhost) by pentarou.parodius.com (8.12.11/8.12.11/Submit) id i74KYur5046715 for current@freebsd.org; Wed, 4 Aug 2004 13:34:56 -0700 (PDT) (envelope-from jdc) Date: Wed, 4 Aug 2004 13:34:56 -0700 From: Jeremy Chadwick To: current@freebsd.org Message-ID: <20040804203456.GA46377@parodius.com> Mail-Followup-To: current@freebsd.org References: <1091649533.29481.42.camel@lanshark.dmv.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1091649533.29481.42.camel@lanshark.dmv.com> User-Agent: Mutt/1.5.6i Subject: Re: Postgresql locks up server - no response at all X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Aug 2004 20:34:57 -0000 I've seen this with our SuperMicro SuperServer 5013C-T, running mysqld. Please note that the server is "heavily loaded" (note the quotes); usually a load of around 0.50 to 1.00 at all times, with mysqld being the top process. Server runs all latest -CURRENT builds. Many people over in freebsd-threads mentioned this problem, and recommended all sorts-of different workarounds. I tried every one available to me, except mucking with PREEMPTION (as I did not feel comfortable tinkering with a random .h file on the box; seemed to be a kernel-related thing, so I'd rather have just an "options" line for it -- I'm conditionally lazy). The locks are exactly as you describe: random, hard-locks. No KDB/DDB/GDB. Just hard-locks with nothing in logs anywhere. There's been (very recent) discussion here about lock-up problems seeming load-related. This is starting to sound very probable for a lot of reasons. Here's a list of all the combinations of things I've tried to *no avail*. The solution for us was to move mysqld to a 4.x machine. Since then, the -CURRENT box has managed to stay up for 3.5 days without any trouble: ===== SuperMicro SuperServer 5013C-T P4, 2.6GHz (for HTT settings, see below) 1GB ECC DDR400 For many months this machine worked fine under heavy load, SMP enabled, ACPI enabled, APIC enabled. Sometime in early-to-mid July things became unstable; I update my kernel/world every 1-2 weeks. The only other difference between "then and now" is that the box runs MySQL (mysqld) 4.0.20; mysqld is not very heavily loaded (at least in comparison to some other posters' systems I've seen...) System can usually stay up about 48-72 hours before dying. Initial configuration * KERNEL: SCHED_ULE * KERNEL: Disabled INVARIANT* and WITNESS* * KERNEL: SMP enabled, APIC enabled * BIOS: HTT enabled, APIC enabled, ACPI enabled * /etc/make.conf has CPUTYPE=p4 (seems to be required for mysqld to work, else sig11) Now the problems begin. Here are my attempted changes... * KERNEL: SCHED_4BSD --> SCHED_ULE KERNEL: Enabled KDB and DDB !! Random locks. * KERNEL: Enabled INVARIANT* and WITNESS* !! Random locks. * LOADER: Temporary ACPI disable (via loader(8) only; BIOS still has ACPI enabled). Kernel panic: pci0: on pcib0 panic: Multiple entries for PCI IRQ 18 cpuid = 0; KDB: enter: panic [thread 0] Stopped at kdb_enter+0x30: movl %ebp,%esp * BIOS: MPS 1.4 --> 1.1 No idea if this worked, because we did the following after reading freebsd-threads: * BIOS: Disabled HTT BIOS: MPS 1.1 --> 1.4 KERNEL: SCHED_ULE --> SCHED_4BSD KERNEL: Disabled INVARIANT* and WITNESS* !! Random locks. Thu Jul 29 04:16 PDT * BIOS: Disabled APIC KERNEL: Disabled SMP, disabled APIC KERNEL: Enabled INVARIANT* and WITNESS* NOTE: Because of the latest gcc 3.4 import, I was forced to rebuild world too. NOTE: Prior to now, world was build WITHOUT CPUTYPE=p4. If this matters at all... !! Random locks. Sat Jul 31 13:08 PDT * MYSQL: Recompiled 4.0.20 with WITH_PROC_SCOPE_PTH=yes. MYSQL: The 4.0.20 rebuild obviously now included CPUTYPE=p4. !! Random locks. Sun Aug 1 03:01:09 PDT 2004 * Ended up moving mysql server portion to a 4.x box, in attempt to see if the 5.x box still hard-locks without mysqld. Wed Aug 4 13:28:35 PDT 2004 * -CURRENT box is still alive and well. ===== Since our situation has shown that even a pure single CPU (i.e. no HTT and no SMP in the kernel) has exhibited lock-ups, as mentioned, I'm starting to think high load causes it. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. | On Wed, Aug 04, 2004 at 03:58:53PM -0400, Sven Willenberger wrote: > FreeBSD 5.2.1-P8 running on dual Xeon supermicro system with vinum data > drive and em network interfaces. I have been having a problem with the > system simply locking up every couple days. No response from the > keyboard, network, nothing. As if it is in some state of IRQ locking. I > see nothing in the messages, even with DDB and DDB_UNATTENDED enabled in > kernel. The system runs 4GB of ram with the following modifications to > kernel: > > cpu I486_CPU > cpu I586_CPU > cpu I686_CPU > > options SHMMAXPGS=65536 # ******************** > options SEMMNI=40 # added for posgresql > options SEMMNS=240 # allows for around > options SEMUME=40 # 180 simultaneous connections > options SEMMNU=120 # ******************** > > # Debugging for use in -current > options DDB #Enable the kernel debugger > options DDB_UNATTENDED #Don't panic on DDB but log it > #options INVARIANTS #Enable calls of extra sanity > checking > options INVARIANT_SUPPORT #Extra sanity checks of internal > #options WITNESS #Enable checks to detect dead .. > #options WITNESS_SKIPSPIN #Don't run witness on spinlocks > # Deal with kmem issues > options VM_KMEM_SIZE_SCALE="4" > options VM_KMEM_SIZE_MAX="(512*1024*1024)" > options KVA_PAGES=512 > > > /boot/loader.conf: > vinum_load="YES" > vinum.autostart="YES" > #kern.maxdsiz="1073741824" > #kern.dfldsiz="1073741824" > > I had experimented in loader.conf with the dsiz settings to no avail, > still get lockups. Got lockups with and without the DDB settings. It > would be helpful if I could see some type of error being generated, but > nothing; the attached terminal has utterly no messages beyond normal > system messages, everything just stops responding. > > After the last lockup and reboot, I sysctl machdep.hlt_logical_cpus=1 to > see if that had any effect. Any other recommendations? adaptive_mutexes? > Any ideas on how to actually find out what is happening? > > Sven > > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"