From owner-freebsd-stable@FreeBSD.ORG Fri Aug 11 23:37:59 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BD3D916A4DD for ; Fri, 11 Aug 2006 23:37:59 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4FE2F43D45 for ; Fri, 11 Aug 2006 23:37:58 +0000 (GMT) (envelope-from scottl@samsco.org) Received: from [10.10.3.185] ([165.236.175.187]) (authenticated bits=0) by pooker.samsco.org (8.13.4/8.13.4) with ESMTP id k7BNbocF005023; Fri, 11 Aug 2006 17:37:56 -0600 (MDT) (envelope-from scottl@samsco.org) Message-ID: <44DD14C8.8080208@samsco.org> Date: Fri, 11 Aug 2006 17:37:44 -0600 From: Scott Long User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.12) Gecko/20060206 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Pavel Merdin References: <292315388.20060810140751@fotki.com> In-Reply-To: <292315388.20060810140751@fotki.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=1.5 required=3.8 tests=SPF_SOFTFAIL autolearn=no version=3.1.1 X-Spam-Level: * X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on pooker.samsco.org Cc: freebsd-stable@freebsd.org Subject: Re: 6-stable locking problem X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Aug 2006 23:37:59 -0000 Darn, I thought that that was already fixed. I'll go dig up my patches and take care of this. Scott Pavel Merdin wrote: > Hello. > > There's a problem with a very busy server (ad server, CPU is close to > 0% idle most of the time). > Configuration: Dual AMD Opteron 252 2.6GHz > Chipset: AMD 8131 > Integrated LAN Controller: Broadcom BCM5704 dual-channel GbE Gigabit > Adaptec AIC-7902W Ultra 320 SCSI controller > amr0: > > We tried both 6.1-RELEASE and 6-STABLE amd64 kernels. (bge driver is > always from recent stable with full Broadcom support). > > The server hangs one or more times a day. It even hangs for some time > right after boot sequence finishes (when "login:" prompt occurs). > During a hang everything stops, even keyboard (interrupts). > > We already removed PREEMPTION and linux support. > Sometimes the server can panic with: > Sleeping thread (tid 100006, pid 4) owns a non-sleepable lock > panic: sleeping thread > cpuid=0 > KDB: enter: panic > and hangs there without even starting a debugger. > pid 4 seems to be [g_down] > > Today I compiled a kernel with INVARIANTS and WITTNESS. > Right after booting sequence I got the following: > > Aug 10 04:37:09 ad1 kernel: lock order reversal: (Giant after non-sleepable) > Aug 10 04:37:09 ad1 kernel: 1st 0xffffff026c4ebe70 AMR List Lock (AMR List Lock) @ dev/amr/amr.c:403 > Aug 10 04:37:09 ad1 kernel: 2nd 0xffffffff8073adc0 Giant (Giant) @ vm/vm_contig.c:579 > Aug 10 04:37:09 ad1 kernel: KDB: stack backtrace: > Aug 10 04:37:09 ad1 kernel: kdb_backtrace() at kdb_backtrace+0x37 > Aug 10 04:37:09 ad1 kernel: witness_checkorder() at witness_checkorder+0x6fb > Aug 10 04:37:09 ad1 kernel: _mtx_lock_flags() at _mtx_lock_flags+0x9a > Aug 10 04:37:09 ad1 kernel: contigmalloc() at contigmalloc+0x57 > Aug 10 04:37:09 ad1 kernel: alloc_bounce_pages() at alloc_bounce_pages+0x75 > Aug 10 04:37:09 ad1 kernel: bus_dmamap_create() at bus_dmamap_create+0x149 > Aug 10 04:37:09 ad1 kernel: amr_alloccmd_cluster() at amr_alloccmd_cluster+0x102 > Aug 10 04:37:09 ad1 kernel: amr_alloccmd() at amr_alloccmd+0x55 > Aug 10 04:37:09 ad1 kernel: amr_bio_command() at amr_bio_command+0x27 > Aug 10 04:37:09 ad1 kernel: amr_startio() at amr_startio+0x6a > Aug 10 04:37:09 ad1 kernel: amr_submit_bio() at amr_submit_bio+0x51 > Aug 10 04:37:09 ad1 kernel: amrd_strategy() at amrd_strategy+0x23 > Aug 10 04:37:09 ad1 kernel: g_disk_start() at g_disk_start+0x17d > Aug 10 04:37:09 ad1 kernel: g_io_schedule_down() at g_io_schedule_down+0x189 > Aug 10 04:37:09 ad1 kernel: g_down_procbody() at g_down_procbody+0x80 > Aug 10 04:37:09 ad1 kernel: fork_exit() at fork_exit+0xdf > Aug 10 04:37:09 ad1 kernel: fork_trampoline() at fork_trampoline+0xe > Aug 10 04:37:09 ad1 kernel: --- trap 0, rip = 0, rsp = 0xffffffffb8e8bd00, rbp = 0 --- > > Any advice (except suggestion of switching to Linux) ? >