From owner-freebsd-ia64@FreeBSD.ORG Mon Aug 2 20:30:08 2010 Return-Path: Delivered-To: freebsd-ia64@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E953C1065679 for ; Mon, 2 Aug 2010 20:30:08 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id A90F38FC1A for ; Mon, 2 Aug 2010 20:30:08 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o72KU8rd087794 for ; Mon, 2 Aug 2010 20:30:08 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o72KU8Zj087791; Mon, 2 Aug 2010 20:30:08 GMT (envelope-from gnats) Resent-Date: Mon, 2 Aug 2010 20:30:08 GMT Resent-Message-Id: <201008022030.o72KU8Zj087791@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-ia64@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Karl Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ADC631065676 for ; Mon, 2 Aug 2010 20:20:09 +0000 (UTC) (envelope-from karl@tickerforum.org) Received: from tickerforum.org (tickerforum.org [67.23.181.70]) by mx1.freebsd.org (Postfix) with ESMTP id 774AD8FC08 for ; Mon, 2 Aug 2010 20:20:09 +0000 (UTC) Received: from tickerforum.org (localhost [127.0.0.1]) by tickerforum.org (8.14.3/8.14.3) with ESMTP id o72JkwM5057403 for ; Mon, 2 Aug 2010 15:46:58 -0400 (EDT) (envelope-from karl@tickerforum.org) Received: (from root@localhost) by tickerforum.org (8.14.3/8.14.3/Submit) id o72JkwdW057402; Mon, 2 Aug 2010 15:46:58 -0400 (EDT) (envelope-from karl) Message-Id: <201008021946.o72JkwdW057402@tickerforum.org> Date: Mon, 2 Aug 2010 15:46:58 -0400 (EDT) From: Karl To: FreeBSD-gnats-submit@FreeBSD.org X-Send-Pr-Version: 3.113 Cc: Subject: ia64/149208: mksnap_ffs hang/deadlock X-BeenThere: freebsd-ia64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Karl List-Id: Porting FreeBSD to the IA-64 List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Aug 2010 20:30:09 -0000 >Number: 149208 >Category: ia64 >Synopsis: mksnap_ffs hang/deadlock >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-ia64 >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Mon Aug 02 20:30:08 UTC 2010 >Closed-Date: >Last-Modified: >Originator: Karl >Release: FreeBSD 8.0-STABLE amd64 >Organization: Cuda Systems LLC >Environment: System: FreeBSD tickerforum.org 8.0-STABLE FreeBSD 8.0-STABLE #3: Fri Dec 18 17:41:53 EST 2009 karl@tickerforum.org:/usr/obj/usr/src/sys/GENERIC amd64 >Description: Automated nightly backup hung during mksnap_ffs call from dump; all I/O to that disk blocked in a "D" state. System was otherwise functional. There were no errors in the system logs or on the console and the raid adapter was fully function. Raw I/O to the disk devices (e.g. reads of the "c" partition of the disk that was wedged) remained fully functional during the event. Disk adapter is a 3ware 9650; firmware FE9x 4.08.00.006, driver 3.70.05.001, BBU enabled. It has no faults recorded against it or the drives attached to it. Looks like a race condition/deadlock of some sort in the mksnap_ffs code. The only unique "feature" of this event was that the disk in question was a DBMS storage RAID array and had a VERY large (many thousands of files) directory on it containing the log segments (typically ~1-2MB each, compressed, sequential log files numbering many thousand.) System was unable to be cleared; mksnap_ffs could not be killed and ultimately the machine had to be forcibly rebooted without a sync(). >How-To-Repeat: Attempting to replicate at this time. Will update the PR if I am able to do so. The subject machine had been up for more than six months at the time of the fault, and it has not reoccurred as of submission of this PR. Migration to 8.1 on this cluster is taking place, but I am going to put in a significant amount of time in attempting to replicate the problem on 8.0-STABLE (same kernel and binaries) on my test cluster to see if I can provoke the failure; I have an isolated environment in which I can make a decent attempt at recreating the event. >Fix: >Release-Note: >Audit-Trail: >Unformatted: