From owner-freebsd-scsi@FreeBSD.ORG Wed Aug 6 12:27:37 2003 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2E2B737B431 for ; Wed, 6 Aug 2003 12:27:35 -0700 (PDT) Received: from mail.sandvine.com (sandvine.com [199.243.201.138]) by mx1.FreeBSD.org (Postfix) with ESMTP id 830EF43F85 for ; Wed, 6 Aug 2003 12:27:34 -0700 (PDT) (envelope-from ddolson@sandvine.com) Received: by mail.sandvine.com with Internet Mail Service (5.5.2653.19) id <305LHN38>; Wed, 6 Aug 2003 15:27:28 -0400 Message-ID: From: Dave Dolson To: "'freebsd-scsi@freebsd.org'" Date: Wed, 6 Aug 2003 15:27:14 -0400 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Subject: Swapping deadlock due to aic/scsi errors? X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Aug 2003 19:27:37 -0000 We have a reproducible bug characterized by the system becoming unresponsive (but db may be entered). System is based on FreeBSD 4.7 (i386) Using the aic79xx scsi driver. Common elements: pagedaemon waiting in wswbuf0 (waiting for free page from swapper?) swapper waiting in vmwait (waiting for free page from disk?) nsw_wcount_async=0 If any procs page fault, they will be waiting on swread then the following message will be seen (Once every 20s): swap_pager: indefinite wait buffer: device: #da/0x30001, blkno: 10352, size: 4096 I believe that the swapper is waiting for the scsi drive to call vunmapbuf() after asynchronously sending the page to be swapped out. The following message is sometimes seen, followed by a "dump card state": "SCB 0x1f - timed out" I would like to add some debugging to detect the lost command and possibly retry it. Can someone suggest where the lost command is supposed to be detected, and where the retry is supposed to occur. (I've been looking through the cam and ahd code, but need some direction) Thanks in advance, David Dolson (ddolson@sandvine.com, www.sandvine.com)