From owner-freebsd-bugs Mon Jul 9 10:20:10 2001 Delivered-To: freebsd-bugs@hub.freebsd.org Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id 3A3BF37B405 for ; Mon, 9 Jul 2001 10:20:02 -0700 (PDT) (envelope-from gnats@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.11.3/8.11.3) id f69HK2Z99491; Mon, 9 Jul 2001 10:20:02 -0700 (PDT) (envelope-from gnats) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id C44FB37B401 for ; Mon, 9 Jul 2001 10:18:21 -0700 (PDT) (envelope-from nobody@FreeBSD.org) Received: (from nobody@localhost) by freefall.freebsd.org (8.11.3/8.11.3) id f69HIL899370; Mon, 9 Jul 2001 10:18:21 -0700 (PDT) (envelope-from nobody) Message-Id: <200107091718.f69HIL899370@freefall.freebsd.org> Date: Mon, 9 Jul 2001 10:18:21 -0700 (PDT) From: Tracy Camp To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-1.0 Subject: kern/28840: Possible interrupt masking trouble in sys/cam/cam_xpt.c Sender: owner-freebsd-bugs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org >Number: 28840 >Category: kern >Synopsis: Possible interrupt masking trouble in sys/cam/cam_xpt.c >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Mon Jul 09 10:20:01 PDT 2001 >Closed-Date: >Last-Modified: >Originator: Tracy Camp >Release: 4.3-RELEASE >Organization: MiraLink Corp >Environment: FreeBSD 4.3-RELEASE FreeBSD 4.3-RELEAE #126: Fri Jul 6 13:37:26 EDT 2001 root@bsd-4-3.miralink.com:/usr/src/sys/compile/TEST i386 >Description: We have over a long period of time been experiencing seemingly random crashes using FreeBSD 3.1 and now FreeBSD 4.3 all related to the disk i/o system. Our application uses a custom driver with a Qlogic ISP controller operating in target mode. After extensive source code auditing in our driver code we could find no further problems. However setting up sanity checks in portions of the sys/cam/cam_xpt.c code showed what appeared to be queue corruption due to invalid interrupt masking. This problem only shows up under rather heavy load. Sorry to say our driver does a fair amount of work at interrupt level so this may be the underlying trigger problem. However removing and replacing all splsoftcam() calls in sys/cam/cam_xpt.c with splcam() entirely eliminated the problem. Specific problems we had encountered: devstat_end_transaction HELP!! busy_count for da2 < 0 (-1) this was shown to allways result from a devstat_end_transaction_buf occuring within cam/sys/scsi/scsi_da.c:dadone() panic: xpt_run_dev_allocq: Device on queue without any work to do This was found after a bit of testing to be related directly to the next one: Fatal Trap 12: page fault while in kernel mode this was occuring within xpt_run_dev_allocq and was actually due to a NULL pointer being returned by camq_remove on the device queue. Checks added to camq_insert and camq_remove showed that occasionally a queue entry could be added and before camq_insert had finished the entries count would be 0 rather than the expected 1. Particularly convincing was a test inserted that did something similar to this: camq_insert(..) { /* near the top */ saved_entries = queue->entries; /* later */ if(queue->entries < 1) { printf("entries < 1 %d", queue->entries); } else { >How-To-Repeat: >Fix: >Release-Note: >Audit-Trail: >Unformatted: To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message