From owner-freebsd-current@FreeBSD.ORG Sun Sep 7 19:51:25 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0972716A4BF for ; Sun, 7 Sep 2003 19:51:25 -0700 (PDT) Received: from cube.gelatinous.com (dsl081-068-105.sfo1.dsl.speakeasy.net [64.81.68.105]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3EE6043FAF for ; Sun, 7 Sep 2003 19:51:23 -0700 (PDT) (envelope-from aaron@mutex.org) Received: (qmail 63792 invoked by uid 1000); 8 Sep 2003 02:51:22 -0000 Date: Sun, 7 Sep 2003 19:51:22 -0700 From: Aaron Smith To: sos@freebsd.org Message-ID: <20030908025121.GQ560@gelatinous.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="J2SCkAp4GZ/dPZZf" Content-Disposition: inline User-Agent: Mutt/1.4.1i cc: freebsd-current@freebsd.org Subject: pst driver: timeout explosion? (patch is attached) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Sep 2003 02:51:25 -0000 --J2SCkAp4GZ/dPZZf Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi, I think I may have found the cause of the pst timeout panics. I'm using the Promise SX6000 RAID on -CURRENT, using the pst driver. Unfortunately, under sufficiently high I/O load, the box starts printing: "pst: timeout mfa=0x00327b90 cmd=0x01" The 'mfa' address varies. It starts printing more and more rapidly, and then eventually the machine wedges solid. Sometimes it makes it to: "panic: timeout table full" Here's what I think is happening. Two timeouts are being scheduled every time a timeout triggers, because pst_timeout schedules a timeout before calling pst_rw to retry the operation. Then pst_rw schedules ANOTHER timeout. Both of these timeouts call pst_timeout, so they double every 10 seconds until there are a large enough number of timeouts firing, retrying the same I/O operation, that the table fills and the machine panics. Check out the following diff http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/pst/pst-raid.c.diff?r1=1.8&r2=1.9&f=h This is where pst_rw was changed to schedule its own timeouts, but the timeout function didn't have its removed. Do you think this could be the correct explanation? It seems like once pst_timeout is called, the machine is doomed... I'm recompiling my kernel now to test the fix under load. --Aaron --J2SCkAp4GZ/dPZZf Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="pst-raid.c.patch" Index: /sys/dev/pst/pst-raid.c =================================================================== RCS file: /usr/cvs/src/sys/dev/pst/pst-raid.c,v retrieving revision 1.11 diff -u -r1.11 pst-raid.c --- /sys/dev/pst/pst-raid.c 24 Aug 2003 17:54:17 -0000 1.11 +++ /sys/dev/pst/pst-raid.c 8 Sep 2003 02:32:58 -0000 @@ -316,11 +316,6 @@ mtx_unlock(&request->psc->iop->mtx); return; } - if (dumping) - request->timeout_handle.callout = NULL; - else - request->timeout_handle = - timeout((timeout_t*)pst_timeout, request, 10 * hz); if (pst_rw(request)) { iop_free_mfa(request->psc->iop, request->mfa); biofinish(request->bp, NULL, EIO); --J2SCkAp4GZ/dPZZf--