Date: Sun, 7 Sep 2003 19:51:22 -0700 From: Aaron Smith <aaron@mutex.org> To: sos@freebsd.org Cc: freebsd-current@freebsd.org Subject: pst driver: timeout explosion? (patch is attached) Message-ID: <20030908025121.GQ560@gelatinous.com>
next in thread | raw e-mail | index | archive | help
--J2SCkAp4GZ/dPZZf Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi, I think I may have found the cause of the pst timeout panics. I'm using the Promise SX6000 RAID on -CURRENT, using the pst driver. Unfortunately, under sufficiently high I/O load, the box starts printing: "pst: timeout mfa=0x00327b90 cmd=0x01" The 'mfa' address varies. It starts printing more and more rapidly, and then eventually the machine wedges solid. Sometimes it makes it to: "panic: timeout table full" Here's what I think is happening. Two timeouts are being scheduled every time a timeout triggers, because pst_timeout schedules a timeout before calling pst_rw to retry the operation. Then pst_rw schedules ANOTHER timeout. Both of these timeouts call pst_timeout, so they double every 10 seconds until there are a large enough number of timeouts firing, retrying the same I/O operation, that the table fills and the machine panics. Check out the following diff http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/pst/pst-raid.c.diff?r1=1.8&r2=1.9&f=h This is where pst_rw was changed to schedule its own timeouts, but the timeout function didn't have its removed. Do you think this could be the correct explanation? It seems like once pst_timeout is called, the machine is doomed... I'm recompiling my kernel now to test the fix under load. --Aaron --J2SCkAp4GZ/dPZZf Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="pst-raid.c.patch" Index: /sys/dev/pst/pst-raid.c =================================================================== RCS file: /usr/cvs/src/sys/dev/pst/pst-raid.c,v retrieving revision 1.11 diff -u -r1.11 pst-raid.c --- /sys/dev/pst/pst-raid.c 24 Aug 2003 17:54:17 -0000 1.11 +++ /sys/dev/pst/pst-raid.c 8 Sep 2003 02:32:58 -0000 @@ -316,11 +316,6 @@ mtx_unlock(&request->psc->iop->mtx); return; } - if (dumping) - request->timeout_handle.callout = NULL; - else - request->timeout_handle = - timeout((timeout_t*)pst_timeout, request, 10 * hz); if (pst_rw(request)) { iop_free_mfa(request->psc->iop, request->mfa); biofinish(request->bp, NULL, EIO); --J2SCkAp4GZ/dPZZf--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030908025121.GQ560>