Date: Wed, 14 Sep 2011 10:08:31 +0200 From: Dennis Koegel <dk@neveragain.de> To: stable@freebsd.org Subject: System freeze: Adaptec (aac) timeouts (releng 8) Message-ID: <20110914080831.GB41431@neveragain.de>
next in thread | raw e-mail | index | archive | help
Cheers, we have a reproducible system freeze due to Adaptec driver (aac) timeouts: Sep 3 05:26:44 foo kernel: aac0: COMMAND 0xffffff80005ae4c0 (TYPE 502) TIMEOUT AFTER 129 SECONDS Sep 3 05:26:44 foo kernel: aac0: COMMAND 0xffffff80005ac0e0 (TYPE 502) TIMEOUT AFTER 129 SECONDS Sep 3 05:26:44 foo kernel: aac0: COMMAND 0xffffff80005b0fa0 (TYPE 502) TIMEOUT AFTER 129 SECONDS <dozens more of these...> Once this happens, the userland seems to be alive, but the controller is completely dead. As soon as the disk subsystem is involved, any process hangs forever (e.g. SSH crypto-exchange still happens, but a shell won't even start anymore). We observe the same issue on two systems of (mostly) identical spec, so it's not a hardware issue. Apparently this only happens under heavy disk i/o and high cpu load. Notably high write throughput plus a 'zpool scrub' on a large GELI-backed zpool usually triggers the problem after a few hours. Without high activity, they run smooth for weeks. Both systems are amd64 with an Adaptec 5805 controller and 16 disks (of which two form a RAID-1 system volume (UFS), and the remaining 14 serve as JBOD for a large zpool -- a total of 15 "aacd" devices). Both were running 8.2R originally. I've taken them to 8-STABLE now and also applied svn r222951 (where the MFC was forgotten, it seems), but the problem remains. Any help is greatly appreciated. Thanks, - D.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110914080831.GB41431>