Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 14 Sep 2011 10:38:20 -0600
From:      Elliot Finley <efinley.lists@gmail.com>
To:        Dennis Koegel <dk@neveragain.de>
Cc:        stable@freebsd.org
Subject:   Re: System freeze: Adaptec (aac) timeouts (releng 8)
Message-ID:  <CACRGtSO-7er=Nhofkj7E=05iM6qzVumNJQpr=pwg6196-1K2MA@mail.gmail.com>
In-Reply-To: <20110914080831.GB41431@neveragain.de>
References:  <20110914080831.GB41431@neveragain.de>

next in thread | previous in thread | raw e-mail | index | archive | help
I was having the exact same problem using an Adaptec 52445.  After
downloading and using the latest driver from the adaptec website, the
problems stopped.  I haven't had a single freeze since using the new
code.  The newest driver from the website has source code with it, so
it shouldn't be that big of a deal to incorporate it into the base
system.    I emailed the authors of the aac driver (Mike Smith and
Scott Long), but they have both retired.  So I'm not really sure how
to get this code into the base.  If anyone knows, please take up the
charge.

On Wed, Sep 14, 2011 at 2:08 AM, Dennis Koegel <dk@neveragain.de> wrote:
> Cheers,
>
> we have a reproducible system freeze due to Adaptec driver (aac) timeouts=
:
>
> Sep =A03 05:26:44 foo kernel: aac0: COMMAND 0xffffff80005ae4c0 (TYPE 502)=
 TIMEOUT AFTER 129 SECONDS
> Sep =A03 05:26:44 foo kernel: aac0: COMMAND 0xffffff80005ac0e0 (TYPE 502)=
 TIMEOUT AFTER 129 SECONDS
> Sep =A03 05:26:44 foo kernel: aac0: COMMAND 0xffffff80005b0fa0 (TYPE 502)=
 TIMEOUT AFTER 129 SECONDS
> <dozens more of these...>
>
> Once this happens, the userland seems to be alive, but the controller is
> completely dead. As soon as the disk subsystem is involved, any process
> hangs forever (e.g. SSH crypto-exchange still happens, but a shell won't
> even start anymore).
>
> We observe the same issue on two systems of (mostly) identical spec, so
> it's not a hardware issue.
>
> Apparently this only happens under heavy disk i/o and high cpu load.
> Notably high write throughput plus a 'zpool scrub' on a large
> GELI-backed zpool usually triggers the problem after a few hours.
> Without high activity, they run smooth for weeks.
>
> Both systems are amd64 with an Adaptec 5805 controller and 16 disks (of
> which two form a RAID-1 system volume (UFS), and the remaining 14 serve
> as JBOD for a large zpool -- a total of 15 "aacd" devices).
>
> Both were running 8.2R originally. I've taken them to 8-STABLE now and
> also applied svn r222951 (where the MFC was forgotten, it seems), but
> the problem remains.
>
> Any help is greatly appreciated.
>
> Thanks,
> - D.
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACRGtSO-7er=Nhofkj7E=05iM6qzVumNJQpr=pwg6196-1K2MA>