Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 20 Jan 2007 12:08:03 -0500
From:      Jeff Royle <lists@qwirky.net>
To:        LI Xin <delphij@delphij.net>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: 6.2 Release -  Adaptec 2130SLP driver?? issue - aac driver
Message-ID:  <45B24C73.3010807@qwirky.net>
In-Reply-To: <45B0F758.70408@delphij.net>
References:  <45B0D996.8070704@qwirky.net> <45B0F61A.8020507@qwirky.net> <45B0F758.70408@delphij.net>

next in thread | previous in thread | raw e-mail | index | archive | help
LI Xin wrote:
> Jeff Royle wrote:
>> Jeff Royle wrote:
>>> I could use some advice on this issue I have had with my raid controller.
>>> I am not really running much on the system yet, postfix, Pf + pflogd,
>>> rlogind, ssh, bsnmp and ntpd.  While I was just reading a file with
>>> less the system stopped responding.   I thought it was the network
>>> interfaces but I was able to ping the interface. Once I plugged a
>>> monitor into the system I saw this (roughly):
>>>
>>> AAC0: COMMAND <SOME HEX> TIMEOUT AFTER X number of seconds
>>>
>>> Not good :)
>>>
>>> Reset of the system resolved the issue and it booted fine.    Since
>>> the controller stopped responding nothing was recorded to my logs.
>>>
>>> Now I have to figure out how to prevent that from happening again.
>>>
>>> Basic run down on the system and some history...
>>>
>>> P4 3.2Ghz
>>> Asus P5MT-S MB
>>> 2 x 1GB DDR2 667 memory
>>> Adaptec 2130SLP Raid Controller + battery backup module
>>> 2 Segate Ultra320 73GB 15k RPM (mirrored)
>>>
>>> I have run this same system hardware testing 6.2-BETA3, RC-1 and RC-2
>>> without this issue.    I was using the driver released by Adaptec
>>> while testing the pre-release installs
>>> (http://www.adaptec.com/en-US/speed/raid/aac/unix/aacraid_freebsd6_drv_b11518_tgz.htm).  
>>> You could say I am fairly confidient in the hardware itself.   I have
>>> put this system through a lot of testing since BETA3.
>>>
>>> The 6.2 release kernel has not been customized all that much, I just
>>> pulled out all the drivers I would never use.    To be safe I kept
>>> just about all scsi devices/card models still in as I continued my
>>> testing of 6.2 release. Right now I am going to try taking out aac and
>>> aacp then try the driver I used in my previous tests.    However,
>>> since I have run a week without this issue it will be hard/impossible
>>> tell if this did anything to resolve it...I almost want a crash on the
>>> old driver :)
>>>
>>> So I need some advice...  How best do I debug this issue?
>>>
>>> Thanks in advance for any direction you guys can offer me.
>>>
>>> Cheers,
>>>
>>> Jeff
>>>
>>>
>> It appears the driver I was using in my pre-release testing is newer
>> then the release driver.
>>
>> Stock driver in 6.2r dmesg:
>>
>> aac0: <Adaptec SCSI RAID 2130S> mem
>> 0xfc600000-0xfc7fffff,0xfc5ff000-0xfc5fffff irq 24 at device 1.0 on pci2
>> aac0: New comm. interface enabled
>> aac0: Adaptec Raid Controller 2.0.0-1
>> aacp0: <SCSI Passthrough Bus> on aac0
>>
>> Currently using:
>>
>> aacu0: <Adaptec SCSI RAID 2130S> mem
>> 0xfc600000-0xfc7fffff,0xfc5ff000-0xfc5fffff irq 24 at device 1.0 on pci2
>> aacu0: New comm. interface enabled
>> aacu0: Adaptec Raid Controller 2.0.7-1
>> aacpu0: <SCSI Passthrough Bus> on aacu0
>>
>> Going to continue testing with the newer driver.
> 
> I have some preliminary work on merging the Adaptec driver:
> 
> http://people.freebsd.org/~delphij/for_review/patch-aac-vendor-b11518
> 
> But one of the reviewers has advised me to request boarder testing,
> especially against old cards and CLI tools, so I have hold the commit
> for now.
> 
> Cheers,

Well the driver patched fine, no issues to report there.

The speed performance is where I expected to see it while using bonnie 
and simple DD tests based on my previous testing.

So far the issue I noted above with the TIMEOUT error has not shown 
itself again, time will tell I think on this one.

However I have encountered a intermittent bug on boot.

Sometimes, say every 5-10 boots the system will hang while probing the 
the scsi bus for the drives.   Now I have seen this happen on the aacdu 
2.0.7-1 binary driver I was using in my 6.2-RC 1 / 6.2-RC 2 testing once 
before.  This problem is happening a fair bit more.

Here is where it hangs...

Hung dmesg output:

-- snip ---
orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xcd7ff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
ppc0: parallel port not found.
Timecounters tick every 1.000 msec
acd0: CDRW <QSI CD-RW/DVD-ROM SBW-243/TX09> at ata0-master UDMA33
aacd0: <RAID 1 (Mirror)> on aac0
aacd0: 69889MB (143132672 sectors)
--- end snip ---

The system does not continue on and probe the drives, as seen in a 
normal boot dmesg:

--- snip ---
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
ppc0: parallel port not found.
Timecounters tick every 1.000 msec
acd0: CDRW <QSI CD-RW/DVD-ROM SBW-243/TX09> at ata0-master UDMA33
aacd0: <RAID 1 (Mirror)> on aac0
aacd0: 69889MB (143132672 sectors)
pass0 at aacp0 bus 0 target 0 lun 0
pass0: <SEAGATE ST373207LC 0005> Fixed unknown SCSI-3 device
pass0: 3.300MB/s transfers
pass1 at aacp0 bus 0 target 3 lun 0
pass1: <SEAGATE ST373207LC 0005> Fixed unknown SCSI-3 device
pass1: 3.300MB/s transfers
SMP: AP CPU #1 Launched!
Trying to mount root from ufs:/dev/aacd0s1a
-- end snip --

In a effort to resolve this I increased the scsi delay in the kernel 
from 5ms to 10ms

options         SCSI_DELAY=10000

It *may* have helped on one of my reboot tests, I thought it was going 
to hang again but proceeded.   However it definitely did not solve the 
issue.

Once I am back in the office I will see if I can get some debug output 
for you.

Cheers,

Jeff



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45B24C73.3010807>