Date: Sat, 20 Jan 2007 12:08:03 -0500 From: Jeff Royle <lists@qwirky.net> To: LI Xin <delphij@delphij.net> Cc: freebsd-stable@freebsd.org Subject: Re: 6.2 Release - Adaptec 2130SLP driver?? issue - aac driver Message-ID: <45B24C73.3010807@qwirky.net> In-Reply-To: <45B0F758.70408@delphij.net> References: <45B0D996.8070704@qwirky.net> <45B0F61A.8020507@qwirky.net> <45B0F758.70408@delphij.net>
next in thread | previous in thread | raw e-mail | index | archive | help
LI Xin wrote: > Jeff Royle wrote: >> Jeff Royle wrote: >>> I could use some advice on this issue I have had with my raid controller. >>> I am not really running much on the system yet, postfix, Pf + pflogd, >>> rlogind, ssh, bsnmp and ntpd. While I was just reading a file with >>> less the system stopped responding. I thought it was the network >>> interfaces but I was able to ping the interface. Once I plugged a >>> monitor into the system I saw this (roughly): >>> >>> AAC0: COMMAND <SOME HEX> TIMEOUT AFTER X number of seconds >>> >>> Not good :) >>> >>> Reset of the system resolved the issue and it booted fine. Since >>> the controller stopped responding nothing was recorded to my logs. >>> >>> Now I have to figure out how to prevent that from happening again. >>> >>> Basic run down on the system and some history... >>> >>> P4 3.2Ghz >>> Asus P5MT-S MB >>> 2 x 1GB DDR2 667 memory >>> Adaptec 2130SLP Raid Controller + battery backup module >>> 2 Segate Ultra320 73GB 15k RPM (mirrored) >>> >>> I have run this same system hardware testing 6.2-BETA3, RC-1 and RC-2 >>> without this issue. I was using the driver released by Adaptec >>> while testing the pre-release installs >>> (http://www.adaptec.com/en-US/speed/raid/aac/unix/aacraid_freebsd6_drv_b11518_tgz.htm). >>> You could say I am fairly confidient in the hardware itself. I have >>> put this system through a lot of testing since BETA3. >>> >>> The 6.2 release kernel has not been customized all that much, I just >>> pulled out all the drivers I would never use. To be safe I kept >>> just about all scsi devices/card models still in as I continued my >>> testing of 6.2 release. Right now I am going to try taking out aac and >>> aacp then try the driver I used in my previous tests. However, >>> since I have run a week without this issue it will be hard/impossible >>> tell if this did anything to resolve it...I almost want a crash on the >>> old driver :) >>> >>> So I need some advice... How best do I debug this issue? >>> >>> Thanks in advance for any direction you guys can offer me. >>> >>> Cheers, >>> >>> Jeff >>> >>> >> It appears the driver I was using in my pre-release testing is newer >> then the release driver. >> >> Stock driver in 6.2r dmesg: >> >> aac0: <Adaptec SCSI RAID 2130S> mem >> 0xfc600000-0xfc7fffff,0xfc5ff000-0xfc5fffff irq 24 at device 1.0 on pci2 >> aac0: New comm. interface enabled >> aac0: Adaptec Raid Controller 2.0.0-1 >> aacp0: <SCSI Passthrough Bus> on aac0 >> >> Currently using: >> >> aacu0: <Adaptec SCSI RAID 2130S> mem >> 0xfc600000-0xfc7fffff,0xfc5ff000-0xfc5fffff irq 24 at device 1.0 on pci2 >> aacu0: New comm. interface enabled >> aacu0: Adaptec Raid Controller 2.0.7-1 >> aacpu0: <SCSI Passthrough Bus> on aacu0 >> >> Going to continue testing with the newer driver. > > I have some preliminary work on merging the Adaptec driver: > > http://people.freebsd.org/~delphij/for_review/patch-aac-vendor-b11518 > > But one of the reviewers has advised me to request boarder testing, > especially against old cards and CLI tools, so I have hold the commit > for now. > > Cheers, Well the driver patched fine, no issues to report there. The speed performance is where I expected to see it while using bonnie and simple DD tests based on my previous testing. So far the issue I noted above with the TIMEOUT error has not shown itself again, time will tell I think on this one. However I have encountered a intermittent bug on boot. Sometimes, say every 5-10 boots the system will hang while probing the the scsi bus for the drives. Now I have seen this happen on the aacdu 2.0.7-1 binary driver I was using in my 6.2-RC 1 / 6.2-RC 2 testing once before. This problem is happening a fair bit more. Here is where it hangs... Hung dmesg output: -- snip --- orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xcd7ff on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 ppc0: parallel port not found. Timecounters tick every 1.000 msec acd0: CDRW <QSI CD-RW/DVD-ROM SBW-243/TX09> at ata0-master UDMA33 aacd0: <RAID 1 (Mirror)> on aac0 aacd0: 69889MB (143132672 sectors) --- end snip --- The system does not continue on and probe the drives, as seen in a normal boot dmesg: --- snip --- sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 ppc0: parallel port not found. Timecounters tick every 1.000 msec acd0: CDRW <QSI CD-RW/DVD-ROM SBW-243/TX09> at ata0-master UDMA33 aacd0: <RAID 1 (Mirror)> on aac0 aacd0: 69889MB (143132672 sectors) pass0 at aacp0 bus 0 target 0 lun 0 pass0: <SEAGATE ST373207LC 0005> Fixed unknown SCSI-3 device pass0: 3.300MB/s transfers pass1 at aacp0 bus 0 target 3 lun 0 pass1: <SEAGATE ST373207LC 0005> Fixed unknown SCSI-3 device pass1: 3.300MB/s transfers SMP: AP CPU #1 Launched! Trying to mount root from ufs:/dev/aacd0s1a -- end snip -- In a effort to resolve this I increased the scsi delay in the kernel from 5ms to 10ms options SCSI_DELAY=10000 It *may* have helped on one of my reboot tests, I thought it was going to hang again but proceeded. However it definitely did not solve the issue. Once I am back in the office I will see if I can get some debug output for you. Cheers, Jeff
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45B24C73.3010807>