Date: Wed, 13 Aug 2003 13:49:09 -0500 From: "P. Larry Nelson" <lnelson@uiuc.edu> To: aic7xxx@freebsd.org Subject: scsi errors only when writing to Promise disks Message-ID: <3F3A8825.9EE28ED3@uiuc.edu>
next in thread | raw e-mail | index | archive | help
I have just joined this list in an attempt to try and get some guidance as to what might be wrong or at least maybe where to turn for help, as I don't seem to be getting very far with RedHat. And Google searches on the errors or the particular Promise raid system lead nowhere. I'm seeing thousands (last count was 26,000) of the following errors in /var/log/messages when running some large write tests on an external disk connected to an Adaptec 29160 (details of the ad hoc test further below): [sample two line entry:] <date/time> <hostname> kernel: (scsi1:A:5:0): parity error detected in Data-out phase. SEQADDR(0x1a3) SCSIRATE(0xc2) <date/time> <hostname> kernel: ^INo terminal CRC packet received [note that the address in SEQADDR is only thing that changes in previous and subsequent messages] If you know what's going on, you can stop reading here and email me the problem, solution, hints, workarounds, commiserations, whatever. Otherwise, here are many more details. System description: Software: Red Hat Linux release 9 (Shrike) Linux version 2.4.20-18.9smp (bhcompile@porky.devel.redhat.com) (gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5)) #1 SMP Thu May 29 06:55:05 EDT 2003 [BTW, same problem occurs with RedHat 8] Drivers loaded: Module Size Used by Not tainted soundcore 7044 0 (autoclean) lp 9188 0 (autoclean) parport 39072 0 (autoclean) [lp] nfs 84600 2 (autoclean) lockd 59536 1 (autoclean) [nfs] sunrpc 87516 1 (autoclean) [nfs lockd] e1000 60704 2 microcode 5184 0 (autoclean) loop 12888 0 (autoclean) keybdev 2976 0 (unused) mousedev 5688 1 hid 22404 0 (unused) input 6208 0 [keybdev mousedev hid] usb-uhci 27468 0 (unused) usbcore 82816 1 [hid usb-uhci] ext3 73376 4 jbd 56368 4 [ext3] lvm-mod 64512 1 aic7xxx 142516 5 sd_mod 13452 10 scsi_mod 110904 2 [aic7xxx sd_mod] I don't know what version of the aic driver is used - how does one tell? Hardware: Open Storage Solutions 2U rack mount server with Intel SE7500WV2 motherboard, dual Xeon 2GHz processors, 2Gb ram, 18Gb & 73Gb internal scsi disks, Adaptec AIC29160 scsi card, external Promise Ultratrak RM 15000 raid system connected to the AIC29160. [all disks are set up for journaling, i.e., ext3] Test details: The test consists of doing some relatively large copies of files to the external disk, which mounts just fine and shows no errors at all with smallish writes. Seems like any write (file copy) over, say, 300,000 bytes, will generate the error. For example, the following command will generate two such occurrences of the pair of lines listed above: 'cp /boot/vmlinuz /mnt2' In this case, the file is a little over 1mb. - same test does not generate any errors when writing to the internal disks. - moved internal disks to the Adaptec 29160 and tried the write test again - no errors. - get same errors regardless whether the test is done against a raid set on the external Promise or to a single jbod disk in the Promise. - when the exact same hardware setup had Win2k loaded, there were no errors writing to the Promise. - when the Promise disk raid was attached to an Alpha running Tru64 unix, there were no errors when writing to the disks. [in other words, this Promise Raid system has been checked out on other systems with no problems at all] - a different scsi controller was not tried (I have no others, besides it worked fine when it was part of the Win2k setup). - neither was a different linux tried (like debian or suse, etc.) In other words, the errors only come when trying to do >~300kb writes thru the Adaptec 29160 controller, on RedHat, to a Promise Ultratrak RM 15000 raid system. There doesn't seem to be anything wrong with the files - a diff of the original and copy shows no differences. This is all particularly bothersome as I need to set up a number of these systems as large (multi-terabyte) file servers in order to handle massive amounts of experimental data. Another problem I discovered (as we migrate away from Alphas) is that I'm limited (at present) to 2 TB logical volumes in LVM, and I need to make upwards to 6 Terabyte lv's, but I digress and that's another story.... (I understand that the 2.6 kernel can handle these) One final note: I am bound to the use of RedHat because of software constraints imposed by the national lab where the data is being generated (they're using RedHat, so we have to, also). Many thanks in advance! - Larry -- P. Larry Nelson (217-244-9855) | Systems/Network Administrator 461 Loomis Lab | U of I, CITES Departmental Services 1110 W. Green St., Urbana, IL | Consultant to: High Energy Physics Group MailTo:lnelson@uiuc.edu | http://www.uiuc.edu/ph/www/lnelson ------------------------------------------------------------------------- "Information without accountability is just noise." - P.L. Nelson
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3F3A8825.9EE28ED3>