Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 18 Aug 2003 15:43:55 -0500
From:      "P. Larry Nelson" <lnelson@uiuc.edu>
To:        aic7xxx@freebsd.org
Subject:   Re: scsi errors only when writing to Promise disks
Message-ID:  <3F413A8B.86454E18@uiuc.edu>
References:  <3F3A8825.9EE28ED3@uiuc.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
Problem appears to be solved (in case anyone else encounters the same
scenario).  

The Promise Ultratrak RM 15000 we have was running the latest (according
to their web site) firmware of 1.1.0.7.

I reported the problem to Promise's tech support and they just sent me
a zip file with a beta 1.1.0.9 firmware release that takes care of the
problem, as far as I can tell.

I'd like to thank Justin Gibbs (scsiguy.com) for some insights on what
was going on with the write errors.

- Larry

"P. Larry Nelson" wrote:
> 
> I have just joined this list in an attempt to try and get some guidance
> as to what might be wrong or at least maybe where to turn for help, as
> I don't seem to be getting very far with RedHat.  And Google searches
> on the errors or the particular Promise raid system lead nowhere.
> 
> I'm seeing thousands (last count was 26,000) of the following errors
> in /var/log/messages when running some large write tests on an external
> disk connected to an Adaptec 29160 (details of the ad hoc test further below):
> 
> [sample two line entry:]
>  <date/time> <hostname> kernel: (scsi1:A:5:0): parity error detected in Data-out
> phase. SEQADDR(0x1a3) SCSIRATE(0xc2)
>  <date/time> <hostname> kernel: ^INo terminal CRC packet received
> [note that the address in SEQADDR is only thing that changes in previous and
>  subsequent messages]
> 
> If you know what's going on, you can stop reading here and email me
> the problem, solution, hints, workarounds, commiserations, whatever.
> Otherwise, here are many more details.
> 
> System description:
>  Software:
>   Red Hat Linux release 9 (Shrike)
>   Linux version 2.4.20-18.9smp (bhcompile@porky.devel.redhat.com)
>   (gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5)) #1 SMP Thu May 29
>   06:55:05 EDT 2003
>  [BTW, same problem occurs with RedHat 8]
> 
>  Drivers loaded:
>   Module                  Size  Used by    Not tainted
>   soundcore               7044   0  (autoclean)
>   lp                      9188   0  (autoclean)
>   parport                39072   0  (autoclean) [lp]
>   nfs                    84600   2  (autoclean)
>   lockd                  59536   1  (autoclean) [nfs]
>   sunrpc                 87516   1  (autoclean) [nfs lockd]
>   e1000                  60704   2
>   microcode               5184   0  (autoclean)
>   loop                   12888   0  (autoclean)
>   keybdev                 2976   0  (unused)
>   mousedev                5688   1
>   hid                    22404   0  (unused)
>   input                   6208   0  [keybdev mousedev hid]
>   usb-uhci               27468   0  (unused)
>   usbcore                82816   1  [hid usb-uhci]
>   ext3                   73376   4
>   jbd                    56368   4  [ext3]
>   lvm-mod                64512   1
>   aic7xxx               142516   5
>   sd_mod                 13452  10
>   scsi_mod              110904   2  [aic7xxx sd_mod]
> 
>  I don't know what version of the aic driver is used - how does one tell?
> 
>  Hardware:
>   Open Storage Solutions 2U rack mount server with Intel SE7500WV2 motherboard,
>   dual Xeon 2GHz processors, 2Gb ram, 18Gb & 73Gb internal scsi disks, Adaptec
>   AIC29160 scsi card, external Promise Ultratrak RM 15000 raid system connected
>   to the AIC29160. [all disks are set up for journaling, i.e., ext3]
> 
>  Test details:
>   The test consists of doing some relatively large copies of files to the
>   external disk, which mounts just fine and shows no errors at all with
>   smallish writes.  Seems like any write (file copy) over, say, 300,000 bytes,
>   will generate the error.  For example, the following command will generate
>   two such occurrences of the pair of lines listed above:
>    'cp /boot/vmlinuz /mnt2'
>   In this case, the file is a little over 1mb.
>  - same test does not generate any errors when writing to the internal disks.
>  - moved internal disks to the Adaptec 29160 and tried the write test again -
>    no errors.
>  - get same errors regardless whether the test is done against a raid set on
>    the external Promise or to a single jbod disk in the Promise.
>  - when the exact same hardware setup had Win2k loaded, there were no errors
>    writing to the Promise.
>  - when the Promise disk raid was attached to an Alpha running Tru64 unix,
>    there were no errors when writing to the disks.
> [in other words, this Promise Raid system has been checked out on other systems
> with no problems at all]
>  - a different scsi controller was not tried (I have no others, besides it
>    worked fine when it was part of the Win2k setup).
>  - neither was a different linux tried (like debian or suse, etc.)
> 
> In other words, the errors only come when trying to do >~300kb writes thru the
> Adaptec 29160 controller, on RedHat, to a Promise Ultratrak RM 15000 raid
> system.  There doesn't seem to be anything wrong with the files - a diff
> of the original and copy shows no differences.
> 
> This is all particularly bothersome as I need to set up a number of these
> systems as large (multi-terabyte) file servers in order to handle massive
> amounts of experimental data.  Another problem I discovered (as we migrate
> away from Alphas) is that I'm limited (at present) to 2 TB logical volumes
> in LVM, and I need to make upwards to 6 Terabyte lv's, but I digress and
> that's another story....  (I understand that the 2.6 kernel can handle these)
> 
> One final note: I am bound to the use of RedHat because of software constraints
> imposed by the national lab where the data is being generated (they're using
> RedHat, so we have to, also).
> 
> Many thanks in advance!
> - Larry
> --
> P. Larry Nelson (217-244-9855) | Systems/Network Administrator
> 461 Loomis Lab                 | U of I, CITES Departmental Services
> 1110 W. Green St., Urbana, IL  | Consultant to: High Energy Physics Group
> MailTo:lnelson@uiuc.edu        | http://www.uiuc.edu/ph/www/lnelson
> -------------------------------------------------------------------------
>  "Information without accountability is just noise."  - P.L. Nelson
> _______________________________________________
> aic7xxx@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/aic7xxx
> To unsubscribe, send any mail to "aic7xxx-unsubscribe@freebsd.org"


-- 
P. Larry Nelson (217-244-9855) | Systems/Network Administrator
461 Loomis Lab                 | U of I, CITES Departmental Services
1110 W. Green St., Urbana, IL  | Consultant to: High Energy Physics Group
MailTo:lnelson@uiuc.edu        | http://www.uiuc.edu/ph/www/lnelson
-------------------------------------------------------------------------
 "Information without accountability is just noise."  - P.L. Nelson



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3F413A8B.86454E18>