Date: Fri, 30 Jul 2004 16:33:32 +0100 From: Jason Thomson <jason.thomson@mintel.com> To: Vinod Kashyap <vkashyap@amcc.com> Cc: Paul Saab <ps@mu.org> Subject: Re: Reproducible FreeBSD 4.10-STABLE (Jul 7) , 3ware 7506-4 lockup. Message-ID: <410A6A4C.4060008@mintel.com> In-Reply-To: <I0YQI602.N07@hadar.amcc.com> References: <I0YQI602.N07@hadar.amcc.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Vinod Kashyap wrote: > After the system locks up, from the DDB prompt, do a > 'tr, 20'. What does it say? > > Please check the drive compatibility list at: > http://www.3ware.com/products/pdf/Drive_compatibility_list.pdf > > If you suspect a problem with any of the 3ware components, > I strongly encourage you to contact 3ware support. > Apologies for taking so long to reply. I've finally got a serial console connected to this machine. When the machine locks up (after the controller reports an error), breaking into the debugger from the console just shows: twe0: AEN: <twe0: port 3: sector repair occurred> db> tr, 20 siointr1(c326b000,c04d1cc8,0,ffc08ff4,c039fd70) at siointr1+0xc1 siointr(c326b000) at siointr+0x17 Xfastintr4(0,ffc09000,0,0,ddaac000) at Xfastintr4+0x20 idle_loop() at idle_loop+0x44 Does this mean that it's not locked up in the kernel, it's just the disk controller / driver that is frozen? I've included the process list at the bottom of this mail. I'm stuck for clues with regard to what else I should look at. I can provide access to the serial console on this machine from the internet, if anyone is able to help debug this? Please reply in private mail. (To recap, I can reproduce the problem by dd'ing from the disk to /dev/null - when it hits a bad sector on the disk, no further twe I/O takes place. Contrary to a previous report, it doesn't always seem to hit a bad sector in the same place). With respect to the drive compatibility list, the drives we are using are not on the list, but drives from the same range are: The drives we have are 5A300J0 and 4A320J8 Maxtor drives - the Maxtor 4A300J0 is on the list. I don't suspect a problem with these specific 3ware components - we've had the same problem occur on 3 different machines (all Dell 1600SCs with 7506-4LP controllers). I don't know if there is a design fault with the 3ware hardware or the Maxtor disks that means they don't play well together. I would guess this is a fairly popular hardware configuration - and I haven't read any problem reports about operating systems other than FreeBSD. BTW I did contact 3ware support, but heard nothing back - this may be because I submitted a too vague problem report. I will try again, if you think they might be able to help. db> ps db> ps pid proc addr uid ppid pgrp flag stat wmesg wchan cmd 229 dffbcc20 dfffc000 0 227 227 004004 3 getblk cfa1a03c atrun 228 dffbcdc0 dffe0000 0 226 226 8000004 3 spread cfa161d4 sh 227 dffbcf60 dffd8000 0 225 227 004084 3 wait dffbcf60 sh 226 dffbd2a0 dffe7000 0 224 226 004084 3 wait dffbd2a0 sh 225 dffbd100 dfff3000 0 92 92 000084 3 piperd dfebe3e0 cron 224 dffbd780 dffaa000 0 92 92 000084 3 piperd dfebe700 cron 218 dffbd440 dffdc000 1003 210 218 004106 3 inode c3503d00 systat 210 dffbd5e0 dffc7000 1003 209 210 2004086 3 pause dffc7260 csh 209 dffbde00 dffbe000 1003 207 94 000184 3 select c04bd588 sshd 207 dffbd920 dffc2000 0 94 94 000184 3 sbwait ddac4268 sshd 196 dffbdc60 dffca000 0 162 196 004086 3 ttyin c1ddb430 csh 181 dffbdac0 dffcf000 0 155 181 004006 3 physstr cfa16088 dd 171 dc059ea0 dffae000 1003 170 171 004086 3 ttyin c3506830 csh 170 dc05a1e0 dff9c000 1003 159 94 000184 3 select c04bd588 sshd 162 dc05a040 dffa1000 1003 161 162 2004086 3 pause dffa1260 csh 161 dc05a520 dff7d000 1003 157 94 000184 3 select c04bd588 sshd 159 dc05a380 dff96000 0 94 94 000184 3 sbwait ddac47a8 sshd 157 dc05a6c0 dff88000 0 94 94 000184 3 sbwait ddac4348 sshd 155 dc05cdc0 dfeb8000 0 151 155 2004086 3 pause dfeb8260 csh 151 dc05a860 dff6b000 0 1 151 004186 3 wait dc05a860 login 150 dc05aa00 dff67000 0 1 150 004086 3 ttyin c3571210 getty 149 dc05aba0 dff63000 0 1 149 004086 3 ttyin c3571410 getty 148 dc05ad40 dff5f000 0 1 148 004086 3 ttyin c3571610 getty 147 dc05aee0 dff5b000 0 1 147 004086 3 ttyin c3571810 getty 146 dc05b3c0 dff45000 0 1 146 004086 3 ttyin c3571a10 getty 145 dc05b560 dff3a000 0 1 145 004086 3 ttyin c3571c10 getty 144 dc05ba40 dff32000 0 1 144 004086 3 ttyin c356be10 getty 143 dc05cf60 dfeb0000 0 1 143 004086 3 ttyin c318d110 getty 140 dc05b080 dff50000 0 1 140 000085 3 select c04bd588 nmbd 138 dc05b220 dff3f000 0 1 138 000085 3 select c04bd588 smbd 132 dc05b8a0 dff36000 0 130 10 000086 3 nanslp c04a3910 3dmd 131 dc05c740 dfef5000 0 130 10 000086 3 accept ddac2ff2 3dmd 130 dc05bf20 dff19000 0 1 10 000086 3 nanslp c04a3910 3dmd 129 dc05b700 dff2b000 0 1 129 000084 3 select c04bd588 rsync 102 dc05bbe0 dff25000 25 1 102 2000184 3 pause dff25260 sendmail 99 dc05bd80 dff21000 0 1 99 000184 3 select c04bd588 sendmail 96 dc05c0c0 dff15000 0 1 96 000084 3 select c04bd588 usbd 94 dc05c260 dff11000 0 1 94 000184 3 select c04bd588 sshd 92 dc05c400 dff0b000 0 1 92 000084 3 nanslp c04a3910 cron 90 dc05c5a0 dfef9000 0 1 90 000084 3 select c04bd588 inetd 83 dc05c8e0 dfec9000 0 1 83 000084 3 select c04bd588 ntpd 79 dc05ca80 dfec4000 0 1 79 000004 3 getblk cfa1ea28 syslogd 31 dc05cc20 dfec0000 0 1 31 2000084 3 pause dfec0260 adjkerntz 9 dc05d100 deb18000 0 0 0 000204 3 getblk cfa1a03c syncer 8 dc05d2a0 deb15000 0 0 0 000204 3 vlruwt dc05d2a0 vnlru 7 dc05d440 deb12000 0 0 0 000204 3 psleep c04a3ae4 bufdaemon 6 dc05d5e0 deb0f000 0 0 0 000204 3 psleep c04b2c20 vmdaemon 5 dc05d780 deb0c000 0 0 0 000204 3 psleep c047cdf8 pagedaemon 4 dc05d920 dda8e000 0 0 0 000204 3 usbtsk c04c2778 usbtask 3 dc05dac0 dda8b000 0 0 0 000204 3 usbevt c318f210 usb0 2 dc05dc60 dda65000 0 0 0 000204 3 tqthr c04bd584 taskqueue 1 dc05de00 dc062000 0 0 1 004284 3 wait dc05de00 init 0 c04bc8a0 c0579000 0 0 0 000204 3 sched c04bc8a0 swapper
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?410A6A4C.4060008>