Date: Thu, 21 Aug 1997 09:16:52 +0200 From: "Ulrich Windl" <ulrich.windl@rz.uni-regensburg.de> To: Doug Ledford <dledford@dialnet.net> Cc: aic7xxx@freebsd.org, linux-scsi@vger.rutgers.edu, Harald Koenig <koenig@tat.physik.uni-tuebingen.de>, Hubert Mantel <mantel@suse.de> Subject: "read defect list" with 2.0.30-pre7 and patch Aug19 Message-ID: <5CA15F646FE@rkdvmks1.ngate.uni-regensburg.de>
next in thread | raw e-mail | index | archive | help
(My floppy with the Aug19-2 patch had a CRC error, so I had to use the Aug19 patch) The driver (compared to the stock version of 2.0.30) gave a new warning about automatic termination being enabled. That is correct for my AHA2940 with BIOS 1.21, and it really work, even though I don't understand how. Having enabled the statistics, I found out that I have statistics for non-existing SCSI IDs and LUNs -- maybe the read was there, but not the LUN ;-) The question is if you want to support SCSC plug and play, what condition should you check? At least accesses in two categories? Despite of that the information given should be much more compact; for cat /proc/scsi/aic7xxx/0 I got a bunch of: ...possible overflow at loop 0:8 0:8 1:8 0:8 1:8 2:8 0:8 1:8 2:8 0:8 1:8 2:8 Resource allocation: SHouldn't the driver use a hardware-identifier instead of a software-identifier when registering resources? Currently the driver uses generic "aic7xxx", not the actual CHIP, and not the PCI bus & device. With multiple cards the approach seems ambiguous (talking about /proc/ioports and /proc/interrupts). Unfortunately the kernel still bombs out badly, but I was able to get at least some information onto a file on my IDE harddisk; I even had symbolic information. I added another log to show how consistent the fault is. Still, as expected earlier, there seems to be a undetected buffer overflow in the kernel that overwrites some SCSI data structures (at least). The code of the fault looked OK, but the RAM accesses had probably bad values. I'll add Harald Koenig to the CC:, because he brought up the issue with the overflow. Adding Hubert Mantel to the CC: because he is a great fan of 2940 variants (not to talk about the driver...). The good thing about the issue is that my SCSI harddisk is more valuable for SCSI developers now than for the average users ;-) Ulrich Edited syslog (with shorter lines): ---------------------------------- 22:02:50 restart. 22:03:08 klogd 1.3-0, log source = /proc/kmsg started. 22:03:08 Loaded 4129 symbols from /usr/src/linux/System.map. 22:03:08 Symbols match kernel version. 22:04:11 scsi0 channel 0 : resetting for second half of retries. 22:04:11 SCSI bus is being reset for host 0 channel 0. 22:04:11 Unable to handle kernel paging request at virtual address c5e7024b 22:04:11 current->tss.cr3 = 00101000, hr3 = 00101000 22:04:11 *pde = 00000000 22:04:11 Oops: 0002 22:04:11 CPU: 0 22:04:11 EIP: 0010:[scsi_mark_host_reset+15/28] 22:04:11 EFLAGS: 00010006 22:04:11 eax: 05e70200 ebx: 00000202 ecx: 0060cf24 edx: 00090018 22:04:11 esi: 00008018 edi: 00090410 ebp: 00000001 esp: 001dec98 22:04:11 ds: 0018 es: 0018 fs: 002b gs: 0018 ss: 0018 22:04:11 Process swapper (pid: 0, process nr: 0, stackpage=001dce04) 22:04:11 Stack: 0019a5fb 00008018 00000001 00090410 00000000 00000027 0019a02e 00090410 22:04:11 00000001 001d18dd 00000000 00000000 00000046 00089edc 00008068 00092058 22:04:11 00008068 00000001 00000000 00070000 00008018 001a6401 00090410 0009e1f8 22:04:11 Call Trace: [scsi_reset+399/776] [scsi_done+1162/1672] [aic7xxx_isr+1117/1424] [do_IRQ+45/80] [IRQ11_interrupt+95/144] [hard_idle+31/56] [sys_idle+59/112] 22:04:11 [system_call+85/128] [init+0/656] [start_kernel+429/440] 22:04:11 Code: 80 48 4b c0 8b 52 10 85 d2 75 f2 c3 90 8b 44 24 04 8b 4c 24 22:04:11 Aiee, killing interrupt handler 22:04:11 kfree of non-kmalloced memory: 001dee4c, next= 00000000, order=0 22:04:11 kfree of non-kmalloced memory: 001dee3c, next= 00000000, order=0 22:04:11 kfree of non-kmalloced memory: 001df350, next= 00000000, order=0 22:04:11 idle task may not sleep 22:04:11 elf last message repeated 4 times 00:30:33 restart. 00:33:24 klogd 1.3-0, log source = /proc/kmsg started. 00:33:24 Loaded 4129 symbols from /usr/src/linux/System.map. 00:33:24 Symbols match kernel version. 00:34:00 scsi0 channel 0 : resetting for second half of retries. 00:34:00 SCSI bus is being reset for host 0 channel 0. 00:34:00 Unable to handle kernel paging request at virtual address c5e7024b 00:34:00 current->tss.cr3 = 00101000, hr3 = 00101000 00:34:00 *pde = 00000000 00:34:00 Oops: 0002 00:34:00 CPU: 0 00:34:00 EIP: 0010:[scsi_mark_host_reset+15/28] 00:34:00 EFLAGS: 00010006 00:34:00 eax: 05e70200 ebx: 00000202 ecx: 00559f24 edx: 00090018 00:34:00 esi: 00008018 edi: 00090410 ebp: 00000001 esp: 001dec98 00:34:00 ds: 0018 es: 0018 fs: 002b gs: 0018 ss: 0018 00:34:00 Process swapper (pid: 0, process nr: 0, stackpage=001dce04) 00:34:00 Stack: 0019a5fb 00008018 00000001 00090410 00000000 00000027 0019a02e 00090410 00:34:00 00000001 001d18dd 00000000 00000000 00000046 00089de0 00008068 00092038 00:34:00 00008068 00000001 00000000 00070000 00008018 001a6401 00090410 0009e1f8 00:34:00 Call Trace: [scsi_reset+399/776] [scsi_done+1162/1672] [aic7xxx_isr+1117/1424] [do_IRQ+45/80] [IRQ11_interrupt+95/144] [hard_idle+31/56] [sys_idle+59/112] 00:34:00 [system_call+85/128] [init+0/656] [start_kernel+429/440] 00:34:00 Code: 80 48 4b c0 8b 52 10 85 d2 75 f2 c3 90 8b 44 24 04 8b 4c 24 00:34:00 Aiee, killing interrupt handler 00:34:00 kfree of non-kmalloced memory: 001dee4c, next= 00000000, order=0 00:34:00 kfree of non-kmalloced memory: 001dee3c, next= 00000000, order=0 00:34:00 kfree of non-kmalloced memory: 001df350, next= 00000000, order=0 00:34:00 idle task may not sleep 22:37:12 restart. A gdb session: ------------- (gdb) disass 0x0019a428 Dump of assembler code for function scsi_mark_host_reset: 0x19a428 <scsi_mark_host_reset>: movl 0x4(%esp,1),%eax 0x19a42c <scsi_mark_host_reset+4>: movl 0x10(%eax),%edx 0x19a42f <scsi_mark_host_reset+7>: testl %edx,%edx 0x19a431 <scsi_mark_host_reset+9>: je 0x19a442 <scsi_mark_host_reset+26> 0x19a433 <scsi_mark_host_reset+11>: nop 0x19a434 <scsi_mark_host_reset+12>: movl 0x4(%edx),%eax 0x19a437 <scsi_mark_host_reset+15>: orb $0xc0,0x4b(%eax) 0x19a43b <scsi_mark_host_reset+19>: movl 0x10(%edx),%edx 0x19a43e <scsi_mark_host_reset+22>: testl %edx,%edx 0x19a440 <scsi_mark_host_reset+24>: jne 0x19a434 <scsi_mark_host_reset+12> 0x19a442 <scsi_mark_host_reset+26>: ret 0x19a443 <scsi_mark_host_reset+27>: nop End of assembler dump. I suspect it's not the aic7xxx, I suspect someone else shot some memory with an undetected overflow... Kernel ends (quoting System.map) at: 0020fb7d A _end And I did not configure SCSI generic support -- Should I?
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5CA15F646FE>