Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 21 Aug 1997 09:16:52 +0200
From:      "Ulrich Windl" <ulrich.windl@rz.uni-regensburg.de>
To:        Doug Ledford <dledford@dialnet.net>
Cc:        aic7xxx@freebsd.org, linux-scsi@vger.rutgers.edu, Harald Koenig <koenig@tat.physik.uni-tuebingen.de>, Hubert Mantel <mantel@suse.de>
Subject:   "read defect list" with 2.0.30-pre7 and patch Aug19
Message-ID:  <5CA15F646FE@rkdvmks1.ngate.uni-regensburg.de>

next in thread | raw e-mail | index | archive | help
(My floppy with the Aug19-2 patch had a CRC error, so I had to use 
the Aug19 patch)

The driver (compared to the stock version of 2.0.30) gave a new 
warning about automatic termination being enabled. That is correct 
for my AHA2940 with BIOS 1.21, and it really work, even though I 
don't understand how.

Having enabled the statistics, I found out that I have statistics for
non-existing SCSI IDs and LUNs -- maybe the read was there, but not the
LUN ;-) The question is if you want to support SCSC plug and play,
what condition should you check? At least accesses in two categories?

Despite of that the information given should be much more compact; for
cat /proc/scsi/aic7xxx/0 I got a bunch of:

...possible overflow at loop 0:8
                             0:8
                             1:8
                             0:8
                             1:8
                             2:8
                             0:8
                             1:8
                             2:8
                             0:8
                             1:8
                             2:8

Resource allocation: SHouldn't the driver use a hardware-identifier instead
of a software-identifier when registering resources? Currently the driver
uses generic "aic7xxx", not the actual CHIP, and not the PCI bus & device.
With multiple cards the approach seems ambiguous (talking about /proc/ioports
and /proc/interrupts).


Unfortunately the kernel still bombs out badly, but I was able to get 
at least some information onto a file on my IDE harddisk; I even had 
symbolic information. I added another log to show how consistent the
fault is.

Still, as expected earlier, there seems to be a undetected buffer 
overflow in the kernel that overwrites some SCSI data structures (at 
least). The code of the fault looked OK, but the RAM accesses had 
probably bad values.

I'll add Harald Koenig to the CC:, because he brought up the issue 
with the overflow. Adding Hubert Mantel to the CC: because he is a great
fan of 2940 variants (not to talk about the driver...).

The good thing about the issue is that my SCSI harddisk is more valuable for
SCSI developers now than for the average users ;-)

Ulrich

Edited syslog (with shorter lines):
----------------------------------
22:02:50 restart.
22:03:08 klogd 1.3-0, log source = /proc/kmsg started.
22:03:08 Loaded 4129 symbols from /usr/src/linux/System.map.
22:03:08 Symbols match kernel version.
22:04:11 scsi0 channel 0 : resetting for second half of retries.
22:04:11 SCSI bus is being reset for host 0 channel 0.
22:04:11 Unable to handle kernel paging request at virtual address c5e7024b
22:04:11 current->tss.cr3 = 00101000, hr3 = 00101000
22:04:11 *pde = 00000000
22:04:11 Oops: 0002
22:04:11 CPU:    0
22:04:11 EIP:    0010:[scsi_mark_host_reset+15/28]
22:04:11 EFLAGS: 00010006
22:04:11 eax: 05e70200   ebx: 00000202   ecx: 0060cf24   edx: 00090018
22:04:11 esi: 00008018   edi: 00090410   ebp: 00000001   esp: 001dec98
22:04:11 ds: 0018   es: 0018   fs: 002b   gs: 0018   ss: 0018
22:04:11 Process swapper (pid: 0, process nr: 0, stackpage=001dce04)
22:04:11 Stack: 0019a5fb 00008018 00000001 00090410 00000000 00000027 0019a02e 00090410 
22:04:11        00000001 001d18dd 00000000 00000000 00000046 00089edc 00008068 00092058 
22:04:11        00008068 00000001 00000000 00070000 00008018 001a6401 00090410 0009e1f8 
22:04:11 Call Trace: [scsi_reset+399/776] [scsi_done+1162/1672] [aic7xxx_isr+1117/1424] [do_IRQ+45/80] [IRQ11_interrupt+95/144] [hard_idle+31/56] [sys_idle+59/112] 
22:04:11        [system_call+85/128] [init+0/656] [start_kernel+429/440] 
22:04:11 Code: 80 48 4b c0 8b 52 10 85 d2 75 f2 c3 90 8b 44 24 04 8b 4c 24 
22:04:11 Aiee, killing interrupt handler
22:04:11 kfree of non-kmalloced memory: 001dee4c, next= 00000000, order=0
22:04:11 kfree of non-kmalloced memory: 001dee3c, next= 00000000, order=0
22:04:11 kfree of non-kmalloced memory: 001df350, next= 00000000, order=0
22:04:11 idle task may not sleep
22:04:11 elf last message repeated 4 times
00:30:33 restart.
00:33:24 klogd 1.3-0, log source = /proc/kmsg started.
00:33:24 Loaded 4129 symbols from /usr/src/linux/System.map.
00:33:24 Symbols match kernel version.
00:34:00 scsi0 channel 0 : resetting for second half of retries.
00:34:00 SCSI bus is being reset for host 0 channel 0.
00:34:00 Unable to handle kernel paging request at virtual address c5e7024b
00:34:00 current->tss.cr3 = 00101000, hr3 = 00101000
00:34:00 *pde = 00000000
00:34:00 Oops: 0002
00:34:00 CPU:    0
00:34:00 EIP:    0010:[scsi_mark_host_reset+15/28]
00:34:00 EFLAGS: 00010006
00:34:00 eax: 05e70200   ebx: 00000202   ecx: 00559f24   edx: 00090018
00:34:00 esi: 00008018   edi: 00090410   ebp: 00000001   esp: 001dec98
00:34:00 ds: 0018   es: 0018   fs: 002b   gs: 0018   ss: 0018
00:34:00 Process swapper (pid: 0, process nr: 0, stackpage=001dce04)
00:34:00 Stack: 0019a5fb 00008018 00000001 00090410 00000000 00000027 0019a02e 00090410 
00:34:00        00000001 001d18dd 00000000 00000000 00000046 00089de0 00008068 00092038 
00:34:00        00008068 00000001 00000000 00070000 00008018 001a6401 00090410 0009e1f8 
00:34:00 Call Trace: [scsi_reset+399/776] [scsi_done+1162/1672] [aic7xxx_isr+1117/1424] [do_IRQ+45/80] [IRQ11_interrupt+95/144] [hard_idle+31/56] [sys_idle+59/112] 
00:34:00        [system_call+85/128] [init+0/656] [start_kernel+429/440] 
00:34:00 Code: 80 48 4b c0 8b 52 10 85 d2 75 f2 c3 90 8b 44 24 04 8b 4c 24 
00:34:00 Aiee, killing interrupt handler
00:34:00 kfree of non-kmalloced memory: 001dee4c, next= 00000000, order=0
00:34:00 kfree of non-kmalloced memory: 001dee3c, next= 00000000, order=0
00:34:00 kfree of non-kmalloced memory: 001df350, next= 00000000, order=0
00:34:00 idle task may not sleep
22:37:12 restart.

A gdb session:
-------------
(gdb) disass 0x0019a428
Dump of assembler code for function scsi_mark_host_reset:
0x19a428 <scsi_mark_host_reset>:        movl   0x4(%esp,1),%eax
0x19a42c <scsi_mark_host_reset+4>:      movl   0x10(%eax),%edx
0x19a42f <scsi_mark_host_reset+7>:      testl  %edx,%edx
0x19a431 <scsi_mark_host_reset+9>:
    je     0x19a442 <scsi_mark_host_reset+26>
0x19a433 <scsi_mark_host_reset+11>:     nop
0x19a434 <scsi_mark_host_reset+12>:     movl   0x4(%edx),%eax
0x19a437 <scsi_mark_host_reset+15>:     orb    $0xc0,0x4b(%eax)
0x19a43b <scsi_mark_host_reset+19>:     movl   0x10(%edx),%edx
0x19a43e <scsi_mark_host_reset+22>:     testl  %edx,%edx
0x19a440 <scsi_mark_host_reset+24>:
    jne    0x19a434 <scsi_mark_host_reset+12>
0x19a442 <scsi_mark_host_reset+26>:     ret
0x19a443 <scsi_mark_host_reset+27>:     nop
End of assembler dump.

I suspect it's not the aic7xxx, I suspect someone else shot some memory with
an undetected overflow...

Kernel ends (quoting System.map) at: 0020fb7d A _end

And I did not configure SCSI generic support -- Should I?



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5CA15F646FE>