Date: Mon, 5 Feb 1996 13:19:22 +0100 (GMT-1:00) From: "Jesus A. Mora Marin" <amora@obelix.cica.es> To: undisclosed-recipients:; Message-ID: <199602051219.NAA11830@obelix.cica.es>
next in thread | raw e-mail | index | archive | help
Hi. Weekend came, so time to work on what I do like (no more code-grinding stupid apps in a brain-damaged 4GL five days a week, no more telling lusers that ttys use to work faster when plugged. Barfulous, but have to earn my life). I have compiled again my custom kernel with `config -g', backed up a version with the debug info (more than 6MB long!), set the dumpdev option, rebooted with the new stripped kernel and forced the crash. This time all worked right and I could savecore and run a `gdb -k' session on the crash dump. Here you are: (kgdb) symbol-file kernel.debug Reading symbols from kernel.debug...done. (kgdb) exec-file /usr/crash/kernel.0 # My /var fs is only 10MB -enough for me- (kgdb) core-file /usr/crash/vmcore.0 IdlePTD 1a1000 current pcb at 195374 panic: page fault #0 boot (howto=256) at ../../i386/i386/machdep.c:892 892 dumppcb.pcb_ptd = rcr3(); (kgdb) bt #0 boot (howto=256) at ../../i386/i386/machdep.c:892 #1 0xf0112aa3 in panic (fmt=0xf016b6fc "page fault") at ../../kern/subr_prf.c:124 #2 0xf016c1ee in trap_fatal (frame=0xf0189f18) at ../../i386/i386/trap.c:745 #3 0xf016bd60 in trap_pfault (frame=0xf0189f18, usermode=0) at ../../i386/i386/trap.c:667 #4 0xf016b9ff in trap (frame={tf_es = 16, tf_ds = -252706800, tf_edi = 0, tf_esi = -266780848, tf_ebp = -266821732, tf_isp = -266876354, tf_ebx = 85, tf_edx = 560, tf_ecx = 561, tf_eax = -236834816, tf_trapno = 12, tf_err = 2, tf_eip = -266876354, tf_cs = -267255800, tf_eflags = 66118, tf_esp = 144, tf_ss = -266876864}) at ../../i386/i386/trap.c:307 #5 0xf0164c9d in calltrap () #6 0xf017ca3e in matcd_blockread (state=144) at ../../i386/isa/matcd/matcd.c:2043 #7 0xf0107110 in softclock () at ../../kern/kern_clock.c:654 #8 0xf0165ff7 in doreti_swi () #9 0xf016b3ec in cpu_switch () (kgdb) up 4 #4 0xf016b9ff in trap (frame={tf_es = 16, tf_ds = -252706800, tf_edi = 0, tf_esi = -266780848, tf_ebp = -266821732, tf_isp = -266876354, tf_ebx = 85, tf_edx = 560, tf_ecx = 561, tf_eax = -236834816, tf_trapno = 12, tf_err = 2, tf_eip = -266876354, tf_cs = -267255800, tf_eflags = 66118, tf_esp = 144, tf_ss = -266876864}) at ../../i386/i386/trap.c:307 307 (void) trap_pfault(&frame, FALSE); (kgdb) frame frame->tf_ebp frame->tf_eip #0 0xf017ca3e in matcd_blockread (state=144) at ../../i386/isa/matcd/matcd.c:2043 2043 *addr++=inb(port+DATA); (kgdb) list # Modified by hand. Not compiled lines deleted. 2033 addr=bp->b_un.b_addr + mbx->skip; 2039 if (iftype==0) { /*<20>Creative host I/F*/ 2040 outb(port+PHASE,1); /*Enable data read*/ 2041 while((inb(port+STATUS) & 2042 (DTEN|STEN))==STEN) {/*<19>*/ 2043 *addr++=inb(port+DATA); ^^^^^^^ 2047 } 2048 outb(port+PHASE, 0); /* Disable read */ (kgdb) print addr $1 = 0xf1e23000 <Address 0xf1e23000 out of bounds> (kgdb) print bp $2 = (struct buf *) 0xf0f04858 (kgdb) print bp->b_un.b_addr $3 = 0xf1e22800 ..... (lot of pretty struct fields) (kgdb) print mbx->skip $4 = 0 (kgdb) quit For the sake of completion, some other vars inspected in the same session: mbx->nblk 1 mbx->partition 0 mbx->sz 2048 cd->partflags[mbx->partition] 1 iftype 0 i 85 blknum 0xf1e23000 ldrive 0 cdrive 0 port 560 (0x230) cmd { 0x00, 0xfc, 0x62, 0xf0, 0x00, 0xfc, 0x62, 0xf0, 0x00, 0xf6, 0x62, 0xf0 } phase 0 state 0 (it was 0x90 at the start of function) That is, the offending source line is exactly the one pointed out by J"org Wunsch in msg <199601301131.MAA13737@uriah.heep.sax.de>. Further, at the start of the read loop, `addr' pointed to 0xf1e22800 (bp->b_un.b_addr + mbx->skip), and at the time of panic it was pointing to an address 2048 bytes up, i.e. the loop was reading the 2049th byte of the block. The problem seems to be hardware/firmware related (mbx values denote that a 2048 byte block was to be read, I guess). For some reason, the drive wants to transfer more than the 2KB expected. > The only way that more than 2048 bytes could be read (assuming no drive > malfunction) is if the "c" partition was opened, which causes the drive > to read 2532 byte sectors. This is intentional. Partition "c" must > never be mounted. Er... I have NEVER tried to mount a "c" partition. I've read manpage for matcd and was warned against. > .... It is possible that the firmware he has in the system > has a bug and the drive returns more data than it should, which could cause > a GPF, but such an action would break the Windows driver that also reads > bytes until the drive says "All Done" as matcd does. In fact, there aren't problems with Windows, nor Linux. > I was planning to change the way this loop was done anyway to improve > speed (the 6x TEAC drive bogs badly here) and simply pull in the expected > number of bytes and deal with any excess later, but I really can't blame > the current implementation as the cause of this reported failure since it > won't fail here. According to this idea I have patched the read loop, keeping a count of the bytes read in a block and `breaking' out of the loop when it reaches the expected 2048 bytes, and simply ignoring the excess. I am not sure this is a convenient solution but it has proved to work great. Yes, the panic has gone, although I wouldn't bet that this crude approach cannot raise some unexpected and `creative' problems later :). For example, something must be done to deal with raw blocks. Frank, I think that, if you've implemented the driver this way, you'd have your reasons. So, any clue to deal with the nasty behaviour of my hardware? I tell you, now this simplistic patch works fine for accessing data in a mounted fs on CD-ROM -enough for me-, but... > Again, I do not have the precise drive firmware revision Jesus has. I have reviewed the text of my original report, and have seen I stated: Creative Lab, Matsushita/Panasonic CR-563-x 0.81. Oops! Sorry, this is the signature of the MS-DOS CD-ROM driver. More details about this: my CD-ROM drive is a Creative Lab CR-563-B, manufactured in January '95. I cannot extract more info from the messy bunch of numbers and codes in the labels on the drive. It is connected to the socket in a Creative Labs SoundBlaster AWE 32, model CT2760. The board has a signature '02 95' on it. Extract your own conclusions. (NB: -You CAN skip this comment- I know at least two CR-563 drives that died miserably because of a broken design in the heatsinking of the IC that drives the spindle motor. They simply stopped working without even some smoke to let you know they had joined their ancestors. I was told that some changes were introduced in the hardware of model CR563B, but nobody is sure whether this has been fixed -so you couldn't :) That's all. Comments, ideas, suggestions about this question will be highly appreciated. Thanks. Jesus
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199602051219.NAA11830>