Date: Tue, 28 Jun 2005 22:58:04 +1000 From: Stephen McKay <smckay@internode.on.net> To: Peter Jeremy <PeterJeremy@optushome.com.au> Cc: stable@freebsd.org, Stephen McKay <smckay@internode.on.net> Subject: Re: Data corruption in cd9660 on FreeBSD 4.11? Message-ID: <200506281258.j5SCw46l010235@dungeon.home> In-Reply-To: <20050624213433.GA50157@cirb503493.alcatel.com.au> from Peter Jeremy at "Sat, 25 Jun 2005 07:34:33 %2B1000" References: <200506241231.j5OCV6jp047730@dungeon.home> <20050624213433.GA50157@cirb503493.alcatel.com.au>
next in thread | previous in thread | raw e-mail | index | archive | help
I haven't finished all the suggested tests, but since I'm taking so long to do so, I thought I should send what I have so far. On Saturday, 25th June 2005, Peter Jeremy wrote: >On Fri, 2005-Jun-24 22:31:06 +1000, Stephen McKay wrote: >>I'm experiencing data corruption when reading CDs and DVDs on FreeBSD 4.11. >... >>So, can anyone suggest any more tests I could try? Or is there a kind of >>hardware fault that could cause this substitution of whole blocks read from >>CDs without causing any other problems? > >You might like to post the relevant sections of a verbose boot - the >ATA and CD probes. I've appended it to this messages, so that the flow is not ruined. Note that I am not currently using ATAPI-CAM for my tests. I am using /dev/acd0a and /dev/acd1a to mount the CDs in the DVD-ROM and DVD-R respectively. Also the "non-ATA66 cable" thing is true; it is a plain ATA33 cable. >Are you running the CD/DVD drives in PIO or UDMA modes? I normally run both DVD drives at UDMA33. My test runs normally fail every 2nd or 3rd run. I've seen it do 5 OK runs in a row once though, so I don't yet have a very good test. I tested with PIO4 and ran 12 consecutive test runs without error. It was a little slower at 150 seconds per run instead of the normal 135, possibly because 75% to 80% of the cpu was dedicated to interrupt handling (doing pio, I assume). It seems that either DMA or ATA interrupts (or maybe both) are required to cause the problem. Also, I tried some tests with the noclusterr mount option on the CD. The test ran much slower (approx 232 seconds instead of 135) but I also saw no failures (with only 6 test runs though as I was pressed for time). The noclusterr option is interesting because it defeats read clustering resulting in the ATA driver issuing only 2K reads instead of up to 64K at a time. I assume that the 64K reads would require scatter-gather DMA, so maybe this is relevant to the problem. Oddly, I noticed that a fixed value of 65534 is found in atapi-all.c as a request size limit. No, not 65536 = 2^16, but 2 bytes less. Puzzling. >Have you tried anything other than ISO9660 filesystems on a physical CD? I have not tried anything but cd9660 file systems on CDs and DVDs. I will see if I can build a UFS file system to test with, when I get a chance. >What happens if you just dd the CD-ROM? When I dd the CD-ROM it seems to work correctly. I have done this only infrequently however, so I may just be lucky to not have had a failure. I've now done 6 consecutive dd reads of my test CD-ROM in UDMA33 mode with no errors. It only takes 125 seconds, so it's a bit faster than comparing directory trees. Only 6 tests isn't many, so I'll do more later, this time with other system activity. >What happens if you use a vnode >mount (see vnconfig(8)) of an ISO filesystem sitting in a UFS filesystem? I'll test this when I get a chance. >Anything unusual in your kernel config file? Nothing too unusual. I'm running a uni-processor kernel with HTT disabled. I skimmed through my config and this is the only interesting thing: HZ=500 I don't think that's too dangerous. On the other hand, it does increase the rate of interrupts, and if there's a race somewhere, it may make it worse. >Have you tried building a kernel with WITNESS and/or DIAGNOSTIC? I'm now running with INVARIANTS, INVARIANT_SUPPORT, and DIAGNOSTIC on 4.11. No change in the failure rate and no significant slowdown either. >Any chance of you repeating the tests with a 5.x system? Maybe >on a spare small partition or using a 5.4-RELEASE disk1 as a live >filesystem. I was experimenting with current in late April, so I installed that drive for testing. So far, I have not been able to reproduce the failure on April's current though I've only had time for a quick run of 6 repetitions. Current takes the same time (135 seconds, on average) to read and compare the CD. That seems good, considering all the debugging is still enabled. I'm pretty sure that ATA MK III is in this kernel. Sadly, it panics immediately if I run "atacontrol mode 1" so I'm just assuming it is running in DMA mode by the speed of it. (And I have hw.ata.atapi_dma=1 in /boot/loader.conf). That's where I'm up to so far in stress testing. Right now I'm trying to understand some unusual looking code in ata_dmasetupd_cb() in 4.11's ata-dma.c. The attached comment is "A maximum segment size was specified for bus_dma_tag_create, but some busdma code does not seem to honor this, so fix up if needed." The "fix-up" code seems to be gone in current, so it looks suspicious to me. When I work out what it does, I'll report back. Stephen. ------------------------------------------------------------------ Verbose boot of 4.11-p10 (the ata related parts, at least): atapci0: <Intel ICH5 ATA100 controller> port 0xfc00-0xfc0f,0-0x3,0-0x7,0-0x3,0-0x7 irq 0 at device 31.1 on pci0 ata0: iobase=0x01f0 altiobase=0x03f6 bmaddr=0xfc00 ata0: mask=03 ostat0=50 ostat2=00 ata0-master: ATAPI 00 00 ata0-slave: ATAPI 00 00 ata0: mask=03 stat0=50 stat1=00 ata0-master: ATA 01 a5 ata0: devices=01 ata0: at 0x1f0 irq 14 on atapci0 ata1: iobase=0x0170 altiobase=0x0376 bmaddr=0xfc08 ata1: mask=03 ostat0=50 ostat2=50 ata1-slave: ATAPI 14 eb ata1-master: ATAPI 14 eb ata1: mask=03 stat0=00 stat1=00 ata1: devices=0c ata1: at 0x170 irq 15 on atapci0 pci0: <unknown card> (vendor=0x8086, dev=0x24d3) at 31.3 irq 10 ata-: ata0 exists, using next available unit number ata-: ata1 exists, using next available unit number Trying Read_Port at 203 Trying Read_Port at 243 Trying Read_Port at 283 Trying Read_Port at 2c3 Trying Read_Port at 303 Trying Read_Port at 343 Trying Read_Port at 383 Trying Read_Port at 3c3 isa_probe_children: disabling PnP devices isa_probe_children: probing non-PnP devices orm0: <Option ROM> at iomem 0xc0000-0xcefff on isa0 pmtimer0 on isa0 fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 fdc0: FIFO enabled, 8 bytes threshold fd0: <1440-KB 3.5" drive> on fdc0 drive 0 ata2 failed to probe at port 0x1f0 irq 14 on isa0 ata3 failed to probe at port 0x170 irq 15 on isa0 [Irrelevant stuff snipped] ad0: success setting UDMA5 on Intel chip Creating DISK ad0 ar: FreeBSD check1 failed ad0: <Maxtor 6Y120L0/YAR41VW0> ATA-7 disk at ata0-master ad0: 117246MB (240121728 sectors), 238216 C, 16 H, 63 S, 512 B ad0: 16 secs/int, 1 depth queue, UDMA100 ad0: piomode=4 dmamode=2 udmamode=6 cblid=1 ata1-master: piomode=4 dmamode=2 udmamode=2 dmaflag=1 ata1-master: success setting UDMA2 on Intel chip acd0: <Pioneer DVD-ROM ATAPIModel DVD-103S 0116/E1.16> DVD-ROM drive at ata1 as master acd0: 512KB buffer, UDMA33 acd0: Reads: CD-R, CD-RW, CD-DA stream, DVD-ROM, DVD-R acd0: Writes: acd0: Audio: play, 256 volume levels acd0: Mechanism: ejectable tray, unlocked acd0: Medium: no/blank disc ata1-slave: piomode=4 dmamode=2 udmamode=4 dmaflag=1 ata1-slave: DMA limited to UDMA33, non-ATA66 cable or device ata1-slave: success setting UDMA2 on Intel chip acd1: <LITE-ON DVDRW SOHW-1673S/JS01> DVD-R drive at ata1 as slave acd1: read 8268KB/s (4782KB/s) write 8268KB/s (8268KB/s), 2048KB buffer, UDMA33 acd1: Reads: CD-R, CD-RW, CD-DA stream, DVD-ROM, DVD-R, packet acd1: Writes: CD-R, CD-RW, DVD-R, test write, burnproof acd1: Audio: play, 256 volume levels acd1: Mechanism: ejectable tray, unlocked acd1: Medium: no/blank disc (probe2:ata1:0:0:0): INQUIRY. CDB: 12 1 80 0 ff 0 (probe2:ata1:0:0:0): ILLEGAL REQUEST asc:24,0 (probe2:ata1:0:0:0): Invalid field in CDB (probe3:ata1:0:1:0): INQUIRY. CDB: 12 1 80 0 ff 0 (probe3:ata1:0:1:0): ILLEGAL REQUEST asc:24,0 (probe3:ata1:0:1:0): Invalid field in CDB (probe2:ata1:0:0:0): TEST UNIT READY. CDB: 0 0 0 0 0 0 (probe2:ata1:0:0:0): NOT READY asc:3a,0 (probe2:ata1:0:0:0): Medium not present (probe3:ata1:0:1:0): TEST UNIT READY. CDB: 0 0 0 0 0 0 (probe3:ata1:0:1:0): NOT READY asc:3a,0 (probe3:ata1:0:1:0): Medium not present (probe2:ata1:0:0:0): INQUIRY. CDB: 12 1 80 0 ff 0 (probe2:ata1:0:0:0): ILLEGAL REQUEST asc:24,0 (probe2:ata1:0:0:0): Invalid field in CDB (probe3:ata1:0:1:0): INQUIRY. CDB: 12 1 80 0 ff 0 (probe3:ata1:0:1:0): ILLEGAL REQUEST asc:24,0 (probe3:ata1:0:1:0): Invalid field in CDB Creating DISK cd0 Creating DISK cd1 pass0 at ata1 bus 0 target 0 lun 0 pass0: <PIONEER DVD-ROM DVD-103 1.16> Removable CD-ROM SCSI-0 device pass0: 33.000MB/s transfers pass1 at ata1 bus 0 target 1 lun 0 pass1: <LITE-ON DVDRW SOHW-1673S JS01> Removable CD-ROM SCSI-0 device pass1: 33.000MB/s transfers Mounting root from ufs:/dev/ad0s2a (cd0:ata1:0:0:0): READ CD RECORDED CAPACITY. CDB: 25 0 0 0 0 0 0 0 0 0 (cd0:ata1:0:0:0): NOT READY asc:3a,0 (cd0:ata1:0:0:0): Medium not present cd0 at ata1 bus 0 target 0 lun 0 cd0: <PIONEER DVD-ROM DVD-103 1.16> Removable CD-ROM SCSI-0 device cd0: 33.000MB/s transfers cd0: Attempt to query device size failed: NOT READY, Medium not present (cd1:ata1:0:1:0): READ CD RECORDED CAPACITY. CDB: 25 0 0 0 0 0 0 0 0 0 (cd1:ata1:0:1:0): NOT READY asc:3a,0 (cd1:ata1:0:1:0): Medium not present cd1 at ata1 bus 0 target 1 lun 0 cd1: <LITE-ON DVDRW SOHW-1673S JS01> Removable CD-ROM SCSI-0 device cd1: 33.000MB/s transfers cd1: Attempt to query device size failed: NOT READY, Medium not present ad0s1: type 0x7, start 63, end = 12289724, size 12289662 : OK ad0s2: type 0xa5, start 12289725, end = 22523129, size 10233405 : OK ad0s3: type 0x83, start 22523130, end = 30716279, size 8193150 ad0s3: C/H/S start 1023/0/1 (16434495) != start 22523130: invalid ad0s4: type 0x5, start 30716280, end = 240107489, size 209391210 ad0s4: C/H/S start 1023/0/1 (16434495) != start 30716280: invalid ad0s5: type 0x82, start 30716343, end = 32820794, size 2104452 ad0s5: C/H/S start 1023/1/1 (16434558) != start 30716343: invalid ad0<extended>: type 0x5, start 32820795, end = 41013944, size 8193150 ad0<extended>: C/H/S start 1023/0/1 (16434495) != start 32820795: invalid ad0s6: type 0x83, start 32820858, end = 41013944, size 8193087 ad0s6: C/H/S start 1023/1/1 (16434558) != start 32820858: invalid ad0<extended>: type 0x5, start 41013945, end = 171526004, size 130512060 ad0<extended>: C/H/S start 1023/0/1 (16434495) != start 41013945: invalid ad0s7: type 0x7, start 41014008, end = 171526004, size 130511997 ad0s7: C/H/S start 1023/1/1 (16434558) != start 41014008: invalid ad0<extended>: type 0x5, start 171526005, end = 240107489, size 68581485 ad0<extended>: C/H/S start 1023/0/1 (16434495) != start 171526005: invalid ad0s8: type 0xa5, start 171526068, end = 240107489, size 68581422 ad0s8: C/H/S start 1023/1/1 (16434558) != start 171526068: invalid ------------------------------------------------------------------
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200506281258.j5SCw46l010235>