Date: Sun, 25 May 2008 13:34:16 +0200 From: "Martin Laabs" <martin.laabs@mailbox.tu-dresden.de> To: "Bruce Evans" <brde@optusnet.com.au> Cc: "freebsd-gnats-submit@freebsd.org" <freebsd-bugs@freebsd.org> Subject: Re: misc/123939: msdosfs corruptes new files Message-ID: <op.ubpjreaw724k7f@martin> In-Reply-To: <20080525100023.D17089@besplex.bde.org> References: <200805231916.m4NJGVXP001708@www.freebsd.org> <20080524134012.L69478@delplex.bde.org> <op.ubnw4xiy724k7f@martin> <20080525100023.D17089@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi, > This thread switched to private mail. Did you mean that? I don't mind, > but sometimes useful PR info gets lost because it is not public. Oh no - this was not my intention. Now I added the two CC's. For that I fullquoted the last mail. Using "cmp -lx" directly with the device does not work - I think be- cause of the wrong block size. Under the assumtion that writing and reading with bs=2k does work propperly I tried to discover whether read, write or both are affec- ted of the bug. su:~$ dd if=/boot/kernel/kernel of=/dev/da4 bs=2k count=200 200+0 records in 200+0 records out 409600 bytes transferred in 1.399798 secs (292614 bytes/sec) su:~$ dd if=/dev/da4 bs=2k|cmp -lx /boot/kernel/kernel - |head -n 15 00064000 f6 55 00064001 46 89 00064002 30 e5 [...] This is OK since I wrote exactly 0x64000 bytes. Now I tried 4k and 8k which worked also fine. With a blocksize of 10k I get a missmatch at adress 0: su:~$ dd if=/dev/da4 bs=10k|cmp -lx /boot/kernel/kernel - |head -n 15 00000000 7f 1d 00000001 45 04 00000002 4c 00 00000003 46 00 00000004 01 c4 00000005 01 16 00000006 01 00 00000007 09 00 [...] I tried to discover the offset of the data that is read with bs>8k and bs<=16k. It is exactly 0x2000 (8k). With bs>16k, bs<=24k the offset is 0x4000, with bs>24k, bs<=32 it is 0x6000. Until now I only checked the data around address 0. Now the writing experiment: As seen above bs=2k is working OK. Now I try 4k: su:~$ dd if=/boot/kernel/kernel of=/dev/da4 bs=4k count=100 100+0 records in 100+0 records out 409600 bytes transferred in 1.002872 secs (408427 bytes/sec) su:~$ dd if=/dev/da4 bs=2k|cmp -lx /boot/kernel/kernel - |head -n 15 00064000 f6 00 00064001 46 00 00064002 30 00 [...] And now 8k: su:~$ dd if=/boot/kernel/kernel of=/dev/da4 bs=8k count=50 50+0 records in 50+0 records out 409600 bytes transferred in 0.748899 secs (546936 bytes/sec) su:~$ dd if=/dev/da4 bs=2k|cmp -lx /boot/kernel/kernel - |head -n 15 00064000 f6 00 00064001 46 00 00064002 30 00 [...] Both are OK. With bs of 10k I get the first byte mismatch at 0x2000 (8k) su:~$ dd if=/boot/kernel/kernel of=/dev/da4 bs=10k count=40 40+0 records in 40+0 records out 409600 bytes transferred in 0.699931 secs (585201 bytes/sec) su:~$ dd if=/dev/da4 bs=2k|cmp -lx /boot/kernel/kernel - |head -n 15 00002000 1d 7f 00002001 04 45 00002002 00 4c 00002003 00 46 00002004 c4 01 00002005 16 01 [...] The offset of the readback data is -0x2000. This means the data at 0x2000 on the stick should be orginally at 0x0. Since it *is* already there (cmp did not report any difference between the file and the first 0x2000 bytes) it is the second time there. This means that the data that would be origina- lly at 0x2000 is lost. The length of this "discontinuity" is 0x800 with not really regular spacings. (writing bs was 10k) su:~$ dd if=/dev/da4 bs=2k|cmp -lx /boot/kernel/kernel - |less 00002000 1d 7f 00002001 04 45 00002002 00 4c [...] 000027f9 13 00 000027fc 00 49 000027fd 00 1e 00004800 de 00 00004801 0f 00 00004804 00 4f [...] 00004ff9 00 02 00004ffc 00 3d 00004ffd 00 1e 00007000 00 bc 00007001 00 15 00007004 a0 be [...] 000077f8 8a 83 000077f9 0f 1a 000077fc 12 dd 00009800 00 90 00009801 00 1d 00009804 00 83 So far, Martin --------------------:<---------------------------- >>> This is probably a bug in the umass or da driver. da claims to support >>> i/o's >>> of DFLTPHYS = 64K, so lower level drivers must support this even if the >>> hardware doesn't, but apparently some usb drives have a lower limit. >> >> Hey - you are right. First I tried direct copy with bs=2k (which >> is the sector size of that device.) This was OK: >> >> u:~$ dd if=/boot/kernel/kernel of=/dev/da7 bs=2k >> dd: /dev/da7: Invalid argument >> 4501+1 records in >> 4501+0 records out >> 9218048 bytes transferred in 31.502305 secs (292615 bytes/sec) > It's another bug that gives the EINVAL error for writing at EOF. > This complicates debugging a little. I think the disk size is not > a multiple of the block size (2K here), so the last block would > strictly cross the boundary at the end of the disk, and none of > it is written, but the error handling would be different/better > if the block were at the boundary, and maybe different/worse if > the block were strictly beyond the boundary. For larger blocks, > the last one would be more likely to strictly cross the boundary. > So just note the error above so as to ignore similar errors for > larger blocks. > >> su:~$ dd if=/boot/kernel/kernel bs=2k count=4501 of=test.fs >> 4501+0 records in >> 4501+0 records out >> 9218048 bytes transferred in 0.134802 secs (68382200 bytes/sec) > > No errors since it didn't go near EOF for either the input or output. > >> su:~$ diff test.umass test.fs >> >> Now I tried this with a block size of 128k and it did not work >> anymore: >> >> su:~$ dd if=/boot/kernel/kernel of=/dev/da7 bs=128k >> dd: /dev/da7: Invalid argument >> 70+1 records in >> 70+0 records out >> 9175040 bytes transferred in 12.484369 secs (734922 bytes/sec) > > Better write only 70 blocks to avoid secondary errors. I think you > eliminated the secondary error above by checking only 70 blocks later. > >> su:~$ dd if=/dev/da7 of=test.umass bs=128k count=70 >> 70+0 records in >> 70+0 records out >> 9175040 bytes transferred in 9.297371 secs (986842 bytes/sec) >> >> su:~$ dd if=/boot/kernel/kernel of=test.fs bs=128k count=70 >> 70+0 records in >> 70+0 records out >> 9175040 bytes transferred in 0.127474 secs (71975736 bytes/sec) >> >> su:~$ diff test.umass test.fs >> Files test.umass and test.fs differ > > Use cmp -lx to locate the error(s), especially the first one (expect > a lot). Copying from the disk using dd is good for eliminating > secondary errors, but cmp -lx directly on the disk should work for > a quick check, with a better chance of working than for diff. (It > depends on whether cmp's block size working (the block size only needs > to be a multiple of the sector size, with no partial block at EOF) and > not being large enough to cause the suspected error on input. The > suspected bug may affect input, output, or both. I suspect both.) > > Bruce >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?op.ubpjreaw724k7f>
