Date: Sun, 14 Aug 2005 21:41:38 +0200 From: =?ISO-8859-1?Q?S=F8ren_Schmidt?= <sos@FreeBSD.org> To: Chris@LainOS.org Cc: freebsd-current@FreeBSD.org Subject: Re: Panic during install on Sparc64 - Only with large HDD Message-ID: <DDA34AD5-6279-4E7F-B40E-2537389591CE@FreeBSD.org> In-Reply-To: <200508142016.17769.Chris@LainOS.org> References: <200508132321.37654.Chris@LainOS.org> <200508142016.17769.Chris@LainOS.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 14/08/2005, at 20:16, Chris Gilbert wrote: > Also, it seems that setting hw.ata.ata_dma=3D0 (forcing it into PIO =20= > mode) fixes > the issue. > > # sysctl -a hw.ata.ata_dma > hw.ata.ata_dma: 0 > > # dd count=3D1 obs=3D1024 seek=3D93321656 if=3D/dev/urandom = of=3D/dev/ad0g > 1+0 records in > 0+1 records out > 512 bytes transferred in 0.001390 secs (368351 bytes/sec) > > Also, seems there is a bug summitted on this, and a posting to the > freebsd-sparc64 mailing list. > > http://lists.freebsd.org/pipermail/freebsd-sparc64/2005-June/=20 > 003212.html > > Will continue looking into the chipset docs and FreeBSD driver... =20 > but thought > I should point this out. Actually the problem is in the Acer chip, it cant handle 48bit =20 addressing in DMA mode, unless the version is above 0xc4 IIRC. Either you should use disks with a size less137GB, or you need to =20 engage PIO mode. A workaround in ATA could be to use PIO mode when crossing the =20 boundary, but there is no framework for quirks like that present yet, =20= could be pretty easily done though so give a me few days (I'm busy as =20= usual) -S=F8ren > > --=20 > Thanks, > Chris (Lance) Gilbert > Ph: +45 33 73 29 31 (UTC +0100) > > On Saturday 13 August 2005 23:21, Chris Gilbert wrote: > >> Well, I've continued looking into this problem as I really =20 >> _really_ want to >> see it fixed for 6.0-RELEASE. >> >> I did some general device stress-testing to make sure that is was =20 >> directly >> triggerable and reproducible, and was not just an intermittent =20 >> failure. >> >> I have successfully created, and installed FreeBSD on (without any =20= >> errors): >> >> /dev/ad0a >> /dev/ad0b >> /dev/ad0c >> /dev/ad0d >> /dev/ad0e >> /dev/ad0f >> >> Even though the newfs on it failed, creating the slice itself =20 >> worked for my >> large partition (/dev/ad0g). >> >> Therefore, I can dd data to it, but I can't write a UFS filesystem =20= >> to it in >> order to mount. >> >> I then went about writing data to this filesystem for long periods =20= >> of time >> to try and hit the problem: >> >> # time dd if=3D/dev/urandom of=3D/dev/ad0g >> 143337401+0 records in >> 143337401+0 records out >> 73388749312 bytes transferred in 89392.318911 secs (820974 bytes/sec) >> 614.444u 41826.640s 24:49:52.35 47.4% 244+1708k 0+0io 0pf+0w >> >> After this ran without a single error for about 20 hours, I =20 >> stopped it and >> started trying to hit the block that triggered the issue manually. >> >> After a few hours of "double and half(ing) " I finally managed to =20 >> find the >> block: >> >> # dd count=3D1 obs=3D1024 seek=3D93321655 if=3D/dev/urandom = of=3D/dev/ad0g >> 1+0 records in >> 0+1 records out >> 512 bytes transferred in 0.001470 secs (348278 bytes/sec) >> >> This one was successful... but the very next one: >> >> # dd count=3D1 obs=3D1024 seek=3D93321656 if=3D/dev/urandom = of=3D/dev/ad0g >> ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=3D268435456 >> ad0: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=3D268435456 >> ad0: FAILURE - WRITE_DMA timed out LBA=3D268435456 >> dd: /dev/ad0g: Input/output error >> 1+0 records in >> 0+0 records out >> 0 bytes transferred in 16.453833 secs (0 bytes/sec) >> >> And incrementing this by one block shows: >> >> # dd count=3D1 obs=3D1024 seek=3D93321657 if=3D/dev/urandom = of=3D/dev/ad0g >> ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=3D268435458 >> ad0: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=3D268435458 >> ad0: FAILURE - WRITE_DMA timed out LBA=3D268435458 >> dd: /dev/ad0g: Input/output error >> 1+0 records in >> 0+0 records out >> 0 bytes transferred in 16.452722 secs (0 bytes/sec) >> >> This makes perfect sense because my block size is specified at =20 >> 1024 on the >> dd command, and the default blocksize is 512. Therefore, =20 >> incrementing it by >> a single 1024 size block would return 2 blocks further in the LBA. >> >> ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=3D268435456 >> (then...) >> ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=3D268435458 >> >> Bingo! We've finally found the wall! >> >> I'm going to look further into the IDE chipset (atapci0: <AcerLabs =20= >> M5229 >> UDMA66 controller>) tonight. Both for it's whitepapers (To see if =20 >> it has >> some sort of quirk or limitation around this area.) and it's FreeBSD >> driver, to see if something funky is going on. >> >> As I said before, if anyone is interesting in helping me resolve =20 >> this I >> would appreciate it greatly. This is a bug which has haunted me =20 >> and several >> others since FreeBSD 5.2-RC2 and it needs to be fixed. >> > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-=20 > unsubscribe@freebsd.org" > > - S=F8ren
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?DDA34AD5-6279-4E7F-B40E-2537389591CE>