Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 14 Aug 2005 21:41:38 +0200
From:      =?ISO-8859-1?Q?S=F8ren_Schmidt?= <sos@FreeBSD.org>
To:        Chris@LainOS.org
Cc:        freebsd-current@FreeBSD.org
Subject:   Re: Panic during install on Sparc64 - Only with large HDD
Message-ID:  <DDA34AD5-6279-4E7F-B40E-2537389591CE@FreeBSD.org>
In-Reply-To: <200508142016.17769.Chris@LainOS.org>
References:  <200508132321.37654.Chris@LainOS.org> <200508142016.17769.Chris@LainOS.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On 14/08/2005, at 20:16, Chris Gilbert wrote:

> Also, it seems that setting hw.ata.ata_dma=3D0 (forcing it into PIO =20=

> mode) fixes
> the issue.
>
> # sysctl -a hw.ata.ata_dma
> hw.ata.ata_dma: 0
>
> # dd count=3D1 obs=3D1024 seek=3D93321656 if=3D/dev/urandom =
of=3D/dev/ad0g
> 1+0 records in
> 0+1 records out
> 512 bytes transferred in 0.001390 secs (368351 bytes/sec)
>
> Also, seems there is a bug summitted on this, and a posting to the
> freebsd-sparc64 mailing list.
>
> http://lists.freebsd.org/pipermail/freebsd-sparc64/2005-June/=20
> 003212.html
>
> Will continue looking into the chipset docs and FreeBSD driver... =20
> but thought
> I should point this out.

Actually the problem is in the Acer chip, it cant handle 48bit =20
addressing in DMA mode, unless the version is above 0xc4 IIRC.

Either you should use disks with a size less137GB, or you need to =20
engage PIO mode.

A workaround in ATA could be to use PIO mode when crossing the =20
boundary, but there is no framework for quirks like that present yet, =20=

could be pretty easily done though so give a me few days (I'm busy as =20=

usual)

-S=F8ren

>
> --=20
> Thanks,
> Chris (Lance) Gilbert
> Ph: +45 33 73 29 31 (UTC +0100)
>
> On Saturday 13 August 2005 23:21, Chris Gilbert wrote:
>
>> Well, I've continued looking into this problem as I really =20
>> _really_ want to
>> see it fixed for 6.0-RELEASE.
>>
>> I did some general device stress-testing to make sure that is was =20
>> directly
>> triggerable and reproducible, and was not just an intermittent =20
>> failure.
>>
>> I have successfully created, and installed FreeBSD on (without any =20=

>> errors):
>>
>> /dev/ad0a
>> /dev/ad0b
>> /dev/ad0c
>> /dev/ad0d
>> /dev/ad0e
>> /dev/ad0f
>>
>> Even though the newfs on it failed, creating the slice itself =20
>> worked for my
>> large partition (/dev/ad0g).
>>
>> Therefore, I can dd data to it, but I can't write a UFS filesystem =20=

>> to it in
>> order to mount.
>>
>> I then went about writing data to this filesystem for long periods =20=

>> of time
>> to try and hit the problem:
>>
>> # time dd if=3D/dev/urandom of=3D/dev/ad0g
>> 143337401+0 records in
>> 143337401+0 records out
>> 73388749312 bytes transferred in 89392.318911 secs (820974 bytes/sec)
>> 614.444u 41826.640s 24:49:52.35 47.4%   244+1708k 0+0io 0pf+0w
>>
>> After this ran without a single error for about 20 hours, I =20
>> stopped it and
>> started trying to hit the block that triggered the issue manually.
>>
>> After a few hours of "double and half(ing) " I finally managed to =20
>> find the
>> block:
>>
>> # dd count=3D1 obs=3D1024 seek=3D93321655 if=3D/dev/urandom =
of=3D/dev/ad0g
>> 1+0 records in
>> 0+1 records out
>> 512 bytes transferred in 0.001470 secs (348278 bytes/sec)
>>
>> This one was successful... but the very next one:
>>
>> # dd count=3D1 obs=3D1024 seek=3D93321656 if=3D/dev/urandom =
of=3D/dev/ad0g
>> ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=3D268435456
>> ad0: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=3D268435456
>> ad0: FAILURE - WRITE_DMA timed out LBA=3D268435456
>> dd: /dev/ad0g: Input/output error
>> 1+0 records in
>> 0+0 records out
>> 0 bytes transferred in 16.453833 secs (0 bytes/sec)
>>
>> And incrementing this by one block shows:
>>
>> # dd count=3D1 obs=3D1024 seek=3D93321657 if=3D/dev/urandom =
of=3D/dev/ad0g
>> ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=3D268435458
>> ad0: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=3D268435458
>> ad0: FAILURE - WRITE_DMA timed out LBA=3D268435458
>> dd: /dev/ad0g: Input/output error
>> 1+0 records in
>> 0+0 records out
>> 0 bytes transferred in 16.452722 secs (0 bytes/sec)
>>
>> This makes perfect sense because my block size is specified at =20
>> 1024 on the
>> dd command, and the default blocksize is 512. Therefore, =20
>> incrementing it by
>> a single 1024 size block would return 2 blocks further in the LBA.
>>
>> ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=3D268435456
>> (then...)
>> ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=3D268435458
>>
>> Bingo! We've finally found the wall!
>>
>> I'm going to look further into the IDE chipset (atapci0: <AcerLabs =20=

>> M5229
>> UDMA66 controller>) tonight. Both for it's whitepapers (To see if =20
>> it has
>> some sort of quirk or limitation around this area.) and it's FreeBSD
>> driver, to see if something funky is going on.
>>
>> As I said before, if anyone is interesting in helping me resolve =20
>> this I
>> would appreciate it greatly. This is a bug which has haunted me =20
>> and several
>> others since FreeBSD 5.2-RC2 and it needs to be fixed.
>>
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-=20
> unsubscribe@freebsd.org"
>
>

- S=F8ren






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?DDA34AD5-6279-4E7F-B40E-2537389591CE>