Date: Fri, 5 Aug 2011 10:59:57 +0300 From: Kostik Belousov <kostikbel@gmail.com> To: Steven Hartland <killing@multiplay.co.uk> Cc: freebsd-hackers@freebsd.org Subject: Re: cam / ata timeout limited to 2147 due to overflow bug? Message-ID: <20110805075957.GP17489@deviant.kiev.zoral.com.ua> In-Reply-To: <4CAD348034DD463E80C89DD5A0BDD71B@multiplay.co.uk> References: <4CAD348034DD463E80C89DD5A0BDD71B@multiplay.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --]
On Fri, Aug 05, 2011 at 12:02:19AM +0100, Steven Hartland wrote:
> I'm working on adding security methods to camcontrol and have
> come up against a strange issue. It seems that the timeout
> value for cam, at least on ata (ahci), is limited to less than
> 2148 seconds.
>
> This can be seen by running:-
> camcontrol identify ada0 -t 2148 -v
> (pass0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00
> 00
> (pass0:ahcich0:0:0:0): CAM status: Command timeout
>
> Also seen in /var/log/messages at this time is:-
> Aug 4 23:29:51 cfdev kernel: ahcich0: Timeout on slot 24
> Aug 4 23:29:51 cfdev kernel: ahcich0: is 00000000 cs 01000000 ss 00000000
> rs 01000000 tfd d0 serr 00000000
>
> Dropping the timeout down to 2147 and the command runs fine.
>
> I've done some digging and it seems like this is implemented via:-
> sys/dev/ahci/ahci.c
> ahci_execute_transaction(struct ahci_slot *slot)
> {
> ...
> /* Start command execution timeout */
> callout_reset(&slot->timeout, (int)ccb->ccb_h.timeout * hz / 2000,
> (timeout_t*)ahci_timeout, slot);
>
> Now its documented that:-
> "Non-positive values of ticks are silently converted to the value 1"
>
> So I suspect that this is what's happening resulting in an extremely
> small timeout instead of a large one. Now I know that passed in value
> to the timeout is seconds * 1000 so we should be seeing 2148000
> for ccb->ccb_h.timeout now multiply that by 1000 (hz) and your over
> the int wrap point 2147483647.
>
> So instead of the wrap point being 2147483 seconds (24 days), I suspect
> because of the way this is structured its actually 2147 seconds (26mins).
>
> If this is the case the fix is likely to be something like:-
> callout_reset(&slot->timeout, (int)(ccb->ccb_h.timeout * (hz / 2000)),
For hz == 1000, hz / 2000 == 0 according to the C rules, so the
result will be 0 always.
>
> Does this sound reasonable? What I don't understand is why the /2000?
>
> For reference the reason for wanting a large timeout is that a
> secure erase of large media could take many hours so I'm using
> the erase time reported by the drive for this, in my case here is
> 400 minutes.
>
> Currently this instantly fails with a Command timeout which is
> clearly not right.
>
> Regards
> Steve
>
> ================================================
> This e.mail is private and confidential between Multiplay (UK) Ltd. and the
> person or entity to whom it is addressed. In the event of misdirection, the
> recipient is prohibited from using, copying, printing or otherwise
> disseminating it or any information contained in it.
> In the event of misdirection, illegible or incomplete transmission please
> telephone +44 845 868 1337
> or return the E.mail to postmaster@multiplay.co.uk.
>
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
[-- Attachment #2 --]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)
iEYEARECAAYFAk47ovwACgkQC3+MBN1Mb4iaPQCfbbcO3Vu0DEBP7h7umwZoYXW7
ttIAoKgMITHEs0YyuHfeMaYQ08cTc4qX
=0hCw
-----END PGP SIGNATURE-----
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110805075957.GP17489>
