Date: Fri, 5 Aug 2011 00:02:19 +0100 From: "Steven Hartland" <killing@multiplay.co.uk> To: <freebsd-hackers@freebsd.org> Subject: cam / ata timeout limited to 2147 due to overflow bug? Message-ID: <4CAD348034DD463E80C89DD5A0BDD71B@multiplay.co.uk>
next in thread | raw e-mail | index | archive | help
I'm working on adding security methods to camcontrol and have come up against a strange issue. It seems that the timeout value for cam, at least on ata (ahci), is limited to less than 2148 seconds. This can be seen by running:- camcontrol identify ada0 -t 2148 -v (pass0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 (pass0:ahcich0:0:0:0): CAM status: Command timeout Also seen in /var/log/messages at this time is:- Aug 4 23:29:51 cfdev kernel: ahcich0: Timeout on slot 24 Aug 4 23:29:51 cfdev kernel: ahcich0: is 00000000 cs 01000000 ss 00000000 rs 01000000 tfd d0 serr 00000000 Dropping the timeout down to 2147 and the command runs fine. I've done some digging and it seems like this is implemented via:- sys/dev/ahci/ahci.c ahci_execute_transaction(struct ahci_slot *slot) { ... /* Start command execution timeout */ callout_reset(&slot->timeout, (int)ccb->ccb_h.timeout * hz / 2000, (timeout_t*)ahci_timeout, slot); Now its documented that:- "Non-positive values of ticks are silently converted to the value 1" So I suspect that this is what's happening resulting in an extremely small timeout instead of a large one. Now I know that passed in value to the timeout is seconds * 1000 so we should be seeing 2148000 for ccb->ccb_h.timeout now multiply that by 1000 (hz) and your over the int wrap point 2147483647. So instead of the wrap point being 2147483 seconds (24 days), I suspect because of the way this is structured its actually 2147 seconds (26mins). If this is the case the fix is likely to be something like:- callout_reset(&slot->timeout, (int)(ccb->ccb_h.timeout * (hz / 2000)), Does this sound reasonable? What I don't understand is why the /2000? For reference the reason for wanting a large timeout is that a secure erase of large media could take many hours so I'm using the erase time reported by the drive for this, in my case here is 400 minutes. Currently this instantly fails with a Command timeout which is clearly not right. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4CAD348034DD463E80C89DD5A0BDD71B>