Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 5 Aug 2011 10:59:43 +0100
From:      "Steven Hartland" <killing@multiplay.co.uk>
To:        "Eygene Ryabinkin" <rea@freebsd.org>
Cc:        freebsd-hackers@freebsd.org, mav@freebsd.org
Subject:   Re: cam / ata timeout limited to 2147 due to overflow bug?
Message-ID:  <FEC6035595894C6CBD490CCEDEA977FB@multiplay.co.uk>
References:  <4CAD348034DD463E80C89DD5A0BDD71B@multiplay.co.uk> <L31KSlcfsHEIWijui9oQC3siWnE@H1uwQtuDamiOTxQ5dYkc3ncI/0w>

next in thread | previous in thread | raw e-mail | index | archive | help
Fri, Aug 05, 2011 at 12:02:19AM +0100, Steven Hartland wrote:
>> So I suspect that this is what's happening resulting in an extremely
>> small timeout instead of a large one. Now I know that passed in value
>> to the timeout is seconds * 1000 so we should be seeing 2148000
>> for ccb->ccb_h.timeout now multiply that by 1000 (hz) and your over
>> the int wrap point 2147483647.
>> 
>> So instead of the wrap point being 2147483 seconds (24 days), I suspect
>> because of the way this is structured its actually 2147 seconds (26mins).
>> 
>> If this is the case the fix is likely to be something like:-
>>  callout_reset(&slot->timeout, (int)(ccb->ccb_h.timeout * (hz / 2000)),
>
> It will give you 0 timeout for all values of hz that are lower than
> 2000: hz is int, so you'll get integer division.  Since ccb_h.timeout
> is u_int32_t, the proper way to handle this situation would be
> {{{
> (u_int64_t)ccb->ccb_h.timeout * (u_int32_t)hz)/2000
> }}}
> as long as the value of hz won't be greater than 2^32.

Ahh of course, was late ;-)

> Can you try the patch at
>  http://codelabs.ru/fbsd/patches/ahci/AHCI-properly-convert-CAM-timeout-to-ticks.diff
>
>> What I don't understand is why the /2000
> 
> It gives (timeout_in_ticks)/2.  The code in ahci_timeout does the following:
> {{{
>        /* Check if slot was not being executed last time we checked. */
>        if (slot->state < AHCI_SLOT_EXECUTING) {
snip..
>
> So, my theory is that the first half of the timeout time is devoted
> to the transition from AHCI_SLOT_RUNNING -> AHCI_SLOT_EXECUTING and
> the second one is the transition from AHCI_SLOT_RUNNING -> TIMEOUT
> to give the whole process the duration of a full timeout.  However,
> judging by the code, if the slot won't start executing at the first
> invocation of ahci_timeout that was spawned by the callout armed in
> ahci_execute_transaction, we can have timeouts more than for the
> specified amount of time.  And if the slot will never start its
> execution, the callout will spin forever, unless I am missing something
> important here.
>
> May be Alexander can shed some light into this?

Interesting thanks for the explaination.

I've tried the patch and it a few cut and paste errors, which I've fixed,
and confirmed it works as expected, so thanks for that :)

There's also a load more drivers with the same issue so I've gone through
and fixed all the occurances I can find. Here's the updated patch:-
http://blog.multiplay.co.uk/dropzone/freebsd/ccb_timeout.patch

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FEC6035595894C6CBD490CCEDEA977FB>