Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 08 Aug 2011 14:27:09 +0300
From:      Alexander Motin <mav@FreeBSD.org>
To:        Eygene Ryabinkin <rea@freebsd.org>
Cc:        freebsd-hackers@freebsd.org, Steven Hartland <killing@multiplay.co.uk>
Subject:   Re: cam / ata timeout limited to 2147 due to overflow bug?
Message-ID:  <4E3FC80D.6090704@FreeBSD.org>
In-Reply-To: <L31KSlcfsHEIWijui9oQC3siWnE@H1uwQtuDamiOTxQ5dYkc3ncI/0w>
References:  <4CAD348034DD463E80C89DD5A0BDD71B@multiplay.co.uk> <L31KSlcfsHEIWijui9oQC3siWnE@H1uwQtuDamiOTxQ5dYkc3ncI/0w>

next in thread | previous in thread | raw e-mail | index | archive | help
On 05.08.2011 11:11, Eygene Ryabinkin wrote:
>> What I don't understand is why the /2000
>
> It gives (timeout_in_ticks)/2.  The code in ahci_timeout does the following:
> {{{
>          /* Check if slot was not being executed last time we checked. */
>          if (slot->state<  AHCI_SLOT_EXECUTING) {
>                  /* Check if slot started executing. */
>                  sstatus = ATA_INL(ch->r_mem, AHCI_P_SACT);
>                  ccs = (ATA_INL(ch->r_mem, AHCI_P_CMD)&  AHCI_P_CMD_CCS_MASK)
>                      >>  AHCI_P_CMD_CCS_SHIFT;
>                  if ((sstatus&  (1<<  slot->slot)) != 0 || ccs == slot->slot ||
>                      ch->fbs_enabled)
>                          slot->state = AHCI_SLOT_EXECUTING;
>
>                  callout_reset(&slot->timeout,
>                      (int)slot->ccb->ccb_h.timeout * hz / 2000,
>                      (timeout_t*)ahci_timeout, slot);
>                  return;
>          }
> }}}
>
> So, my theory is that the first half of the timeout time is devoted
> to the transition from AHCI_SLOT_RUNNING ->  AHCI_SLOT_EXECUTING and
> the second one is the transition from AHCI_SLOT_RUNNING ->  TIMEOUT
> to give the whole process the duration of a full timeout.  However,
> judging by the code, if the slot won't start executing at the first
> invocation of ahci_timeout that was spawned by the callout armed in
> ahci_execute_transaction, we can have timeouts more than for the
> specified amount of time.  And if the slot will never start its
> execution, the callout will spin forever, unless I am missing something
> important here.
>
> May be Alexander can shed some light into this?

Your understanding is right. Some command may never trigger timeout if 
some other command execute infinitely. My goal was to find the commands 
that are really executing and may really cause delays. It would not be 
fair if command depend on each other and short command timeout reset 
device while long command tries to do something big. Implemented in 
ahci(4) code supposed to avoid such false timeouts. Unluckily, I've 
found case when that algorithm indeed may fail. Patch fixing that is 
committed and merged down recently.

-- 
Alexander Motin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4E3FC80D.6090704>