Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 5 Aug 2011 12:11:18 +0400
From:      Eygene Ryabinkin <rea@freebsd.org>
To:        Steven Hartland <killing@multiplay.co.uk>
Cc:        freebsd-hackers@freebsd.org, mav@freebsd.org
Subject:   Re: cam / ata timeout limited to 2147 due to overflow bug?
Message-ID:  <L31KSlcfsHEIWijui9oQC3siWnE@H1uwQtuDamiOTxQ5dYkc3ncI/0w>
In-Reply-To: <4CAD348034DD463E80C89DD5A0BDD71B@multiplay.co.uk>
References:  <4CAD348034DD463E80C89DD5A0BDD71B@multiplay.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help

--mP3DRpeJDSE+ciuQ
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Steven, good day.

Fri, Aug 05, 2011 at 12:02:19AM +0100, Steven Hartland wrote:
> So I suspect that this is what's happening resulting in an extremely
> small timeout instead of a large one. Now I know that passed in value
> to the timeout is seconds * 1000 so we should be seeing 2148000
> for ccb->ccb_h.timeout now multiply that by 1000 (hz) and your over
> the int wrap point 2147483647.
>=20
> So instead of the wrap point being 2147483 seconds (24 days), I suspect
> because of the way this is structured its actually 2147 seconds (26mins).
>=20
> If this is the case the fix is likely to be something like:-
>  callout_reset(&slot->timeout, (int)(ccb->ccb_h.timeout * (hz / 2000)),

It will give you 0 timeout for all values of hz that are lower than
2000: hz is int, so you'll get integer division.  Since ccb_h.timeout
is u_int32_t, the proper way to handle this situation would be
{{{
	(u_int64_t)ccb->ccb_h.timeout * (u_int32_t)hz)/2000
}}}
as long as the value of hz won't be greater than 2^32.

Can you try the patch at
  http://codelabs.ru/fbsd/patches/ahci/AHCI-properly-convert-CAM-timeout-to=
-ticks.diff

> What I don't understand is why the /2000

It gives (timeout_in_ticks)/2.  The code in ahci_timeout does the following:
{{{
        /* Check if slot was not being executed last time we checked. */
        if (slot->state < AHCI_SLOT_EXECUTING) {
                /* Check if slot started executing. */
                sstatus =3D ATA_INL(ch->r_mem, AHCI_P_SACT);
                ccs =3D (ATA_INL(ch->r_mem, AHCI_P_CMD) & AHCI_P_CMD_CCS_MA=
SK)
                    >> AHCI_P_CMD_CCS_SHIFT;
                if ((sstatus & (1 << slot->slot)) !=3D 0 || ccs =3D=3D slot=
->slot ||
                    ch->fbs_enabled)
                        slot->state =3D AHCI_SLOT_EXECUTING;

                callout_reset(&slot->timeout,
                    (int)slot->ccb->ccb_h.timeout * hz / 2000,
                    (timeout_t*)ahci_timeout, slot);
                return;
        }
}}}

So, my theory is that the first half of the timeout time is devoted
to the transition from AHCI_SLOT_RUNNING -> AHCI_SLOT_EXECUTING and
the second one is the transition from AHCI_SLOT_RUNNING -> TIMEOUT
to give the whole process the duration of a full timeout.  However,
judging by the code, if the slot won't start executing at the first
invocation of ahci_timeout that was spawned by the callout armed in
ahci_execute_transaction, we can have timeouts more than for the
specified amount of time.  And if the slot will never start its
execution, the callout will spin forever, unless I am missing something
important here.

May be Alexander can shed some light into this?
--=20
Eygene Ryabinkin                                        ,,,^..^,,,
[ Life's unfair - but root password helps!           | codelabs.ru ]
[ 82FE 06BC D497 C0DE 49EC  4FF0 16AF 9EAE 8152 ECFB | freebsd.org ]

--mP3DRpeJDSE+ciuQ
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (FreeBSD)

iF4EAREIAAYFAk47paYACgkQFq+eroFS7PshggD7BjGIRUl6F0iBu2jazwBmcM72
8cIbhC6QN+zbvLSFE2wBAJQlebM+hbMjdT6dAPwo8NXacDd7UMvmUTtwyueekbHU
=pWkn
-----END PGP SIGNATURE-----

--mP3DRpeJDSE+ciuQ--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?L31KSlcfsHEIWijui9oQC3siWnE>