From owner-freebsd-hackers@FreeBSD.ORG Fri Aug 5 08:11:23 2011 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 54141106566C; Fri, 5 Aug 2011 08:11:23 +0000 (UTC) (envelope-from rea@codelabs.ru) Received: from 0.mx.codelabs.ru (0.mx.codelabs.ru [144.206.177.45]) by mx1.freebsd.org (Postfix) with ESMTP id E1D118FC16; Fri, 5 Aug 2011 08:11:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=codelabs.ru; s=two; h=Sender:In-Reply-To:Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date; bh=H9g4xF5qYv9oBS1IawlHR54V6TKh5eNMPHG01shxj4M=; b=DHM4jn8eyLZuJjO873AWJJXY7uClGHSAWxatPaP/QZuaSjsQkI/hrzO78/JKMg5Ewni1rFl5qfKYhH9/VdOgyzIzKeoLphksRlz8Xa/URt/vLwrPIT71UehbsmxvyZWZyvb+1hHiRitK4kw6eG2nW4kd6jMm6UwbO0XfoPLiugo7eswFBN9cFQRk0jB809zcEde+p+J/2zKFu9/Z0s1jdBWX7LbowMd4w0BRyq4ECAQ19jXX3rKkoVSC3dq0ut48WMVxe+zWmKrW2oVqaYnTIxuCUOyPomsLPSjx3v3x258wZMfV3OUfQuHcR9a4jr6q9lnP04NaCoDIYjJ7aZS0Ng==; Received: from void.codelabs.ru (void.codelabs.ru [144.206.177.25]) by 0.mx.codelabs.ru with esmtpsa (TLSv1:AES256-SHA:256) id 1QpFV3-0005ef-6G; Fri, 05 Aug 2011 12:11:21 +0400 Date: Fri, 5 Aug 2011 12:11:18 +0400 From: Eygene Ryabinkin To: Steven Hartland Message-ID: References: <4CAD348034DD463E80C89DD5A0BDD71B@multiplay.co.uk> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="mP3DRpeJDSE+ciuQ" Content-Disposition: inline In-Reply-To: <4CAD348034DD463E80C89DD5A0BDD71B@multiplay.co.uk> Sender: rea@codelabs.ru Cc: freebsd-hackers@freebsd.org, mav@freebsd.org Subject: Re: cam / ata timeout limited to 2147 due to overflow bug? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Aug 2011 08:11:23 -0000 --mP3DRpeJDSE+ciuQ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Steven, good day. Fri, Aug 05, 2011 at 12:02:19AM +0100, Steven Hartland wrote: > So I suspect that this is what's happening resulting in an extremely > small timeout instead of a large one. Now I know that passed in value > to the timeout is seconds * 1000 so we should be seeing 2148000 > for ccb->ccb_h.timeout now multiply that by 1000 (hz) and your over > the int wrap point 2147483647. >=20 > So instead of the wrap point being 2147483 seconds (24 days), I suspect > because of the way this is structured its actually 2147 seconds (26mins). >=20 > If this is the case the fix is likely to be something like:- > callout_reset(&slot->timeout, (int)(ccb->ccb_h.timeout * (hz / 2000)), It will give you 0 timeout for all values of hz that are lower than 2000: hz is int, so you'll get integer division. Since ccb_h.timeout is u_int32_t, the proper way to handle this situation would be {{{ (u_int64_t)ccb->ccb_h.timeout * (u_int32_t)hz)/2000 }}} as long as the value of hz won't be greater than 2^32. Can you try the patch at http://codelabs.ru/fbsd/patches/ahci/AHCI-properly-convert-CAM-timeout-to= -ticks.diff > What I don't understand is why the /2000 It gives (timeout_in_ticks)/2. The code in ahci_timeout does the following: {{{ /* Check if slot was not being executed last time we checked. */ if (slot->state < AHCI_SLOT_EXECUTING) { /* Check if slot started executing. */ sstatus =3D ATA_INL(ch->r_mem, AHCI_P_SACT); ccs =3D (ATA_INL(ch->r_mem, AHCI_P_CMD) & AHCI_P_CMD_CCS_MA= SK) >> AHCI_P_CMD_CCS_SHIFT; if ((sstatus & (1 << slot->slot)) !=3D 0 || ccs =3D=3D slot= ->slot || ch->fbs_enabled) slot->state =3D AHCI_SLOT_EXECUTING; callout_reset(&slot->timeout, (int)slot->ccb->ccb_h.timeout * hz / 2000, (timeout_t*)ahci_timeout, slot); return; } }}} So, my theory is that the first half of the timeout time is devoted to the transition from AHCI_SLOT_RUNNING -> AHCI_SLOT_EXECUTING and the second one is the transition from AHCI_SLOT_RUNNING -> TIMEOUT to give the whole process the duration of a full timeout. However, judging by the code, if the slot won't start executing at the first invocation of ahci_timeout that was spawned by the callout armed in ahci_execute_transaction, we can have timeouts more than for the specified amount of time. And if the slot will never start its execution, the callout will spin forever, unless I am missing something important here. May be Alexander can shed some light into this? --=20 Eygene Ryabinkin ,,,^..^,,, [ Life's unfair - but root password helps! | codelabs.ru ] [ 82FE 06BC D497 C0DE 49EC 4FF0 16AF 9EAE 8152 ECFB | freebsd.org ] --mP3DRpeJDSE+ciuQ Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (FreeBSD) iF4EAREIAAYFAk47paYACgkQFq+eroFS7PshggD7BjGIRUl6F0iBu2jazwBmcM72 8cIbhC6QN+zbvLSFE2wBAJQlebM+hbMjdT6dAPwo8NXacDd7UMvmUTtwyueekbHU =pWkn -----END PGP SIGNATURE----- --mP3DRpeJDSE+ciuQ--