From owner-freebsd-hackers@FreeBSD.ORG Mon Aug 8 11:27:30 2011 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D2322106566B; Mon, 8 Aug 2011 11:27:30 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-ey0-f172.google.com (mail-ey0-f172.google.com [209.85.215.172]) by mx1.freebsd.org (Postfix) with ESMTP id EB49A8FC12; Mon, 8 Aug 2011 11:27:29 +0000 (UTC) Received: by eye4 with SMTP id 4so2742411eye.31 for ; Mon, 08 Aug 2011 04:27:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=FhhcwkWNfiLrQNgPnNz5qh2qElSIZxb65S/316fF5vY=; b=G329Xi2XxrEixY1xvm9yEiHHjduPc5LIAwhabMb+3dCe6mqEwJuIf4qT4fl4xuyegJ qGgkYPVxhnLmXGALCpTYL3dR2G4usGMOlxsZVjpeAtuxSXCR1C48UTpiw/LX3Sn8tnCa obCFp9+tzk+lF9lMAbiCZ6jfhq71/c9mbSkGk= Received: by 10.205.65.206 with SMTP id xn14mr1541553bkb.329.1312802848984; Mon, 08 Aug 2011 04:27:28 -0700 (PDT) Received: from mavbook.mavhome.dp.ua ([91.198.175.1]) by mx.google.com with ESMTPS id f13sm1611047bku.18.2011.08.08.04.27.26 (version=SSLv3 cipher=OTHER); Mon, 08 Aug 2011 04:27:27 -0700 (PDT) Sender: Alexander Motin Message-ID: <4E3FC80D.6090704@FreeBSD.org> Date: Mon, 08 Aug 2011 14:27:09 +0300 From: Alexander Motin User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110709 Thunderbird/5.0 MIME-Version: 1.0 To: Eygene Ryabinkin References: <4CAD348034DD463E80C89DD5A0BDD71B@multiplay.co.uk> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org, Steven Hartland Subject: Re: cam / ata timeout limited to 2147 due to overflow bug? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Aug 2011 11:27:31 -0000 On 05.08.2011 11:11, Eygene Ryabinkin wrote: >> What I don't understand is why the /2000 > > It gives (timeout_in_ticks)/2. The code in ahci_timeout does the following: > {{{ > /* Check if slot was not being executed last time we checked. */ > if (slot->state< AHCI_SLOT_EXECUTING) { > /* Check if slot started executing. */ > sstatus = ATA_INL(ch->r_mem, AHCI_P_SACT); > ccs = (ATA_INL(ch->r_mem, AHCI_P_CMD)& AHCI_P_CMD_CCS_MASK) > >> AHCI_P_CMD_CCS_SHIFT; > if ((sstatus& (1<< slot->slot)) != 0 || ccs == slot->slot || > ch->fbs_enabled) > slot->state = AHCI_SLOT_EXECUTING; > > callout_reset(&slot->timeout, > (int)slot->ccb->ccb_h.timeout * hz / 2000, > (timeout_t*)ahci_timeout, slot); > return; > } > }}} > > So, my theory is that the first half of the timeout time is devoted > to the transition from AHCI_SLOT_RUNNING -> AHCI_SLOT_EXECUTING and > the second one is the transition from AHCI_SLOT_RUNNING -> TIMEOUT > to give the whole process the duration of a full timeout. However, > judging by the code, if the slot won't start executing at the first > invocation of ahci_timeout that was spawned by the callout armed in > ahci_execute_transaction, we can have timeouts more than for the > specified amount of time. And if the slot will never start its > execution, the callout will spin forever, unless I am missing something > important here. > > May be Alexander can shed some light into this? Your understanding is right. Some command may never trigger timeout if some other command execute infinitely. My goal was to find the commands that are really executing and may really cause delays. It would not be fair if command depend on each other and short command timeout reset device while long command tries to do something big. Implemented in ahci(4) code supposed to avoid such false timeouts. Unluckily, I've found case when that algorithm indeed may fail. Patch fixing that is committed and merged down recently. -- Alexander Motin