From owner-freebsd-current@FreeBSD.ORG Wed Jul 28 18:39:46 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C10E31065677 for ; Wed, 28 Jul 2010 18:39:46 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id 4E0A08FC25 for ; Wed, 28 Jul 2010 18:39:45 +0000 (UTC) Received: by fxm13 with SMTP id 13so1484873fxm.13 for ; Wed, 28 Jul 2010 11:39:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:cc:subject:references:in-reply-to :x-enigmail-version:content-type:content-transfer-encoding; bh=0+0hSOzkX+QOTdmf1BNHI97Qru7gZ2FooNDzILzv90w=; b=gFwJ0RzzjtO2EXOo8NY8e//yfZ67Fw4VULHzihAR4QHz8ts2l2t/AxtxACG9TNvZ9p bZPR2uzSOsiiK2AUiH3il5qAwh4vj8kF1JaomdQVt0IWGnVkI9ChBVeUPMwtG1EfccQb JV+394/Pm57f7lp77xH+VGhtB+Rf6sZM4HM2c= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding; b=meoU1SzvrM+kAyc882G2BVus+TNTo1hjqPU88jwDo00mKYCIHsRI3NZaYo0qIyELjB RQ5pEvuHmUlwItN0v+0wn+TajdWaDkrbn7OmnJhBKr2PKy4jwavvu9hhAcQsrjYWqKB0 PmqvxXcrPbg36WBB5FhZU5Lb9cIHZoh+l/61Y= Received: by 10.223.119.196 with SMTP id a4mr10443502far.65.1280342384463; Wed, 28 Jul 2010 11:39:44 -0700 (PDT) Received: from mavbook.mavhome.dp.ua (pc.mavhome.dp.ua [212.86.226.226]) by mx.google.com with ESMTPS id s20sm893409faa.4.2010.07.28.11.39.43 (version=SSLv3 cipher=RC4-MD5); Wed, 28 Jul 2010 11:39:43 -0700 (PDT) Sender: Alexander Motin Message-ID: <4C50796C.4070509@FreeBSD.org> Date: Wed, 28 Jul 2010 21:39:40 +0300 From: Alexander Motin User-Agent: Thunderbird 2.0.0.24 (X11/20100402) MIME-Version: 1.0 To: David Naylor References: <201007281953.53131.naylor.b.david@gmail.com> In-Reply-To: <201007281953.53131.naylor.b.david@gmail.com> X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=KOI8-R Content-Transfer-Encoding: 7bit Cc: "freebsd-current@freebsd.org" Subject: Re: Interrupt Problems X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Jul 2010 18:39:46 -0000 David Naylor wrote: > I have been having interrupt related problems with various subsystems. I > suspect this is related to the changes in the event timer infrastructure. > > The subsystems that have experienced interrupt problems: > - hda: this is the easiest to reproduce and what I used to isolate the > commits. I get ``pcm0: chn_write(): pcm0:virtual:dsp0.vp0: play interrupt > timeout, channel dead'' reported and sound no longer plays. > - nfe: this has happened on occasion with no reliable way to reproduce. > ``watchdog timeouts'' are reported. After this happens all network traffic dies > and doing `ifconfig nfe0 down; ifconfig nfe0 up' panics the computer. > - dc: same thing as above. > - nvidia: has reported interrupt timeouts. This is independent of the > locking problem (that is fixed with recently published patch). No reliable way > to reproduce, appears to happen when under heavy load. X freezes as a result. > - ata: I had a HDD detach twice. I am not sure if this is related. I have > two HDD, each attached to a different controller. > > I tested this by using a kernel built from a cvsup date of 2010/06/20 and > 2010/06/22 (at midnight for both, aka 00:00:00). The former kernel does not > exhibit any problems while the latter does. This problem is also present with > a kernel from today. > > The motherboard is a N650SLI-DS4L with one graphics card. See attached for > more system information. > > Is there anything I can do to help diagnose the problem? Hardly I can explain how timer related changes could cause problems with such a long list of devices, using different IRQs. MCP51 seems to have quite bright history of different problems (at least I know about SATA and HDA MSI problems), so I won't be very surprised if it is some one more hardware-specific issue. Does problem happens randomly or can be triggered somehow? Have you tried to look what happens with interrupts during/after the problem appears? Are all of them dying or selectively each time? Is there way to restore operation after problem? Have you tried to switch to using other event timers? HPET event timers were never used before this, so bugs are not studied yet. PS: Verbose dmesg could be more useful. -- Alexander Motin