From owner-freebsd-current@FreeBSD.ORG Fri Dec 26 02:24:09 2008 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EDC25106564A for ; Fri, 26 Dec 2008 02:24:09 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from rv-out-0506.google.com (rv-out-0506.google.com [209.85.198.231]) by mx1.freebsd.org (Postfix) with ESMTP id B38DE8FC14 for ; Fri, 26 Dec 2008 02:24:09 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: by rv-out-0506.google.com with SMTP id b25so4861693rvf.43 for ; Thu, 25 Dec 2008 18:24:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:received:received:date:from :to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=sVQfuixwLfqJjhetQYUkKDUmgh9FBecA+pn/XWjkIa8=; b=sAWTcsDxKgW4QmDs78zYrwyF2/NOV5Sa/lKi6oSAYJ8UgRgSQ31hD7U4TeChvjpASc EPJYiyIAXnOay/+m8CVNNuIoaCMB2dMt2En7gsDL0ADJpNQg6m3WFfkAUaC200wUaiAq qZyKLMQblz17zwSSe49BmcKCE7TWTFh7t5+WQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=GuCA9Gv01NxNgKDYIdZ89CoitnQrE84vjWTWj769wy+EJxrAbgvEYR1U5B7YJcaLqm I5ZRms1tFHEB/FaNnZSH5Sc76fCgjPrgPLPP50yHaKFPP6RxXAH3IMuDZXzVkCLs2ya3 vTI30iP7ENkNwbvnA7H4pSZ2pPSN9zztB5tDE= Received: by 10.140.147.5 with SMTP id u5mr5022499rvd.292.1230258249240; Thu, 25 Dec 2008 18:24:09 -0800 (PST) Received: from michelle.cdnetworks.co.kr ([211.53.35.84]) by mx.google.com with ESMTPS id g31sm25355288rvb.4.2008.12.25.18.24.05 (version=TLSv1/SSLv3 cipher=RC4-MD5); Thu, 25 Dec 2008 18:24:07 -0800 (PST) Received: from michelle.cdnetworks.co.kr (localhost.cdnetworks.co.kr [127.0.0.1]) by michelle.cdnetworks.co.kr (8.13.5/8.13.5) with ESMTP id mBQ2O1aX003171 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 26 Dec 2008 11:24:01 +0900 (KST) (envelope-from pyunyh@gmail.com) Received: (from yongari@localhost) by michelle.cdnetworks.co.kr (8.13.5/8.13.5/Submit) id mBQ2NxLs003170; Fri, 26 Dec 2008 11:23:59 +0900 (KST) (envelope-from pyunyh@gmail.com) Date: Fri, 26 Dec 2008 11:23:59 +0900 From: Pyun YongHyeon To: Bruce Simpson Message-ID: <20081226022359.GB2700@cdnetworks.co.kr> References: <20070520174124.GA14987@Athena.infor.org> <495258E7.5070309@incunabulum.net> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="WhfpMioaduB5tiZL" Content-Disposition: inline In-Reply-To: <495258E7.5070309@incunabulum.net> User-Agent: Mutt/1.4.2.1i Cc: freebsd-current@FreeBSD.org Subject: Re: msk watchdog timeout X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Dec 2008 02:24:10 -0000 --WhfpMioaduB5tiZL Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Wed, Dec 24, 2008 at 03:44:39PM +0000, Bruce Simpson wrote: > Hi, > > I just observed a similar issue with the onboard msk0 on my ASUS Vintage > AH-1. > > The symptoms occurred in the last hour, when attempting to download the > 7.1-RC1-i386-dvd1.iso.gz from ftp.plig.net mirror. > It is triggered when the data rate of the wget job hit around 890 KiB/sec. > > Let me know if you need a PR raised for this. > > uname -a: > %%% > FreeBSD anglepoise.lon.incunabulum.net 7.1-PRERELEASE FreeBSD > 7.1-PRERELEASE #0: Wed Dec 3 17:03:33 GMT 2008 > root@anglepoise.lon.incunabulum.net:/home/obj/usr/src/sys/ANGLEPOISE7 amd64 > %%% > > dmesg output from syslog: > %%% > Dec 24 15:08:05 anglepoise kernel: msk0: watchdog timeout (missed Tx > interrupts) > -- recovering > Dec 24 15:09:32 anglepoise kernel: msk0: watchdog timeout > Dec 24 15:09:32 anglepoise kernel: msk0: link state changed to DOWN > Dec 24 15:09:34 anglepoise kernel: msk0: link state changed to UP > Dec 24 15:09:46 anglepoise kernel: msk0: watchdog timeout (missed Tx > interrupts) > -- recovering > Dec 24 15:10:08 anglepoise kernel: msk0: watchdog timeout (missed Tx > interrupts) > -- recovering > Dec 24 15:10:32 anglepoise kernel: msk0: watchdog timeout (missed Tx > interrupts) > -- recovering > Dec 24 15:11:03 anglepoise kernel: msk0: watchdog timeout > Dec 24 15:11:03 anglepoise kernel: msk0: link state changed to DOWN > Dec 24 15:11:05 anglepoise kernel: msk0: link state changed to UP > Dec 24 15:12:10 anglepoise kernel: msk0: watchdog timeout > Dec 24 15:12:10 anglepoise kernel: msk0: link state changed to DOWN > Dec 24 15:12:12 anglepoise kernel: msk0: link state changed to UP > Dec 24 15:12:20 anglepoise kernel: msk0: watchdog timeout (missed Tx > interrupts) > -- recovering > Dec 24 15:12:58 anglepoise last message repeated 3 times > Dec 24 15:14:28 anglepoise last message repeated 12 times > Dec 24 15:14:29 anglepoise kernel: msk0: link state changed to DOWN > Dec 24 15:14:31 anglepoise kernel: msk0: link state changed to UP > Dec 24 15:14:39 anglepoise kernel: msk0: watchdog timeout (missed Tx > interrupts) > -- recovering > Dec 24 15:15:06 anglepoise last message repeated 3 times > Dec 24 15:15:21 anglepoise kernel: msk0: watchdog timeout (missed Tx > interrupts) > -- recovering > Dec 24 15:18:27 anglepoise dhclient[339]: connection closed > Dec 24 15:18:27 anglepoise dhclient[339]: exiting. > Dec 24 15:18:33 anglepoise kernel: msk0: link state changed to DOWN > Dec 24 15:18:35 anglepoise kernel: msk0: link state changed to UP > Dec 24 15:18:35 anglepoise kernel: msk0: link state changed to DOWN > Dec 24 15:18:37 anglepoise kernel: msk0: link state changed to UP > Dec 24 15:18:46 anglepoise kernel: msk0: watchdog timeout (missed Tx > interrupts) > -- recovering > Dec 24 15:18:49 anglepoise kernel: msk0: link state changed to DOWN > Dec 24 15:18:51 anglepoise kernel: msk0: link state changed to UP > Dec 24 15:19:00 anglepoise kernel: msk0: watchdog timeout (missed Tx > interrupts) > -- recovering > Dec 24 15:19:38 anglepoise last message repeated 4 times > Dec 24 15:19:47 anglepoise kernel: msk0: watchdog timeout (missed Tx > interrupts) > Dec 24 15:18:46 anglepoise kernel: msk0: watchdog timeout (missed Tx > interrupts) > -- recovering > Dec 24 15:18:49 anglepoise kernel: msk0: link state changed to DOWN > Dec 24 15:18:51 anglepoise kernel: msk0: link state changed to UP > Dec 24 15:19:00 anglepoise kernel: msk0: watchdog timeout (missed Tx > interrupts) > -- recovering > Dec 24 15:19:38 anglepoise last message repeated 4 times > Dec 24 15:19:47 anglepoise kernel: msk0: watchdog timeout (missed Tx > interrupts) > -- recovering > Dec 24 15:25:48 anglepoise last message repeated 6 times > Dec 24 15:28:04 anglepoise kernel: msk0: promiscuous mode enabled > Dec 24 15:28:56 anglepoise kernel: msk0: promiscuous mode disabled > Dec 24 15:29:41 anglepoise sudo: bms : TTY=ttyp4 ; PWD=/home/bms ; > USER=roo > t ; COMMAND=/sbin/reboot > Dec 24 15:29:41 anglepoise reboot: rebooted by bms > Dec 24 15:29:41 anglepoise syslogd: exiting on signal 15 > %%% > > The DHCP lease is lost, msk0 appears to stop receiving traffic. > > I *did* re-patch the cable on my switch around this point in time, and > it's possible this triggered the condition. > Perhaps this is a receive DMA descriptor problem, or a PHY interrupt No, if this was root cause of the issue, msk(4) would have showed "Rx descriptor error" on console. Of course this assumes the controller can detect such errors. > problem? > msk(4) doesn't rely on PHY status change interrupt. The interrupt is enabled by default, though. I vaguely guess link state change handing in msk(4) is not right as msk(4) just checked link UP/DOWN event. I'm working on improving link state handling to support 88E8040 but it still requires a lot of code and workaround. > I confirmed that neither the cabling itself nor other network > infrastructure were responsible. > Ok. Yukon controllers look really buggy and seem to require different workaround for each controller/revision. There was fix for one of silicon bug of Yukon controllers so it would be even better if you can apply the workaround in HEAD(r183346). However one user also reported watchdog timeouts on CURRENT so there still seem to have unresolved issues. I couldn't reproduce the issue on my box but would you try attached patch? Also show me dmesg output to see what revision you have(This information is not available with pciconf(8)). -- Regards, Pyun YongHyeon --WhfpMioaduB5tiZL Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="msk.watchdog.diff" Index: sys/dev/msk/if_msk.c =================================================================== --- sys/dev/msk/if_msk.c (revision 186497) +++ sys/dev/msk/if_msk.c (working copy) @@ -1355,27 +1355,25 @@ CSR_WRITE_4(sc, STAT_LIST_ADDR_HI, MSK_ADDR_HI(addr)); /* Set the status list last index. */ CSR_WRITE_2(sc, STAT_LAST_IDX, MSK_STAT_RING_CNT - 1); - if (sc->msk_hw_id == CHIP_ID_YUKON_EC && - sc->msk_hw_rev == CHIP_REV_YU_EC_A1) { - /* WA for dev. #4.3 */ - CSR_WRITE_2(sc, STAT_TX_IDX_TH, ST_TXTH_IDX_MASK); - /* WA for dev. #4.18 */ - CSR_WRITE_1(sc, STAT_FIFO_WM, 0x21); - CSR_WRITE_1(sc, STAT_FIFO_ISR_WM, 0x07); - } else { - CSR_WRITE_2(sc, STAT_TX_IDX_TH, 0x0a); - CSR_WRITE_1(sc, STAT_FIFO_WM, 0x10); - if (sc->msk_hw_id == CHIP_ID_YUKON_XL && - sc->msk_hw_rev == CHIP_REV_YU_XL_A0) - CSR_WRITE_1(sc, STAT_FIFO_ISR_WM, 0x04); - else - CSR_WRITE_1(sc, STAT_FIFO_ISR_WM, 0x10); - CSR_WRITE_4(sc, STAT_ISR_TIMER_INI, 0x0190); - } /* - * Use default value for STAT_ISR_TIMER_INI, STAT_LEV_TIMER_INI. + * Interrupt moderation and coalescing frames should be + * controllable with sysctl variables or loader tunables + * but the relationship between status updates and + * interrupt moderation are not clear. Some hardware + * revisions seem to very sensitive to these parameters + * and could be resulted in poor performance as well as + * non-working situation if improper values were chosen. */ + CSR_WRITE_2(sc, STAT_TX_IDX_TH, 0x0a); + CSR_WRITE_1(sc, STAT_FIFO_WM, 0x10); + if (sc->msk_hw_id == CHIP_ID_YUKON_XL && + sc->msk_hw_rev == CHIP_REV_YU_XL_A0) + CSR_WRITE_1(sc, STAT_FIFO_ISR_WM, 0x04); + else + CSR_WRITE_1(sc, STAT_FIFO_ISR_WM, 0x10); CSR_WRITE_4(sc, STAT_TX_TIMER_INI, MSK_USECS(sc, 1000)); + CSR_WRITE_4(sc, STAT_ISR_TIMER_INI, MSK_USECS(sc, 30)); + CSR_WRITE_4(sc, STAT_LEV_TIMER_INI, MSK_USECS(sc, 50)); /* Enable status unit. */ CSR_WRITE_4(sc, STAT_CTRL, SC_STAT_OP_ON); @@ -3586,6 +3584,10 @@ domore = msk_handle_events(sc); if ((status & Y2_IS_STAT_BMU) != 0) CSR_WRITE_4(sc, STAT_CTRL, SC_STAT_CLR_IRQ); + if (CSR_READ_1(sc, STAT_TX_TIMER_CTRL) == TIM_START) { + CSR_WRITE_1(sc, STAT_TX_TIMER_CTRL, TIM_STOP); + CSR_WRITE_1(sc, STAT_TX_TIMER_CTRL, TIM_START); + } if (ifp0 != NULL && (ifp0->if_drv_flags & IFF_DRV_RUNNING) != 0 && !IFQ_DRV_IS_EMPTY(&ifp0->if_snd)) --WhfpMioaduB5tiZL--