From owner-freebsd-current@FreeBSD.ORG Wed Nov 28 15:37:41 2007 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1890316A469 for ; Wed, 28 Nov 2007 15:37:41 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 0380313C45A for ; Wed, 28 Nov 2007 15:37:40 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from zion.baldwin.cx (66-23-211-162.clients.speedfactory.net [66.23.211.162]) by elvis.mu.org (Postfix) with ESMTP id 80A341A4D80; Wed, 28 Nov 2007 07:37:40 -0800 (PST) From: John Baldwin To: =?iso-8859-1?q?S=F8ren_Schmidt?= Date: Wed, 28 Nov 2007 09:38:37 -0500 User-Agent: KMail/1.9.7 References: <73807.10710.qm@web63912.mail.re1.yahoo.com> <200711280842.09340.jhb@freebsd.org> <474D726A.8080807@deepcore.dk> In-Reply-To: <474D726A.8080807@deepcore.dk> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: <200711280938.38545.jhb@freebsd.org> Cc: Barney Cordoba , freebsd-current@freebsd.org Subject: Re: Any successful installs on a Broadcom HT1000 chipset? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Nov 2007 15:37:41 -0000 On Wednesday 28 November 2007 08:51:38 am S=F8ren Schmidt wrote: > John Baldwin wrote: > > On Wednesday 28 November 2007 02:45:16 am S=F8ren Schmidt wrote: > > =20 > >> John Baldwin wrote: > >> =20 > >>> FYI, I've seen weird in-memory corruption with machines with the HT10= 00_S1=20 > >>> atapci device. In all the cases I've seen so far, a single page is c= orrupted=20 > >>> with garbage and the page happens to be used by UMA to hold credentia= ls=20 > >>> including proc0's credentials. I've seen this corruption (trashed cr= eds for=20 > >>> proc0 and other creds in that page) on many of the same boxes (Dell 1= 435's=20 > >>> IIRC) running on 6.2. I've tried switching the HT1000_S1 to use SWKS= MIO=20 > >>> rather SWKS100 as I mentioned to you in an earlier e-mail (the Linux = driver=20 > >>> uses equivalent of SWKSMIO FWIW) but don't have any conclusive tests = on that. > >>> > >>> =20 > >>> =20 > >> OK, seems the chipset has some real problems, I have digged through al= l=20 > >> the (very little) docs and info I got from serverworks back when, and= =20 > >> the only thing I can find is that the chips doesn't support MSI in any= =20 > >> shape or fashion or it will do really strange things. > >> Now on my system it seems to be disabled but I'm not sure yet how its= =20 > >> determined to be that way. Would be worth for you guys to check what t= he=20 > >> sysctl's "hw.pci.enable_msi" and "hw.pci.enable_msix" are set to. > >> I havn't looked into this yet, but I'm pretty sure we added MSI suppor= t=20 > >> in the 6.2 -> 7.0 timeframe, so that might have uncovered this chipset= =20 > >> bug, and possibly the Promise data corruption one as well. > >> =20 > > > > The ata driver doesn't use MSI (no calls to pci_msi_count or pci_msi_al= loc, > > etc.), so this isn't an issue. Also, the boxes I've seen the corruptio= n on > > already have MSI disabled (it's still disabled by default in 6.x). > > =20 > OK, its must be *totally* disabled not just for ATA but for everything=20 > on those chipsets or they'll barf all over the place. > If we do that already we need to look into other places. > However, if we are dealing with in-memory corruption this is going to=20 > get "interesting".... > Does that also happen if nothing uses DMA ? Again, on the machines I'm seeing this on it was totally disabled. I don't= think I can totally disable DMA (NICs etc. must use DMA) on the machines and sinc= e they are in production and I only see the corruption as an after-effect when the= boxes panic or deadlock for another reason I'm not easily able to reproduce this.= Also, we do disable MSI for devices behind HT2000 chipsets because of a chip bug,= but not on HT1000 currently. However, MSI isn't on on 6.x anyway. =2D-=20 John Baldwin