From owner-freebsd-hackers Sat Aug 28 11: 1:40 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (Postfix) with ESMTP id 7F67214BE3 for ; Sat, 28 Aug 1999 11:01:38 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id LAA05485; Sat, 28 Aug 1999 11:00:31 -0700 (PDT) (envelope-from dillon) Date: Sat, 28 Aug 1999 11:00:31 -0700 (PDT) From: Matthew Dillon Message-Id: <199908281800.LAA05485@apollo.backplane.com> To: Matthew Jacob Cc: "Justin T. Gibbs" , hackers@FreeBSD.ORG Subject: Re: Should cam_imask be part of bio_imask ? References: Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :I strongly doubt that this is a CAM isr problem- the error pattern isn't :entirely clear from what you said, but it looks more like a FIFO or CACHE :LINE sized type of problem- it looks to be < 16 bytes, but not a short :count. Because this isn't one of the wacky systems I spent most of my :career on at Sun where the first and usual suspect was a system memory :cache line because IO wasn't cache coherent on Suns between the Sun :3/{50,60,75,150} and the advent SuperSparc Viking Chipset, I'd guess a :FIFO somewhere in the I/O movement path. : :Justin- any changes lately where flushing a FIFO in the Adaptec at the end :of tranfer might have been spoodged? : :-matt The problem is definitely aligned in some way. Here's a diff of a hexdump of one error. Sometimes I lose a whole page, sometimes two pages, sometimes 16 bytes, but the error is always page aligned. 1536c1536 < 0005ff0 3333 2033 3434 3434 7c20 207c 3030 3030 --- > 0005ff0 7365 3d20 3120 093b 2309 6720 6f6c 6162 A cache-line problem would fit the symptoms. I know it isn't the hardware... this 1xCPU PPro/200 system has been with me for several years and this test didn't fail like this a month ago. When I updated the machine last (unfortunately w/ about a month's worth of changes), my buildworlds started failing with odd errors. I then switched away from the failing buildworlds (which take an hour) and started doing cp -r's and then diff -r's (takes only 20 min), and as you can see I'm still seeing the problem. Maybe this is DMA related. Perhaps the cache is not getting cleared? Maybe an MMU optimization someone threw in recently? -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message