From owner-freebsd-hackers Sat Mar 1 05:20:38 1997 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.5/8.8.5) id FAA04674 for hackers-outgoing; Sat, 1 Mar 1997 05:20:38 -0800 (PST) Received: from dg-rtp.dg.com (dg-rtp.rtp.dg.com [128.222.1.2]) by freefall.freebsd.org (8.8.5/8.8.5) with SMTP id FAA04667 for ; Sat, 1 Mar 1997 05:20:34 -0800 (PST) Received: by dg-rtp.dg.com (5.4R3.10/dg-rtp-v02) id AA11179; Sat, 1 Mar 1997 08:20:03 -0500 Received: from ponds by dg-rtp.dg.com.rtp.dg.com; Sat, 1 Mar 1997 08:20 EST Received: from lakes.water.net (lakes [10.0.0.3]) by ponds.water.net (8.8.3/8.7.3) with ESMTP id IAA27309; Sat, 1 Mar 1997 08:04:47 -0500 (EST) Received: (from rivers@localhost) by lakes.water.net (8.8.3/8.6.9) id IAA10154; Sat, 1 Mar 1997 08:10:13 -0500 (EST) Date: Sat, 1 Mar 1997 08:10:13 -0500 (EST) From: Thomas David Rivers Message-Id: <199703011310.IAA10154@lakes.water.net> To: ponds!cet.co.jp!michaelh, ponds!lakes.water.net!rivers Subject: Re: Another installment of the "dup alloc"/"bad dir" panic problems. Cc: ponds!freebsd.org!freebsd-hackers Content-Type: text Sender: owner-hackers@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk > > On Fri, 28 Feb 1997, Thomas David Rivers wrote: > > > > > It now appears that having the printf()s in disksort() affects the problem > > > > in a positive manner (that is, I'm not able to demonstrate the previous > > > > "non-writing" behaviour I had seen; the inode in question is reliably > > > > filled with zeros.) > > The old "printf's makes the problem go away effect". I hate when that > happens. ;-) > > Can you do a gcc -v and show us the flags you use to compile the kernel? > I missed these, in case you've already posted them. Good idea - I'll have to dig this up. To explain more, I'm using a boot floppy image made with the "release" target of the standard makefiles. So, this is a BOOTMFS kernel. What I do is change a kernel file (usually adding printf()s) and then "make release". Boot from the resulting floppy image; insert a fixit floppy and then start up newfs's to test things... [rather involved; but the file system that gives me the reliable reproduction happens to be the root one on my test machine...] Here's an example of one of the kernel compiles that results from that (chopped up to fit 80 columns): cc -c -O -W -Wreturn-type -Wcomment -Wredundant-decls -Wimplicit -nostdinc -I. -I../.. -I../../sys -I../../../include -DBOOTMFS -DI586_CPU -DI486_CPU -DI386_CPU -DMFS_ROOT=1450 -DUSERCONFIG_BOOT -DMAXCONS=4 -DNFS_NOSERVER -DMFS -DATAPI -DVISUAL_USERCONFIG -DUSERCONFIG -DUCONSOLE -DBOUNCE_BUFFERS -DSCSI_DELAY=15 -DCOMPAT_43 -DCD9660 -DMSDOSFS -DNFS -DFFS -DINET -DMATH_EMULATE -DKERNEL -Di386 -DLOAD_ADDRESS=0xF0100000 ../../i386/i386/trap.c However, I was skeptical myself. So; I removed *every* printf() from my sources (retrieving the original sources) and remade my kernel. This resulted in a kernel that would demonstrate the problem; and, of course, has gcc flags as you see above. This means I can rebuild a kernel that fails, and, to some extent, indicates my build process is consistent. My next idea is to add the printf()s back, one-at-a-time, to see which one (or combinations) mask the problem. I have a difficult time believing that slowing down disksort() (especially since this isn't an appreciable slow-down) does anything. But, it would add delay to it's calling routine (in this case, the SCSI driver) which may be significant.... But, again, this happens with SCSI and IDE, so I'm guessing the problem lies outside of the low-level drivers and is to be found at some common level (which is what lead me to the shared disksort() routine.) Because of this commonality, I'm betting it's something in /sys/ufs/ufs - which is where I'm investigating... (well, blindly stumbling.) > > Regards, > > > Mike - Dave Rivers -