Date: Fri, 23 Mar 2001 18:20:32 -0800 (PST) From: Matt Dillon <dillon@earth.backplane.com> To: Dave Tweten <tweten@nas.nasa.gov> Cc: freebsd-stable@FreeBSD.ORG Subject: Re: 4.3-RC Kernel Buffer Corruption (Was: 4.3-BETA makeworld of current STABLE Fails) Message-ID: <200103240220.f2O2KWZ89481@earth.backplane.com> References: <200103240208.f2O28G602193@gilmore.nas.nasa.gov>
next in thread | previous in thread | raw e-mail | index | archive | help
: :> * Is the corruption in the same file every time you try the buildworld? : :That depends. If I reboot the machine to create a controlled set of :conditions and immediately do a buildworld, then yes, it seems to strike the :same file at the same offset every time. If I cvsup some more updates first, :or start the buildworld after the machine has been doing other random stuff, :then no, it strikes various files, but always while at "stage 4: building :libraries," according to the world.log file I capture. A hexdump of that reproducably corrupted file would be invaluable. Is the contents of the corruption the same every time (in the reboot/buildworld case) or different? :> * Is the corruption at the same offset/length? : :No/yes. The corruption always starts at a multiple of 1024 bytes into the :file, and is always 1024 bytes long. The offset's actual multiple of 1024 :varies, but has not yet been 0. Is it always at or near the end of a file (within the last 8K) or sometimes in the middle? Large files sometimes or only small files? :> * How are you monitoring the corruption? ktrace? cat? vi? : :When the buildworld croaks, I track down the file from information in :world.log and look at it with emacs. Interestingly, the corruption always has :a "look" to it. The first few bytes of corruption in a recent example looked :in emacs like : : \244\201^A^@^@^@^@^@\377 : :and so forth. The corruption is always rich in "^@" which I think is Emacs :for ASCII NUL. Yah. :> * How definitive a kernel -stable date can you lock the corruption :> down at? Judging from this and prior messages, somewhere between :> Feb16 and Mar1 ? : :Unfortunately, that's the best I can do. My newly installed automatic weekly :cvsup-buildworld-buildkernel failed a couple of consecutive weekends before I :became convinced that it wasn't just a transient in STABLE or an error on my :part causing real corruption in /usr/src. This is extremely helpful. It limits the scope to commits made on the 16th and for the week following the 16th. :> * Do you have softupdates enabled? If so, try turning them off. : :I didn't in the previous config file. That file had remained utterly That is also extremely helpful. It means softupdates is almost certainly not responsible for the problem, which greatly reduces the amount of code I have to go through :-) :Incidently, thanks for stepping up. :-- :M/S 258-5 | 1024-bit PGP fingerprint: | tweten@nas.nasa.gov :NASA Ames Research Center | 41 B0 89 0A 8F 94 6C 59 | (650) 604-4416 Repeatable corruption is the holy grail of kernel debugging. A couple of us have been tearing our hair out tring to track down a filesystem corruption case that has been occuring very inoften for months. We've fixed a number of things, but weird things still occur occassionally. I'm hoping that the corruption you are able to reproduce is due to a kernel bug and not due to something else. Strange hope, eh? -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200103240220.f2O2KWZ89481>