Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 23 Mar 2001 18:08:16 -0800
From:      Dave Tweten <tweten@nas.nasa.gov>
To:        Matt Dillon <dillon@earth.backplane.com>
Cc:        freebsd-stable@FreeBSD.ORG
Subject:   Re: 4.3-RC Kernel Buffer Corruption (Was: 4.3-BETA makeworld of  current STABLE Fails)
Message-ID:  <200103240208.f2O28G602193@gilmore.nas.nasa.gov>
In-Reply-To: Message from Matt Dillon <dillon@earth.backplane.com>  of "Fri, 23 Mar 2001 17:28:02 PST." <200103240128.f2O1S2V87488@earth.backplane.com> 

next in thread | previous in thread | raw e-mail | index | archive | help

dillon@earth.backplane.com said:
>    A bunch of questions:

>    * Is the corruption in the same file every time you try the buildworld? 

That depends.  If I reboot the machine to create a controlled set of 
conditions and immediately do a buildworld, then yes, it seems to strike the 
same file at the same offset every time.  If I cvsup some more updates first, 
or start the buildworld after the machine has been doing other random stuff, 
then no, it strikes various files, but always while at "stage 4: building 
libraries," according to the world.log file I capture.

>    * Is the corruption at the same offset/length? 

No/yes.  The corruption always starts at a multiple of 1024 bytes into the 
file, and is always 1024 bytes long.  The offset's actual multiple of 1024 
varies, but has not yet been 0.

>    * How are you monitoring the corruption?  ktrace?  cat?  vi? 

When the buildworld croaks, I track down the file from information in 
world.log and look at it with emacs.  Interestingly, the corruption always has 
a "look" to it.  The first few bytes of corruption in a recent example looked 
in emacs like

	\244\201^A^@^@^@^@^@\377

and so forth.  The corruption is always rich in "^@" which I think is Emacs 
for ASCII NUL.

>    * How definitive a kernel -stable date can you lock the corruption
>      down at?  Judging from this and prior messages, somewhere between
>      Feb16 and Mar1 ? 

Unfortunately, that's the best I can do.  My newly installed automatic weekly 
cvsup-buildworld-buildkernel failed a couple of consecutive weekends before I 
became convinced that it wasn't just a transient in STABLE or an error on my 
part causing real corruption in /usr/src.

>    * Do you have softupdates enabled?  If so, try turning them off.

I didn't in the previous config file.  That file had remained utterly 
unchanged for months.  I did have it enabled in my work-in-progress config 
file I used for the just-previous test.  I'm currently building again with 
that and anything else that strikes me as the least bit adventuresome 
commented out.

>    * Are you using any special sysctl's ? 

No.

Incidently, thanks for stepping up.
-- 
M/S 258-5                     | 1024-bit PGP fingerprint: | tweten@nas.nasa.gov
NASA Ames Research Center     |  41 B0 89 0A  8F 94 6C 59 |      (650) 604-4416
Moffett Field, CA  94035-1000 |  7C 80 10 20  25 C7 2F E6 | FAX: (650) 604-4377
We each earn what freedom of speech we defend for those who most offend us.



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200103240208.f2O28G602193>