From owner-freebsd-current Tue Nov 12 10:18: 7 2002 Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E481137B401 for ; Tue, 12 Nov 2002 10:18:04 -0800 (PST) Received: from bunrab.catwhisker.org (adsl-63-193-123-122.dsl.snfc21.pacbell.net [63.193.123.122]) by mx1.FreeBSD.org (Postfix) with ESMTP id F27AA43E4A for ; Tue, 12 Nov 2002 10:18:03 -0800 (PST) (envelope-from david@catwhisker.org) Received: from bunrab.catwhisker.org (localhost [127.0.0.1]) by bunrab.catwhisker.org (8.12.6/8.12.6) with ESMTP id gACII3G8052660 for ; Tue, 12 Nov 2002 10:18:03 -0800 (PST) (envelope-from david@bunrab.catwhisker.org) Received: (from david@localhost) by bunrab.catwhisker.org (8.12.6/8.12.6/Submit) id gACII3oB052659 for current@freebsd.org; Tue, 12 Nov 2002 10:18:03 -0800 (PST) Date: Tue, 12 Nov 2002 10:18:03 -0800 (PST) From: David Wolfskill Message-Id: <200211121818.gACII3oB052659@bunrab.catwhisker.org> To: current@freebsd.org Subject: Weird error during "make installworld" [executable becomes "data"] Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG OK; this is a bit strange, and I've come up with a circumvention (read "really ugly bloody hack"), but my real concern is that this may be a manifestation or symptom of something broken in some subtle way. The note is rather long- winded; sorry about that, but I didn't see a better way to do this. Background: I track -CURRENT (and -STABLE) daily on both an SMP "build machine" and on my laptop. Other thandifferences imposed by the different hardware types, and the fact that the build machine is normally run headless (with a serial console), the machines are set up fairly similarly -- in particular, I mount /tmp on /dev/md10 on both machines. However, I have never seen this problem on the build machine, but I have been seeing it regularly on the laptop for the past week or so. Here's an excerpt of the typescript from the "make installworld": ---%<----- snip! ------------------------------------------ >>> Installing everything.. ... ===> share/examples ... if [ -L /usr/share/examples/bootforth ]; then rm -f /usr/share/examples/bootforth; fi if [ -L /usr/share/examples/cvsup ]; then rm -f /usr/share/examples/cvsup; fi ... if [ -L /usr/share/examples/startslip ]; then rm -f /usr/share/examples/startslip; fi if [ -L /usr/share/examples/sunrpc ]; then rm -f /usr/share/examples/sunrpc; fi if [ -L /usr/share/examples/worm ]; then rm -f /usr/share/examples/worm; fi mtree -deU -f /usr/src/share/examples/../../etc/mtree/BSD.usr.dist -p /usr : not found : not found : not found : not found : not found : not found : not found : not found : not found : not found : not found : not found ! : not found : not found /tmp/install.O6fzOZh4/mtree: 62: Syntax error: "(" unexpected *** Error code 2 ---%<----- snip! ------------------------------------------ Now, watch this: g1-9(5.0-C)[4] file /tmp/install.O6fzOZh4/* |grep data /tmp/install.O6fzOZh4/mtree: data g1-9(5.0-C)[5] Huh??!? The symptom seems to be associated with some small number of the executables that are stuffed away in /tmp/install.* for execution during the "make installworld" starting out OK (yes, I verified this), but at some point during the installworld, "file" stops identifying them as executables, and identifies them as mere "data". This does not appear to merely be a matter of file modes and flags: g1-9(5.0-C)[5] ls -lo /tmp/install.O6fzOZh4/mtree -r-xr-xr-x 1 root wheel - 27128 Nov 12 09:37 /tmp/install.O6fzOZh4/mtree Rather, I suspect that the (swap-backed) image is getting corrupted in some way: g1-9(5.0-C)[6] hd -n 32 !$ hd -n 32 /tmp/install.O6fzOZh4/mtree 00000000 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 |................| 00000010 00 00 00 00 02 00 00 00 00 00 00 00 00 00 03 00 |................| 00000020 g1-9(5.0-C)[7] hd -n 32 /tmp/install.O6fzOZh4/mv 00000000 7f 45 4c 46 01 01 01 09 00 00 00 00 00 00 00 00 |.ELF............| 00000010 02 00 03 00 01 00 00 00 c0 80 04 08 34 00 00 00 |........À...4...| 00000020 The circumvention has had 100% success rate in 3 tries so far. It consists of the starting up the following in another window once the "make installworld" is under way: while (1) file /tmp/install.*/* | grep data && date && break; sleep 5 end Just *why* that appears to be effective is not something I can even guess right now, but that it does is what suggests to me that there is something subtly broken somewhere that could bite us badly. Sometimes it's mtreee that gets clobbered; less often, it's zic. These are the only two victims I recall at present. I may experiment with tweaking the part of installworld that creates the /tmp/install* directory to make the directory & its contents immutable as a different possible circumvention. (In my case, since /tmp is only swap-backed, I have a fair degree of confidence that anything put there really is ephemeral, regardless of flags.) In the case in question, I just blew away the old /tmp/install.O6fzOZh4 directory, fired up the "make installworld" again, then started the above-cited loop. And the "make installworld" has now finished, apparently successfully. And yes, I know that the "make installworld" should be done in single- user mode. I tried that; it does not appear to help. I *think* I also saw the symptom one time when I tried doing the installworld in single-user mode without creating a separate swap-backed /tmp -- though I confess I am not certain of that as of this writing. I don't recall seeing similar symptoms being mentioned by anyone else. Is it plausible that there's something weird about this laptop (Dell Inspiron 5000e) that might contribute to these symptoms? Thanks, david -- David H. Wolfskill david@catwhisker.org I have no confidence in results obtained through the use of Microsoft products. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message