Date: Thu, 16 Nov 2000 09:23:08 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: will@physics.purdue.edu Cc: tlambert@primenet.com (Terry Lambert), arch@FreeBSD.org Subject: Re: Turning on debugging in GENERIC Message-ID: <200011160923.CAA01986@usr02.primenet.com> In-Reply-To: <20001115180257.B26516@puck.firepipe.net> from "Will Andrews" at Nov 15, 2000 06:02:57 PM
next in thread | previous in thread | raw e-mail | index | archive | help
The real point of this thread should be that a 386 taking forever to start up because it's not fast at generating pseudo-randomness is not an acceptable state of affairs. There are plenty of laptop and other so-called "green" processors today which will downgrade CPU power to the level of an old 386, merely to extend battery life or to be considered "eco-friendly", so throwing out the old systems will not solve the problem. The answer to this slow-start "dilemma" is _not_ to throw out the slow processors which can't run the hulking, slow-running new "improved" code, just for the sake of not making that code run efficiently. It's obvious to me that this has the same fix as if someone had put a big "for" loop in the idle proc: tolerate it for a while, if it does something useful, and if it doesn't get fixed after a while, take it out and shoot it, like PHK did to Julian's slice code, even if it _does_ something useful. --- I will answer your points after the ^L below, even though they are now wildly off-subject, as build engineering _is_ at least on topic for the -arch list; those not interested can stop the remainder of the message in their mail client now: If you are reply to this, please change the Subject: line. > > Many people "try" -current on small scratch disks that they > > install from snapshots, rather than polluting their local > > trees with -current bits, particularly since the answer to > > their bug reports is pretty much to ignore them and tell the > > reporter to "resup" or ask "have you tried the snapshot?". > > Um, Terry, are you even on bugs@ ? The fact of life is, many folks who > "try" -current that report bugs do not give enough details, so they in > return get vague suggestions like these. No, I'm not. But I run -current on scratch disks from snapshots because I can't afford the bandwidth to cvsup all the time, and because when you use cvsup, the lack of an interlock means that the result is often unbuildable, particularly when it comes to -current. If I can't afford one, then I can't afford the second one that would be necessary, were I to have a failure. Unfortunately, the cvsup date is not useful information for use in a bug report, either. I would have to change my strategy to doing a cvsup, and then backing off by date in GMT, until I got something that compiled (not a winning strategy on a 386, in any case). That would give me a baseline from which I (or others) could then report bugs that would be repeatable, even if not bleeding-edge -current. It takes a prohibitive amount of disk cycles to do something like this, and hosted cross-builds are still not that easy, unless you want to dirty your main source tree and the /sys link, or unless your scratch disks are really massive suckers. Using snapshots avoids the CPU cycles problem and the cvsup data synchronization problem: a snapshot is not made available until it can at least successfully compile. So you could ammend my statement, I suppose, to ``clued people "try" snapshots to reduce the number of useless answers to their bug reports''. So in summary, I don't need to be on "bugs@" today, since I'm already well aware of the dynamics involved, and nothing has changed from the past about them that would change the dynamics; and it's the dynamics which lead to the lack of details and the vague responses. > Besides, people are told to resup when the newer -current has > fixed the problem, and using a snapshot is an easy way to > determine points of infection. Agreed; I think all reports should be made against snapshots, to have a clear demarcation between developement and testing, if nothing else. But even though I advocate using them, in my post, which you quoted above, they are still succeptible to the problems I noted. Snapshots have a relatively short archival life expectancy, and so they aren't really useful for developers for repeating a user reported problem. Even with an exact date, a developer is probably not going to be able to rebuild a system against which they can run gdb with a user supplied crash dump. So where are we left? With a bunch of developers who would like help testing their code against a lot of different hardware, and a much larger group of users who would like to help them out, but have huge impediments in the communications channel between them and the developers. How can we resolve this impasse? There are a couple of simple procedural fixes, actually: 1) The snapshot was rebuildable from sources, such that a kernel debugging session would work. This could be done by build-tagging the repository sources (the tags could be removed as the snapshots were removed), or by using explicit dates to check out the snapshot trees for the build. 2) Snapshots could be trusted to hang around for long enough for a developer and a bug reported to be able to rendesvous on one, and fix a problem. 3) Kernels for snapshots were built with full debugging symbols (-g), and only stripped for the snapshot, and the unstripped version kept around with the snapshot for use by a developer wanting to debug a crash (this would mostly eliminate the ned for #1 for debugging -- but not for bug fixing, since you would want to rebuild with more error diagnostics and retry the failure, etc., until the fault was isolated). Even Whistle, which was about as ad-hoc about using the local source repository to communicate between developers in adjacent cubicles (a practice which can result in frequently unbuildable source trees) knew enough, institutionally, about build engineering to at least make all successful builds (not just releases) rebuildable. A seperate build engineer role, and a willingness to tag builds in the repository after reverting changes which prevented builds, was helpful in making this a nightly (or more frequent) occurrance, but FreeBSD doesn't have that strong an "it works" requirement, nor as formal testing vs. requirements and a regression of closed but not verified resolved bugs, that it could not afford some delay between snapshot instances. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200011160923.CAA01986>