From owner-freebsd-arch@FreeBSD.ORG Thu Feb 24 06:24:59 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F0C6916A4CE; Thu, 24 Feb 2005 06:24:58 +0000 (GMT) Received: from smtp3.server.rpi.edu (smtp3.server.rpi.edu [128.113.2.3]) by mx1.FreeBSD.org (Postfix) with ESMTP id 39FB543D41; Thu, 24 Feb 2005 06:24:58 +0000 (GMT) (envelope-from drosih@rpi.edu) Received: from [128.113.24.47] (gilead.netel.rpi.edu [128.113.24.47]) by smtp3.server.rpi.edu (8.13.0/8.13.0) with ESMTP id j1O6Oo95015511; Thu, 24 Feb 2005 01:24:50 -0500 Mime-Version: 1.0 Message-Id: In-Reply-To: References: <200410020349.i923nG8v021675@northstar.hetzel.org> <20041002052856.GE17792@nexus.dglawrence.com> <20041002233542.GL714@nexus.dglawrence.com> Date: Thu, 24 Feb 2005 01:24:49 -0500 To: freebsd-arch@FreeBSD.ORG From: Garance A Drosihn Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-CanItPRO-Stream: default X-RPI-SA-Score: undef - spam-scanning disabled X-Scanned-By: CanIt (www . canit . ca) cc: Maxim Sobolev cc: "David G. Lawrence" Subject: Bug in #! processing - One More Time X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Feb 2005 06:24:59 -0000 Sometimes it's the simplest little changes which can suck the life out of you... I am aware that this is a trivial issue, but now that I've figured out what is really going on, I am not sure what the "best" fix would be. To recap some history: a) In Jan 2000, someone sent in a PR that perl documentation (including the famous "Camel" book from O'Reilly) claims that users can start a script with the line: #!/bin/sh -- # -*- perl -*- -p to avoid a variety of issues when writing cross-platform scripts. Ignore the question of "but why?" for the moment, it *is* documented by perl (and in books on some other scripting languages). He proposed a fix, and that was committed to src/sys/kern/imgact_shell.c as revision 1.21 back in Feb 15 2000 (predating 4.0-release). It was MFC'ed into release 3.5 on March 20, 2000. The PR is: http://www.FreeBSD.org/cgi/query-pr.cgi?pr=16393 NOTE: People *do* use this "feature". Counter: This feature doesn't actually work on recent releases of Redhat Linux. I don't know about other linuxes. b) In 2002, some other user updated that PR saying that the new behavior wasn't quite right either. I assume nothing much was done at the time, but he spent time to collect a lot of details (which will be given below). c) In 2004, after 5.3-release, the issue came up again. I assume that is in another PR, but I haven't checked. In any case, kern/imgact_shell.c was changed to remove that special processing for '#, after discussion in -current. The change was committed to HEAD (6.x) on October 31st as revision 1.27. It was MFC'ed to 5.3-stable on November 8th. This broke scripts which depended on the special-handling of '#', but the conclusion in -current was that /bin/sh should handle such processing (if it wanted to), and not execve(). d) In January I was finally bitten by this running 6.x-current, and a friend of mine happened to get hit by it at the same time running 5.3-stable. So I wrote up a quick fix and did some minimal testing. I posted that to -current on Jan 31st, but I didn't want to commit it until I did more testing, which I wanted to do *after* I brought my systems up-to-date. e) On January 29th, sobomax committed an "unrelated" fix to kern/imgact_shell.c, except that it just happened to bring back the special '#' processing which had been removed in October... f) I update my systems, do extensive testing of my patch, and I committed it once I was confident it worked in all situations. However, I didn't notice that the shell was no longer even *seeing* the parameters after '#' (I had tested that part back in #d), so it turns out the key loop I that had added was never actually getting triggered. I committed it to 6.x-current last week. g) On Monday I get ready to MFC the change to 5.3 (ahead of the rush to beat the code-freeze!). But... the damn thing does NOT work right in some common situations!! WTF?!? So, I figure out all the above history, and I locally modify kern/imgact_shell.c to again remove the special '#'-processing. I go to fix my patch to /bin/sh, and I realize... There is no simple, "make everyone happy" fix for it. Sigh. The problem is in the way the execve() system call passes all arguments to the shell. Given a shell named /tmp/list_args.pl, which starts out as: #!/bin/sh -x -- # -*- perl -*- -p and is executed via: /tmp/list_args.pl aaa bbb What /bin/sh sees for arguments are: arg[0] == '-x' arg[1] == '--' arg[2] == '#' arg[3] == '-*-' arg[4] == 'perl' arg[5] == '-*-' arg[6] == '-p' arg[7] == '/tmp/list_args.pl' arg[8] == 'aaa' arg[9] == 'bbb' The problem is that /bin/sh has no way of knowing where the "shebang-line options" end, and the "command-line options" start. (or does it? I couldn't think of any reliable way, given that the '#' could be followed by any totally arbitrary strings). Going back to the follow-up to PR 16393, part of the challenge with fixing this is that many other OS's do *not* break up the options on the shebang line the way FreeBSD does. From the PR: Given a file called '/tmp/x2' with shebang line: #!/tmp/interp -a -b -c #dee eee If /tmp/x2 is exec'd, the operating system runs /tmp/interp with the following arguments: Solaris 8: args: "/tmp/interp" "-a" "/tmp/x2" Tru64 4.0: args: "interp" "-a -b -c #dee eee" "/tmp/x2" FreeBSD 2.2.7: args: "/tmp/interp" "-a" "-b" "-c" "#dee" "eee" "/tmp/x2" FreeBSD 4.0: args: "/tmp/interp" "-a" "-b" "-c" "/tmp/x2" Linux 2.4.12: args: "/tmp/interp" "-a -b -c #dee eee" "/tmp/x2" Linux 2.2.19: args: "interp" "-a -b -c #dee eee" "/tmp/x2" Irix 6.5: args: "/tmp/interp" "-a -b -c #dee eee" "/tmp/x2" HPUX 11.00: args: "/tmp/x2" "-a -b -c #dee eee" "/tmp/x2" AIX 4.3: args: "interp" "-a -b -c #dee eee" "/tmp/x2" Mac OX X: args: "interp" "-a -b -c #dee eee" "/tmp/x2" The most common behavior is: argv[0]: full path of interpreter argv[1]: all remaining args, coalesced into one string argv[2]: The file file exec'd. The change committed back in 2000 made the comment: "This complies to POSIX 1003.2, in that Posix says the implementation is free to choose whatever it likes.". I actually like the idea that FreeBSD splits up the arguments from the shebang-line, but that leaves us with the problem of figuring out shebang-options from user-specified options given on the command-line. As I see it, we have the following choices to fix this: 1) MFC the January 31st change to kern/imgact_shell.c to 5.3-stable, as it is. This means we haven't fixed the problem that people complained about in 2002 and again in 2004. And I still think it is "not appropriate" for the execve() system to be deciding what '#' means on that line. The biggest advantage is that this means 5.4-release will behave exactly the same as 3.5 through 5.3-release have behaved. 2) Remove '#'-processing from kern/imgact_shell.c, and remove my change to bin/sh/options.c (which doesn't work right once we do that). This breaks shell-scripts which use the feature as documented by perl (and other scripting languages), and fixes the problem people complained about in 2002/2004. 3) Change kern/imgact_shell.c to process shebang options the same way other (non-BSD?) operating systems do. By that I mean: send the entire string as arg[1], and let the scripting language sort it out. This is an incompatible change from FreeBSD 5.3 to 5.4, but would put make us "more consistent" with other operating systems. 4) Provide some way for /bin/sh to find out where the shebang options end, and the user-specified options begin. This could make everyone happy, but it's more work and right now (this close to 5.4-release) that wouldn't make me particularly happy... Or we could do #1 for now, and plan to do #4 after 5.4-release. Or do #1 now in 5.3, and go with some incompatible change (#2 or #3) only in 6.x-current. What do people think? I know this is a mind-numbingly trivial issue to care about, but I figured that if I just went ahead with any particular solution, someone would be irritated with me and assume I must not have understood "the issues". They will then commit yet *another* change which undoes whatever I did, while they fix something they feel that I broke. And if nothing else, this is proof that one can't just blindly MFC some change, no matter now trivial it seems. -- Garance Alistair Drosehn = gad@gilead.netel.rpi.edu Senior Systems Programmer or gad@freebsd.org Rensselaer Polytechnic Institute or drosih@rpi.edu