Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 14 May 1998 11:00:01 -0700 (PDT)
From:      woods@zeus.leitch.com (Greg A. Woods)
To:        freebsd-bugs@FreeBSD.ORG
Subject:   Re: bin/6557: /bin/sh && IFS
Message-ID:  <199805141800.LAA08266@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help
The following reply was made to PR bin/6557; it has been noted by GNATS.

From: woods@zeus.leitch.com (Greg A. Woods)
To: FreeBSD-gnats-submit@freebsd.org
Cc:  Subject: Re: bin/6557: /bin/sh && IFS
Date: Thu, 14 May 1998 14:00:38 -0400 (EDT)

 [ On Wed, May 13, 1998 at 02:00:02 (-0700), Martin Cracauer wrote: ]
 > Subject: Re: bin/6557: /bin/sh && IFS
 >
 >  Hm, Solaris' ksh and sh don't agree completely (Solaris 2.6/SPARC):
 
 Actually with your example the original Bourne Shell is the odd man
 out.  Ksh-88i, Ksh93, ash (both NetBSD & FreeBSD), and pdksh-5.2.13
 all behave similarly with your example (which is I think the one that
 gets right down to the meat of the problem).
 
 I've finally found the rationale in POSIX 1003.2 Draft 11.2 that talks
 about this, and it does seem to make a certain amount of sense, though
 it introduces strange magic that can lead to very unexpected results:
 
 
                Copyright c 1991 IEEE.  All rights reserved.
       This is an unapproved IEEE Standards Draft, subject to change.
 
  BEGIN_RATIONALE
 
  3.6.5.1  Field Splitting Rationale. (This subclause is not a part of
           P1003.2)
 
  The operation of field splitting using IFS as described in earlier drafts
  was based on the way the KornShell splits words, but is incompatible with
  other common versions of the shell.  However, each has merit, and so a
  decision was made to allow both.  If the IFS variable is unset, or is
  <space><tab><newline>, the operation is equivalent to the way the
  System V shell splits words.  Using characters outside the
  <space><tab><newline> set yields the KornShell behavior, where each of
  the non-<space><tab><newline> characters is significant.  This behavior,
  which affords the most flexibility, was taken from the way the original
  awk handled field splitting.
 
  The (3) rule can be summarized as a pseudo ERE:                            1
 
        (s*ns*|s+)                                                           1
 
  where s is an IFS white-space character and n is a character in the IFS    1
  that is not white space.  Any string matching that ERE delimits a field,   1
  except that the s+ form does not delimit fields at the beginning or the    1
  end of a line.  For example, if IFS is <space><comma>, the string          1
 
        <space><space>red<space><space>,<space>white<space>blue              1
 
  yields the three colors as the delimited fields.                           1
 
  END_RATIONALE                                                              1
 
 >  Hm, so what are the arguments to `for` (or to any command)?
 >  
 >  As far as I can tell, they are
 >  - not parameter expansion
 >  - not command substitution
 >  - not arithmetic expansion
 > 
 >  The paragraph above says that only results of these expansions and
 >  substitutions are subject to field splitting. What kind of
 >  substitution or expandsion are command arguments a result of?
 
 Command arguments are not a valid concept here at all.  A deep and dark
 alley full of many horrors awaits anyone trying to think of things in
 those terms.  Another section from P1003.2D11.2 may clear the fog (and
 also gives concrete reasons for siding with Korn on these mechanisms):
 
                Copyright c 1991 IEEE.  All rights reserved.
       This is an unapproved IEEE Standards Draft, subject to change.
 
  BEGIN_RATIONALE
 
  3.6.0.1  Word Expansions Rationale. (This subclause is not a part of
           P1003.2)
 
  IFS is used for performing field splitting on the results of parameter
  and command substitution; it is not used for splitting all fields.
  Previous versions of the shell used it for splitting all fields during
  field splitting, but this has severe problems because the shell can no
  longer parse its own script.  There are also important security
  implications caused by this behavior.  All useful applications of IFS use
  it for parsing input of the read utility and for splitting the results of
  parameter and command substitution.  New versions of the shell have fixed
  this bug, and POSIX.2 requires the corrected behavior.
 
  The rule concerning expansion to a single field requires that if foo=abc
  and bar=def, that
 
        "$foo""$bar"
 
  expands to the single field
 
        abcdef
 
  The rule concerning empty fields can be illustrated by:
 
        $ unset foo
        $ set $foo bar '' xyz "$foo" abc
        $ for i
        > do
        >       echo "-$i-"
        > done
        -bar-
        --
        -xyz-
        --
        -abc-
 
  Step (1) indicates that Tilde Expansion, Parameter Expansion, Command      1
  Substitution, and Arithmetic Expansion are all processed simultaneously
  as they are scanned.  For example, the following is valid arithmetic:
 
        x=1
        echo $(( $(echo 3)+$x ))
 
  An earlier draft stated that Tilde Expansion preceded the other steps,     1
  but this is not the case in known historical implementations; if it were,  1
  and a referenced home directory contained a $ character, expansions would  1
  result within the directory name.                                          1
 
  END_RATIONALE                                                              1
 
 If that didn't quite do it, then perhaps this will (the actual rules
 that appear before the above quoted rationale).  This next section
 also answers your last question about the empty field (i.e. pdksh is
 wrong):
 
                Copyright c 1991 IEEE.  All rights reserved.
       This is an unapproved IEEE Standards Draft, subject to change.
 
  3.6  Word Expansions
 
  This clause describes the various expansions that are performed on words.
  Not all expansions are performed on every word, as explained in the
  following subclauses.
 
  Tilde expansions, parameter expansions, command substitutions, arithmetic
  expansions, and quote removals that occur within a single word expand to
  a single field.  It is only field splitting or pathname expansion that
  can create multiple fields from a single word.  The single exception to
  this rule is the expansion of the special parameter @ within double-
  quotes, as is described in 3.5.2.
 
  The order of word expansion shall be as follows:
 
      (1)  Tilde Expansion (see 3.6.1), Parameter Expansion (see 3.6.2),     1
           Command Substitution (see 3.6.3), and Arithmetic Expansion (see
           3.6.4) shall be performed, beginning to end.  [See item (5) in
           3.3.]
 
      (2)  Field Splitting (see 3.6.5) shall be performed on fields
           generated by step (1) unless IFS is null.
 
 [[NOTE: there's a minor inconsistency in the above vs. the rationale
 quoted first in this message, specifically the earlier rationale stated
 "If the IFS variable is unset, or is <space><tab><newline>, the
 operation is equivalent to the way the System V shell splits words."
 which would imply more magic happens than the above actual rule allows.
 Hopefully nobody's implmented the extra magic given in the rationale.]]
 
      (3)  Pathname Expansion (see 3.6.6) shall be performed, unless set -f
           is in effect.
 
      (4)  Quote Removal (see 3.6.7) shall always be performed last.
 
  The expansions described in this clause shall occur in the same shell
  environment as that in which the command is executed.
 
  If the complete expansion appropriate for a word results in an empty
  field, that empty field shall be deleted from the list of fields that
  form the completely expanded command, unless the original word contained   1
  single-quote or double-quote characters.                                   1
 
  The $ character is used to introduce parameter expansion, command
  substitution, or arithmetic evaluation.  If an unquoted $ is followed by
  a character that is either not numeric, the name of one of the special
  parameters (see 3.5.2), a valid first character of a variable name, a
  left curly brace ({), or a left parenthesis, the result is unspecified.
 
 -- 
 							Greg A. Woods
 
 +1 416 443-1734      VE3TCP      <gwoods@acm.org>      <robohack!woods>
 Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199805141800.LAA08266>