Date: Wed, 23 Oct 2019 19:20:58 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 241441] inconsistency between allowed empty regex for `awk -F` and split() Message-ID: <bug-241441-227@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D241441 Bug ID: 241441 Summary: inconsistency between allowed empty regex for `awk -F` and split() Product: Base System Version: 12.0-STABLE Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: bin Assignee: bugs@FreeBSD.org Reporter: freebsd@tim.thechases.com I get an error when I try to use an empty regex for the field separator: $ echo hello | awk -F '' '{print $2}' awk: field separator FS is empty but awk has no issues splitting things on an empty regex: $ awk 'BEGIN{s=3D"hello"; split(s, a, ""); print a[1]}' h Over on gawk, I get the expected behavior $ echo hello | awk -F '' '{print $1}' h This is somewhat similar to #226112 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D226112 I get that awk uses EREs and `man re_format` says that "A (modern [Extende= d]) RE is one or more non-empty branches, separated by '|'", but 1) that's not what split() does 2) it's not what gawk's -F parameter does 3) permitting an empty regex for splitting already seems supported in awk c= ode (as the split example shows) and shouldn't break any existing usage 4) as a non-workaround, `man re_format` says that the atom "()" matches the null string, but $ echo hello | awk -F '()' '{print $1}' doesn't split the row on the null regular expression (FWIW, gawk gives the = same results when using "()" as the split pattern). In an ideal world, the behavior would match the behavior of gawk & the spli= t() function, splitting the record into each individual character. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-241441-227>