Date: Wed, 19 May 2021 14:59:46 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 243229] awk length() function in base system produces an incorrect results for UTF-8 strings Message-ID: <bug-243229-227-797NunqJn4@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-243229-227@https.bugs.freebsd.org/bugzilla/>
index | next in thread | previous in thread | raw e-mail
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=243229 Frédéric Fauberteau <triaxx@NetBSD.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |triaxx@NetBSD.org --- Comment #2 from Frédéric Fauberteau <triaxx@NetBSD.org> --- I don't know if this issue is related to that bug report, but the following command prints 'bin': % echo "bin" | LANG=en_US awk '$1 ~ /^[\t -~]/ {print $0}' while this one prints nothing: echo "bin" | LANG=en_US.UTF-8 awk '$1 ~ /^[\t -~]/ {print $0}' The range from ' ' to '~' includes alphabetical characters when the locale is not utf-8 but does not when the locale is utf-8. We can notice that '/^[\t -~]/' matches "bin" with C.UTF-8. -- You are receiving this mail because: You are the assignee for the bug.help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-243229-227-797NunqJn4>
