Date: Wed, 19 May 2021 14:59:46 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 243229] awk length() function in base system produces an incorrect results for UTF-8 strings Message-ID: <bug-243229-227-797NunqJn4@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-243229-227@https.bugs.freebsd.org/bugzilla/> References: <bug-243229-227@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D243229 Fr=C3=A9d=C3=A9ric Fauberteau <triaxx@NetBSD.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |triaxx@NetBSD.org --- Comment #2 from Fr=C3=A9d=C3=A9ric Fauberteau <triaxx@NetBSD.org> --- I don't know if this issue is related to that bug report, but the following command prints 'bin': % echo "bin" | LANG=3Den_US awk '$1 ~ /^[\t -~]/ {print $0}' while this one prints nothing: echo "bin" | LANG=3Den_US.UTF-8 awk '$1 ~ /^[\t -~]/ {print $0}' The range from ' ' to '~' includes alphabetical characters when the locale = is not utf-8 but does not when the locale is utf-8. We can notice that '/^[\t -~]/' matches "bin" with C.UTF-8. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-243229-227-797NunqJn4>