Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 19 May 2021 14:59:46 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 243229] awk length() function in base system produces an incorrect results for UTF-8 strings
Message-ID:  <bug-243229-227-797NunqJn4@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-243229-227@https.bugs.freebsd.org/bugzilla/>
References:  <bug-243229-227@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D243229

Fr=C3=A9d=C3=A9ric Fauberteau <triaxx@NetBSD.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |triaxx@NetBSD.org

--- Comment #2 from Fr=C3=A9d=C3=A9ric Fauberteau <triaxx@NetBSD.org> ---
I don't know if this issue is related to that bug report, but the following
command prints 'bin':
% echo "bin" | LANG=3Den_US awk '$1 ~ /^[\t -~]/ {print $0}'
while this one prints nothing:
echo "bin" | LANG=3Den_US.UTF-8 awk '$1 ~ /^[\t -~]/ {print $0}'

The range from ' ' to '~' includes alphabetical characters when the locale =
is
not utf-8 but does not when the locale is utf-8.

We can notice that '/^[\t -~]/' matches "bin" with C.UTF-8.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-243229-227-797NunqJn4>