Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 19 May 2021 14:59:46 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 243229] awk length() function in base system produces an incorrect results for UTF-8 strings
Message-ID:  <bug-243229-227-797NunqJn4@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-243229-227@https.bugs.freebsd.org/bugzilla/>

index | next in thread | previous in thread | raw e-mail

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=243229

Frédéric Fauberteau <triaxx@NetBSD.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |triaxx@NetBSD.org

--- Comment #2 from Frédéric Fauberteau <triaxx@NetBSD.org> ---
I don't know if this issue is related to that bug report, but the following
command prints 'bin':
% echo "bin" | LANG=en_US awk '$1 ~ /^[\t -~]/ {print $0}'
while this one prints nothing:
echo "bin" | LANG=en_US.UTF-8 awk '$1 ~ /^[\t -~]/ {print $0}'

The range from ' ' to '~' includes alphabetical characters when the locale is
not utf-8 but does not when the locale is utf-8.

We can notice that '/^[\t -~]/' matches "bin" with C.UTF-8.

-- 
You are receiving this mail because:
You are the assignee for the bug.

help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-243229-227-797NunqJn4>