Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 09 Jan 2020 21:20:49 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 243229] awk in base system does not work with UTF-8 strings correctly
Message-ID:  <bug-243229-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D243229

            Bug ID: 243229
           Summary: awk in base system does not work with UTF-8 strings
                    correctly
           Product: Base System
           Version: 12.1-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: misc
          Assignee: bugs@FreeBSD.org
          Reporter: sv@ulstu.ru

I tried using the function length() with UTF-8 strings. And this function
produces an incorrect result. The function works with strings not as
characters, but as bytes. And the number of characters per string is multip=
lied
by two.

Steps to reproduce (for LANG=3Dru_RU.UTF-8):

echo '=D0=9F=D1=80=D0=B8=D0=B2=D0=B5=D1=82' | awk '{print length($1)}'

If I use the function length() with lang/gawk, then UTF-8 string length is
calculated correctly.

Are you planning to update awk in the base system to support UTF-8 strings =
in
the near future?

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-243229-227>