Date: Thu, 09 Jan 2020 21:20:49 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 243229] awk in base system does not work with UTF-8 strings correctly Message-ID: <bug-243229-227@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D243229 Bug ID: 243229 Summary: awk in base system does not work with UTF-8 strings correctly Product: Base System Version: 12.1-RELEASE Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: misc Assignee: bugs@FreeBSD.org Reporter: sv@ulstu.ru I tried using the function length() with UTF-8 strings. And this function produces an incorrect result. The function works with strings not as characters, but as bytes. And the number of characters per string is multip= lied by two. Steps to reproduce (for LANG=3Dru_RU.UTF-8): echo '=D0=9F=D1=80=D0=B8=D0=B2=D0=B5=D1=82' | awk '{print length($1)}' If I use the function length() with lang/gawk, then UTF-8 string length is calculated correctly. Are you planning to update awk in the base system to support UTF-8 strings = in the near future? --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-243229-227>