Date: Wed, 05 Jul 2023 14:51:52 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 272386] The iconv converter from EUC-JP to UTF-8 accepts second and third bytes outside of the valid range Message-ID: <bug-272386-227@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D272386 Bug ID: 272386 Summary: The iconv converter from EUC-JP to UTF-8 accepts second and third bytes outside of the valid range Product: Base System Version: 13.2-RELEASE Hardware: Any OS: Any Status: New Severity: Affects Only Me Priority: --- Component: bin Assignee: bugs@FreeBSD.org Reporter: bruno@clisp.org The structure of the EUC-JP encoding is explained in https://en.wikipedia.org/wiki/Extended_Unix_Code#EUC-JP . The second byte o= f a two- or three-bytes sequence must be in the range 0xA1..0xFE for the sequen= ce to be valid. The third byte of a three-bytes sequence must be in the range 0xA1..0xFE for the sequence to be valid. So, bytes in the range 0x00..0x7F = are only valid as the first byte. The FreeBSD 13.2 converter from EUC-JP to UTF-8 accepts bytes < 0x80 in the= se positions. This is not helpful, because it breaks detection of encodings by applications. How to reproduce: $ cc -Wall -o table-from table-from.c $ ./table-from EUC-JP > EUC-JP.TXT Attached are the actual and the expected EUC-JP.TXT. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-272386-227>