Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 05 Jul 2023 14:51:52 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 272386] The iconv converter from EUC-JP to UTF-8 accepts second and third bytes outside of the valid range
Message-ID:  <bug-272386-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D272386

            Bug ID: 272386
           Summary: The iconv converter from EUC-JP to UTF-8 accepts
                    second and third bytes outside of the valid range
           Product: Base System
           Version: 13.2-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: bin
          Assignee: bugs@FreeBSD.org
          Reporter: bruno@clisp.org

The structure of the EUC-JP encoding is explained in
https://en.wikipedia.org/wiki/Extended_Unix_Code#EUC-JP . The second byte o=
f a
two- or three-bytes sequence must be in the range 0xA1..0xFE for the sequen=
ce
to be valid. The third byte of a three-bytes sequence must be in the range
0xA1..0xFE for the sequence to be valid. So, bytes in the range 0x00..0x7F =
are
only valid as the first byte.

The FreeBSD 13.2 converter from EUC-JP to UTF-8 accepts bytes < 0x80 in the=
se
positions. This is not helpful, because it breaks detection of encodings by
applications.

How to reproduce:
$ cc -Wall -o table-from table-from.c
$ ./table-from EUC-JP > EUC-JP.TXT

Attached are the actual and the expected EUC-JP.TXT.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-272386-227>