Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 06 Dec 2022 08:22:58 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 268189] BSD tar incorectly encode UTF-8 sequences
Message-ID:  <bug-268189-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D268189

            Bug ID: 268189
           Summary: BSD tar incorectly encode UTF-8 sequences
           Product: Base System
           Version: 13.1-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Many People
          Priority: ---
         Component: bin
          Assignee: bugs@FreeBSD.org
          Reporter: aeder@list.ru

BSD tar incorectly encode UTF-8 sequences

How to repeat:
Create two directories with (UTF-8) names:

d0 bf d0 be d0 bb d0 b5 d0 b2 d0 be d0 b8 cc 86
d0 bf d0 be d0 bb d0 b5 d0 b2 d0 be d0 b9

("=D0=BF=D0=BE=D0=BB=D0=B5=D0=B2=D0=BE=D0=B8=CC=86" and "=D0=BF=D0=BE=D0=BB=
=D0=B5=D0=B2=D0=BE=D0=B9"). It looks exactly the same, but actually it's
different names.

The difference is that sequence 'd0 b9' encode cyrillic '=D0=B9' symbol, bu=
t 'd0 b8
cc 86' encode actually two symbols: cyrillic '=D0=B8' and diacritic symbol =
which I
can't enter here.

You can create such directories or files, but if archived using BSD tar, se=
cond
name become replaced by first name.

Adding --posix option or LC_ALL=3DC doesn't help.

GNU tar handle such files correctly - as separate files/directories.

I think at least --posix (or some another option) must allow to COMPLETELY
disable all filename encoding/decoding operations.

Problem arise in 12.3-RELEASE also, but seems to absent in 10-RELEASEs.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-268189-227>