Date: Tue, 06 Dec 2022 08:22:58 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 268189] BSD tar incorectly encode UTF-8 sequences Message-ID: <bug-268189-227@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D268189 Bug ID: 268189 Summary: BSD tar incorectly encode UTF-8 sequences Product: Base System Version: 13.1-RELEASE Hardware: Any OS: Any Status: New Severity: Affects Many People Priority: --- Component: bin Assignee: bugs@FreeBSD.org Reporter: aeder@list.ru BSD tar incorectly encode UTF-8 sequences How to repeat: Create two directories with (UTF-8) names: d0 bf d0 be d0 bb d0 b5 d0 b2 d0 be d0 b8 cc 86 d0 bf d0 be d0 bb d0 b5 d0 b2 d0 be d0 b9 ("=D0=BF=D0=BE=D0=BB=D0=B5=D0=B2=D0=BE=D0=B8=CC=86" and "=D0=BF=D0=BE=D0=BB= =D0=B5=D0=B2=D0=BE=D0=B9"). It looks exactly the same, but actually it's different names. The difference is that sequence 'd0 b9' encode cyrillic '=D0=B9' symbol, bu= t 'd0 b8 cc 86' encode actually two symbols: cyrillic '=D0=B8' and diacritic symbol = which I can't enter here. You can create such directories or files, but if archived using BSD tar, se= cond name become replaced by first name. Adding --posix option or LC_ALL=3DC doesn't help. GNU tar handle such files correctly - as separate files/directories. I think at least --posix (or some another option) must allow to COMPLETELY disable all filename encoding/decoding operations. Problem arise in 12.3-RELEASE also, but seems to absent in 10-RELEASEs. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-268189-227>