Date: Mon, 22 Jun 2020 17:39:05 -0500 From: Zhihao Yuan <lichray@gmail.com> To: Gleb Smirnoff <glebius@freebsd.org> Cc: Yuri Pankov <yuripv@yuripv.dev>, Yuri Pankov <yuripv@freebsd.org>, svn-src-head@freebsd.org Subject: Re: svn commit: r362148 - head/contrib/nvi/common Message-ID: <CAGsORuDZ-WpvAzOXjKQiC2F6f1=iaAuzs21VW-NUqqgGf5V%2BPg@mail.gmail.com> In-Reply-To: <20200622222448.GB31842@FreeBSD.org> References: <202006131411.05DEB2mP097868@repo.freebsd.org> <20200622221144.GA31842@FreeBSD.org> <3fe4705c-e036-6999-b6b0-6e05f7cf8321@yuripv.dev> <20200622222448.GB31842@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jun 22, 2020 at 5:24 PM Gleb Smirnoff <glebius@freebsd.org> wrote: > > My first attempt was this fix: > > --- common/exf.c (revision 362200) > +++ common/exf.c (working copy) > @@ -1252,7 +1252,8 @@ file_encinit(SCR *sp) > else if (O_ISSET(sp, O_FILEENCODING) && > strcasecmp(O_STR(sp, O_FILEENCODING), "utf-8") != 0) > /* Use fileencoding as is */ ; > - else if (strcasecmp(codeset(), "utf-8") != 0) > + else if (strncasecmp(codeset() + strlen(codeset()) - 5, "utf-8", > 5) != > + 0) > o_set(sp, O_FILEENCODING, OS_STRDUP, codeset(), 0); > else > o_set(sp, O_FILEENCODING, OS_STRDUP, "iso8859-1", 0); > > But it appeared to be not the case. To my surprise, codeset() > which is wrapper around nl_langinfo() in my case returns US-ASCII. > > That sounds strange. 1. Can you set LC_CTYPE as well and see if anything changes? 2. Can you revert to the previous version and see what nl_langinfo gives? There is another issue... I'm sorry. I totally forgot what looks_utf8 actually does. Here is its behavior (encoding.c): Returns -1: invalid UTF-8 0: uses odd control characters, so doesn't look like text 1: 7-bit text 2: definitely UTF-8 text (valid high-bit set bytes) So if looks_utf8() > 1, it means the file itself is UTF-8 for sure. If you opened a file with 7-bit text or with control characters, :set fileencoding should set the encoding intended to write. But the HEAD behaviors is that you can't input Unicode. I'm reverting upstream. -- Zhihao Yuan, ID lichray The best way to predict the future is to invent it. _______________________________________________
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGsORuDZ-WpvAzOXjKQiC2F6f1=iaAuzs21VW-NUqqgGf5V%2BPg>