Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 23 Jun 2020 01:46:57 +0300
From:      Yuri Pankov <yuripv@yuripv.dev>
To:        Zhihao Yuan <lichray@gmail.com>, Gleb Smirnoff <glebius@freebsd.org>
Cc:        Yuri Pankov <yuripv@freebsd.org>, svn-src-head@freebsd.org
Subject:   Re: svn commit: r362148 - head/contrib/nvi/common
Message-ID:  <fd88b39a-eec3-e42f-4182-036fc1f8e644@yuripv.dev>
In-Reply-To: <CAGsORuDZ-WpvAzOXjKQiC2F6f1=iaAuzs21VW-NUqqgGf5V%2BPg@mail.gmail.com>
References:  <202006131411.05DEB2mP097868@repo.freebsd.org> <20200622221144.GA31842@FreeBSD.org> <3fe4705c-e036-6999-b6b0-6e05f7cf8321@yuripv.dev> <20200622222448.GB31842@FreeBSD.org> <CAGsORuDZ-WpvAzOXjKQiC2F6f1=iaAuzs21VW-NUqqgGf5V%2BPg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Zhihao Yuan wrote:
> On Mon, Jun 22, 2020 at 5:24 PM Gleb Smirnoff <glebius@freebsd.org 
> <mailto:glebius@freebsd.org>> wrote:
> 
> 
>     My first attempt was this fix:
> 
>     --- common/exf.c        (revision 362200)
>     +++ common/exf.c        (working copy)
>     @@ -1252,7 +1252,8 @@ file_encinit(SCR *sp)
>              else if (O_ISSET(sp, O_FILEENCODING) &&
>                  strcasecmp(O_STR(sp, O_FILEENCODING), "utf-8") != 0)
>                      /* Use fileencoding as is */ ;
>     -       else if (strcasecmp(codeset(), "utf-8") != 0)
>     +       else if (strncasecmp(codeset() + strlen(codeset()) - 5,
>     "utf-8", 5) !=
>     +           0)
>                      o_set(sp, O_FILEENCODING, OS_STRDUP, codeset(), 0);
>              else
>                      o_set(sp, O_FILEENCODING, OS_STRDUP, "iso8859-1", 0);
> 
>     But it appeared to be not the case. To my surprise, codeset()
>     which is wrapper around nl_langinfo() in my case returns US-ASCII.
> 
> 
> That sounds strange.
> 
>    1. Can you set LC_CTYPE as well and see
>      if anything changes?
>    2. Can you revert to the previous version
>      and see what nl_langinfo gives?
> 
> There is another issue... I'm sorry.  I totally forgot what
> looks_utf8 actually does.
> 
> Here is its behavior (encoding.c):
> 
>   Returns
>   -1: invalid UTF-8
>    0: uses odd control characters, so doesn't look like text
>    1: 7-bit text
>    2: definitely UTF-8 text (valid high-bit set bytes)
> 
> So if looks_utf8() > 1, it means the file itself is UTF-8
> for sure.  If you opened a file with 7-bit text or with
> control characters, :set fileencoding should set
> the encoding intended to write.  But the HEAD
> behaviors is that you can't input Unicode.
> 
> I'm reverting upstream.

Yes, I will revert for now.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?fd88b39a-eec3-e42f-4182-036fc1f8e644>