Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 22 Jun 2020 15:24:48 -0700
From:      Gleb Smirnoff <glebius@freebsd.org>
To:        Yuri Pankov <yuripv@yuripv.dev>
Cc:        Yuri Pankov <yuripv@freebsd.org>, Zhihao Yuan <lichray@gmail.com>, svn-src-head@freebsd.org
Subject:   Re: svn commit: r362148 - head/contrib/nvi/common
Message-ID:  <20200622222448.GB31842@FreeBSD.org>
In-Reply-To: <3fe4705c-e036-6999-b6b0-6e05f7cf8321@yuripv.dev>
References:  <202006131411.05DEB2mP097868@repo.freebsd.org> <20200622221144.GA31842@FreeBSD.org> <3fe4705c-e036-6999-b6b0-6e05f7cf8321@yuripv.dev>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jun 23, 2020 at 01:20:01AM +0300, Yuri Pankov wrote:
Y> Gleb Smirnoff wrote:
Y> >    Yuri, Zhihao,
Y> > 
Y> > this commit totally broke Russian input for me in nvi. After
Y> > exiting edit mode, nvi immediately converts all text to ???????.
Y> > 
Y> > I don't have any special settings in my environment. All I have
Y> > is "russian" class for my user which yields in these environment
Y> > variables:
Y> > 
Y> > declare -x LANG="ru_RU.UTF-8"
Y> > declare -x MM_CHARSET="UTF-8"
Y> > declare -x XTERM_LOCALE="ru_RU.UTF-8"
Y> > 
Y> > I'm already digging into that problem, but may be you have
Y> > a clue immediately.
Y> 
Y> My bad, yes, I see the problem, looking into it.

My first attempt was this fix:

--- common/exf.c        (revision 362200)
+++ common/exf.c        (working copy)
@@ -1252,7 +1252,8 @@ file_encinit(SCR *sp)
        else if (O_ISSET(sp, O_FILEENCODING) &&
            strcasecmp(O_STR(sp, O_FILEENCODING), "utf-8") != 0)
                /* Use fileencoding as is */ ;
-       else if (strcasecmp(codeset(), "utf-8") != 0)
+       else if (strncasecmp(codeset() + strlen(codeset()) - 5, "utf-8", 5) !=
+           0)
                o_set(sp, O_FILEENCODING, OS_STRDUP, codeset(), 0);
        else
                o_set(sp, O_FILEENCODING, OS_STRDUP, "iso8859-1", 0);

But it appeared to be not the case. To my surprise, codeset()
which is wrapper around nl_langinfo() in my case returns US-ASCII.

Y> > On Sat, Jun 13, 2020 at 02:11:02PM +0000, Yuri Pankov wrote:
Y> > Y> Author: yuripv
Y> > Y> Date: Sat Jun 13 14:11:02 2020
Y> > Y> New Revision: 362148
Y> > Y> URL: https://svnweb.freebsd.org/changeset/base/362148
Y> > Y>
Y> > Y> Log:
Y> > Y>   nvi: fallback to ISO8859-1 as last resort
Y> > Y>
Y> > Y>   Current logic of using user's locale encoding that is UTF-8 doesn't make
Y> > Y>   much sense if we already failed the looks_utf8() check and skipped
Y> > Y>   encoding set using "fileencoding" as being UTF-8 as well; fallback to
Y> > Y>   ISO8859-1 in that case.
Y> > Y>
Y> > Y>   Reviewed by:	Zhihao Yuan <lichray@gmail.com>
Y> > Y>   Differential Revision:	https://reviews.freebsd.org/D24919
Y> > Y>
Y> > Y> Modified:
Y> > Y>   head/contrib/nvi/common/exf.c
Y> > Y>
Y> > Y> Modified: head/contrib/nvi/common/exf.c
Y> > Y> ==============================================================================
Y> > Y> --- head/contrib/nvi/common/exf.c	Sat Jun 13 09:16:07 2020	(r362147)
Y> > Y> +++ head/contrib/nvi/common/exf.c	Sat Jun 13 14:11:02 2020	(r362148)
Y> > Y> @@ -1237,7 +1237,10 @@ file_encinit(SCR *sp)
Y> > Y>  	}
Y> > Y>
Y> > Y>  	/*
Y> > Y> -	 * Detect UTF-8 and fallback to the locale/preset encoding.
Y> > Y> +	 * 1. Check for valid UTF-8.
Y> > Y> +	 * 2. Check if fallback fileencoding is set and is NOT UTF-8.
Y> > Y> +	 * 3. Check if user locale's encoding is NOT UTF-8.
Y> > Y> +	 * 4. Use ISO8859-1 as last resort.
Y> > Y>  	 *
Y> > Y>  	 * XXX
Y> > Y>  	 * A manually set O_FILEENCODING indicates the "fallback
Y> > Y> @@ -1246,9 +1249,13 @@ file_encinit(SCR *sp)
Y> > Y>  	 */
Y> > Y>  	if (looks_utf8(buf, blen) > 1)
Y> > Y>  		o_set(sp, O_FILEENCODING, OS_STRDUP, "utf-8", 0);
Y> > Y> -	else if (!O_ISSET(sp, O_FILEENCODING) ||
Y> > Y> -	    !strcasecmp(O_STR(sp, O_FILEENCODING), "utf-8"))
Y> > Y> +	else if (O_ISSET(sp, O_FILEENCODING) &&
Y> > Y> +	    strcasecmp(O_STR(sp, O_FILEENCODING), "utf-8") != 0)
Y> > Y> +		/* Use fileencoding as is */ ;
Y> > Y> +	else if (strcasecmp(codeset(), "utf-8") != 0)
Y> > Y>  		o_set(sp, O_FILEENCODING, OS_STRDUP, codeset(), 0);
Y> > Y> +	else
Y> > Y> +		o_set(sp, O_FILEENCODING, OS_STRDUP, "iso8859-1", 0);
Y> > Y>
Y> > Y>  	conv_enc(sp, O_FILEENCODING, 0);
Y> > Y>  #endif

-- 
Gleb Smirnoff



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20200622222448.GB31842>