Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 19 May 2020 20:01:35 +0300
From:      Yuri Pankov <ypankov@fastmail.com>
To:        "Ronald F. Guilmette" <rfg@tristatelogic.com>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: (character) Conversion error (in vi) ?
Message-ID:  <5c384499-c87a-e121-2337-3598adf7fef0@fastmail.com>
In-Reply-To: <72824.1589685787@segfault.tristatelogic.com>
References:  <72824.1589685787@segfault.tristatelogic.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Ronald F. Guilmette wrote:
> In message <ec8735ff-fcf1-7ee4-1aed-4aa9b87c655c@fastmail.com>,
> Yuri Pankov <ypankov@fastmail.com> wrote:
> 
>> No, it's not that bug after all.  The issue is that (n)vi now (for quite
>> some time :-) defaults to UTF-8 when it can't reliably detect the file
>> encoding, so you'll just have to help it a bit adding the following to
>> ~/.nexrc:
>>
>> set fileencoding=iso8859-1
>>
>> This way (n)vi will check if file encoding looks like UTF-8, and if not,
>> it will use ISO8859-1 as fallback.
> 
> Ahhhhhh... I did what you said and yes, that fixed it!
> 
> Thanks ever so much!  This has been bugging me fofr quite awile.
> 
> And my apologies for being to lazy/preoccupied to dredge deeply
> enough into the man pages to be able to find this solution on
> my own.
> 
> If you were my fairy godmother, then I'd ask you to grant me
> one more wish, which would be to have (n)vi always be able to
> automagically correctly detect the content encoding in any given
> file it is asked to load.  But you're not, so I won't. :-)
> 
> Still, it seems like it out to be possible to do.  It appears
> that a hnuman (you) didn't have much trouble figuring out the
> correct encoding type in this instance, so one would think
> that this one piece of software might be able to do a better
> job in this particular guessing game.  (Should I bother to
> submit a PR / enhancement request for that?)

I do agree that falling back to user locale's encoding that is UTF-8 
doesn't make much sense as we already know that it will fail.  I'll put 
a change that makes us try ISO8859-1 (as it seems to be the most widely 
used single byte locale?) instead if we fail all of the checks below (as 
added to the code):

1. Check for valid UTF-8.
2. Check if fallback fileencoding is set and is NOT UTF-8.
3. Check if user locale's encoding is NOT UTF-8.
4. Use ISO8859-1 as last resort.

As for the autodetecting the single byte encoding, I don't think it's 
doable in base without adding too much dependencies -- there are tools 
in ports for this, but if you really need it, can I just say the magic 
word, "vim"? :-)

The review is at https://reviews.freebsd.org/D24919.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5c384499-c87a-e121-2337-3598adf7fef0>