Date: Thu, 13 Aug 2015 19:57:48 +0000 From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 202290] /usr/bin/vi conversion error on valid character Message-ID: <bug-202290-8-YHAbzrK1B0@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-202290-8@https.bugs.freebsd.org/bugzilla/> References: <bug-202290-8@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=202290 --- Comment #1 from lampa@fit.vutbr.cz --- Looking at /usr/src/contrib/nvi/common/exf.c file_encinit(SCR *sp) ... if (looks_utf8(buf, blen) > 1) o_set(sp, O_FILEENCODING, OS_STRDUP, "utf-8", 0); else if (!O_ISSET(sp, O_FILEENCODING) || !strncasecmp(O_STR(sp, O_FILEENCODING), "utf-8", 5)) o_set(sp, O_FILEENCODING, OS_STRDUP, codeset(), 0); conv_enc(sp, O_FILEENCODING, 0); } 1. There is no way how to disable auto detection of encoding, if looks_utf8() returns 2, then there you are lost!!! You can setup your .exrc, but it will be ignored!!! 2. But why looks_utf() detects 0xe1 0x20 as valid utf-8? IT IS NOT VALID! Looking at /usr/src/contrib/nvi/common/encoding.c looks_utf8(const char *ibuf, size_t nbytes) ... for (n = 0; n < following; n++) { i++; if (i >= nbytes) goto done; if (buf[i] & 0x40) /* 10xxxxxx */ return -1; } That's completely wrong, it doesn't test if bit 7 is set in succeeding bytes! It should be: for (n = 0; n < following; n++) { i++; if (i >= nbytes) goto done; if ((buf[i] & 0xc0) != 0x10) /* 10xxxxxx */ return -1; } This change is was tested and works. Please fix at least broken "auto detection" before 10.2-RELEASE! But some option to disable auto-detection or honor user setting in .exrc is also required. -- You are receiving this mail because: You are the assignee for the bug.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-202290-8-YHAbzrK1B0>