Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 4 Feb 2023 13:16:37 +0900
From:      Tomoaki AOKI <junchoon@dec.sakura.ne.jp>
To:        stable@freebsd.org
Subject:   Re: Grep with non-ascii
Message-ID:  <20230204131637.4e8e66e086eea57f4bb27b12@dec.sakura.ne.jp>
In-Reply-To: <c77ad75b-b8c4-9014-0bc7-f1a0ec78272c@m5p.com>
References:  <20230203110642.70e4a076@elg.hjerdalen.lokalnett> <819a4336-9689-bdbe-a90d-8f1d7b842662@grosbein.net> <20230203151853.02732bd6@elg.hjerdalen.lokalnett> <20230204010605.4874609f80eed28543407807@dec.sakura.ne.jp> <c77ad75b-b8c4-9014-0bc7-f1a0ec78272c@m5p.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 3 Feb 2023 12:36:47 -0500
George Mitchell <george+freebsd@m5p.com> wrote:

> On 2/3/23 11:06, Tomoaki AOKI wrote:
> > [...]
> > If this is the case like above, the only solution is to move to
> > character set containing ALL characters all over the world.
> > 
> > AFAIK, the only candidates are only two, TRON code [1] and Unicode (UCS,
> > ISO/IEC 10646) [2]. And TRON code is very rarely used, actual candidate
> > would be Unicode only.
> > Note that Unicode is usually encoded to any of UTF-8, UTF-16 or UTF-32
> > for data transfer (sometimes raw UCS-2?).
> > [...]
> 
> The one positive development in the world of computing that I would
> credit to Java is the earliest big push toward the adoption of UTF-8.
> I strongly hope UTF-8 becomes universally used sooner rather than
> later.                                                     -- George

And FreeBSD already has UTF-8. ;-)

Drawbacks of UTF-8 are...
  *Han unification. Not exactly same but lookalike characters in
   Japanese, Chinese and Korean are fatally missingly unified.

  *Lack of proper support for variant forms of characters.
   Maybe Unicode should have another 2 dimensions, one for classifying
   wrongly unified CJK characters and another one for variants.

  *Font sets. Very limited number of fonts covers the whole
   Unicode codepoints that are assigned any of actual character.

  *FreeBSD base does not have full Unicode font for vt yet.
   (Input methods are the different problem, though.)

-- 
Tomoaki AOKI    <junchoon@dec.sakura.ne.jp>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20230204131637.4e8e66e086eea57f4bb27b12>