Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 6 Apr 2000 01:32:46 -0700 (PDT)
From:      Alex Belits <abelits@phobos.illtel.denver.co.us>
To:        Anatoly Vorobey <mellon@pobox.com>
Cc:        hackers@freebsd.org
Subject:   Re: Unicode on FreeBSD
Message-ID:  <Pine.LNX.4.20.0004060104340.16140-100000@phobos.illtel.denver.co.us>
In-Reply-To: <20000406081726.A22343@happy.checkpoint.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 6 Apr 2000, Anatoly Vorobey wrote:

> >   Can you guess, which one of of multiple cyrillic charsets never was
> > actually used in Russia?
> > 
> >   ISO 8859-5.
> 
> It's actually being used quite often now by users of MS Outlook 2000
> (those of them not sophisticated enough to select their own outgoing
> encoding).

  Unless Microsoft turned around its encodings policy one more time last
year, Outlook by default uses Windows CP-1251 for cyrillic.

> 
> >   And which is still the standard in Russian-language newsgroups,
> > for russian Unix users and most of Russian-language web pages?
> 
> Cyrillic!=Russian.

  The same applies to the use of encodings for Ukrainian language except
that koi8-u (that us a superset of koi8-r) is used instead. Other
languages either aren't used widely enough to provide any statistics (such
as Belorussian), or use one of existing charsets other than iso8859-5.

> >   koi8-r, one of the oldest cyrillic charsets, primarily designed to keep
> 
> This is untrue. cp1251 is used in almost all Russian web pages, and
> koi8-r is the minority (for no good reason, of course, primarily because
> too many people never learned to set the right charset in the outgoing
> HTTP headers).

  While the number of russian pages in CP-1251 is increasing, I probably
look at the "wrong" web sites because absolute majority of what I have
seen either uses koi8-r, or offers multiple encodings, including koi8-r
and CP-1251 but never iso 8859-5.

> > "intuitive" mapping to ASCII, to remain usable after passing through
> > characters-mangling old software and to be readable on 7-bit dumb
> > terminals -- and the last mentioned property is still saving a lot of
> > trouble for Russians that use mail-to-pager systems. History is more 
> > complex than some people think.
> 
> And with all its attractive properties, it's still missing the letter
> "yat'" that I need. It's there in Unicode, of course (and in 8859-5).

  With multiple-charsets support it's still can be available, however this
is not the point. The reality is that this letter is completely excluded
from any real-life use for more than 70 years. That is, everything
published in modern Russian, even if it is a re-published work that
originally used pre-reform Russian language, is printed in post-reform
version of the language, works of Pushkin and Tolstoy included. The only
cases where "yat'" is used are ones where exact reproduction of works in
documents is necessary, and generally are treated by Russians as texts in
languages that is not recognized as Russian anymore (as well as even
earlier version of Russian that had significantly different alphabet and
can't be read by modern Russians without archaic-language
training). In other words, you are talking about completely different
language.

-- 
Alex

----------------------------------------------------------------------
 Excellent.. now give users the option to cut your hair you hippie!
                                                  -- Anonymous Coward



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.LNX.4.20.0004060104340.16140-100000>