From owner-freebsd-hackers Thu Apr 6 1:31:36 2000 Delivered-To: freebsd-hackers@freebsd.org Received: from phobos.illtel.denver.co.us (dsl-206.169.4.82.wenet.com [206.169.4.82]) by hub.freebsd.org (Postfix) with ESMTP id 6E74F37B626 for ; Thu, 6 Apr 2000 01:31:27 -0700 (PDT) (envelope-from abelits@phobos.illtel.denver.co.us) Received: from localhost (abelits@localhost) by phobos.illtel.denver.co.us (8.9.3/8.9.3) with ESMTP id BAA16548; Thu, 6 Apr 2000 01:32:46 -0700 Date: Thu, 6 Apr 2000 01:32:46 -0700 (PDT) From: Alex Belits To: Anatoly Vorobey Cc: hackers@freebsd.org Subject: Re: Unicode on FreeBSD In-Reply-To: <20000406081726.A22343@happy.checkpoint.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Thu, 6 Apr 2000, Anatoly Vorobey wrote: > > Can you guess, which one of of multiple cyrillic charsets never was > > actually used in Russia? > > > > ISO 8859-5. > > It's actually being used quite often now by users of MS Outlook 2000 > (those of them not sophisticated enough to select their own outgoing > encoding). Unless Microsoft turned around its encodings policy one more time last year, Outlook by default uses Windows CP-1251 for cyrillic. > > > And which is still the standard in Russian-language newsgroups, > > for russian Unix users and most of Russian-language web pages? > > Cyrillic!=Russian. The same applies to the use of encodings for Ukrainian language except that koi8-u (that us a superset of koi8-r) is used instead. Other languages either aren't used widely enough to provide any statistics (such as Belorussian), or use one of existing charsets other than iso8859-5. > > koi8-r, one of the oldest cyrillic charsets, primarily designed to keep > > This is untrue. cp1251 is used in almost all Russian web pages, and > koi8-r is the minority (for no good reason, of course, primarily because > too many people never learned to set the right charset in the outgoing > HTTP headers). While the number of russian pages in CP-1251 is increasing, I probably look at the "wrong" web sites because absolute majority of what I have seen either uses koi8-r, or offers multiple encodings, including koi8-r and CP-1251 but never iso 8859-5. > > "intuitive" mapping to ASCII, to remain usable after passing through > > characters-mangling old software and to be readable on 7-bit dumb > > terminals -- and the last mentioned property is still saving a lot of > > trouble for Russians that use mail-to-pager systems. History is more > > complex than some people think. > > And with all its attractive properties, it's still missing the letter > "yat'" that I need. It's there in Unicode, of course (and in 8859-5). With multiple-charsets support it's still can be available, however this is not the point. The reality is that this letter is completely excluded from any real-life use for more than 70 years. That is, everything published in modern Russian, even if it is a re-published work that originally used pre-reform Russian language, is printed in post-reform version of the language, works of Pushkin and Tolstoy included. The only cases where "yat'" is used are ones where exact reproduction of works in documents is necessary, and generally are treated by Russians as texts in languages that is not recognized as Russian anymore (as well as even earlier version of Russian that had significantly different alphabet and can't be read by modern Russians without archaic-language training). In other words, you are talking about completely different language. -- Alex ---------------------------------------------------------------------- Excellent.. now give users the option to cut your hair you hippie! -- Anonymous Coward To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message