From owner-freebsd-hackers Tue Apr 4 19:42: 6 2000 Delivered-To: freebsd-hackers@freebsd.org Received: from postal.linkfast.net (postal.linkfast.net [208.160.105.16]) by hub.freebsd.org (Postfix) with ESMTP id 8FCB337B732 for ; Tue, 4 Apr 2000 19:41:52 -0700 (PDT) (envelope-from grasshacker@linkfast.net) Received: from gh (modem138.linkfast.net [208.160.105.138]) by postal.linkfast.net (Postfix) with SMTP id 9E8F99B0F for ; Tue, 4 Apr 2000 21:41:50 -0500 (CDT) Message-ID: <002101bf9ea8$7e6e10a0$fc69a0d0@linkfast.net.linkfast.net> From: "gh" To: References: Subject: Re: Unicode on FreeBSD Date: Tue, 4 Apr 2000 21:41:49 -0500 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2615.200 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2615.200 Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Regardless of how you feel about Unicode--whatever, just think of how horribly terrible things would be if people actually had to *speak* to one another. gah, the torture. ;-) Dan gh > On Tue, 4 Apr 2000, G. Adam Stanislav wrote: > > > On Tue, Apr 04, 2000 at 05:05:05PM -0700, Alex Belits wrote: > > > The existing "market" of multilingual application is so small, and it's > > >based on so simplistic requirements (to be able to display and print > > >characters, and make multilingual "web pages"), that even solution so much > > >flawed as standardization on Unicode can survive. Unicode is positioned as > > >the _replacement_ for languages/charsets handling infrastructure -- "we > > >know all the characters, so we can write all the words, right?". > > > > Not so. Unicode is a character map. One of many. It just happens to be > > the most inclusive one in existence. > > It is. However if you look at the current efforts of its "adoption", it > is not used as one. It's touted as the solution to all language-related > problems, as a replacement of language/charset labeling infrastructure > and as the necessary prerequisite for any multilingual text processing. > > [skipped] > > > It does not, for example, provide sorting order. It cannot. Unicode is > > not about linguistics, it is about mapping characters regardless of their > > use in specific languages. And different languages sort characters > > differently. For example, in Slovak, "ch" is considered a character > > which belongs after the "h". In other languages it is sorted differently. > > And in most languages, it is just two unrelated characters. > > This is the kind of work that currently nonexistent language support > infrastructure should do -- when some language is encountered in > "multilingual" document/protocol/... its name can be used to load the > procedures (in this case sorting but it may be hyphenation, phonetic > match, etc.) for that particular language, and if no matched language is > known or supported, data should be just left alone. The same > infrastructure can be designed to support charsets and encodings, doing > conversion between them (and unicode) only where possible and necessary, > and providing the text in either "original" or "preferred", "supported", > etc. encoding for the language for the particular operation that should be > performed on the text. If such thing will be implemented, all existing > charset-specific routines that now exist in various places, can be reused, > and compatibility with existing software can be achieved without any > significant pain. > > > Unicode is not simplistic. It does what its stated goal is, and it does > > it well. How we use it, is up to us. > > > > Cheers, > > Adam > > > > P.S. Hmmm... Interesting. I noticed my random quote contains a C-caron. > > I wonder how it is going to be handled. :) > > It was handled pretty well for such a primitive system as pine in > xterm. Since your charset was iso 8859-2, it was marked as such in > Content-Type header of the message. pine given me a warning: > > ---8<--- > [ The following text is in the "iso-8859-2" character set. ] > [ Your display is set for the "koi8-r" character set. ] > [ Some characters may be displayed incorrectly. ] > --->8--- > > and displayed the text. xterm used the default font that happened to be in > koi8-r charset, displaying C-caron as cyrillic ha. I have read the > warning, manually switched xterm to a font in iso 8859-2 charset, and text > was displayed correctly. If I used a gui-based MUA such as Netscape (what > I didn't because Netscape Messenger sucks for reasons that have nothing to > do with its charsets support), it would just display the message in the > charset defined in the header. > > -- > Alex > > ---------------------------------------------------------------------- > Excellent.. now give users the option to cut your hair you hippie! > -- Anonymous Coward > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-hackers" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message