Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 3 Aug 2012 16:15:39 +0200
From:      Ulrich =?utf-8?B?U3DDtnJsZWlu?= <uqs@FreeBSD.org>
To:        "Simon L. B. Nielsen" <simon@FreeBSD.org>
Cc:        doc@FreeBSD.org, Gabor Kovesdan <gabor@FreeBSD.org>, www@FreeBSD.org
Subject:   Re: RFC: doc/www cleanup
Message-ID:  <20120803141538.GG1202@acme.spoerlein.net>
In-Reply-To: <CAC8HS2E2ekMKJgY04qPrQGbEe_tPJ%2BHrf5_ToERptf0yawYoQA@mail.gmail.com>
References:  <501BAFBD.3010008@FreeBSD.org> <CAC8HS2E2ekMKJgY04qPrQGbEe_tPJ%2BHrf5_ToERptf0yawYoQA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 2012-08-03 at 14:33:04 +0100, Simon L. B. Nielsen wrote:
> On Fri, Aug 3, 2012 at 12:02 PM, Gabor Kovesdan <gabor@freebsd.org> wrote:
> > 2, Relaxing character entity usage: To be able to read non-ASCII characters
> > on ASCII-only systems, we have been using character entities, like &aacute;.
> > But in CJK languages, Greek and Russian every character is non-ASCII so
> > practically they cannot be used nor were they used. So they are only used in
> > ISO-8859 encodings (except Greek, which is also from this family). In fact,
> > displaying these Latin-based characters nowadays isn't that problematic any
> > more. Furthermore, if you edit text in a given language then we can suppose
> > that you understand the language so you know what you should see and you
> > know how to configure your system if you don't see the desired result. As a
> > result, these entities nowadays don't have any real advantage any more but
> > they highly "pollute" the text and make it much harder to edit and read. One
> 
> I agree that the entities should generally not be used. I think we
> should just switch to UTF-8 and charecterset wherever possible to
> simplify it even more.
> 
> And on that note, kill the useless character-set part of all our
> language directories which generate horrible paths with no additional
> value.
> 
> > exception is using characters in a specific language that aren't present
> > there, e.g. a non-English developer name in the English documentation, etc.
> 
> UTF-8 would fix that.

Last time I brought this up (trying to get rid of silly entities and
the bogus charset name of the directories), I was told that our
toolchain didn't fully grok UTF-8 yet, which was the reason we still had
this de_DE.ISO8859-1 nonsense.

The move to XML should really, really convert all files to UTF-8, drop
that from the directories, and get rid of entities like &auml; or
&eacute;, etc.o

Just my two cents
Uli



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120803141538.GG1202>