Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 29 Dec 2012 14:03:19 +0100
From:      Gabor Kovesdan <gabor@FreeBSD.org>
To:        =?UTF-8?B?VWxyaWNoIFNww7ZybGVpbg==?= <uqs@FreeBSD.org>
Cc:        doc@FreeBSD.org
Subject:   Re: Please review, small SGML entity cleanup
Message-ID:  <50DEEA17.90802@FreeBSD.org>
In-Reply-To: <20121229124833.GC69724@acme.spoerlein.net>
References:  <20121228171424.GZ69724@acme.spoerlein.net> <50DECF3E.2020502@FreeBSD.org> <20121229124833.GC69724@acme.spoerlein.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2012.12.29. 13:48, Ulrich Spörlein wrote:
> On Sat, 2012-12-29 at 12:08:46 +0100, Gabor Kovesdan wrote:
>> >On 2012.12.28. 18:14, Ulrich Spörlein wrote:
>>> > >The DE and FR articles are a hodgepodge of SGML entities and direct,
>>> > >8bit chars, with the former being the majority. This patch cleans this
>>> > >up a little, although we should eventually switch this all to UTF-8,
>>> > >obviously.
>> >Don't they work with direct chars? Once we made a step to that direction
> They probably will, and I have no clue why we used entities for German
> and French, but the usual encodings for Russian and Japanese, etc.
>
>> >so this one would be one step back. If possible, it would be better to
>> >convert the entities to direct chars instead of the opposite.
> In the end, sure. But that's a larger project of moving from
> de_DE.ISO8859-1 -> de_DE (with an implied UTF-8 encoding, as is required
> by XML anyway, the implied part, not the exact encoding).
It isn't required by XML. If you omit the encoding part of the XML 
declaration, the content is treated as UTF-8 but it is not a requirement 
at all.
>
> I don't think this commit is a step back, because the documents need to
> be converted using a long series of s/&uuml;/ü/g, anyway. And the
> current mish-mash is just weird.
I don't see any reason why we cannot do this conversion right now in 
ISO-8859-1. (Actually, I did, but people kept introducing new redundant 
entities.) ISO-8859-1 isn't any harder to type than UTF-8. If you want 
consistency (which imho isn't that important at this point since there 
are lots of upcoming changes) then why not move to the right direction 
of consistency?

Gabor



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50DEEA17.90802>