From owner-freebsd-doc@FreeBSD.ORG Sat Dec 29 13:00:59 2012 Return-Path: Delivered-To: doc@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 6FA29DC5; Sat, 29 Dec 2012 13:00:59 +0000 (UTC) (envelope-from gabor@FreeBSD.org) Received: from server.mypc.hu (server.mypc.hu [87.229.73.95]) by mx1.freebsd.org (Postfix) with ESMTP id 1F7288FC12; Sat, 29 Dec 2012 13:00:58 +0000 (UTC) Received: from server.mypc.hu (localhost [127.0.0.1]) by server.mypc.hu (Postfix) with ESMTP id 2AEEB14E750D; Sat, 29 Dec 2012 14:00:58 +0100 (CET) X-Virus-Scanned: amavisd-new at !change-mydomain-variable!.example.com Received: from server.mypc.hu ([127.0.0.1]) by server.mypc.hu (server.mypc.hu [127.0.0.1]) (amavisd-new, port 10024) with LMTP id l_4ee9NBV4Us; Sat, 29 Dec 2012 14:00:57 +0100 (CET) Received: from [192.168.1.117] (catv-80-99-23-232.catv.broadband.hu [80.99.23.232]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by server.mypc.hu (Postfix) with ESMTPSA id 6194114DC92A; Sat, 29 Dec 2012 14:00:56 +0100 (CET) Message-ID: <50DEEA17.90802@FreeBSD.org> Date: Sat, 29 Dec 2012 14:03:19 +0100 From: Gabor Kovesdan User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20121224 Thunderbird/19.0a2 MIME-Version: 1.0 To: =?UTF-8?B?VWxyaWNoIFNww7ZybGVpbg==?= Subject: Re: Please review, small SGML entity cleanup References: <20121228171424.GZ69724@acme.spoerlein.net> <50DECF3E.2020502@FreeBSD.org> <20121229124833.GC69724@acme.spoerlein.net> In-Reply-To: <20121229124833.GC69724@acme.spoerlein.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Cc: doc@FreeBSD.org X-BeenThere: freebsd-doc@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Documentation project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Dec 2012 13:00:59 -0000 On 2012.12.29. 13:48, Ulrich Spörlein wrote: > On Sat, 2012-12-29 at 12:08:46 +0100, Gabor Kovesdan wrote: >> >On 2012.12.28. 18:14, Ulrich Spörlein wrote: >>> > >The DE and FR articles are a hodgepodge of SGML entities and direct, >>> > >8bit chars, with the former being the majority. This patch cleans this >>> > >up a little, although we should eventually switch this all to UTF-8, >>> > >obviously. >> >Don't they work with direct chars? Once we made a step to that direction > They probably will, and I have no clue why we used entities for German > and French, but the usual encodings for Russian and Japanese, etc. > >> >so this one would be one step back. If possible, it would be better to >> >convert the entities to direct chars instead of the opposite. > In the end, sure. But that's a larger project of moving from > de_DE.ISO8859-1 -> de_DE (with an implied UTF-8 encoding, as is required > by XML anyway, the implied part, not the exact encoding). It isn't required by XML. If you omit the encoding part of the XML declaration, the content is treated as UTF-8 but it is not a requirement at all. > > I don't think this commit is a step back, because the documents need to > be converted using a long series of s/ü/ü/g, anyway. And the > current mish-mash is just weird. I don't see any reason why we cannot do this conversion right now in ISO-8859-1. (Actually, I did, but people kept introducing new redundant entities.) ISO-8859-1 isn't any harder to type than UTF-8. If you want consistency (which imho isn't that important at this point since there are lots of upcoming changes) then why not move to the right direction of consistency? Gabor