From owner-freebsd-doc@FreeBSD.ORG Fri Aug 3 14:15:41 2012 Return-Path: Delivered-To: doc@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 765631065670; Fri, 3 Aug 2012 14:15:41 +0000 (UTC) (envelope-from uqs@FreeBSD.org) Received: from acme.spoerlein.net (acme.spoerlein.net [IPv6:2a01:4f8:131:23c2::1]) by mx1.freebsd.org (Postfix) with ESMTP id 0D8A48FC12; Fri, 3 Aug 2012 14:15:40 +0000 (UTC) Received: from localhost (acme.spoerlein.net [IPv6:2a01:4f8:131:23c2::1]) by acme.spoerlein.net (8.14.5/8.14.5) with ESMTP id q73EFdCa061912 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO); Fri, 3 Aug 2012 16:15:39 +0200 (CEST) (envelope-from uqs@FreeBSD.org) Date: Fri, 3 Aug 2012 16:15:39 +0200 From: Ulrich =?utf-8?B?U3DDtnJsZWlu?= To: "Simon L. B. Nielsen" Message-ID: <20120803141538.GG1202@acme.spoerlein.net> References: <501BAFBD.3010008@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: doc@FreeBSD.org, Gabor Kovesdan , www@FreeBSD.org Subject: Re: RFC: doc/www cleanup X-BeenThere: freebsd-doc@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Documentation project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Aug 2012 14:15:41 -0000 On Fri, 2012-08-03 at 14:33:04 +0100, Simon L. B. Nielsen wrote: > On Fri, Aug 3, 2012 at 12:02 PM, Gabor Kovesdan wrote: > > 2, Relaxing character entity usage: To be able to read non-ASCII characters > > on ASCII-only systems, we have been using character entities, like á. > > But in CJK languages, Greek and Russian every character is non-ASCII so > > practically they cannot be used nor were they used. So they are only used in > > ISO-8859 encodings (except Greek, which is also from this family). In fact, > > displaying these Latin-based characters nowadays isn't that problematic any > > more. Furthermore, if you edit text in a given language then we can suppose > > that you understand the language so you know what you should see and you > > know how to configure your system if you don't see the desired result. As a > > result, these entities nowadays don't have any real advantage any more but > > they highly "pollute" the text and make it much harder to edit and read. One > > I agree that the entities should generally not be used. I think we > should just switch to UTF-8 and charecterset wherever possible to > simplify it even more. > > And on that note, kill the useless character-set part of all our > language directories which generate horrible paths with no additional > value. > > > exception is using characters in a specific language that aren't present > > there, e.g. a non-English developer name in the English documentation, etc. > > UTF-8 would fix that. Last time I brought this up (trying to get rid of silly entities and the bogus charset name of the directories), I was told that our toolchain didn't fully grok UTF-8 yet, which was the reason we still had this de_DE.ISO8859-1 nonsense. The move to XML should really, really convert all files to UTF-8, drop that from the directories, and get rid of entities like ä or é, etc.o Just my two cents Uli