From owner-freebsd-doc  Sat Feb 23 16:29:51 2002
Delivered-To: freebsd-doc@freebsd.org
Received: from eos.ocn.ne.jp (eos.ocn.ne.jp [210.190.142.171])
	by hub.freebsd.org (Postfix) with ESMTP id A469B37B41A
	for <freebsd-doc@FreeBSD.ORG>; Sat, 23 Feb 2002 16:29:17 -0800 (PST)
Received: from mail.hrslab.yi.org (p0775-ip01funabasi.chiba.ocn.ne.jp [61.119.148.13])
	by eos.ocn.ne.jp (OCN) with ESMTP id JAA21365;
	Sun, 24 Feb 2002 09:29:14 +0900 (JST)
Received: from localhost (alph.hrslab.yi.org [192.168.0.10])
	by mail.hrslab.yi.org (8.9.3/3.7W/DomainMaster) with ESMTP id JAA13667;
	Sun, 24 Feb 2002 09:13:26 +0900 (JST)
	(envelope-from hrs@eos.ocn.ne.jp)
Date: Sun, 24 Feb 2002 09:09:26 +0900 (JST)
Message-Id: <20020224.090926.85420473.hrs@eos.ocn.ne.jp>
To: sziszi@bsd.hu
Cc: freebsd-doc@FreeBSD.ORG
Subject: Re: Entities in translations
From: Hiroki Sato <hrs@eos.ocn.ne.jp>
In-Reply-To: <20020223115416.GA1152@fonix.adamsfamily.xx>
References: <20020223115416.GA1152@fonix.adamsfamily.xx>
X-Mailer: Mew version 2.1 on Emacs 20.7 / Mule 4.0 (HANANOEN)
Mime-Version: 1.0
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-doc@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-doc.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-doc>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-doc>
X-Loop: FreeBSD.org

Szilveszter Adam <sziszi@bsd.hu> wrote
  in <20020223115416.GA1152@fonix.adamsfamily.xx>:

sziszi> So now, please advise. Which method should I follow? Should I stick to
sziszi> entities for non-ASCII characters? But then how do I make them display
sziszi> in the HTML rendering in their expanded form instead of the entity
sziszi> itself? Or should I just follow the lead of the non Latin-1 teams and
sziszi> start inputting these characters as-is? What is, for example the Greek
sziszi> Doc Project doing about this?

 For localized documents, I do not think sticking to using entities
 for non-ascii characters is always needed.
 Almost all Japanese characters are categorized into non-ascii characters,
 but we cannot use entities since they are too many.

 The advantage to use entities is for original (English) documents.
 Languages used by translation teams define character code of
 non-ascii characters on their own terms, so if non-ascii
 characters are included as is, we would misunderstand what
 the character means.  In addition, the translators often use
 localized tools (e.g. editors, web browsers) for their work,
 but such tools cannot often handle non-ascii characters
 in other language properly.

 As you pointed out, this do not become a problem for translated
 documents.  I think you can use non-ascii characters as is to
 write your language, but I also think if we can use entities we should
 use them because entities do not mislead us to understand the meaning
 in non-English documents.  The Japanese team uses entities for Latin
 characters, although our Japanese character set has a own set of
 Latin characters.

 Any comments or suggestions else?

--
| Hiroki Sato  <hrs@FreeBSD.org>
|              <hrs@eos.ocn.ne.jp>

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-doc" in the body of the message