From owner-freebsd-doc Thu May 13 14:48:14 1999 Delivered-To: freebsd-doc@freebsd.org Received: from nothing-going-on.demon.co.uk (nothing-going-on.demon.co.uk [193.237.89.66]) by hub.freebsd.org (Postfix) with ESMTP id 66E58151DC; Thu, 13 May 1999 14:45:08 -0700 (PDT) (envelope-from nik@nothing-going-on.demon.co.uk) Received: (from nik@localhost) by nothing-going-on.demon.co.uk (8.9.2/8.9.2) id VAA72036; Thu, 13 May 1999 21:14:58 +0100 (BST) (envelope-from nik) Date: Thu, 13 May 1999 21:14:58 +0100 From: Nik Clayton To: doc@freebsd.org, freebsd-translate@ngo.org.uk Subject: FDP Directory Reorganisation Message-ID: <19990513211458.B70767@catkin.nothing-going-on.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95.4i Organization: Nik at home, where there's nothing going on Sender: owner-freebsd-doc@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Folks, [ Sent to: doc@freebsd.org FreeBSD Documentation Project freebsd-translate@ngo.org.uk FreeBSD Translation Teams Bcc'd (so that they don't get caught up in people doing "group replies" unless they want to) to ache@freebsd.org Andrey Chernov, listed as being responsible for "Internationalization" in the FreeBSD Handbook. terry@lambert.org Terry Lambert. My addled brain recalls that Terry's occasionally posted messages about i18n and l10n issues to the FreeBSD mailing lists, and I thought he might have useful input to make. To Andrey and Terry; I'd be particularly interested in your thoughts about the sense and viability of organising the documentation by language and character set encoding that I outline here. ] This is an attempt to put down all my thoughts about my plans for an FDP directory reorganisation down so they can be critiqued. Comments welcome to the mailing list please. _________________________________________________________________ Overview The FreeBSD Documentation Project (FDP) directory structure has grown haphazardly over time. This was tolerable when the FDP repository only contained English versions of the documentation. However, as more translations are added to the repository it becomes important to have a consistent directory naming scheme followed by each translation. A consistent directory naming scheme will make it easier to write software that can automatically process FDP documentation without needing to be configured as to exactly where that documentation is in the tree; automated tools will be able to deduce this. Moving existant content that conflicts with this scheme will make automated tools simpler, as they will not need to handle exceptions to the rules. Finally, a consistent approach is much easier to document and to learn. Anything that can reduce the learning curve required before people can contribute to the FDP is a good thing. _________________________________________________________________ Current situation At the time of writing, the doc/ repository contains the following directories (ignoring empty directories); doc/ FAQ/ en/ handbook/ tutorials/ docproj-primer/ fonts/ ... share/ sgml/ es/ FAQ/ ja/ FAQ/ handbook/ man/ ru/ FAQ/ share/ sgml/ mk/ zh/ FAQ/ There are a number of anomalies and potential problems with this structure. It also gets a few things right. * doc/FAQ is out of place. It is the English version of the FreeBSD FAQ, and is a holdover from when the repository only contained the English documentation. * The English tutorials are one level lower in the tree than the English Handbook. Any commands to process the documentation that rely on relative paths will need to ensure that this is compensated for before running the command. See the current DOC_PREFIX kludge for an example of this. * Some of the documentation in tutorials/ should not be considered to be tutorials. A more neutral term would better describe the content. * No attempt is made to specify the character set used to write the documentation. While this is not a problem for the English translation, other languages, such as Japanese, Korean, and Chinese, have multiple character sets that could be used to encode the documentation. Some way of differentiating between these character sets should be provided, as should a mechanism for allowing multiple translations to the same language differing only in the choice of character set. * There is a proposed plan to split the Handbook up, and replace it with a number of smaller books with a tighter focus. The existing layout does not support this approach at all. * The use of share/ directories to contain files that are language neutral (in the first case) or can be used by all translations to a specific language (in the second case) is a good idea. _________________________________________________________________ The change Migrate to a new directory structure that follows this layout; doc/ / / articles/ fonts/ ... books/ FAQ/ FDP-primer/ printing/ ... man/ ... share/ sgml/ share/ ... share/ sgml/ ... mk/ ... There are two top level directories. represents the language code, as we currently use it. The language codes are defined in ISO639, which can be found in /usr/share/misc/iso639 on a relatively recent FreeBSD system. The second top level directory is share/, which will contain language neutral files. Under each directory is at least one directory named after the character set encoding used. This approach will be followed even if there is only one character set that could be considered ``standard'' for that language. I understand that for some languages (such as English) this introduces an additional directory where one is not strictly needed. However, this will ensure that the SGML source files are kept at the same level in the directory tree relative to one another. This helps avoid ambiguities with relative paths, and the need to special-case between languages that have multiple possible character sets, and languages that need only one character set. After all, a language with one character set is just a subset of a language that can be encoded in multiple character sets. There might also be a share/ directory at this level as well, to contain files that can be shared by all translations to this language, regardless of character set. Below the directory the documentation is categorised further. There are three categories that each document might be in; articles/ An article is a short piece of documentation (although ``short'' is a relative term). In general, if the documentation does not contain any chapters then it is an article, and should be placed in a subdirectory of this directory. ``article'' is a neutral term that does not convey information about about the nature of the information contained within the article (unlike ``tutorials''). Examples of existing documentation that would fall in to this category are; + Using FreeBSD with other Operating Systems + ``Making the world'' your own + This document. books/ Books are longer sets of documentation, characterised by their organisation in to multiple chapters. Examples of existing documentation that would fall in to this category are; + FreeBSD FAQ + FreeBSD Handbook + FDP Primer man/ The system manual pages, translated to the target language. While it is feasible that the English manual pages could move out of the src/ repository and in to doc/, I don't see this actually happening any time soon (certainly not within my life time). The historical pressure to keep them in src/ is too great. share/ Content that can be shared between different documentation, but is language and character set specific. For example, as a translation team translates the documentation there will be sections that haven't been translated yet. You can put the translation of the phrase ``This section has not been translated yet'' into a file in this directory, and then use a general entity to include it in all the documentation where it is necessary. So, there are three levels of shared content between the language projects; content that is shared globally (doc/share/), content that is specific to a particular language (doc//share/), and content that is specific to a particular language and character set encoding (doc///share/). Each one of these share/ directories can (and will) contain subdirectories. share/sgml/ for SGML content, share/mk/ for includeable Makefiles, and so on. Based on the current doc/, the converted directory structure will look like this. doc/ en/ share/ sgml/ iso8859-1/ articles/ writing-device-drivers/ programming-tools/ formatting-media/ ... books/ FAQ/ FDP-primer/ handbook/ ... share/ sgml/ ja/ share/ sgml/ euc-jp/ books/ FAQ/ handbook/ ... man/ ... share/ sgml/ zh/ share/ sgml/ big5/ books/ FAQ/ share/ sgml gb/ books/ FAQ/ share/ sgml fr/ share/ sgml/ iso8859-1/ books/ handbook/ share/ sgml share/ sgml/ mk/ I don't know everthing I need to know about i18n and l10n yet, so there may be some problems with the above example. For example, is euc-jp the correct name to use for a character set, or is there a more precise term for it (perhaps an ISO number?) that should be used instead? _________________________________________________________________ Making the change This is quite a large change, and will need careful thought about how to carry it out. In particular, we want to avoid bloating the CVS repository any more than we have to. How files are moved will depend on their current DTD. All documentation that is already marked up according to the DocBook DTD (and the manual pages) can be moved within the repository by the repository managers (Peter Wemm and John Polstra). Some of the Makefiles will then need small changes made to them to reflect the directory names, but that should be about all. All documentation that is marked up according to the LinuxDoc DTD is treated differently. The original files are left where they are. Then, when the documentation is converted to DocBook the original LinuxDoc files are left, and the new DocBook files will be stored in the new directories as appropriate. We will then have two versions of the document in the repository, one marked up in LinuxDoc, one in DocBook. The Makefiles can continue to point to the LinuxDoc version until the DocBook conversion has completed. When the DocBook conversion has been completed the LinuxDoc version can be removed. The conversion will be complete when the last piece of LinuxDoc documentation has been removed from the tree. _________________________________________________________________ Additional resources I've found the following links useful while trying to find out more information about i18n and l10n. http://czyborra.com/charsets/ Lots of information about different character sets, the iso8859* characters, and so on. http://www.ora.com/people/authors/lunde/cjk_inf.html The Chinese, Japanese, Korean information page has lots of information about how to encode these languages. http://www.vlsivie.tuwien.ac.at/mike/8bit/FAQ-ISO-8859-1 The ISO8859-1 FAQ contains useful inforamtion. -- There's some milk in the fridge about to go off. . . and there it goes. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-doc" in the body of the message