Date: Fri, 03 Aug 2012 13:02:39 +0200 From: Gabor Kovesdan <gabor@FreeBSD.org> To: doc@FreeBSD.org Cc: www@FreeBSD.org Subject: RFC: doc/www cleanup Message-ID: <501BAFCF.9010600@FreeBSD.org>
next in thread | raw e-mail | index | archive | help
Hi Doc Fellows, the XML migration that is in progress now, is also a big cleanup that will probably simplify documentation authoring. When working on this item I've encountered several old constructs and several things that made me think of further directions. I'd like to discuss these changes with you before proceeding with them: 1, Removing emacs PSGML comments: PSGML is an emacs mode for SGML editing. It can be instructed to behave in a determined way by SGML comments or separately with a configuration file (described in fdp-primer). Our documentation is scattered by PSGML comments like this: <!-- Local Variables: mode: sgml sgml-indent-data: t sgml-omittag: nil sgml-always-quote-attributes: t End: --> XML requires tags to be closed and attributes to be always quoted so this loses most if its utility and these comments just confuse people, who don't know what they mean. Indenting or any other specific option can be configured in the .emacs file. I propose dropping these comments. 2, Relaxing character entity usage: To be able to read non-ASCII characters on ASCII-only systems, we have been using character entities, like á. But in CJK languages, Greek and Russian every character is non-ASCII so practically they cannot be used nor were they used. So they are only used in ISO-8859 encodings (except Greek, which is also from this family). In fact, displaying these Latin-based characters nowadays isn't that problematic any more. Furthermore, if you edit text in a given language then we can suppose that you understand the language so you know what you should see and you know how to configure your system if you don't see the desired result. As a result, these entities nowadays don't have any real advantage any more but they highly "pollute" the text and make it much harder to edit and read. One exception is using characters in a specific language that aren't present there, e.g. a non-English developer name in the English documentation, etc. So I propose for every translation to convert back entities to normal characters and only conserve those that aren't present in the given language. Abundance of character entities used to mean difficulties for new documentation people, especially for those who don't have that much IT background. This change would make the texts more natural. 3, Preferring XML/XSLT over scripts: Some parts of the web, like the A-Z index and sitemap pages have their own format that is processed with shell scripts. It would be more consistent to use an XML data file with an XSLT stylesheet for this objective. It would give us more flexibility for further changes and would reduce the several different methods we use to generate things. 4, Stricter XHML: I don't propose going directly to XHTML Strict 1.0 but there are very inconsistently marked up <hr/>'s, <table>'s, etc. I would like to make them more consistent and prefer CSS styling when applicable. There are also empty paragraphs used as line breaks, which should also be eliminated. This would give us a more consistent look and more structure-oriented webpage files. And after the migration, I plan: 5, Identifying obsolete webpages: There are moved pages both in the English pages and translations that only serve for redirection. These pages were moved a very long time ago so any interested party could update her bookmarks. I would like to remove these finally. On the other hand, there are leftovers in translations, i.e. pages that were removed from the English web but not from the translations. I would like to generate a list of them and send patches to translation projects to clean these up. Thanks in advance for your comments, Gabor
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?501BAFCF.9010600>