Date: Tue, 25 Sep 2001 17:32:40 +0100 From: Nik Clayton <nik@freebsd.org> To: Nik Clayton <nik@freebsd.org> Cc: doc@freebsd.org, www@freebsd.org Subject: Re: Branching www/ for XML development Message-ID: <20010925173240.F31744@clan.nothing-going-on.org> In-Reply-To: <20010922113521.W1162@clan.nothing-going-on.org>; from nik@freebsd.org on Sat, Sep 22, 2001 at 11:35:21AM %2B0100 References: <20010921001517.N1162@clan.nothing-going-on.org> <20010922113521.W1162@clan.nothing-going-on.org>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --]
On Sat, Sep 22, 2001 at 11:35:21AM +0100, Nik Clayton wrote:
> On Fri, Sep 21, 2001 at 12:15:17AM +0100, Nik Clayton wrote:
> > I'm thinking about branching www/ to do some XML development work.
>
> Another alternative that occured to me earlier today. Don't branch, but
> do the work in directories that parallel the doc/ locale directories.
Which people seem to think is a terrible idea, so we'll branch.
Here are some thoughts and jottings on what we can do with the next generation
web site.
Separation of content and formatting
------------------------------------
Where possible, I'd like the content that people author to be separated from
the formatting. We should provide;
* A standard template for people to write new pages
* Stylesheets, and classes, for specific presentation effects
* No author should need to write "<small>" in their document, or use
other formatting tricks like that. We either use CSS, or, if browser
support is poor, we use CSS in the documents, and let the XSL stylesheets
embed the browser specific formatting as necessary
* A library of reusable content that can be dropped in to the page
* Facilities to help ensure that internal links are correct
Page layout
-----------
We should adopt a standard layout for each page wherever possible. In some
places this won't be immediately possible -- the output from some CGI scripts,
the results of doing DocBook -> HTML conversion, and so on. But we should try
to do so for all static and semi-static content.
Definition: static content is a page that doesn't change each time it
is converted to HTML.
semi-static content is a page that does change when it is
converted, but only because it syndicates content from other
pages. It does not change as a result of a user's choice (i.e.,
it's not CGI).
A rough layout sees the page divided in to 5 areas:
+----------------------------------------------------------------------+
| +------------------------------------------------------------------+ |
| | | |
| | Header | |
| | | |
| +------------------------------------------------------------------+ |
| +------------+ +------------------------------------+ +------------+ |
| | | | | | | |
| | Left nav | | | | Right nav | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | Body | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| +------------+ +------------------------------------+ +------------+ |
| +------------------------------------------------------------------+ |
| | | |
| | Footer | |
| | | |
| +------------------------------------------------------------------+ |
+----------------------------------------------------------------------+
Probable content
Header: Static across all pages. Contains the logo, the drop down
to select languages/mirrors
Left nav: Provides the top level "you are here" indicator, the search
box, and perhaps some key links that we want to appear on
every page, irrespective of the other content of the page --
perhaps links to the FAQ and Handbook, maybe current release
information, and so on. Content authors do not get to change
this content.
Body: The body copy of the page.
Right nav: Links to information that is pertinent to the content of the
body. Typically:
* Links to related pages on the FreeBSD web site
* Links to related pages on other web sites
* Content syndicated from other portions of the site (e.g.,
news headlines).
We should provide standard ways for authors to include this
content, particularly content from other portions of the
FreeBSD site.
Footer: Contact details, modification time, version number, a "report
a problem about this page" link, copyright notice. Can not be
modified by the content authors.
Page description language
-------------------------
To enforce the style guidelines, don't let authors write a complete HTML
page. Instead, give them a cut-down markup language suitable for what they
need to do, allowing us to enforce the look and feel.
My experiments so far have used the following schema:
<page>
<title>Title goes here</title>
<!-- Becomes the HTML "<title>", as well as the top level heading in
the body copy -->
<section name="news"/>
<!-- Specifies what section of the site this page is in, for the
purposes of highlighting in the "You are here" left hand
navigator. We can't do this based on the content's directory,
because we might want to split content that's in the same
directory in to different logical "sections". E.g., the project
news pages and press pages are in the "news/" directory. But one
of them is the "news" section, the other is the "press"
section. -->
<cvs:keywords xmlns:cvs="...">
<cvs:keyword name="freebsd">$FreeBSD$</cvs:keyword>
</cvs:keywords>
<!-- CVS keywords. Put these in an element so we can find them in
the stylesheets, rather than relying on structured comments,
and other hacks. The URI to use in the xmlns declaration is to be
confirmed. -->
<body>
<!-- Body copy goes here. All HTML (formally, XHTML 1.0) is valid.
In theory, this includes letting the author write things like
<html> ... </html>
If they want to shoot themselves in the foot like that, let
them. -->
</body>
<!-- Zero or more <sidebar> elements. Each element may have a "class"
attribute, in which case it contains standard content that
the stylesheets will generate (and possibly document specific content
as well. -->
<sidebar class="release-info"/>
<!-- Includes a "release info" box, with information about the current
release, and links to the announcement, errata, and so on. -->
<sidebar class="news-headlines"/>
<!-- Syndicates the news headlines from the news pages, and includes
the top 'n' here. We probably need a mechanism for specifying what
value 'n' should have ('5', '10', etc). -->
<sidebar class="press-stories" count="5"/>
<!-- Syndicates the names of the most recent articles in the press,
in a similar fashion to the news-headlines class.
This is one way we could specify how many items to include. -->
<sidebar class="related-local">
<!-- Contains links to pages of related information on this web site.
Links are wrapped up in a <links> container, with a <url> element
denoting the link, like this. -->
<links>
<link href="pressreleases.html">Press Releases</link>
<link href="../publish.html#newsletter">Newsletter</link>
<link href="press.html">Press articles</link>
<link href="status/status.html">Status reports</link>
</links>
</sidebar>
<sidebar class="related-web">
<!-- Contains links to pages of related information on other web sites.
"related-local" and "related-web" are kept separate so that we can
visually distinguish between them on the web site, include
disclaimers, and so on. -->
<links>
<link href="http://www.daemonnews.org/">DaemonNews</link>
</links>
</sidebar>
<sidebar>
<!-- Use this format when the author wants to include information
not covered by the standard classes. -->
<title>The title</title>
<body>
<!-- HTML -->
</body>
</sidebar>
</page>
Comments? We probably need some additional <sidebar> classes.
"security-advisories", that sort of thing. Any suggestions?
I also thought we might want to <sidebar align="left"> and align="right", if
we wanted to let authors choose which of the left/right nav bar the content
goes in. But I think for consistency it might be better to just insist on
sidebars only appearing on the right.
Building the web site
---------------------
The majority of files on the web site will be XML, marked up in the schema
outlined above (or whatever schema we eventually decide on). For those files,
we will need a stylesheet that can convert them to XHTML.
However, some of the files will need an intermediate step first. For example,
the current www/en/news/news.xml is a completely different schema. We will
need two stylesheets; one to convert news.xml in to the <page> schema outlined
above, and then the 'global' stylesheet that can convert <page>s in to HTML.
This suggests to me that naming everything with .xml extension might not be
the best way to do it. If everything has a .xml extension, irrespective of
its schema, then we can't write Makefiles that use SUFFIX rules to
automatically convert documents to HTML.
For example, consider the top level index file (the site's home page), which
we'll call www/en/index.xml. The make(1) rule for this would be something
like
index.html: index.xml
xsltproc stylesheet.xsl index.xml > index.html
Which we could turn in to a generic XML -> HTML suffix rule easily.
However, this breaks down when you need to pre-process the files through
multiple stylesheets. To use the news example, where news.xml has to be
pre-processed, the Makefile fragment looks something like this:
# news-page.xsl is the stylesheet that converts news.xml in to the
# <page> schema.
news.html: news-page.xsl news.xml
xsltproc news-page.xsl news.xml | \
xsltproc stylesheet.xsl - > news.html
Perhaps, instead, we should say that pages have a .page suffix, and that
generic XML data has the .xml suffix. Further, we mandate that for every
foo.xml there must also be a foo.xsl that can convert foo.xml in to foo.page.
The news example above then becomes
news.html: news.page
xsltproc stylesheet.xsl news.page > news.html
news.page: news.xsl news.xml
xsltproc news.xsl news.xml > news.page
This can easily be encapsulated in make's SUFFIX rules
SUFFIXES: .xml .page .html
.html.page:
xsltproc stylesheet.xsl ${.IMPSRC} > ${.TARGET}
.page.xml:
xsltproc ${.IMPSRC:R}.xsl ${.IMPSRC} > ${.TARGET}
Of course, this assumes that we can go from .xml to .page in one go, without
needing any more intermediate translations. I think that's probably the case,
but if you can think of a counter example then please say so.
There will still be a few oddities. For example, news.rdf is generated from
news-rdf.xsl and news.xml, but I think they'll be so rare that we can special
case them in the Makefiles as necessary.
Dependencies are going to be another interesting challenge. The HTML output
file does not (necessarily) only depend on the input .page file. For example,
anything that has a
<sidebar class="news-headlines"/>
in it has a dependency on en/news/news.xml. I think this means:
(a) We will need a "make depend" step in the build process.
(b) We will need to write a tool that can process a .page file, chase
down all its dependencies, and write that out in a format that make(1)
can understand.
Link checking
-------------
For internal links, I'm thinking about introducing a new <link> element. It
would have all the same attributes and functionality as the existing <a>
element in HTML. But, if the content was local, and the <link> had no
content, the stylesheet would parse the target document to extract its title.
Similar to <xref>'s processing expectations for DocBook.
Comments?
N
--
FreeBSD: The Power to Serve http://www.freebsd.org/
FreeBSD Documentation Project http://www.freebsd.org/docproj/
--- 15B8 3FFC DDB4 34B0 AA5F 94B7 93A8 0764 2C37 E375 ---
[-- Attachment #2 --]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (FreeBSD)
Comment: For info see http://www.gnupg.org
iEYEARECAAYFAjuwsacACgkQk6gHZCw343WgtQCdFPR/NvViBx1w4f9RwAiq//1V
tWkAn2122vKw6xGtDZrZHW2Rp+ioni1t
=PhLt
-----END PGP SIGNATURE-----
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010925173240.F31744>
