FreeBSD Mail Archives

Date:      Tue, 25 Sep 2001 17:32:40 +0100
From:      Nik Clayton <nik@freebsd.org>
To:        Nik Clayton <nik@freebsd.org>
Cc:        doc@freebsd.org, www@freebsd.org
Subject:   Re: Branching www/ for XML development
Message-ID:  <20010925173240.F31744@clan.nothing-going-on.org>
In-Reply-To: <20010922113521.W1162@clan.nothing-going-on.org>; from nik@freebsd.org on Sat, Sep 22, 2001 at 11:35:21AM %2B0100
References:  <20010921001517.N1162@clan.nothing-going-on.org> <20010922113521.W1162@clan.nothing-going-on.org>


--TU+u6i6jrDPzmlWF
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sat, Sep 22, 2001 at 11:35:21AM +0100, Nik Clayton wrote:
> On Fri, Sep 21, 2001 at 12:15:17AM +0100, Nik Clayton wrote:
> > I'm thinking about branching www/ to do some XML development work.
>=20
> Another alternative that occured to me earlier today.  Don't branch, but
> do the work in directories that parallel the doc/ locale directories.

Which people seem to think is a terrible idea, so we'll branch.

Here are some thoughts and jottings on what we can do with the next generat=
ion
web site.

Separation of content and formatting
------------------------------------

Where possible, I'd like the content that people author to be separated from
the formatting.  We should provide;

  * A standard template for people to write new pages
  * Stylesheets, and classes, for specific presentation effects
  * No author should need to write "<small>" in their document, or use=20
    other formatting tricks like that.  We either use CSS, or, if browser
    support is poor, we use CSS in the documents, and let the XSL styleshee=
ts
    embed the browser specific formatting as necessary
  * A library of reusable content that can be dropped in to the page
  * Facilities to help ensure that internal links are correct

Page layout
-----------

We should adopt a standard layout for each page wherever possible.  In some
places this won't be immediately possible -- the output from some CGI scrip=
ts,
the results of doing DocBook -> HTML conversion, and so on.  But we should =
try
to do so for all static and semi-static content.

  Definition: static content is a page that doesn't change each time it
              is converted to HTML.

              semi-static content is a page that does change when it is
	      converted, but only because it syndicates content from other
	      pages.  It does not change as a result of a user's choice (i.e.,
	      it's not CGI).

A rough layout sees the page divided in to 5 areas:

 +----------------------------------------------------------------------+
 | +------------------------------------------------------------------+ |
 | |                                                                  | |
 | |                            Header                                | |
 | |                                                                  | |
 | +------------------------------------------------------------------+ |
 | +------------+ +------------------------------------+ +------------+ |
 | |            | |                                    | |            | |
 | |  Left nav  | |                                    | | Right nav  | |
 | |            | |                                    | |            | |
 | |            | |                                    | |            | |
 | |            | |                                    | |            | |
 | |            | |                                    | |            | |
 | |            | |                                    | |            | |
 | |            | |              Body                  | |            | |
 | |            | |                                    | |            | |
 | |            | |                                    | |            | |
 | |            | |                                    | |            | |
 | |            | |                                    | |            | |
 | |            | |                                    | |            | |
 | |            | |                                    | |            | |
 | |            | |                                    | |            | |
 | |            | |                                    | |            | |
 | +------------+ +------------------------------------+ +------------+ |
 | +------------------------------------------------------------------+ |
 | |                                                                  | |
 | |                             Footer                               | |
 | |                                                                  | |
 | +------------------------------------------------------------------+ |
 +----------------------------------------------------------------------+

Probable content

  Header:	Static across all pages.  Contains the logo, the drop down
		to select languages/mirrors

  Left nav:	Provides the top level "you are here" indicator, the search
		box, and perhaps some key links that we want to appear on
		every page, irrespective of the other content of the page --
		perhaps links to the FAQ and Handbook, maybe current release
		information, and so on.  Content authors do not get to change
		this content.

  Body:		The body copy of the page.

  Right nav:	Links to information that is pertinent to the content of the
		body.  Typically:

		  * Links to related pages on the FreeBSD web site

		  * Links to related pages on other web sites

		  * Content syndicated from other portions of the site (e.g.,
		    news headlines).

		We should provide standard ways for authors to include this
		content, particularly content from other portions of the
		FreeBSD site.

  Footer:	Contact details, modification time, version number, a "report
		a problem about this page" link, copyright notice.  Can not be
		modified by the content authors.

Page description language
-------------------------

To enforce the style guidelines, don't let authors write a complete HTML
page.  Instead, give them a cut-down markup language suitable for what they
need to do, allowing us to enforce the look and feel.

My experiments so far have used the following schema:

  <page>
    <title>Title goes here</title>

      <!-- Becomes the HTML "<title>", as well as the top level heading in
           the body copy -->

    <section name=3D"news"/>

      <!-- Specifies what section of the site this page is in, for the
           purposes of highlighting in the "You are here" left hand
           navigator.  We can't do this based on the content's directory,
           because we might want to split content that's in the same
           directory in to different logical "sections".  E.g., the project
	   news pages and press pages are in the "news/" directory.  But one
	   of them is the "news" section, the other is the "press"=20
	   section. -->

    <cvs:keywords xmlns:cvs=3D"...">
      <cvs:keyword name=3D"freebsd">$FreeBSD$</cvs:keyword>
    </cvs:keywords>

      <!-- CVS keywords.  Put these in an element so we can find them in
           the stylesheets, rather than relying on structured comments,
	   and other hacks.  The URI to use in the xmlns declaration is to be
	   confirmed. -->

    <body>
      <!-- Body copy goes here.  All HTML (formally, XHTML 1.0) is valid.
           In theory, this includes letting the author write things like
=20
             <html> ... </html>

           If they want to shoot themselves in the foot like that, let=20
	   them. -->
    </body>

    <!-- Zero or more <sidebar> elements.  Each element may have a "class"
         attribute, in which case it contains standard content that
         the stylesheets will generate (and possibly document specific cont=
ent
         as well. -->

    <sidebar class=3D"release-info"/>

      <!-- Includes a "release info" box, with information about the current
           release, and links to the announcement, errata, and so on. -->

    <sidebar class=3D"news-headlines"/>

      <!-- Syndicates the news headlines from the news pages, and includes
           the top 'n' here.  We probably need a mechanism for specifying w=
hat
           value 'n' should have ('5', '10', etc). -->

    <sidebar class=3D"press-stories" count=3D"5"/>

      <!-- Syndicates the names of the most recent articles in the press,
	   in a similar fashion to the news-headlines class.=20
	  =20
	   This is one way we could specify how many items to include. -->

    <sidebar class=3D"related-local">

      <!-- Contains links to pages of related information on this web site.
           Links are wrapped up in a <links> container, with a <url> element
           denoting the link, like this. -->

      <links>
        <link href=3D"pressreleases.html">Press Releases</link>
        <link href=3D"../publish.html#newsletter">Newsletter</link>
        <link href=3D"press.html">Press articles</link>
        <link href=3D"status/status.html">Status reports</link>
      </links>
    </sidebar>

    <sidebar class=3D"related-web">

      <!-- Contains links to pages of related information on other web site=
s.
	   "related-local" and "related-web" are kept separate so that we can
	   visually distinguish between them on the web site, include
	   disclaimers, and so on. -->

      <links>
        <link href=3D"http://www.daemonnews.org/">DaemonNews</link>;
      </links>
    </sidebar>

    <sidebar>
     =20
      <!-- Use this format when the author wants to include information
	   not covered by the standard classes. -->

      <title>The title</title>

      <body>=20
        <!-- HTML -->
      </body>
    </sidebar>
  </page>

Comments?  We probably need some additional <sidebar> classes.
"security-advisories", that sort of thing.  Any suggestions?

I also thought we might want to <sidebar align=3D"left"> and align=3D"right=
", if
we wanted to let authors choose which of the left/right nav bar the content
goes in.  But I think for consistency it might be better to just insist on
sidebars only appearing on the right.

Building the web site
---------------------

The majority of files on the web site will be XML, marked up in the schema
outlined above (or whatever schema we eventually decide on).  For those fil=
es,
we will need a stylesheet that can convert them to XHTML.

However, some of the files will need an intermediate step first.  For examp=
le,
the current www/en/news/news.xml is a completely different schema.  We will
need two stylesheets; one to convert news.xml in to the <page> schema outli=
ned
above, and then the 'global' stylesheet that can convert <page>s in to HTML.

This suggests to me that naming everything with .xml extension might not be
the best way to do it.  If everything has a .xml extension, irrespective of
its schema, then we can't write Makefiles that use SUFFIX rules to=20
automatically convert documents to HTML.

For example, consider the top level index file (the site's home page), which
we'll call www/en/index.xml.  The make(1) rule for this would be something
like

    index.html: index.xml
            xsltproc stylesheet.xsl index.xml > index.html

Which we could turn in to a generic XML -> HTML suffix rule easily.

However, this breaks down when you need to pre-process the files through
multiple stylesheets.  To use the news example, where news.xml has to be
pre-processed, the Makefile fragment looks something like this:

    # news-page.xsl is the stylesheet that converts news.xml in to the
    # <page> schema.
    news.html: news-page.xsl news.xml
            xsltproc news-page.xsl news.xml | \
	            xsltproc stylesheet.xsl - > news.html

Perhaps, instead, we should say that pages have a .page suffix, and that
generic XML data has the .xml suffix.  Further, we mandate that for every
foo.xml there must also be a foo.xsl that can convert foo.xml in to foo.pag=
e.
The news example above then becomes

    news.html: news.page
            xsltproc stylesheet.xsl news.page > news.html

    news.page: news.xsl news.xml
            xsltproc news.xsl news.xml > news.page

This can easily be encapsulated in make's SUFFIX rules

    SUFFIXES: .xml .page .html

    .html.page:
            xsltproc stylesheet.xsl ${.IMPSRC} > ${.TARGET}

    .page.xml:
            xsltproc ${.IMPSRC:R}.xsl ${.IMPSRC} > ${.TARGET}

Of course, this assumes that we can go from .xml to .page in one go, without
needing any more intermediate translations.  I think that's probably the ca=
se,
but if you can think of a counter example then please say so.

There will still be a few oddities.  For example, news.rdf is generated from
news-rdf.xsl and news.xml, but I think they'll be so rare that we can speci=
al
case them in the Makefiles as necessary.

Dependencies are going to be another interesting challenge.  The HTML output
file does not (necessarily) only depend on the input .page file.  For examp=
le,
anything that has a=20

    <sidebar class=3D"news-headlines"/>

in it has a dependency on en/news/news.xml.  I think this means:

  (a)  We will need a "make depend" step in the build process.

  (b)  We will need to write a tool that can process a .page file, chase=20
       down all its dependencies, and write that out in a format that make(=
1)
       can understand.

Link checking
-------------

For internal links, I'm thinking about introducing a new <link> element.  It
would have all the same attributes and functionality as the existing <a>
element in HTML.  But, if the content was local, and the <link> had no
content, the stylesheet would parse the target document to extract its titl=
e.
Similar to <xref>'s processing expectations for DocBook.

Comments?

N
--=20
FreeBSD: The Power to Serve             http://www.freebsd.org/
FreeBSD Documentation Project           http://www.freebsd.org/docproj/

          --- 15B8 3FFC DDB4 34B0 AA5F  94B7 93A8 0764 2C37 E375 ---

--TU+u6i6jrDPzmlWF
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (FreeBSD)
Comment: For info see http://www.gnupg.org

iEYEARECAAYFAjuwsacACgkQk6gHZCw343WgtQCdFPR/NvViBx1w4f9RwAiq//1V
tWkAn2122vKw6xGtDZrZHW2Rp+ioni1t
=PhLt
-----END PGP SIGNATURE-----

--TU+u6i6jrDPzmlWF--

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-doc" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010925173240.F31744>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation