Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 13 Mar 1998 17:03:42 +0000
From:      nik@iii.co.uk
To:        Jun Kuriyama <kuriyama@opt.phys.waseda.ac.jp>
Cc:        freebsd-doc@FreeBSD.ORG
Subject:   Re: Convert to DocBook (was Re: ps2pdf)
Message-ID:  <19980313170342.60995@iii.co.uk>
In-Reply-To: <35095A81.F35369D9@opt.phys.waseda.ac.jp>; from Jun Kuriyama on Sat, Mar 14, 1998 at 01:10:41AM %2B0900
References:  <199803052049.PAA00460@hawk.pearson.udel.edu.> <19980305215748.36179@iii.co.uk> <19980311170823.64512@shale.csir.co.za> <19980311152540.43318@iii.co.uk> <35095A81.F35369D9@opt.phys.waseda.ac.jp>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Mar 14, 1998 at 01:10:41AM +0900, Jun Kuriyama wrote:
> nik@iii.co.uk wrote:
> > I hope it will be. Right now I'm just waiting for someone from the Japanese
> > team to get back to me about some issues regarding the DocBook conversion.
> 
>   Is it "Whether there is a problem to treat Japanese with DocBook or
> not?" that you say?

No. Well, not quite.

What follows is part of a message I've sent to a couple of people, which 
covers the situation pretty much as I see it.

My specific questions are:

  1. Has anyone on the Japanese side used Jade to process SGML with
     the japanese characters suitably encoded? Did it work?

  2. Do you see any problems with the approach used (i.e., a series of
     commits, each one concerning itself with just one structural change,
     and those changes automated where ever possible)?

Comments on this welcome (to nik@freebsd.org if possible, I've set the
reply-to header accordingly).

  I've been working on getting the handbook converted from the LinuxDoc
  DTD to DocBook, cleaning up the markup, reorganising the file structure
  and so on.

  Inevitably, this is going to have a large impact on the work you (and
  others) have been doing to get the handbook translated. 

  Before I start actually doing anything serious (and coming close to
  comitting stuff) I reckon I need to work out the issues involved with
  you (and anyone else you think needs to be brought in on this) to 
  ensure that the process is as painless for all involved as possible.

  To give you some idea of what this entails, here's an outline of where 
  I am at the moment.

  Trying to convert the handbook from LinuxDoc to DocBook on a file
  by file basis won't work. At least, not without an immense amount of
  grief.

  Instead, the aim is to use John Fieber's linuxdoc-docbook translation
  specification and the 'instant' program to do the conversion. This has
  a number of 'interesting' consequences.

    1. All the entities will be expanded during the conversion, so things
       like &a.jkh; will be lost.

       I have a work around for this, it's not a problem.

    2. All the comments in the individual source files will be lost. This
       is (potentially) a problem, since some of those comments contain
       copyright notices, reminders and so forth.

       They will need to be manually added back in.

       I've spent some time trying to see if there's a way to pass the
       comments straight through, but I don't see any way of doing it.

    3. The result of this conversion is a syntactically valid but very
       ugly DocBook file. DocBook is a more expressive markup language 
       than LinuxDoc, but the translation doesn't take advantage of this
       expressiveness (it can't).

    4. The converted file is not (yet) ready to be converted to HTML, 
       some of the entity definitions need putting back.

  However, this process has the one redeeming feature that it's completely
  automatic. This should allow anyone working on a translation to do the
  same thing on the translated version of the document and get similar
  results.

  Once this has been done, the translated file will be cleaned up. So far,
  I've identified the following stages in the clean up process 

    - Reindent the file so that the structure of the document is more
      apparent. This can be automated (in [x]emacs).

    - Reformat ('fill') the paragraphs. Mostly for consistency. This can
      be automated (in [x]emacs).

    - Add in the missing entity definitions. This is things like

          <!ENTITY % authors SYSTEM "authors.sgml">
          %authors;

      which got stripped in the conversion process. This has to be done
      by hand.

    - Replace comments that were in the original files, but have not been
      moved in to this file. This has to be done by hand, but may not
      be necessary (see above).

    - Split this large file out in to its component parts. This split
      does not (and probably should not) correspond with the way the
      handbook is currently split.

      I'm planning on doing this along the lines of the way the handbook
      is logically split at the moment. Each 'part' gets its own 
      directory, and each chapter is a file in that directory. Parameter
      entities can be used (as they are now) to refer to individual files.

      It should be possible to automate this.

    - Add an Emacs local variables section to the first line of each split
      file. This would look something like

 <!-- -*- sgml-parent-document: ("../handbook.sgml" "" "part" "book") -*- -->

      Not strictly necessary, but it does make it easier to handle the 
      document in [x]emacs.

    - Add a DOCTYPE to each of the split files, but do it inside a marked
      section. That way each split file can be processed individually,
      or the entire handbook can be processed as one.

      I see this as benefitting people who want to see what their changes
      to a chapter or part look like, but who don't want to have to 
      regenerate the entire handbook each time they want to test a change.

      This would involve adding something like

          <![ %doctype [
	    <!DOCTYPE CHAPTER PUBLIC "-//Davenport//DTD DocBook V3.0//EN">
	  ]]>

      to the beginning of each split file. When processing the handbook as
      a whole, %doctype would expand to 'IGNORE', but when the user wants
      to process an individual file they would set it to 'INCLUDE'.

      This can (probably) be automated.

  Each one of these steps is a seperate commit.

  Then, go through the converted document, looking for markup that's either
  incorrectly used, or too generic.

  I envisage this happening in stages, with one particular type of markup 
  being examined at a time.

  For example, there are many areas where 

      <emphasis role="tt">/a/file/name</emphasis>

  is used (in the converted document) where

      <filename>/a/file/name</filename>

  would be the correct approach. Each commit would aim to fix all occurences
  of this one problem. So there'd be one commit to fixup filename 
  references, one commit to fixup 'notes', one commit to fixup 'warnings'
  and so on.

  At this point, no content changes have been made to the Handbook, it's
  all markup changes. Hopefully these are easier to track and/or 
  automatically merge (although I can see how that might be problematical).

  I'm planning on all this happening on a separate CVS branch, so it 
  doesn't interfere with the 'live' handbook.

  To convert the handbook to HTML, James Clark's SGML/DSSSL processor
  'jade' will be used, along with Norm Walsh's modular DocBook 
  stylesheets.

  The stylesheets work relatively well, although there are probably a few
  knobs that need tweaking to customise the HTML output. However, I have
  no idea what they'll do when presented with encoded Japanese text. Is
  this something you've looked at?


-- 
Work: nik@iii.co.uk                       | FreeBSD + Perl + Apache
Rest: nik@nothing-going-on.demon.co.uk    | Remind me again why we need
Play: nik@freebsd.org                     | Microsoft?

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-doc" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19980313170342.60995>