FreeBSD Mail Archives

Date:      Fri, 25 Jun 1999 10:45:00 +0100
From:      Nik Clayton <nclayton@lehman.com>
To:        Jun Kuriyama <kuriyama@sky.rim.or.jp>, doc@freebsd.org
Cc:        freebsd-translate@ngo.org.uk, jdp@freebsd.org
Subject:   Re: Resolution:  FDP reorganisation
Message-ID:  <19990625104500.F15628@lehman.com>
In-Reply-To: <37724BD3.D1C377CC@sky.rim.or.jp>; from Jun Kuriyama on Fri, Jun 25, 1999 at 12:16:35AM %2B0900
References:  <19990623231441.N42442@catkin.nothing-going-on.org> <37724BD3.D1C377CC@sky.rim.or.jp>

Hi,

On Fri, Jun 25, 1999 at 12:16:35AM +0900, Jun Kuriyama wrote:
> Nik Clayton wrote:
> > I will then RE-IMPORT the non-English docs into the tree into the new
> > directories.  This means that all the non-English docs will revert to
> > revision 1.1 when you next see them.
> 
> I don't like this.  Commit logs for Japanese files include many
> "Submitted by:" record which has contributor's name.  If it is possible,
> I want it to be reserved.

OK.  There's two bits of information here.

The first is the record that <foo@bar> has submitted changes to the 
documentation.  This is relatively easy to preserve, and actually should
be, probably as another section in "Contributors to FreeBSD" chapter.
Certainly the members of the translation teams that have put so much effort
in to translating the documentation should be recognised, and I have 
absolutely no problem with that whatsoever.

The second is tieing <foo@bar> to a specific piece of translation.  If
we're starting the non-English repositories from scratch this is obviously
much more difficult.

Note that it's not impossible.  I am prepared to take a local copy of the
CVS tree, generates diffs for *every single version* of the translated 
documentation, and then recommit it with the original commit message,
including the "Submitted by" line.

Obviously, we'd lose the record of who did the actual commit (because it
would be me) and the precise time that the commit happened would be lost.
But the CVS deltas would be preserved.

As I say, if you want that, I'll do it (although it's a lot more work. . .)

Which would the Japanese team like (and, to the other translation teams
reading this, I'm prepared to make the same effort for your docs as well,
if that's what you want).

> > [1] There are two changes to that message.  In conjunction with the recently
> >     committed change that moved /usr/share/local/zh_TW.BIG5 to zh_TW.Big5,
> >     the "Big5" variant will be used in the new repository as well.
> > 
> >     And after comments from the Japanese Doc. Proj., the directory for
> >     Japanese docs in the repository will be ja_JP.EUC-JP.
> 
> Is it difficult to keep it as "ja"?  

Yes.  To reiterate, here's some functionality that I want to add that would
not be possible if it stays as "ja".

Two encodings, eucJP and SJIS.  We can mechanically convert between the 
two, however, the Japanese team have decided that the primary format
will be eucJP.

That does not preclude a Japanese admin from wanting to install docs in the
SJIS encoding.  They might want to install these docs *instead of* the
eucJP docs, or they might want to install these docs *as well as* the eucJP
docs.

So, we need a way of allowing the admin to install documentation in two 
formats.  

Well, we could do this by defaulting to "ja", and installing the SJIS docs
in to somewhere like ".../docs/ja_JP.SJIS".  But then we've got two 
directories, one called "ja", one called "ja_JP.SJIS".  

More consistent would be to install one in to ".../docs/ja_JP.eucJP" and
the other in to ".../docs/ja_JP.SJIS".  The admin can then use whichever
one they prefer, and, if they have a local preference as to which one they
want as a default then can make a symlink from ".../docs/ja" to point to
whichever one of those directories they prefer.

That's the argument for (eventually) allowing the encoding name to appear
in the installation directory -- but please note that the "ja" name will
still be retained, so ports and other things don't need to be changed to
cope with this.  

What's the argument for separating the directories in the CVS repository?

<hypothetical>

Suppose that we only had Japanese in there, in a "doc/ja" directory,
encoded with eucJP.  In that directory is a Makefile, supporting the
standard targets (all, install, clean, etc).  

Everything works fine.  This is pretty much what we have now.

Suppose you want to provide hooks to let the admin automatically convert
the docs from eucJP to SJIS.  As Satoshi points out, we have tools in
the ports tree that can do this.

How do you do it?

Well, the simplest approach is probably to patch the Makefile.  Add a new
variable (perhaps called ${ENCODING}, or something like that) that holds
the default encoding.  And add a test in the Makefile that says something
like "If ${ENCODING} is 'SJIS' then run the docs through the encoding 
converter first, and then convert them to HTML or whatever".

Nice and simple.  But also wrong.  The interface exposed by the Makefile
has changed.  Instead of just being able to do "make install" you now have
this ${ENCODING} variable which you might need to set.

But, it's not the end of the world.  It'll still work -- you might need to
write a bit more documentation about it to explain how it works, and it
breaks POLA a little bit, but we've done worse.

Now introduce English language docs in to the equation.  Well, not a lot
changes.  The English language docs can be in the repository under 
".../doc/en", which is what we have now.

OK, so the Makefile for the English docs won't support the ${ENCODING}
variable.  So there's now a small difference between the interface exported
by the two different Makefiles.  But it's not really a big deal, you can
live with it.

By now you've probably written a top level Makefile to help you build the
documentation.  It probably has a ${LANG} variable containing "en ja", 
which it uses to iterate over the subdirectories.

Also, there's probably a test to see if ${LANG} is "ja", and if it is 
to build the docs twice, once with ${ENCODING} set to eucJP, and once 
with it set to SJIS.  Of course, the English languages docs don't need
this test.

You can see how trying to keep all the Japanese encodings under one 
directory is starting to cause small kludges and work arounds to appear
elsewhere, right?

But, it all still works, pretty much.  It's not completely intuitive,
and there are small foibles to bite the unwary who don't read the Makefile
source, but it's all working, and people have got used to it.

Now introduce Chinese in to the picture.  Things start getting more 
complicated.

Suppose that we have just one translation of the Chinese documents (say,
'Big5').  Well, that's not a problem, we can just stick them under 
"doc/zh", and carry on as normal.

Indeed you can.

And then someone comes along with the Chinese documentation translated
in to the other encoding format.

Shit.  What do you do?

Well, you can't do the same thing you did for the Japanese docs.  As 
others have pointed out, you can't mechanically translate between the
two encodings.  You're actually going to have to store this documentation
somewhere in the CVS tree.

How do you do it?  Well, you could put it under zh/.  Create two new
directories under there, big5/ and euc/ perhaps.  Then move the existing
docs under the big5/ directory, and import the new docs under euc/.

OK, that'll work.  But suddenly you're supporting encoding information for
two different languages (Japanese and Chinese) in two completely different
ways.  The code in your Makefiles is going to be different, which is a 
pain to maintain, and you've got to document these two approaches, so 
that other people don't get bitten by them.

Hmm.  Is there a better way?  Well, you could create a new top level
directory, say zh_EUC or something like that, and import the docs under
there instead.  So now you've got a "zh/" and a "zh_EUC/" directory.

It's not particularly clean, and it's different from how you've handled
the Japanese encoding issues, but hey, it works, so what the hell.

</hypothetical>

The above might look a little bit familiar :-)

I want to break out of this cycle, and impose a cleaner structure, one that
should be more future proof than what we have at the moment, be internally
consistent, easier to code, easier to document, and ready for Unicode when
the time comes.

Whichever way we do this is going to cause some pain.  It should probably
have been done a few years ago -- I should certainly have raised this 
sooner, and in my defence I'll offer up the DocBook conversion, which 
took a large chunk of my available FreeBSD time.

The more we postpone this, the more painful it'll be when the time comes.
I'd like to bite the bullet and get on with it.

> As Satoshi said, we need more discussion for correct encoding name.  

Subject to the proviso that it starts "ja_JP." I'm not fussed what the
encoding name is, and you and the other translators are in a much better
position to suggest the right thing than I am, so I'll go with your
choice.  The two favourites at the moment seem to be "eucJP" and
"SJIS".

N
-- 
--+==[ Systems Administrator, Year 2000 Test Lab, Lehman Brothers, Inc. ]==+--
--+==[      1 Broadgate, London, EC2M 7HA     0171-601-0011 x5514       ]==+--
--+==[              Year 2000 Testing: It's about time. . .             ]==+--

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-doc" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19990625104500.F15628>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation