Date: Fri, 25 Jun 1999 10:45:00 +0100 From: Nik Clayton <nclayton@lehman.com> To: Jun Kuriyama <kuriyama@sky.rim.or.jp>, doc@freebsd.org Cc: freebsd-translate@ngo.org.uk, jdp@freebsd.org Subject: Re: Resolution: FDP reorganisation Message-ID: <19990625104500.F15628@lehman.com> In-Reply-To: <37724BD3.D1C377CC@sky.rim.or.jp>; from Jun Kuriyama on Fri, Jun 25, 1999 at 12:16:35AM %2B0900 References: <19990623231441.N42442@catkin.nothing-going-on.org> <37724BD3.D1C377CC@sky.rim.or.jp>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi, On Fri, Jun 25, 1999 at 12:16:35AM +0900, Jun Kuriyama wrote: > Nik Clayton wrote: > > I will then RE-IMPORT the non-English docs into the tree into the new > > directories. This means that all the non-English docs will revert to > > revision 1.1 when you next see them. > > I don't like this. Commit logs for Japanese files include many > "Submitted by:" record which has contributor's name. If it is possible, > I want it to be reserved. OK. There's two bits of information here. The first is the record that <foo@bar> has submitted changes to the documentation. This is relatively easy to preserve, and actually should be, probably as another section in "Contributors to FreeBSD" chapter. Certainly the members of the translation teams that have put so much effort in to translating the documentation should be recognised, and I have absolutely no problem with that whatsoever. The second is tieing <foo@bar> to a specific piece of translation. If we're starting the non-English repositories from scratch this is obviously much more difficult. Note that it's not impossible. I am prepared to take a local copy of the CVS tree, generates diffs for *every single version* of the translated documentation, and then recommit it with the original commit message, including the "Submitted by" line. Obviously, we'd lose the record of who did the actual commit (because it would be me) and the precise time that the commit happened would be lost. But the CVS deltas would be preserved. As I say, if you want that, I'll do it (although it's a lot more work. . .) Which would the Japanese team like (and, to the other translation teams reading this, I'm prepared to make the same effort for your docs as well, if that's what you want). > > [1] There are two changes to that message. In conjunction with the recently > > committed change that moved /usr/share/local/zh_TW.BIG5 to zh_TW.Big5, > > the "Big5" variant will be used in the new repository as well. > > > > And after comments from the Japanese Doc. Proj., the directory for > > Japanese docs in the repository will be ja_JP.EUC-JP. > > Is it difficult to keep it as "ja"? Yes. To reiterate, here's some functionality that I want to add that would not be possible if it stays as "ja". Two encodings, eucJP and SJIS. We can mechanically convert between the two, however, the Japanese team have decided that the primary format will be eucJP. That does not preclude a Japanese admin from wanting to install docs in the SJIS encoding. They might want to install these docs *instead of* the eucJP docs, or they might want to install these docs *as well as* the eucJP docs. So, we need a way of allowing the admin to install documentation in two formats. Well, we could do this by defaulting to "ja", and installing the SJIS docs in to somewhere like ".../docs/ja_JP.SJIS". But then we've got two directories, one called "ja", one called "ja_JP.SJIS". More consistent would be to install one in to ".../docs/ja_JP.eucJP" and the other in to ".../docs/ja_JP.SJIS". The admin can then use whichever one they prefer, and, if they have a local preference as to which one they want as a default then can make a symlink from ".../docs/ja" to point to whichever one of those directories they prefer. That's the argument for (eventually) allowing the encoding name to appear in the installation directory -- but please note that the "ja" name will still be retained, so ports and other things don't need to be changed to cope with this. What's the argument for separating the directories in the CVS repository? <hypothetical> Suppose that we only had Japanese in there, in a "doc/ja" directory, encoded with eucJP. In that directory is a Makefile, supporting the standard targets (all, install, clean, etc). Everything works fine. This is pretty much what we have now. Suppose you want to provide hooks to let the admin automatically convert the docs from eucJP to SJIS. As Satoshi points out, we have tools in the ports tree that can do this. How do you do it? Well, the simplest approach is probably to patch the Makefile. Add a new variable (perhaps called ${ENCODING}, or something like that) that holds the default encoding. And add a test in the Makefile that says something like "If ${ENCODING} is 'SJIS' then run the docs through the encoding converter first, and then convert them to HTML or whatever". Nice and simple. But also wrong. The interface exposed by the Makefile has changed. Instead of just being able to do "make install" you now have this ${ENCODING} variable which you might need to set. But, it's not the end of the world. It'll still work -- you might need to write a bit more documentation about it to explain how it works, and it breaks POLA a little bit, but we've done worse. Now introduce English language docs in to the equation. Well, not a lot changes. The English language docs can be in the repository under ".../doc/en", which is what we have now. OK, so the Makefile for the English docs won't support the ${ENCODING} variable. So there's now a small difference between the interface exported by the two different Makefiles. But it's not really a big deal, you can live with it. By now you've probably written a top level Makefile to help you build the documentation. It probably has a ${LANG} variable containing "en ja", which it uses to iterate over the subdirectories. Also, there's probably a test to see if ${LANG} is "ja", and if it is to build the docs twice, once with ${ENCODING} set to eucJP, and once with it set to SJIS. Of course, the English languages docs don't need this test. You can see how trying to keep all the Japanese encodings under one directory is starting to cause small kludges and work arounds to appear elsewhere, right? But, it all still works, pretty much. It's not completely intuitive, and there are small foibles to bite the unwary who don't read the Makefile source, but it's all working, and people have got used to it. Now introduce Chinese in to the picture. Things start getting more complicated. Suppose that we have just one translation of the Chinese documents (say, 'Big5'). Well, that's not a problem, we can just stick them under "doc/zh", and carry on as normal. Indeed you can. And then someone comes along with the Chinese documentation translated in to the other encoding format. Shit. What do you do? Well, you can't do the same thing you did for the Japanese docs. As others have pointed out, you can't mechanically translate between the two encodings. You're actually going to have to store this documentation somewhere in the CVS tree. How do you do it? Well, you could put it under zh/. Create two new directories under there, big5/ and euc/ perhaps. Then move the existing docs under the big5/ directory, and import the new docs under euc/. OK, that'll work. But suddenly you're supporting encoding information for two different languages (Japanese and Chinese) in two completely different ways. The code in your Makefiles is going to be different, which is a pain to maintain, and you've got to document these two approaches, so that other people don't get bitten by them. Hmm. Is there a better way? Well, you could create a new top level directory, say zh_EUC or something like that, and import the docs under there instead. So now you've got a "zh/" and a "zh_EUC/" directory. It's not particularly clean, and it's different from how you've handled the Japanese encoding issues, but hey, it works, so what the hell. </hypothetical> The above might look a little bit familiar :-) I want to break out of this cycle, and impose a cleaner structure, one that should be more future proof than what we have at the moment, be internally consistent, easier to code, easier to document, and ready for Unicode when the time comes. Whichever way we do this is going to cause some pain. It should probably have been done a few years ago -- I should certainly have raised this sooner, and in my defence I'll offer up the DocBook conversion, which took a large chunk of my available FreeBSD time. The more we postpone this, the more painful it'll be when the time comes. I'd like to bite the bullet and get on with it. > As Satoshi said, we need more discussion for correct encoding name. Subject to the proviso that it starts "ja_JP." I'm not fussed what the encoding name is, and you and the other translators are in a much better position to suggest the right thing than I am, so I'll go with your choice. The two favourites at the moment seem to be "eucJP" and "SJIS". N -- --+==[ Systems Administrator, Year 2000 Test Lab, Lehman Brothers, Inc. ]==+-- --+==[ 1 Broadgate, London, EC2M 7HA 0171-601-0011 x5514 ]==+-- --+==[ Year 2000 Testing: It's about time. . . ]==+-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-doc" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19990625104500.F15628>