Date: Sat, 24 Jan 2009 13:53:38 -0800 From: Murray Stokely <murray@stokely.org> To: Chuck Robey <chuckr@telenix.org> Cc: freebsd-chat@freebsd.org Subject: Re: text formatting tools. Message-ID: <2a7894eb0901241353l56be13b4s9860b9e949bc9ec2@mail.gmail.com> In-Reply-To: <497B77C7.90001@telenix.org> References: <497B77C7.90001@telenix.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Jan 24, 2009 at 12:19 PM, Chuck Robey <chuckr@telenix.org> wrote: > I didn't want to get all that oud about xml/xsl, because I felt that given time, > hopefully, better tools would appear. While the ability to spend money on that > has hugely expanded, and the number of incompatible macro sets have also hugely > appreared, the minimum size of any free software toolsets remains gigantic. If > I'm wrong here, PLEASE, tell me, I would be only too happy to be proved wrong, I think your criticism of the distribution size of the tools is accurate but you are focusing on a dimension that the rest of the world has chosen to ignore in the era of 1TB disk drives. You are correct that any XML/XSLT based solution uses far more disk space than any groff based solution. I do not think many people care about this. Separation of content and presentation is worth far more to me then a few bits on a disk, to say nothing of the greater portability and programmability of XML. > Documentation? Well, I could point to the book named "Unix Text Processing", by > Dougherty/O'Reilly. It's out of print, which is actually really pretty nice, Would be a minority of O'Reilly books if that one was typeset with groff rather than DocBook XML. > OK, I've described 2 of my reasons for liking it, that it's relatively tiny, and > that it's far more flexibile in allowing an author to take their own approach. First reason is granted, but I think the second reason depends on a very particular definition of "flexible" and that many reasonable people would disagree with this and argue XML is the more flexible solution. > The fact that xml forces one to regard a document more like it is a database is > probably a good thing for things like Web pages which are actually electronic > salespeople, but it's a LOUSY method to force upon authors. Most books just > aren't approached with preplanning and hierarchical control which is an endemic > requirement for a sales database tool. So if you're not writing something like Technical manuals are generally highly hierarchical, as are most books actually with parts, chapters, sections, and paragraphs. Even those items need not be imposed on anyone with an XML/XSL tool. > "newegg.com", well, maybe you do like it, but I never, ever, heard of anyone > using any approach like this in any major piece of fiction, at least before some > businesses (in another case of follow the leaderism) required it. Just like > many commercial companies require you use MicroSoft Word, nothing but marketing > propaganda. Heard of this before? I think the MS connection is a pretty big leap as is a reduction without argument of XML's benefits to newegg.com. > I know we use this tool in our very good tool, the handbook. So, what we've > done is deny to a large number of folks the ability to format the handbooks > unelss they're willing to install a set of enormous tools. Used to be the > Handbook formatted directly out of the OS with no added tools needed. Think > that's difficult for a non-fiction tool? Ask Richard Stevens ghost, because his > books could have been formatted using only the base FreeBSD IS also. Sure, your problem could be solved by importing more XML tools into the base system, but I think that is the opposite direction we are going in. A number of base system tools are in FreeBSD because they were historically part of BSD but would today be kept as ports/packages if they weren't already there. LiveCD distributions such as PC-BSD could have a much larger base system pre-installed if this is something you seem to care a lot about in an operating system distribution. > OK, I don't know of any negative to using groff, except that you don't get to > point at your toolset and claim it's the latest. All that internationalization, I can think of dozens of reasons why we're not using groff for the Handbook. Off the top of my head I'll list a few : 1. How would you identify the first occurrence of each technical acronym in the Handbook so that it could be rendered with a mouseover definition or link to the glossary in hyper-text versions of the Handbook (only the first occurrence because these presentation details would be distracting and make it difficult to read if applied to every occurrence). 2. How do you programatically extract all of the Armenian FTP sites mentioned in a groff version of the Handbook? (so they can be listed on the web site separately). 3. How do you pull in content from other sites on the net and dynamically include parts of it each time you rebuild the document in a structured way? (e.g. the way we pull in external RSS feeds on the website, the way we pull in the results of the latest kernel stress tests to add to the release TODO page, etc..) 4. How do you render the same content in multiple presentation styles in the same output format? E.g. maybe one web based version with one color scheme for the website, and another web based version with a different layout and color scheme? Or one with per-chapter table of contents and one with only a per-book table of contents for a printed format? All configurable with make flags to the build script and with the key separation of content and presentation since different people with very different skill sets will be responsible for those two tasks in general. 5. How do you generate texts for electronic book readers, open office, or other modern formats? I use groff occasionally, but am a novice, so I am sure there are solutions to some of these problems, but the ones I know of are clearly sub-optimal. > Groff even produces html, and it does a really bang-up job of formatting ASCII > text pages, something which xml tools have never been able to do. I just don't Sure, but those are basic output formats we've supported for a decade with XML based tools. What about Amazon Kindle ebooks? Mobi ebooks? OpenOffice documents? We distribute more than just those three very basic output formats. > get the reason to go with xml, except a bad case of follow the leader. What's > the benefit that the users, or even the authors, accrue? And don't fail to Why don't you ask the publisher of the book you just cited. Or better yet, the author of groff, James Clark, that moved on to write most of the open source SGML/XML tools we use in building the handbook. I must admit to not following him closely and only reading his blog rarely -- did he work for Microsoft or something? Still trying to find where that connection comes in. > realize that our groff cames with a set of ancillary tools like "pic", to be a > very good job of doing technical drawings. That's what Richard Stevens did, so > don't argue that it's either impossible, or even difficult to do well. If you > argue this, please drop all the marketing propaganda, drop all references to Richard Stevens is a highly technical network engineer. He created great figures as people often do with pic. Whether you are using groff or LaTeX or XML tools however you can hardly argue that manual editing of a programming language is a better way to generate diagrams than a graphical tool for most needs. Sure I get great figures with pic or pstricks but some of my best figures are drawn with OmniGraffle in a fraction of the time. > God, the amount of marketing crap that has gone out to push dynamic features > (which web pages really do need) upon paper authors is impressive, but I never > saw any use of this in any piece of fiction, or even in any technical > dissertation, anything not destined for presentation via paper. Many companies > depend on this for their future, so I'm skeptical. This seems to change the scope of your argument significantly. If you are now conceding the general usefulness of XML for things like Handbook and only saying it is overkill for a paper-only document then I'd tend to agree. I'd go straight to LaTeX, but many would go straight to groff. To each his own. Kind of makes me wonder why all the ranting about newegg.com and microsoft and evil xml vendors. > Show me a book which needs these features, a book that would be better written Any book published by O'Reilly -- because they need to publish not just PDFs but hyperlinkable electronic book versions in addition to dead tree versions. > to a complicated monstrosity (which our handbook did) was a mistake in strategy, > trying to go the popular way. Making it so the number of folks who can format > the sources is limited to folks which have the resources. You've done a very poor job in this rant of pointing out any advantages to using groff for the Handbook over DocBook XML, and I think you know this which is why you sent it to an unrelated list. I will grant you that you listed one solid advantage in this mail. The tools use less disk space. I assure you that your points will be discussed and listened to if you try again without all the ranting and weird logical fallacies and if you post it to the appropriate place, doc@FreeBSD.org. Thanks! - Murray
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2a7894eb0901241353l56be13b4s9860b9e949bc9ec2>