From owner-freebsd-doc Wed Jul 10 16:26:45 1996 Return-Path: owner-doc Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id QAA25431 for doc-outgoing; Wed, 10 Jul 1996 16:26:45 -0700 (PDT) Received: from Fieber-John.campusview.indiana.edu (Fieber-John.campusview.indiana.edu [149.159.1.34]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id QAA25425 for ; Wed, 10 Jul 1996 16:26:42 -0700 (PDT) Received: from localhost (jfieber@localhost) by Fieber-John.campusview.indiana.edu (8.7.5/8.7.3) with SMTP id SAA04819; Wed, 10 Jul 1996 18:25:31 -0500 (EST) X-Authentication-Warning: Fieber-John.campusview.indiana.edu: jfieber owned process doing -bs Date: Wed, 10 Jul 1996 18:25:26 -0500 (EST) From: John Fieber X-Sender: jfieber@Fieber-John.campusview.indiana.edu To: Wolfram Schneider cc: doc@FreeBSD.org Subject: Re: FYI: IDML In-Reply-To: <199607091430.QAA24241@caramba.cs.tu-berlin.de> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-doc@FreeBSD.org X-Loop: FreeBSD.org Precedence: bulk On Tue, 9 Jul 1996, Wolfram Schneider wrote: > http://www.identify.com/welcome/idml-faq.html EEEeeeeeewwwwww!! This makes my stomach turn. Not only is it a brain dammaged application of SGML, it amounts to nothing more than a database with a fixed set of field that are woefully inadequate for describing much of anything useful. Just consider the SUBJECT attribute. First, it specifies "no more than three (3)". Well, I'm sorry, but thats a pretty lame restriction to place on someone categorizing something and, the way they set it up, there is no way to enforce the rule. SGML could enforce it *if* they bothered to use SGML properly. If that isn't enough, their pre-defined subject categories are an utter insult. The LC subject headings take up 4 large volumes, each about 4 inches thick with small print and even they can't begin to capture many subtlties required in distinguishing entities. Then you have things like the National Library of Medicine subject headings, a 3 inch thick fine-print listing of subject categories just within the field of medicine! And these identity people think that a couple hundred headings are sufficient for everything anyone would want to put on the internet? Okay, then look at the LOCATION and LANGUAGE attributes. They too have severly limited canned lists of countrys, place names and languages. The US Geological Survey geographic names database takes up a whole CD-ROM with millions of entries just for the United States. The Library of Congress language codes for use in MARC records is much longer than the ISO list they use. What these other languages. But wait, there is more! What about the KEYWORDS attribute? Isn't this somewhat redundant with the SUBJECT field? I'm not aware of any study that shows searching and uncontroled keyword vocabulary as being any more effective than free text searching. If you want to look at some *useful* discussion of metadata standards, look at http://www.nlc-bnc.ca/ifla/II/metadata.htm. In particular, the Dublin Core has a proposal for HTML files that uses a slightly modified tag and is much better thought out than this IDML crap. Library science has been researching this sort of thing for decades and there is a plenty of sound research and literature on the subject. Just say NO to IDML! Oh, and by the way, an underscore (_) is NOT permitted in a tag name (eg ) or attribute name (eg STREET_ADDRESS) according to the reference standard SGML declaration, or the SGML declaration used by HTML. So much for being HTML compatible.... -john == jfieber@indiana.edu =========================================== == http://fallout.campusview.indiana.edu/~jfieber ================