Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 10 Jun 1996 01:14:08 +0200 (SAT)
From:      Robert Nordier <rnordier@iafrica.com>
To:        terry@lambert.org (Terry Lambert)
Cc:        hackers@freebsd.org
Subject:   Re: bit 7 in filenames
Message-ID:  <199606092314.BAA00185@eac.iafrica.com>
In-Reply-To: <199606092059.NAA02136@phaeton.artisoft.com> from "Terry Lambert" at Jun 9, 96 01:59:35 pm

next in thread | previous in thread | raw e-mail | index | archive | help
Terry Lambert wrote:
> 
> > The vfatfs (== rewritten msdosfs) will not actually create files
> > containing illegal DOS filename characters.
> > 
> > Currently, however, it offers a `translate' option which does a
> > semi-intelligent mapping between characters valid on BSD and DOS.
> > 
> > (Invalid DOS filename characters are those below 0x20, as well as
> > the following sixteen:
> > 
> >      " * + , . / : ; < = > ? [ \ ] |
> > 
> > All other characters including 0x20 and characters >= 0x80 are
> > legal.)
> > 
> > With the translate option enabled, Bruce's example would be
> > acceptable, would be mapped to (say)
> > 
> >      /msdosfs/a2345678 this is a very long not to mention invalid
> >      msdos path.name
> > 
> > (which DOS itself would accept) and would result in the file
> > 
> >      A2345678.NAM
> > 
> > on a FAT filesystem.
> 
> Actually, the IFS documentation with the SDK states that a directory
> name can contain:
> 
> o   $ \ % ' - _ @ ~ ` ! ( )
> ^ ^
> | `- blank space
> `- degree symbol
> 
> A file name may contain:
> 
> o $ \ % ' - _ @ ~ ` ! ( )

Thanks.  Though the whole business of Microsoft documentation versus
Microsoft practice tends to be rather a sore point.

Over the last few years, I've disassembled and commented probably
several thousand lines of MS-DOS 3.30, 5.00, and 6.22 code, including
large chunks of 'io.sys' and 'msdos.sys', as well as (relevant
stuff) much of 'format.com' and some parts of 'fdisk.exe'.  And
recently I've also been running seemingly endless tests on 'scandisk',
in the course of developing the 'fsck_msdos' utility.

Some observations to come out of this are:

     (a) If Microsoft documents any technical details about DOS,
	 it almost invariably gets them wrong.

     (b) No two programmers at Microsoft seem to have the same idea
	 about what is and isn't legal, at least for the FAT FS.

That the Microsoft programmers don't seem to know what the $\%'-_@~`!()
the real technical details are, half the time, tends to be evident
in all sorts of ways.  For instance, the problems that have arisen
relating to use of the 0xe5 character in filenames _should_ have
been at least somewhat predictable.  And evidently whoever implemented
filename checking in 'scandisk' has his own personal ideas about
what is (0x7f) and what isn't (0x00) acceptable.... [List of further
boring and abstruse technical details reluctantly omitted.]

There is also the further issue of compatibility with various
non-Microsoft versions of DOS.  These include not only the IBM and
ex-Digital Research stuff, but systems like Mike Podanoffsky's
RxDOS and Pat Villani's DOS-C (used by the `Free-DOS Project').

I think the point is that, ultimately, it has to be a matter of
`do as we do', not `do as we say'.  Which doesn't, of course, mean
that knowing what the party line is, isn't useful and even interesting.

As regards specifics, a 

   creat("a c e g .i k", 0666);

is certainly acceptable to MS-DOS 6.22, so it is hard to know quite
what to make of the directory/file-naming distinction for the space
character, for instance.

> 
> The following special characters can also be used in long file names
> (but not short ones):
> 
> : + , ; = [ ]
> 
> Blank spaces can be anywhere in the long name, but blank spaces and
> periods at the end of a long name are ignored.
> 
> Case is preserved on storage, but ignored on lookup (DOS has seperate
> interfaces for directory lookup as opposed to file opening).
> 
> 
> I can also give you the "short name generation rules" (which aren't
> really documented anywhere).  They require directory iteration and
> use of a monotonically increasing numeric "tail" substitution into
> the file name (not affecting the extension, if any).
> 
> 
> I have somewhat of an advantage, having been involved in a project
> that ported the Heidemann framework and some of the FS modules and
> most of the BSD FS kernel environment to Windows 95.  8-).

I'd certainly appreciate all the information you can supply, if you
don't mind taking the trouble.  I've tested a lot of Linux, Mach,
NetBSD, and GNU DOS FS-related code in the last few months, and what is
particularly evident is a lack of rigorous attention to detail.  Besides
that, even in the generalities, I'm such I could learn a lot from your
experience.

> The conversion to parsed-path stuctures greatly aids in use of
> Unicode and DOS code-page interoperability... you will need to
> incorporate a number of patches if you expect to be able to
> support two name binding, lookup, or Unicode storage (We have a
> UFS where we have made these modifications).

Yes, this is an area in the new vfatfs implementation that still needs
work.

-- 
Robert Nordier



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199606092314.BAA00185>