From owner-freebsd-hackers  Thu Aug  1 12:31:25 1996
Return-Path: owner-hackers
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id MAA02732
          for hackers-outgoing; Thu, 1 Aug 1996 12:31:25 -0700 (PDT)
Received: from eac.iafrica.com (196-7-101-132.iafrica.com [196.7.101.132])
          by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id MAA02707
          for <hackers@freebsd.org>; Thu, 1 Aug 1996 12:31:16 -0700 (PDT)
Received: (from rnordier@localhost) by eac.iafrica.com (8.6.12/8.6.12) id VAA00586; Thu, 1 Aug 1996 21:28:08 +0200
From: Robert Nordier <rnordier@iafrica.com>
Message-Id: <199608011928.VAA00586@eac.iafrica.com>
Subject: Re: anyone working on upgrading the msdosfs to NetBSD levels?
To: terry@lambert.org (Terry Lambert)
Date: Thu, 1 Aug 1996 21:28:07 +0200 (SAT)
Cc: hackers@freebsd.org
In-Reply-To: <199607311829.LAA02458@phaeton.artisoft.com> from "Terry Lambert" at Jul 31, 96 11:29:44 am
X-Mailer: ELM [version 2.4 PL24 ME8a]
Content-Type: text
Sender: owner-hackers@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

Terry Lambert wrote:
> 
> > The FAT fs primitives are done and tested and I'm currently working
> > on VFAT support.
> 
> Before you get too far on that, I have the algorithm it uses to avoid
> short name name space collisions.  It's not pretty, but it works, and
> I think that was all Microsoft really cared about.

> I'm also not sure about the long name space storage which is in
> ISO-10646/16 (16 bit Unicode), since it is not possible to pass
> Unicode across the lookup interface (this will be a problem for
> any NTFS as well -- Linux is unfortunately way ahead of BSD here).
 
> I can't help with the Unicode stuff given the current state of the BSD
> VFS; my suggestion is to punt, and treat the high byte as zero in all
> cases, converting it to ISO-8859-1 (Latin 1).  This will damage utility
> for anyone outside the Latin 1 scope, but that can't be helped without
> the underlying VFS changes (appologies to non-Latin 1 using countries
> up front).
> 
> If you get to where you need to work on name collision, let me know,
> and I can describe the algorithm in a couple of pages.

I was doing some work on this just recently.  When you have the
time, I'd appreciate your description.  There may be a few points
that my derived algorithm misses.

Following your suggestion of dropping the Unicode high byte, a
primary concern is that this will itself lead to name space
collisions.

I'm a bit vague on the complete range of encodings, but I assume
that LFNs could coexist in a directory where the only difference
is in bits 8-15, which are then masked off.

Alternatively, masking off the high byte may result in a value of
(binary) zero embedded in the LFN, or something equally undesirable.

Won't this entail a further algorithm to produce distinct BSD LFN
representations, or do you forsee another way around this?

--
Robert Nordier