From owner-freebsd-hackers Fri Aug 28 10:45:08 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id KAA22114 for freebsd-hackers-outgoing; Fri, 28 Aug 1998 10:45:08 -0700 (PDT) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from alpo.whistle.com (alpo.whistle.com [207.76.204.38]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id KAA22092 for ; Fri, 28 Aug 1998 10:45:03 -0700 (PDT) (envelope-from julian@whistle.com) Received: (from daemon@localhost) by alpo.whistle.com (8.8.5/8.8.5) id KAA10592; Fri, 28 Aug 1998 10:42:37 -0700 (PDT) Received: from current1.whistle.com(207.76.205.22) via SMTP by alpo.whistle.com, id smtpdy10586; Fri Aug 28 17:42:32 1998 Date: Fri, 28 Aug 1998 10:42:29 -0700 (PDT) From: Julian Elischer To: jallison@engr.sci.com cc: archie@whistle.com, freebsd-hackers@FreeBSD.ORG Subject: Re: Warning: Change to netatalk's file name handling (fwd) Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG ---------- Forwarded message ---------- Date: Fri, 28 Aug 1998 00:36:04 +0200 (CEST) From: Stefan Bethke To: Terry Lambert Cc: archie@whistle.com, freebsd-hackers@FreeBSD.ORG Subject: Re: Warning: Change to netatalk's file name handling On Thu, 27 Aug 1998, Terry Lambert wrote: > > > Netatalk seems like the wrong place to modify behavior to solve this > > > problem, which is a display problem, not an encoding problem. > > > > Where is the encoding defined for character values in the ranges between > > \0x01 to \0x1f, and \0x7f to \0xff in terms of UFS, POSIX, whatever? > > ISO 8859? Is this a standardized encoding for POSIX file names, or just a convention? If it only is a convention, what will non-latin script users think about it? How do we discriminate between different 8859 encodings? (Yeah, I see your point about "locales".) > > If you were right, it would be OK for afpd to store all chars literally. > > While this does work, it is definitly awkward to work with in the shell, > > and possibly so together with other applications as Samba as well. Its not > > merely an display issue; its an interoperability issue. I feel that too > > many things expect file names to confine to printable ascii, and unless > > this changes, I opt to fix what in my eyes is an obvious bug in afpd (that > > is, escaping \0x80 to \0xff, but leaving \0x01 to \0x1f and \0x7f > > untouched). > > Per interoperability: This presumes, incorrectly, that Mac's support > the same idiotic idea of code pages as SAMBA must. Macs, in this sense, use a single "code page." I believe there is an escape mechanism to change the encoding to non-latin scripts, but I will have to look that up in Inside Mac. For AFP 2.1 (which netatalk claims to support to the extent the Macs use it), there is a single encoding defined, without any escape mechanism. > > It won't change anything to the worse; the only problem is that existing > > files with file names containing control characters (custom icons on folders > > being the single source of such name probably) will stop working and will > > need manual assistance from an operator. > > It will break a number of things. It already breaks the file name > length limitation in SAMBA. Duplicating this break into Appletalk is, > IMO, a bad idea. I don't know much about SMB/CIFS/Samba. What is the filename length limit (as opposed, possibly, to the pathname limit)? AFP has a filename length limitation to 31 bytes/chars. All Unix-based AFP servers I know of choose to drop files with longer names. Also, at least two commercial products use the same mechanism for escaping non-ASCII chars. > If you are going to push this hard, you should consider Internataional > representation ofile names by client locale, and how it is already > handled. Would you mind to point me to any information shedding light on standardisation efforts for file name representation? In terms of "locale", this would mean that "Mac" or "AFP" would be it's own locale in terms of file name character encoding? After all, I see three possible ways: - improve interoperability by confining to printable ASCII (or ISO-8859-1, or...) and not escaping other glyphs, thus breaking AFP conformance; - escaping all glyphs (or rather their encoding) in a way that preserves the full AFP filename encoding space (for filenames, this is 0x01 to 0xff, with ":" being illegal as it is the path delimiter), but using printable ASCII where possible (this is, I believe, what netatalk tries to do, but doesn't, due to a stupid bug). - translate the AFP filename encoding space into some larger glyph encoding space, such as Unicode, or, more specifically, UTF-8. The last one probably is the way to go, but this would require (at least to me) some testimonial that Unicode in general and UTF-8 in particular is the way to go for file names in FreeBSD. This of course would probably start other interop problems with NFS and alike, and it would require samba to deal with CP bogosities in its own right instead of putting it in the face of every other app. > Novell servers are another case where the server assumes all clients > exist in a given locale; this would be a mistake to buy into... Yep. Cheers, Stefan -- Stefan Bethke Muehlendamm 12 Phone: +49-40-256848, +49-177-3504009 D-22087 Hamburg Hamburg, Germany To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message