Date: Mon, 18 Sep 1995 13:19:02 -0700 (MST) From: Terry Lambert <terry@lambert.org> To: phk@critter.tfs.com (Poul-Henning Kamp) Cc: bde@zeta.org.au, hackers@FreeBSD.ORG, terry@lambert.org Subject: Re: Policy on printf format specifiers? Message-ID: <199509182019.NAA08435@phaeton.artisoft.com> In-Reply-To: <6760.811430864@critter.tfs.com> from "Poul-Henning Kamp" at Sep 18, 95 06:27:44 am
next in thread | previous in thread | raw e-mail | index | archive | help
> > >I'd like to add a format specifier '%S' to the list of format specifiers > > >accepted by printf. Well, kernel printf, anyway. > > > > I don't want wchar_t's in the kernel. > I also fail to see the need for this, and even if I did see the need, I > still think we shouldn't have them in the kernel... Unicode encoding of file names without yanking around the value of MAXPATHLEN or MAXNAMLEN by runic encoding of the file name data. There is no method of determining the runic length required for an unknown file name before it is entered. File name entry will occur in process encoding. By divorcing process and storage encoding, you increase the length of a non-7-bit-ASCII string unpredictably when transforming it into the storage encoding format. How many characters do you let the user enter in the "file name" dialog before you tell them they've entered too many by visual or auditory feedback? If your storage encoding, like Plan9, is UTF-8, then the answer is you can allow them no more than 51 characters for file names, unless you provide a prohibitively expensive (in terms of interactive response time) "check" callback for character entry. Even if you implement such an expensive callback (after all, everyone will be running P6's, right?), you are limiting it such that using one set of glyphs vs. another vary the overall length allowed. That is, if you use ISO-8859-1 characters, and they are all in the range 0x80-0xff, you get a length limit of 127 characters for your file name, whereas if they are in the range 0x00-0x7f, you get the full 255. Characters outside the 0x00-0xff range of 8859-1 (for instance, all of the characters in 8859-2 through 8859-9 not intersecting with 8859-1) take 3-5 8-bit characters to encode, depending on their lexical position. Say "goodbye, fixed field input", say "goodbye, fixed length record storage", say "hello, record oriented file systems", say "hello, user interface rewrite for all internationally sold products". Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199509182019.NAA08435>
