Date: Fri, 20 May 2011 13:12:03 +0200 From: Frank Bonnet <f.bonnet@esiee.fr> To: Jeremy Chadwick <freebsd@jdc.parodius.com> Cc: freebsd-apache@freebsd.org Subject: Re: Where to define HTTP_ACCEPT_LANGUAGE=fr-fr ??? Message-ID: <4DD64C83.1070903@esiee.fr> In-Reply-To: <20110520103725.GA19494@icarus.home.lan> References: <4DD624E4.5000408@esiee.fr> <20110520092755.GA18041@icarus.home.lan> <4DD63698.3030907@esiee.fr> <20110520103725.GA19494@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
On 05/20/2011 12:37 PM, Jeremy Chadwick wrote: stuff deleted OK Jeremy, thank you for your complete and good technical answer, I'm gonna check all your recommendation then let you know if is has worked . Thanks again. Frank > here is the problem > This looks like a character set issue of the browser vs. the filename o= n > the server. Specifically: the browser is requesting to download a > filename that's in utf-8 (Unicode), while what's on the actual server i= s > a filename encoded in iso-8859-1. > > I'm also making the assumption the letter which shows up in your Email > above is actually the "=EF=BF=BD" character (latin small letter e with = an > acute (raising) accent above it). I hope the below examples therefore > render correctly for you. > > Let me explain the two differences: > > utf-8 > =3D=3D=3D=3D=3D=3D=3D > - Filename (visually): 11_EE_APP_FE_CV_CISSE_Kaliss=EF=BF=BD.docx > - Filename (literally): 11_EE_APP_FE_CV_CISSE_Kaliss<0xc3><0xa9>.docx > - Filename (as URL): 11_EE_APP_FE_CV_CISSE_Kaliss%C3%A9.docx > > iso-8859-1 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > - Filename (visually): 11_EE_APP_FE_CV_CISSE_Kaliss=EF=BF=BD.docx > - Filename (literally): 11_EE_APP_FE_CV_CISSE_Kaliss<0xe9>.docx > - Filename (as URL): 11_EE_APP_FE_CV_CISSE_Kaliss%E9.docx > > URLs, per official RFC 1738, with regards to iso-8859-1, do not permit > characters above 0x7f to make it into the URL. So, technically > speaking, the URL of: > > http://somesite/11_EE_APP_FE_CV_CISSE_Kaliss=EF=BF=BD.docx > > Should fail or not work. Some browsers may try and "be smart" and turn > the accented small e character into %E9, which would then become: > > http://somesite/11_EE_APP_FE_CV_CISSE_Kaliss%E9.docx > > Which would work just fine. > > I'm not sure that HTTP_ACCEPT_LANGUAGE would fix this problem. > > If you have a CGI, PHP script, web software, etc. which is generating > filenames and things like that, and is using utf-8 as it's character se= t > (meaning either via an HTTP header or via HTML<meta http-equiv> tag), > then that's going to mess things up. You need to be using the > iso-8859-1 character set instead. A good browser will be able to show > you what character set the page shows up as. > > What's the alternative? Simple: you start using utf-8 in your > filenames. I should note, however, that FreeBSD (including 8.2-STABLE) > does not have very good Unicode support. It's hit-or-miss, and using > things like LANG/LC_CTYPE result in some serious problems with utilitie= s > that rely on locale(7). So, I would be very careful going this route o= n > FreeBSD. > > The short version is this: if you're going to use utf-8, you need to us= e > it absolutely 100% of the time. You cannot reliably mix-match characte= r > sets like that. > > Hope this helps. >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4DD64C83.1070903>