Date: Fri, 20 May 2011 03:37:25 -0700 From: Jeremy Chadwick <freebsd@jdc.parodius.com> To: Frank Bonnet <f.bonnet@esiee.fr> Cc: freebsd-apache@freebsd.org Subject: Re: Where to define HTTP_ACCEPT_LANGUAGE=fr-fr ??? Message-ID: <20110520103725.GA19494@icarus.home.lan> In-Reply-To: <4DD63698.3030907@esiee.fr> References: <4DD624E4.5000408@esiee.fr> <20110520092755.GA18041@icarus.home.lan> <4DD63698.3030907@esiee.fr>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, May 20, 2011 at 11:38:32AM +0200, Frank Bonnet wrote: > On 05/20/2011 11:27 AM, Jeremy Chadwick wrote: > >On Fri, May 20, 2011 at 10:23:00AM +0200, Frank Bonnet wrote: > >>How and WHERE to define this variable in apache22 configuration ??? > >>I need the web server to understand French characters in filenames > >I haven't worked with this before, but what does "need the webserver to > >understand French characters in filenames" mean exactly? More details > >are needed, particularly technical ones. How is Apache "not working" > >with French characters in filenames? > > > Apache is working BUT if a filename contains a "french" character > I get a 404 error from apache ( file not found) > > here is such error message > > xxx.xxx.xxx.xxx - - [20/May/2011:10:55:06 +0200] "GET /cv/ESIEE_ENGINEERING/CV_electronique/11_EE_APP_FE_CV_CISSE_Kaliss%C3%A9.docx > HTTP/1.1" 404 1221 > > in fact the file do exists > > -rw-r--r-- 1 www-data www-data 15494 20 mai 03:00 > 11_EE_APP_FE_CV_CISSE_Kaliss?.docx > ^^^^^ > here is the problem This looks like a character set issue of the browser vs. the filename on the server. Specifically: the browser is requesting to download a filename that's in utf-8 (Unicode), while what's on the actual server is a filename encoded in iso-8859-1. I'm also making the assumption the letter which shows up in your Email above is actually the "é" character (latin small letter e with an acute (raising) accent above it). I hope the below examples therefore render correctly for you. Let me explain the two differences: utf-8 ======= - Filename (visually): 11_EE_APP_FE_CV_CISSE_Kalissé.docx - Filename (literally): 11_EE_APP_FE_CV_CISSE_Kaliss<0xc3><0xa9>.docx - Filename (as URL): 11_EE_APP_FE_CV_CISSE_Kaliss%C3%A9.docx iso-8859-1 ============ - Filename (visually): 11_EE_APP_FE_CV_CISSE_Kalissé.docx - Filename (literally): 11_EE_APP_FE_CV_CISSE_Kaliss<0xe9>.docx - Filename (as URL): 11_EE_APP_FE_CV_CISSE_Kaliss%E9.docx URLs, per official RFC 1738, with regards to iso-8859-1, do not permit characters above 0x7f to make it into the URL. So, technically speaking, the URL of: http://somesite/11_EE_APP_FE_CV_CISSE_Kalissé.docx Should fail or not work. Some browsers may try and "be smart" and turn the accented small e character into %E9, which would then become: http://somesite/11_EE_APP_FE_CV_CISSE_Kaliss%E9.docx Which would work just fine. I'm not sure that HTTP_ACCEPT_LANGUAGE would fix this problem. If you have a CGI, PHP script, web software, etc. which is generating filenames and things like that, and is using utf-8 as it's character set (meaning either via an HTTP header or via HTML <meta http-equiv> tag), then that's going to mess things up. You need to be using the iso-8859-1 character set instead. A good browser will be able to show you what character set the page shows up as. What's the alternative? Simple: you start using utf-8 in your filenames. I should note, however, that FreeBSD (including 8.2-STABLE) does not have very good Unicode support. It's hit-or-miss, and using things like LANG/LC_CTYPE result in some serious problems with utilities that rely on locale(7). So, I would be very careful going this route on FreeBSD. The short version is this: if you're going to use utf-8, you need to use it absolutely 100% of the time. You cannot reliably mix-match character sets like that. Hope this helps. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110520103725.GA19494>