From owner-freebsd-apache@FreeBSD.ORG Fri May 20 10:50:37 2011 Return-Path: Delivered-To: freebsd-apache@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8FDD21065673 for ; Fri, 20 May 2011 10:50:37 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta15.emeryville.ca.mail.comcast.net (qmta15.emeryville.ca.mail.comcast.net [76.96.27.228]) by mx1.freebsd.org (Postfix) with ESMTP id 77C608FC12 for ; Fri, 20 May 2011 10:50:37 +0000 (UTC) Received: from omta24.emeryville.ca.mail.comcast.net ([76.96.30.92]) by qmta15.emeryville.ca.mail.comcast.net with comcast id lmYg1g0021zF43QAFmdTgo; Fri, 20 May 2011 10:37:27 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta24.emeryville.ca.mail.comcast.net with comcast id lmdR1g00R1t3BNj8kmdRSy; Fri, 20 May 2011 10:37:26 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 80A06102C19; Fri, 20 May 2011 03:37:25 -0700 (PDT) Date: Fri, 20 May 2011 03:37:25 -0700 From: Jeremy Chadwick To: Frank Bonnet Message-ID: <20110520103725.GA19494@icarus.home.lan> References: <4DD624E4.5000408@esiee.fr> <20110520092755.GA18041@icarus.home.lan> <4DD63698.3030907@esiee.fr> MIME-Version: 1.0 Content-Type: text/plain; charset=unknown-8bit Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <4DD63698.3030907@esiee.fr> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-apache@freebsd.org Subject: Re: Where to define HTTP_ACCEPT_LANGUAGE=fr-fr ??? X-BeenThere: freebsd-apache@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Support of apache-related ports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 May 2011 10:50:37 -0000 On Fri, May 20, 2011 at 11:38:32AM +0200, Frank Bonnet wrote: > On 05/20/2011 11:27 AM, Jeremy Chadwick wrote: > >On Fri, May 20, 2011 at 10:23:00AM +0200, Frank Bonnet wrote: > >>How and WHERE to define this variable in apache22 configuration ??? > >>I need the web server to understand French characters in filenames > >I haven't worked with this before, but what does "need the webserver to > >understand French characters in filenames" mean exactly? More details > >are needed, particularly technical ones. How is Apache "not working" > >with French characters in filenames? > > > Apache is working BUT if a filename contains a "french" character > I get a 404 error from apache ( file not found) > > here is such error message > > xxx.xxx.xxx.xxx - - [20/May/2011:10:55:06 +0200] "GET /cv/ESIEE_ENGINEERING/CV_electronique/11_EE_APP_FE_CV_CISSE_Kaliss%C3%A9.docx > HTTP/1.1" 404 1221 > > in fact the file do exists > > -rw-r--r-- 1 www-data www-data 15494 20 mai 03:00 > 11_EE_APP_FE_CV_CISSE_Kaliss?.docx > ^^^^^ > here is the problem This looks like a character set issue of the browser vs. the filename on the server. Specifically: the browser is requesting to download a filename that's in utf-8 (Unicode), while what's on the actual server is a filename encoded in iso-8859-1. I'm also making the assumption the letter which shows up in your Email above is actually the "é" character (latin small letter e with an acute (raising) accent above it). I hope the below examples therefore render correctly for you. Let me explain the two differences: utf-8 ======= - Filename (visually): 11_EE_APP_FE_CV_CISSE_Kalissé.docx - Filename (literally): 11_EE_APP_FE_CV_CISSE_Kaliss<0xc3><0xa9>.docx - Filename (as URL): 11_EE_APP_FE_CV_CISSE_Kaliss%C3%A9.docx iso-8859-1 ============ - Filename (visually): 11_EE_APP_FE_CV_CISSE_Kalissé.docx - Filename (literally): 11_EE_APP_FE_CV_CISSE_Kaliss<0xe9>.docx - Filename (as URL): 11_EE_APP_FE_CV_CISSE_Kaliss%E9.docx URLs, per official RFC 1738, with regards to iso-8859-1, do not permit characters above 0x7f to make it into the URL. So, technically speaking, the URL of: http://somesite/11_EE_APP_FE_CV_CISSE_Kalissé.docx Should fail or not work. Some browsers may try and "be smart" and turn the accented small e character into %E9, which would then become: http://somesite/11_EE_APP_FE_CV_CISSE_Kaliss%E9.docx Which would work just fine. I'm not sure that HTTP_ACCEPT_LANGUAGE would fix this problem. If you have a CGI, PHP script, web software, etc. which is generating filenames and things like that, and is using utf-8 as it's character set (meaning either via an HTTP header or via HTML tag), then that's going to mess things up. You need to be using the iso-8859-1 character set instead. A good browser will be able to show you what character set the page shows up as. What's the alternative? Simple: you start using utf-8 in your filenames. I should note, however, that FreeBSD (including 8.2-STABLE) does not have very good Unicode support. It's hit-or-miss, and using things like LANG/LC_CTYPE result in some serious problems with utilities that rely on locale(7). So, I would be very careful going this route on FreeBSD. The short version is this: if you're going to use utf-8, you need to use it absolutely 100% of the time. You cannot reliably mix-match character sets like that. Hope this helps. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB |