From owner-freebsd-python@FreeBSD.ORG Thu Jan 29 18:55:44 2015 Return-Path: Delivered-To: freebsd-python@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 927842D2 for ; Thu, 29 Jan 2015 18:55:44 +0000 (UTC) Received: from lb3-smtp-cloud6.xs4all.net (lb3-smtp-cloud6.xs4all.net [194.109.24.31]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (Client CN "Bizanga Labs SMTP Client Certificate", Issuer "Bizanga Labs CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 169947AA for ; Thu, 29 Jan 2015 18:55:43 +0000 (UTC) Received: from slackbox.erewhon.home ([83.162.243.5]) by smtp-cloud6.xs4all.net with ESMTP id luvZ1p00P07iGuj01uvauS; Thu, 29 Jan 2015 19:55:34 +0100 Received: by slackbox.erewhon.home (Postfix, from userid 1001) id C6C1612426; Thu, 29 Jan 2015 19:55:33 +0100 (CET) Date: Thu, 29 Jan 2015 19:55:33 +0100 From: Roland Smith To: Robert Simmons Subject: Re: Unicode Problem Message-ID: <20150129185533.GA38445@slackbox.erewhon.home> Mail-Followup-To: Robert Simmons , freebsd-python@freebsd.org References: <20150129072908.GA37127@slackbox.erewhon.home> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="KsGdsel6WgEHnImy" Content-Disposition: inline In-Reply-To: X-GPG-Fingerprint: 1A2B 477F 9970 BA3C 2914 B7CE 1277 EFB0 C321 A725 X-GPG-Key: http://www.xs4all.nl/~rsmith/pubkey.txt X-GPG-Notice: If this message is not signed, don't assume I sent it! User-Agent: Mutt/1.5.23 (2014-03-12) Cc: freebsd-python@freebsd.org X-BeenThere: freebsd-python@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: FreeBSD-specific Python issues List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jan 2015 18:55:44 -0000 --KsGdsel6WgEHnImy Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Jan 29, 2015 at 02:42:31AM -0500, Robert Simmons wrote: > On Thu, Jan 29, 2015 at 2:29 AM, Roland Smith wrote: > > On Thu, Jan 29, 2015 at 01:38:21AM -0500, Robert Simmons wrote: > >> I'm having a unicode problem on FreeBSD lang/python34 that does not > >> appear on MacOS X. I've condensed the problem to one single line to > >> enter in the interpreter: > >> > >> FreeBSD: > >> Python 3.4.2 (default, Jan 28 2015, 22:23:57) > >> [GCC 4.2.1 Compatible FreeBSD Clang 3.4.1 (tags/RELEASE_34/dot1-final > >> 208032)] on freebsd10 > >> Type "help", "copyright", "credits" or "license" for more information. > >> >>> b'\xc3\xa2'.decode('utf-8') > >> '\xe2' > >> > >> MacOS X: > >> Python 3.4.2 (default, Oct 19 2014, 17:55:38) > >> [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.54)] on darwin > >> Type "help", "copyright", "credits" or "license" for more information. > >> >>> b'\xc3\xa2'.decode('utf-8') > >> '=C3=A2' > >> > >> Why is Python on FreeBSD incorrectly decoding this? > > > > Works fine here (FreeBSD 10.1-STABLE #0 r276653 amd64): > > > > Python 3.4.2 (default, Nov 4 2014, 19:34:48) > > [GCC 4.2.1 Compatible FreeBSD Clang 3.4.1 (tags/RELEASE_34/dot1-fin= al 208032)] on freebsd10 > > Type "help", "copyright", "credits" or "license" for more informati= on. > > >>> b'\xc3\xa2'.decode('utf-8') > > '=C3=A2' (please don't top-post) > What is the output from print(sys.stdout.encoding) on your system? Python 3.4.2 (default, Nov 4 2014, 19:34:48)=20 [GCC 4.2.1 Compatible FreeBSD Clang 3.4.1 (tags/RELEASE_34/dot1-final 2= 08032)] on freebsd10 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> print(sys.stdout.encoding) UTF-8 > And, can you explain how to change that on mine so that it is UTF-8? > Mine is a default fresh install, btw. In /etc/login.conf, I set LC_ALL=3Den_US.UTF-8; default:\ :passwd_format=3Dsha512:\ :copyright=3D/etc/COPYRIGHT:\ :welcome=3D/etc/motd:\ :setenv=3DMAIL=3D/var/mail/$,BLOCKSIZE=3DK,LC_ALL=3Den_US.UTF-8= :\ :path=3D/sbin /bin /usr/sbin /usr/bin /usr/games /usr/local/sbi= n /usr/local/bin And I use a unicode aware X terminal (rxvt-unicode). In case you're not using X11, the new vt(4) device uses UTF-8, but the old sc(4) doesn't support it at all, AFAIK. Roland --=20 R.F.Smith http://rsmith.home.xs4all.nl/ [plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated] pgp: 5753 3324 1661 B0FE 8D93 FCED 40F6 D5DC A38A 33E0 (keyID: A38A33E0) --KsGdsel6WgEHnImy Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABAgAGBQJUyoIlAAoJEED21dyjijPgeKkQALSYLDcpUFyzf2xMIDAMtpoN 2U7sZHEf+Afj/nMbOoj90fk2IjirziXB8LRUdnJgPHrvARecqsY/bi0rgrdv5jjt W0/fYjuu17qKgSEGpijA9tqLfDyi1wAY91SmJlOhFiogJThiEZ/lFZRHRR4uhGo6 IcO2chbTO8ppV3Ch+mj7tC4MJofeUsdJaDDyhp8KIsakR/F5SEQ4kilwxAPMqllP f2UrwJXLnSYvu8E4Ap0sNBz5k3K8DspNjK1HjzDa3twTmAwCJGdxVHjVlPQFEiqr 8Ps+oZX/3M5WMcP6yOWXun/fh3zDDWHTtC81B8oh9HVFCyqJTv/Q1nbANQCkgmA9 lpLhRrsVzWX87V0J8i8Hzf7jROVlT2PBuCxunWEXwvLyDU5ySterLz7jVFibwg0k bLuVCfwXmgYyGLzV5iu2ldZxAPQUEv07Ef3vH3xuIDyWVS+TnFomdWgpAhdA8vio 9ZMqYD8cq4N6MRsVpDtyDw3EOONTh7sN2+8u0ztMPZfJf4cPKkDj+szC3WJexZT1 OCwFwS7pquG5wMzMIjs0F/0+logOqnWJCWFkeO+5NE3rDGpG08dTngtFMuTR6A6t MDVTRtdyZ5f4M73UVRrh5LBmbypowC3aL7bqhsMfVutBv9VRj3bc7PSsR968i9Me pRtmYwFy7QjzglUENTeC =YSs+ -----END PGP SIGNATURE----- --KsGdsel6WgEHnImy--