From owner-freebsd-questions@freebsd.org Sun Nov 3 02:38:31 2019 Return-Path: Delivered-To: freebsd-questions@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 103CE1A23E1 for ; Sun, 3 Nov 2019 02:38:31 +0000 (UTC) (envelope-from per@hedeland.org) Received: from mailout.easydns.com (mailout.easydns.com [64.68.202.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 475KqV35wqz4WpM for ; Sun, 3 Nov 2019 02:38:30 +0000 (UTC) (envelope-from per@hedeland.org) Received: from localhost (localhost [127.0.0.1]) by mailout.easydns.com (Postfix) with ESMTP id 7526AC0F41; Sun, 3 Nov 2019 02:38:29 +0000 (UTC) Received: from mailout.easydns.com ([127.0.0.1]) by localhost (emo12-pco.easydns.vpn [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oGxl7rNj6y_a; Sun, 3 Nov 2019 02:38:29 +0000 (UTC) Received: from hedeland.org (81-228-157-209-no289.tbcn.telia.com [81.228.157.209]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mailout.easydns.com (Postfix) with ESMTPSA id CD626C0B21; Sun, 3 Nov 2019 02:38:27 +0000 (UTC) Received: from pluto.hedeland.org (pluto.hedeland.org [10.1.1.5]) by tellus.hedeland.org (8.15.2/8.15.2) with ESMTPS id xA32cPdY021302 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Sun, 3 Nov 2019 03:38:25 +0100 (CET) (envelope-from per@hedeland.org) Subject: Re: sort is broken To: "Ronald F. Guilmette" Cc: freebsd-questions@freebsd.org References: <8847.1572745058@segfault.tristatelogic.com> From: Per Hedeland Message-ID: Date: Sun, 3 Nov 2019 03:38:25 +0100 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: <8847.1572745058@segfault.tristatelogic.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 475KqV35wqz4WpM X-Spamd-Bar: + Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=none (mx1.freebsd.org: domain of per@hedeland.org has no SPF policy when checking 64.68.202.10) smtp.mailfrom=per@hedeland.org X-Spamd-Result: default: False [1.36 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCVD_COUNT_FIVE(0.00)[5]; RWL_MAILSPIKE_POSSIBLE(0.00)[10.202.68.64.rep.mailspike.net : 127.0.0.17]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; NEURAL_HAM_MEDIUM(-0.31)[-0.311,0]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; DMARC_NA(0.00)[hedeland.org]; AUTH_NA(1.00)[]; RECEIVED_SPAMHAUS_PBL(0.00)[209.157.228.81.khpj7ygk5idzvmvt5x4ziurxhy.zen.dq.spamhaus.net : 127.0.0.11]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_SPAM_LONG(0.27)[0.271,0]; R_SPF_NA(0.00)[]; RCVD_IN_DNSWL_LOW(-0.10)[10.202.68.64.list.dnswl.org : 127.0.5.1]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:16686, ipnet:64.68.200.0/22, country:CA]; MID_RHS_MATCH_FROM(0.00)[]; IP_SCORE(0.60)[ip: (1.13), ipnet: 64.68.200.0/22(0.17), asn: 16686(1.80), country: CA(-0.09)]; FROM_EQ_ENVFROM(0.00)[] X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Nov 2019 02:38:31 -0000 On 2019-11-03 02:37, Ronald F. Guilmette wrote: > In message <20191102233528.CFE66E4728E@ary.local>, you wrote: > >> In article <7668.1572729288@segfault.tristatelogic.com> you write: >>> Not a question, just an expression of grief and deep dismay. >>> >>> It is a sad day when even very fundamental tools, used in billions >>> of scripts, such as /usr/bin/sort turn up broken. >>> >>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=241679 >> >> I tried it on 11.3 and 12.0 and it works fine. >> >> What's in your environment, particularly what's LC_ALL set to? > > In my env, LC_ALL is not set at all. > > I do have these, but not sure if they make any difference: > > LANG=en_US.UTF-8 This, in combination with trying to sort a file with contents that *isn't* valid UTF-8, is the reason for the behavior you observe - see my previous post. The specification of how LANG and the LC_* variables (should) interact can be found at https://pubs.opengroup.org/onlinepubs/7908799/xbd/envvar.html - I believe setting only LANG is the "normal" way to specify a locale. If you convert your file to UTF-8, e.g. using the strange behavior of 'sort': $ sort test > test.utf8 - or more "properly" (assuming you have the libiconv package installed): $ iconv -f ISO-8859-1 -t UTF-8 test > test.utf8 - you will find that the test.utf8 file is handled correctly by 'sort', both as filename argument and as stdin. > XTERM_LOCALE=en_US.UTF-8 This - which is actually set by xterm based on how it was started - implies that your xterm will decode UTF-8 and display the "real" character. --Per Hedeland