From owner-freebsd-questions@freebsd.org Sun Jun 26 14:41:52 2016 Return-Path: Delivered-To: freebsd-questions@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id ADA08B819AC for ; Sun, 26 Jun 2016 14:41:52 +0000 (UTC) (envelope-from eirnym@gmail.com) Received: from mail-lf0-x231.google.com (mail-lf0-x231.google.com [IPv6:2a00:1450:4010:c07::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id DFA742847 for ; Sun, 26 Jun 2016 14:41:51 +0000 (UTC) (envelope-from eirnym@gmail.com) Received: by mail-lf0-x231.google.com with SMTP id f6so141539986lfg.0 for ; Sun, 26 Jun 2016 07:41:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:mime-version:from:in-reply-to:date:cc:message-id:references :to; bh=CxmUUCCT68Jm5eugu9zYsS3syUDCWHjYcmPjlJ1bass=; b=jD1x14sOTUF8Gq8v7posi1sZTj81d6COYI+VkGq4rnOso5qHLq0hKBSdo/mc/6kigl XQnGL00D72mCsynMqqAyn/mar4Lo33ti6qDT+GymBWwtFMy650tTlciVaFhGn8peEb3e T85NV248uuZQM+9PhRvxJdygaQnGh2K9rFMlIPCk2hXJJHjvEPR4aIFxMD4dK+ecWC+M L1nTCuE69I0Qwyv+XzdfnvmV8KowEsvh30cmjU9VFOhb7h/CDCcDX7Lb5GUD4sZSUof3 FDIkKCjh5iNy53vqs8T5eboGoBmgaTn1Sj9wKjERYBlEyUlA6tzanfHkZu7mD7pXf/XQ hXXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:mime-version:from:in-reply-to:date:cc :message-id:references:to; bh=CxmUUCCT68Jm5eugu9zYsS3syUDCWHjYcmPjlJ1bass=; b=hejjfq4FJlkCZqMz5Q0qoYqbyh0nWaClEHeZSaTWoJlrmy9qxU/j/EOtZ4zY/ddZMl nTSrBM/nQGzouffByKyKOAF11H+nQVFE7ijraGVEzoCb0HlCF1AGZIcUtoza9qlcbFwG 8wssWIDnbvPSpsyxxu6qVDYIXxyA2die3/iBm+pQkjhM44F5BoRWocpupRdXR4u0j7cK LXmzurc8LWJeRsBYJ1BUWvjKEWpATWZ9NqhxqIs4m/1JtZUCChAlwpbWgO1ISTc27L2s XB5Ryh8HoOEyuTycAPunlP+XLusM3Geobe2zTSKs7nqGxMEoyXBzwRKLmk9TJBEeb5QO GCmg== X-Gm-Message-State: ALyK8tLaGFolnw3nY++ZWqvgLUw3bo0pMiMbd/Xky8RHx/ZVzEqyEnDgBXo3xYfiIDoXRQ== X-Received: by 10.25.39.78 with SMTP id n75mr4247563lfn.91.1466952110166; Sun, 26 Jun 2016 07:41:50 -0700 (PDT) Received: from [192.168.1.3] (gate.imsmultimedia.eu. [91.238.76.1]) by smtp.gmail.com with ESMTPSA id f41sm2488897lji.19.2016.06.26.07.41.48 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sun, 26 Jun 2016 07:41:49 -0700 (PDT) Subject: Re: grep and anchoring Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) From: Eir Nym In-Reply-To: <20160626163411.d05f863e.freebsd@edvax.de> Date: Sun, 26 Jun 2016 16:41:47 +0200 Cc: =?utf-8?Q?Dani=C3=ABl_de_Kok?= , freebsd-questions@freebsd.org Message-Id: References: <20232C89-B821-41EC-9188-C2A19C679BD8@danieldk.eu> <20160626163411.d05f863e.freebsd@edvax.de> To: Polytropon X-Mailer: Apple Mail (2.3124) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Jun 2016 14:41:52 -0000 > On 26 Jun 2016, at 16:34, Polytropon wrote: >=20 > On Sun, 26 Jun 2016 15:10:57 +0200, Dani=C3=ABl de Kok wrote: >> Dear all, >>=20 >> After a BSD hiatus of many years, I am tinkering with FreeBSD again. >> I=E2=80=99ve run into some strange issue with grep and beginning of = line (^) >> anchoring: >>=20 >> =E2=80=94 >> % echo "1234 1234 1234" | egrep -o '^=E2=80=A6.' >> 1234 >> 123 >> 4 12 >> % echo "123412341234" | egrep -o '^....' >> 1234 >> 1234 >> 1234 >> =E2=80=94 >>=20 >> Any idea what is going on here? >=20 > I think what you see here is a typical "UTF-8 fsck-up". > The first search pattern contains a an ellipsis ("=E2=80=A6", > 2 bytes long, representing 3 characters), and a single > dot (".", one byte long, 1 character); the second pattern > contains four dots (4 x ".", 1 byte long, 1 character). > Of course grep interprets "=E2=80=A6" and "..." differently. > In my mailer, I can see the difference clearly as the > ellipsis =E2=80=A6 is displayed in monospace font as a _one_ > character wide symbol on the screen. >=20 I think this was automatic spell correction and he mentioned 4 dot = symbols (.), not a =E2=80=98=E2=80=A6' and =E2=80=98.=E2=80=99 > Or is this just an "enrichment" your MUA added? :-) >=20 > I'm quite sure you run into similar problems when you > include ligatures (like st, ft, ffi, ck or the like) > or one of the many different hyphend and spaces in a > search pattern. :-) >=20 > Otherwise, your example seems to show the expected > behaviour. >=20 > % echo "1234 1234 1234" | egrep -o '^....' > 1234 > 123 > 4 12 >=20 > % echo "123412341234" | egrep -o '^....' > 1234 > 1234 > 1234 >=20 > First 4-character pattern is "1234", next is " 123", > and last is "4 12" (each 4 characters wide, as the > space character " " is also "any character" that matches > the . pattern). In the second example, the groups match > 4 characters each ("1234" x 3). >=20 > What different results did you expect? Or am I misinterpreting > your question? >=20 >=20 > --=20 > Polytropon > Magdeburg, Germany > Happy FreeBSD user since 4.0 > Andra moi ennepe, Mousa, ... > _______________________________________________ > freebsd-questions@freebsd.org = mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-questions = > To unsubscribe, send any mail to = "freebsd-questions-unsubscribe@freebsd.org = "