From owner-svn-src-all@freebsd.org Sun Jun 7 15:41:12 2020 Return-Path: Delivered-To: svn-src-all@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id EB2C932F887 for ; Sun, 7 Jun 2020 15:41:12 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-qk1-x72d.google.com (mail-qk1-x72d.google.com [IPv6:2607:f8b0:4864:20::72d]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 49g0wQ6h4cz43LP for ; Sun, 7 Jun 2020 15:41:10 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-qk1-x72d.google.com with SMTP id l17so1094312qki.9 for ; Sun, 07 Jun 2020 08:41:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=tGY+jttYWOSwzRRGXEYuXQJrE52/XT/4561RK0pffpk=; b=tv7HstEu/xZmaRV5iOjGm8TB4hoeEt5IlwXzW405C5gIwz7b+fuepTrQ4740oQpja+ lAte0c6Dchl5NBbwJDqVpW2AdjqyLORVmaRUrqat+t3y5GKNRCBF46P1tBbtwo/i0C3d D7RN4wRUNQlwRqnVfZ1l/txDbarxvzuR8ItkLxsOskCoa/cjNIuJnls6efT7LpYofcge x2c/rbrq4sk7UvxuzGHn8zTM4bq9efNAPA0xGWFXdJVrwQ/Soq0wzfLM0mlb6IsXW81f pitIxrN5y8bJUNaeqvByLsJH9cZ2EnI8nf0ef+BVX71jzTEyBxWpuHmwoPFY94TnIFZJ 15Pw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=tGY+jttYWOSwzRRGXEYuXQJrE52/XT/4561RK0pffpk=; b=UxyyT/SVA1RFsCRGaYrTSjKI3Nkrv8q65E04AvXSMFWiQFzUA1duf8fNe1E5q4kWoL adkBaoaCdEAfubUli2FCIR1sj+i3F6YKHVSVcotYdMJJLm98rFwzunjdK03hoG4m+k+0 dUVoPUasa4AcktVPlZwFxji3uB8Y5y56Byif0SC7LMju5fY5qa3sIv+ZtKMvOeb/G6eZ tuNPqGZxI0TbtpGo4l3SYFYVgbCbRX1MrXlAVuz5e8hPkujA4NWi6zy3pl+ij/BWmIpI HHJdd+zzzFjfxS/+5ombyQg0ROU4XdjhLx1cbaeiDaELd8v2T966mU389gRp/4khPPGa C07w== X-Gm-Message-State: AOAM531zuHY9vy+OX9Kr6oktmDPfPK34DpmxzaAJwYcADcfPl5yD2irN 86GdgCCvbVsPrcxKbVRGEgWi1MwG1Mm4h1MC3TfXcw== X-Google-Smtp-Source: ABdhPJz5hBl3+vb71hpN+jUvxjef90R5epBiEwG488W25ezpcYC2frUJB4+j4J85lZP/Le0gBNYG8H5Gbiz1ZnYwczw= X-Received: by 2002:a05:620a:2050:: with SMTP id d16mr2262721qka.215.1591544469647; Sun, 07 Jun 2020 08:41:09 -0700 (PDT) MIME-Version: 1.0 References: <202006070432.0574Wc1L063319@repo.freebsd.org> <202006071331.057DV4Vo040383@gndrsh.dnsmgr.net> In-Reply-To: From: Warner Losh Date: Sun, 7 Jun 2020 09:40:57 -0600 Message-ID: Subject: Re: svn commit: r361884 - in head/usr.bin/sed: . tests To: Kyle Evans Cc: "Rodney W. Grimes" , src-committers , svn-src-all , svn-src-head X-Rspamd-Queue-Id: 49g0wQ6h4cz43LP X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=pass header.d=bsdimp-com.20150623.gappssmtp.com header.s=20150623 header.b=tv7HstEu; dmarc=none; spf=none (mx1.freebsd.org: domain of wlosh@bsdimp.com has no SPF policy when checking 2607:f8b0:4864:20::72d) smtp.mailfrom=wlosh@bsdimp.com X-Spamd-Result: default: False [-1.98 / 15.00]; RCVD_TLS_ALL(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[bsdimp-com.20150623.gappssmtp.com:s=20150623]; NEURAL_HAM_MEDIUM(-0.99)[-0.992]; FROM_HAS_DN(0.00)[]; NEURAL_SPAM_SHORT(0.00)[0.003]; NEURAL_HAM_LONG(-0.99)[-0.986]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[svn-src-all@freebsd.org]; DMARC_NA(0.00)[bsdimp.com]; RCPT_COUNT_FIVE(0.00)[5]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[bsdimp-com.20150623.gappssmtp.com:+]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::72d:from]; R_SPF_NA(0.00)[no SPF record]; FORGED_SENDER(0.30)[imp@bsdimp.com,wlosh@bsdimp.com]; MIME_TRACE(0.00)[0:+,1:+,2:~]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; RCVD_COUNT_TWO(0.00)[2]; FROM_NEQ_ENVFROM(0.00)[imp@bsdimp.com,wlosh@bsdimp.com] Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.33 X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Jun 2020 15:41:13 -0000 On Sun, Jun 7, 2020, 8:04 AM Kyle Evans wrote: > On Sun, Jun 7, 2020 at 8:31 AM Rodney W. Grimes > wrote: > > > > > Author: kevans > > > Date: Sun Jun 7 04:32:38 2020 > > > New Revision: 361884 > > > URL: https://svnweb.freebsd.org/changeset/base/361884 > > > > > > Log: > > > sed: attempt to learn about hex escapes (e.g. \x27) > > > > > > Somewhat predictably, software often wants to use \x27/\x24 among > others so > > > that they can decline worrying about ugly escaping, if said escaping > is even > > > possible. Right now, this software is using these and getting the > wrong > > > results, as we'll interpret those as x27 and x24 respectively. Some > examples > > > of this, when an exp-run was ran, were science/octopus and misc/vifm. > > > > > > Go ahead and process these at all times. We allow either one or two > digits, > > > and the tests account for both. If extra digits are specified, e.g. > \x2727, > > > then the third and fourth digits are interpreted literally as one > might > > > expect. > > > > Does it work to do \\x27, ie I want it to NOT do \x27 so I can sed > > on files that contain sequences of escapes. > > I'm so glad you asked this. :-) For your immediate answer: yes, the > semantics there work as you expect. > > For the long answer, that's actually what you should have been doing > all along; raising awareness of that fact is what PR 229925 aims to > do, by switching our interpretation of the UB for escaping ordinary > characters to make them an error if it's not specially interpreted. > > Prior to this change, if you had: > > printf "\\\\x27\n" | sed -e 's/\x27//' > > What you end up with is actually *not* an empty string with a newline, > but just a single backslash! \x27 in the replacement pattern gets > passed through to the underlying regex(3) implementation, which then > happily interprets \x => x and replaces the literal 'x27', leaving \ > -- which is perhaps not what you might have expected if \x27 didn't > have special meaning and it almost certainly isn't what you wanted. > With the new sed, you can change 'x27' to 'b27' in both strings above > to see what I mean. > > In the New World Order, all regex(3) users will be forced to be > precise here so that we don't get it wrong. This is especially > important when I add GNU extensions to libregex, because some of those > escaped-ordinaries will now be granted special meaning, so \s will no > longer match a literal s but instead [[:space:]]; using the > unadulterated libc regex(3) interface instead will give you an error > and allow you to detect whether you're accidentally using libc > regex(3) rather than the GNU-extended libregex. > > This is going to be a large and potentially world-breaking change for > many, but I think we'll all be better for it in the end. The symbol > version of regcomp will get bumped, so that older binaries will > continue to operate with the old escaping behavior in case that was > actually pertinent to their functionality. > Thanks for taking this on. We are actually stuck between two POLAs here: existing behavior and what users of other systems expect on FreeBSD. Given how edge-Casey the breakage will be, I'm glad you've decided to try full new semantics. I've had *LOTS* of code I've downloaded that I had to hack sed to be gsed for exactly this reason. I think it is one area we've failed to keep up. It's an area where the anti linux bias of the project's early days is hurting us now. Thanks for seeing how feasible this is and retiring this technical debt. Warner Thanks, > > Kyle Evans >