From owner-freebsd-hackers@freebsd.org Tue Apr 27 03:14:41 2021 Return-Path: Delivered-To: freebsd-hackers@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 7A7335FC4A8 for ; Tue, 27 Apr 2021 03:14:41 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic303-23.consmr.mail.gq1.yahoo.com (sonic303-23.consmr.mail.gq1.yahoo.com [98.137.64.204]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4FTn1X00VRz4tmg for ; Tue, 27 Apr 2021 03:14:39 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1619493278; bh=OSFlEEq6gc+l1T2w3wdxIqPQz+atalXafVjhoFraxuv=; h=X-Sonic-MF:Subject:From:Date:To:From:Subject; b=etAHotYbeiM1u2OrWY978NUYcuxc3t9Xncs94szrIrVs59gMZzvSg2tdKpBSavfehrubl3SAv2qVVrZQLbpJjchTovqtmh+3BMkbi7nCTJYtROxnxrCPpfh2mknkvvONXIYMOJuBZ+klmJnw5jCkchKoylB+04uNhyPVwA7GdweB/qFupv6pYiEAEOPD1OqFIYycwquiAO9x+asz15yLA5LiThIRK8opJC2wMzDKfxpVUzbs92zGOOINeYIMPtrpX52HoZWzzqVz2W/LN7GYd7WzYsHKo3qutbFtLsd/L7BvXl8dZF5dLIONkdFQe1gqpjEYICaQzH6fHI75ctmhCA== X-YMail-OSG: 7eaDbf8VM1kN.XuPVM2KXdKQgC2o6ZHyXYw9ZDoDyf7WPzDAjVP5lh0hNfgstoJ qzduQvGcw9adu5knwH4Y90h8nUtzeDzjeILCtTK1uRg2XrbarYYWEvgnh5Eie3bz2or3rZAKbIXx jWw5ufyozMra6LThQKwTUSy2pNY66zSQ43IgyCcMqgWo7W0YFTCSLtJ2Mgjdq_AfKfWWjziLB9sw z59x5TIkjQ59UX4StREk9bKLTEFgoj8i3omEM1RvN4mZXrMMv9F1jTPmjJVyI_.lX5MaaoEpxbU. P3mXDoAjDlje.kC8X8SAqsGLegcCAdBw6ZSo3TPZf3k4etTdRzBre3zV3_oqNXOAnbjx6UuFnH1t yzMYOU75wSuQjtIzizRko_bnkFEauX2f.W8wrC8fvBpr8tv4LgT2EmFZdDwP5n3LKt1JXaxhQqOj yx84_a52fh3jZKRdUYlL5fhs.u3uXNDCjWeFqrNyoGrHQzwRX6ETa3sNIRYyTnQ81jsJNYvOilnv 9wYPwWYcJixh.x3HQN_lITPN_64cnZCqgLiCgUKdmtzkwwqnq_GcrSykZUgIAjvPj.GkDUeXD3Yo SxOlt5GvogguVm29KoxNsnXb7vImh2mceJLLCdH23kYVoowy29z.py9xgNI8f_3Lezpo2IzejxsI TE5lZsoZcRNgPTIgApL677NZr3YGf6Q3DIhv6Mj_26n5rp.PtY_cIBdHo_B.v6rO.Nt9oTMeKk9V DxTKaPpwe6_r0aZfN5bZUl8WyMc8CGXP2G1TbEGCeRfZUPqPn0F0RWunhxG9gFu80Dz9tZo5dsD3 CXdNphmTeZylGDV4XzIPizDboXNcbXfCaBoNshtJVJuZ.7fsKnaszKH182.fPpQyM3Rh9uG3E9yS 2kYi2Fx.zKAcxALlJVPvmXWFjIRV9Zffu7fHakKoLgc9.vbd5BYt7Uj0N0rd8Q3NJsVtakb3MHEH 5.QAhuQpeMCNVPOuORZrbNXjeGq1FY3I4BM7BVu1zt82IS1i0MDwwVQiHm5pT6CYguWOZCB.zZE5 WLEKx.sNb9s9FFUsbURTCPayD4p1TCyiMm238FSzDEhTbcTHEvEeVR2K6pGN.g7oeBzlLgGV0VFm 7VszqyZ6QZSEwzaUeVqdQnrFJYSVgLCfN4NLHdRPYe5Y0LynjHjhL2XxdhRUOF6WrraLgrK4BWvm DF6smxeySQZgzimAnV6i5W8ZnjCma1NsPtSEIcInCvtc1kP8vVh3RNiJioitZd.5cZzYnkNXbiia RpHhmAKlSRgER6iX_utHb3JFWtJifuIvn4VaHpwpcG2wh_p2Wa2b9DZ5ES_0FDW_6y0.D70R4Iun VKV0twXs8o2Z37Ye68wObkfCG7hWArDWtWpFRoyoD5lwbQR.YGvwuN02U2pR.D.JrctopbK7eatA Olp8FgVF_7u2PE5stW1k91WMs2qT2NNFDQTlXN_ZYhjoG0cuij4796pNJRS3qzZl46jrJjJ693cI NhTTiv9ue3fcIj1NQ8kVm1_w8bS27OGIUypR52WmKqMv7aRu1lwPrrxQO74K3Yk33Ca9X__DVQCU GceWebBhnmrFntk7_1P3lWShQXls7F20Q53j4GMOPl2BBcUbabLtXLJPtt6YTYjSnkbDYHHovqg4 2t7FK7wkYVRplSKGCxh9rtcQ_JSeKnUYMzPtdDuyfDgQAtCprJofw4Ct4L7iDXCHlRm3kz4BzqAd ojq7dEdCNzXgoeJuLPEy40VUf7Fd62pNUBza4dc8biSgoO8ghwA4thFSePSPdvkkEKOzBw8eT.NQ EaMrrBOwCA58s9lNrkGuCVER8eZv4h1u4J7GEEbrVDESBmUjK3z2Z885h1eoxuqGuSSr62a.mA_v EiMeNUpy09YQ7HIdRdO1iljAldG6EJT5VJbqsKvr2diYzbU37o2l_3.F.1q5O.d20ySefubGTB4E 6moR29OjoJ_opCJhCjhJ9pHH9APXPz41tVzKBCumgpjdskt4xRBMa0anANc2U8j0gvbD7uCx23yG _IEqn4uedpe1tvQBSl_Q4TOIbmhrHuHX82Hjjau9izzk0cJxFn.ssS..Dzsb.Tmxx8RceSG_Bgap 4V2uHR0AfXUQKfbDong96_tz9GFNS4DnPXI5kEYA6sYFiz3vYXWpXuv64bP7JjNdi6NyPCZnaprn bJ_gdFxUkzsKKN8YiURxTZusxrwNk55VBk4Yx_XpLt5_blYOay4GbFsAG7LrZ8xrCFJ1tg.R1QNe shjIFw.B3rU6Ew5ZZEldYAwc7.hwvVTGvtqZfbywkwNo25Od0UNTfH7E7AFM3b1cuSwyX5MUddb9 gW1hc2zSXKYY8OpwM_tsNArfiGE3vlf7Vqmb1ZuEkFQj8lSPgqBE6EQw5MD4Z6qDIyAdShKRNzzm xtfXCJrRA0l7TqA0W5P5.dyid0XXaNALBYWyyqTCH6E6wGYRPIFD4vRVQCmGD5_BPicnB_DsI__b 9cihdy8EtyuqdrGWL3hrBWrFdCReKpIaM9ZmPbZ6IHWp4B8tfbZU2lSJKoBlalXztbRXvNshiYoA .q1odrj2iOULl7k.JNyfrrBq.itB12D26WxCmNRahikWrmm8o8fFYu1OAfwE3cBJibvVMLERNy_k az.2FMzSO X-Sonic-MF: Received: from sonic.gate.mail.ne1.yahoo.com by sonic303.consmr.mail.gq1.yahoo.com with HTTP; Tue, 27 Apr 2021 03:14:38 +0000 Received: by kubenode559.mail-prod1.omega.gq1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 9c8ef822901342ad466328a017443a2c; Tue, 27 Apr 2021 03:14:33 +0000 (UTC) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.60.0.2.21\)) Subject: Re: Regular expression compilation fail in current From: Mark Millard In-Reply-To: Date: Mon, 26 Apr 2021 20:14:32 -0700 Cc: FreeBSD Hackers Content-Transfer-Encoding: quoted-printable Message-Id: References: To: =?utf-8?Q?Fernando_Apestegu=C3=ADa?= X-Mailer: Apple Mail (2.3654.60.0.2.21) X-Rspamd-Queue-Id: 4FTn1X00VRz4tmg X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.50 / 15.00]; FREEMAIL_FROM(0.00)[yahoo.com]; MV_CASE(0.50)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]; RBL_DBL_DONT_QUERY_IPS(0.00)[98.137.64.204:from]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; SPAMHAUS_ZRD(0.00)[98.137.64.204:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[98.137.64.204:from]; RWL_MAILSPIKE_POSSIBLE(0.00)[98.137.64.204:from]; RCVD_COUNT_TWO(0.00)[2]; MAILMAN_DEST(0.00)[freebsd-hackers] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Technical discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Apr 2021 03:14:41 -0000 On 2021-Apr-26, at 06:31, Fernando Apestegu=C3=ADa wrote: > Hi there, >=20 > I'm working with this port PR > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D255182 >=20 > and the problem seems to boil down to a regular expression that does > not compile on current but it does in 12.2. >=20 > The minimum repro is this one: >=20 > #include > #include >=20 > int > main() > { > regex_t regexp; > int ret =3D regcomp(®exp, "\\s*", REG_EXTENDED | REG_ICASE | > REG_NOSUB); Here is my stab at notes for this . . . It is not all that uncommon for error cases to be initially mistreated but later toolchains to reject instead of mistreating the same. I suspect that is what is going on here. But the details seem to be as follows. Using C++11's raw_characters notation to specify string content, "\\s*" is: R"%(\s*)%" In other words, the content of the string is just: \s* (3 characters, plus a terminating '\0' present). It is this later string contant that the regcomp 2nd parameter points to and that leads to the error report. The "s" is not valid after the backslash for Basic Regular Expressions or for Extended Regular Expressions. ( = https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html = ) REG_EESCAPE is described at: https://pubs.opengroup.org/onlinepubs/9699919799/functions/regcomp.html as: QUOTE REG_EESCAPE Trailing character in pattern. END QUOTE In other words: an extra backslash not paired with anything valid just after it --so it is tailing whatever was before it. If you meant the parameter received to point in memory to: \\s* ( 4 characters, plus a terminating '\0' after it, a.k.a. R"%(\\s*)%" ) you likely want the C-string: "\\\\s*" as the argument, shown below: regcomp(®exp, "\\\\s*", REG_EXTENDED | REG_ICASE | REG_NOSUB) If you meant some other character sequence in memory, I'd have to know what it was to try to back-translate it to C-source that would produce the correct content in the memory pointed to. > if ( ret !=3D 0) { > printf("regexp compilation failed: %d\n", ret); > } >=20 > return 0; > } >=20 > This one works in 12.2 It might not be rejected, but was does it do? And is that conformant with: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html ? > but fails to compile the regexp in FreeBSD > 14.0-CURRENT #11 main-n245984-15221c552b3c with error 5 REG_EESCAPE > `\' applied to unescapable character. >=20 > Any help is appreciated. Note: While I used C++11's notation as one way of indicating string content, no C standard has the notation to my knowledge. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)