From owner-svn-src-all@freebsd.org Mon Jun 6 13:43:27 2016 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 38135B6B706 for ; Mon, 6 Jun 2016 13:43:27 +0000 (UTC) (envelope-from pfg@FreeBSD.org) Received: from nm9-vm0.bullet.mail.bf1.yahoo.com (nm9-vm0.bullet.mail.bf1.yahoo.com [98.139.213.154]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id EB1D61292 for ; Mon, 6 Jun 2016 13:43:26 +0000 (UTC) (envelope-from pfg@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1465220605; bh=aznCmVeQYA522gz8SZTRhk3WysQKsxPEbWPz8753bso=; h=Subject:To:References:From:Date:In-Reply-To:From:Subject; b=j5UFZ1BVvanEwnm4ttws7yskVtbliZ0CibQOO3PXZX6SBsetHETgL+5TNrLXWlnaZ+cYbvDwWsDCs3CFygdY1J2PqmK7YyYMMG0TUPe+2FcG+VC/JEv3nbSyT6OlWT9YZ5Wyr9JmXGGBYyKwVvCTOpNXTimIwwZO7pPnlB7tyVhOqpni2iRUeZ7DaPYPpazSdchhZj4EUfIjfms2jVCDtxwYjnTOAFbagm3/gMlK53G0lIHsqzlsJccZgZ2oj1V6pOtjG7BhMLaaS7XIEHKuGSgEc5TFCdrANwNG5l/Wa3HcEkyYRGpAIQiLwMYTRdHxmrfV4dQvStKy2IZlRdcujA== Received: from [66.196.81.170] by nm9.bullet.mail.bf1.yahoo.com with NNFMP; 06 Jun 2016 13:43:25 -0000 Received: from [98.139.213.15] by tm16.bullet.mail.bf1.yahoo.com with NNFMP; 06 Jun 2016 13:43:25 -0000 Received: from [127.0.0.1] by smtp115.mail.bf1.yahoo.com with NNFMP; 06 Jun 2016 13:43:25 -0000 X-Yahoo-Newman-Id: 130390.503.bm@smtp115.mail.bf1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: 3hbHIqoVM1lRsPFxqNT00pbL5BBvvAL9cYmvZwk9MVfmAu6 0uKhQ8jcUjpCIACUGnwBsPF4PZDAhOzv.YcuSHTScbYtysVQFgzDUJMYw.UM h1uSsMCu1uiBlnmPYTIIWNqtxOrSlRWH74.YoGr3SvGe18qwQPwVvwEGt5z_ 468kwSGXh2OEmJfojHTL_tndQz4wCXwpOJlpHsozEU5vIMQJSH2i_IfgXLSr YkoumzNAmyrtrn65tVPnEC1hDPXboyNXeUXolEE4nKwYkCO_eVLejiPZL__R aEiFa63Y78O0K_JvOLoG.cglZ41zq9F4siZ.a1jIcFMNGmcCH5PM.0niUpba ebOqGzRPsUmIvx85sWbxF4PJ0TLIcP16UVaNsMstPS49FwzmYnqUOKHTL11v GfWOsd9ywlwa2wVB6xoE5_wpe7Ylg6SAV8guSvZLXufRCGhPnC2WBMcyF_C4 O0hPGGlBlwDEoL.VfNsRgdcNDJ4B8dYECbsfbodKQtDhO4j.B6BxeNPyisqp TYkuELnsCC7JD2vkNv5ZWsduyf0sCDr5s X-Yahoo-SMTP: xcjD0guswBAZaPPIbxpWwLcp9Unf Subject: Re: svn commit: r301461 - in head/lib/libc: gen locale regex To: Andrey Chernov , src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org References: <201606051912.u55JCqdR036458@repo.freebsd.org> <40c481fe-5585-45d2-d4e3-b9988a8198f3@freebsd.org> From: Pedro Giffuni Message-ID: Date: Mon, 6 Jun 2016 08:43:12 -0500 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.1.0 MIME-Version: 1.0 In-Reply-To: <40c481fe-5585-45d2-d4e3-b9988a8198f3@freebsd.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jun 2016 13:43:27 -0000 On 06/05/16 14:49, Andrey Chernov wrote: > On 05.06.2016 22:12, Pedro F. Giffuni wrote: >> --- head/lib/libc/regex/regcomp.c Sun Jun 5 18:16:33 2016 (r301460) >> +++ head/lib/libc/regex/regcomp.c Sun Jun 5 19:12:52 2016 (r301461) >> @@ -821,10 +821,10 @@ p_b_term(struct parse *p, cset *cs) >> (void)REQUIRE((uch)start <= (uch)finish, REG_ERANGE); >> CHaddrange(p, cs, start, finish); >> } else { >> - (void)REQUIRE(__collate_range_cmp(table, start, finish) <= 0, REG_ERANGE); >> + (void)REQUIRE(__wcollate_range_cmp(table, start, finish) <= 0, REG_ERANGE); >> for (i = 0; i <= UCHAR_MAX; i++) { >> - if ( __collate_range_cmp(table, start, i) <= 0 >> - && __collate_range_cmp(table, i, finish) <= 0 >> + if ( __wcollate_range_cmp(table, start, i) <= 0 >> + && __wcollate_range_cmp(table, i, finish) <= 0 >> ) >> CHadd(p, cs, i); >> } >> > > As I already mention in PR, we have broken regcomp after someone adds > wchar_t support there. Now regcomp ranges works only for the first 256 > wchars of the current locale, notice that loop upper limit: > for (i = 0; i <= UCHAR_MAX; i++) { > In general, ranges are either broken in regcomp now or are memory > eating. We have bitmask only for the first 256 wchars, all other added > to the range literally. Imagine what happens if someone specify full > Unicode range in regexp. > > Proper fix will be adding bitmask for the whole Unicode range, and even > in that case regcomp attempting to use collation in ranges will be > _very_slow_ since needs to check all Unicode chars in its > for (i = 0; i <= Max_Unicode_wchar; i++) { > loop. > > Better stop pretending that we are able to do collation support in the > ranges, since POSIX cares about its own locale only here: > "In the POSIX locale, a range expression represents the set of collating > elements that fall between two elements in the collation sequence, > inclusive. In other locales, a range expression has unspecified > behavior: strictly conforming applications shall not rely on whether the > range expression is valid, or on the set of collating elements matched." > > Until whole Unicode range bitmask will be implemented (if ever), better > stop pretending to honor collation order, we just can't do it with > wchars now and do what NetBSD/OpenBSD does (using wchar_t) instead. It > does not prevent memory eating on big ranges (bitmask is needed, see > above), but at least fix the thing that only first 256 wchars are > considered. > Sadly regex is one part of the system that could use a maintainer :(, I have been forced to look at it more than I'd like to but I don't really use the collation support at all. Pedro.