From owner-svn-src-all@freebsd.org Sat Dec 8 19:45:07 2018 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1DD7013279E1; Sat, 8 Dec 2018 19:45:07 +0000 (UTC) (envelope-from yuripv@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B9F068D5A8; Sat, 8 Dec 2018 19:45:06 +0000 (UTC) (envelope-from yuripv@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 7E35F269EC; Sat, 8 Dec 2018 19:45:06 +0000 (UTC) (envelope-from yuripv@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id wB8Jj67S023784; Sat, 8 Dec 2018 19:45:06 GMT (envelope-from yuripv@FreeBSD.org) Received: (from yuripv@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id wB8Jj5jh023781; Sat, 8 Dec 2018 19:45:05 GMT (envelope-from yuripv@FreeBSD.org) Message-Id: <201812081945.wB8Jj5jh023781@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: yuripv set sender to yuripv@FreeBSD.org using -f From: Yuri Pankov Date: Sat, 8 Dec 2018 19:45:05 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-12@freebsd.org Subject: svn commit: r341745 - in stable/12/lib/libc: regex tests/regex X-SVN-Group: stable-12 X-SVN-Commit-Author: yuripv X-SVN-Commit-Paths: in stable/12/lib/libc: regex tests/regex X-SVN-Commit-Revision: 341745 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: B9F068D5A8 X-Spamd-Result: default: False [-2.96 / 15.00]; local_wl_from(0.00)[FreeBSD.org]; NEURAL_HAM_MEDIUM(-1.00)[-0.997,0]; NEURAL_HAM_LONG(-0.99)[-0.990,0]; NEURAL_HAM_SHORT(-0.97)[-0.972,0] X-Rspamd-Server: mx1.freebsd.org X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 08 Dec 2018 19:45:07 -0000 Author: yuripv Date: Sat Dec 8 19:45:05 2018 New Revision: 341745 URL: https://svnweb.freebsd.org/changeset/base/341745 Log: MFC r340835: regexec: fix processing multibyte strings. Matcher function incorrectly assumed that moffset that we get from findmust is in bytes. Fix this by introducing a stepback function, taking short path if MB_CUR_MAX is 1, and going back byte-by-byte, checking if we have a legal character sequence otherwise. PR: 153502 Reviewed by: pfg, kevans Differential revision: https://reviews.freebsd.org/D18297 Added: stable/12/lib/libc/tests/regex/multibyte.sh - copied unchanged from r340835, head/lib/libc/tests/regex/multibyte.sh Modified: stable/12/lib/libc/regex/engine.c stable/12/lib/libc/tests/regex/Makefile Directory Properties: stable/12/ (props changed) Modified: stable/12/lib/libc/regex/engine.c ============================================================================== --- stable/12/lib/libc/regex/engine.c Sat Dec 8 19:42:01 2018 (r341744) +++ stable/12/lib/libc/regex/engine.c Sat Dec 8 19:45:05 2018 (r341745) @@ -48,6 +48,7 @@ __FBSDID("$FreeBSD$"); */ #ifdef SNAMES +#define stepback sstepback #define matcher smatcher #define walk swalk #define dissect sdissect @@ -58,6 +59,7 @@ __FBSDID("$FreeBSD$"); #define match smat #endif #ifdef LNAMES +#define stepback lstepback #define matcher lmatcher #define walk lwalk #define dissect ldissect @@ -68,6 +70,7 @@ __FBSDID("$FreeBSD$"); #define match lmat #endif #ifdef MNAMES +#define stepback mstepback #define matcher mmatcher #define walk mwalk #define dissect mdissect @@ -142,6 +145,39 @@ static const char *pchar(int ch); #endif /* + * Given a multibyte string pointed to by start, step back nchar characters + * from current position pointed to by cur. + */ +static const char * +stepback(const char *start, const char *cur, int nchar) +{ + const char *ret; + int wc, mbc; + mbstate_t mbs; + size_t clen; + + if (MB_CUR_MAX == 1) + return ((cur - nchar) > start ? cur - nchar : NULL); + + ret = cur; + for (wc = nchar; wc > 0; wc--) { + for (mbc = 1; mbc <= MB_CUR_MAX; mbc++) { + if ((ret - mbc) < start) + return (NULL); + memset(&mbs, 0, sizeof(mbs)); + clen = mbrtowc(NULL, ret - mbc, mbc, &mbs); + if (clen != (size_t)-1 && clen != (size_t)-2) + break; + } + if (mbc > MB_CUR_MAX) + return (NULL); + ret -= mbc; + } + + return (ret); +} + +/* - matcher - the actual matching engine == static int matcher(struct re_guts *g, const char *string, \ == size_t nmatch, regmatch_t pmatch[], int eflags); @@ -244,9 +280,14 @@ matcher(struct re_guts *g, ZAPSTATE(&m->mbs); /* Adjust start according to moffset, to speed things up */ - if (dp != NULL && g->moffset > -1) - start = ((dp - g->moffset) < start) ? start : dp - g->moffset; + if (dp != NULL && g->moffset > -1) { + const char *nstart; + nstart = stepback(start, dp, g->moffset); + if (nstart != NULL) + start = nstart; + } + SP("mloop", m->st, *start); /* this loop does only one repetition except for backrefs */ @@ -1083,6 +1124,7 @@ pchar(int ch) #endif #endif +#undef stepback #undef matcher #undef walk #undef dissect Modified: stable/12/lib/libc/tests/regex/Makefile ============================================================================== --- stable/12/lib/libc/tests/regex/Makefile Sat Dec 8 19:42:01 2018 (r341744) +++ stable/12/lib/libc/tests/regex/Makefile Sat Dec 8 19:45:05 2018 (r341745) @@ -2,6 +2,9 @@ PACKAGE= tests +# local test cases +ATF_TESTS_SH+= multibyte + .include "Makefile.inc" .include "${.CURDIR:H}/Makefile.netbsd-tests" .include Copied: stable/12/lib/libc/tests/regex/multibyte.sh (from r340835, head/lib/libc/tests/regex/multibyte.sh) ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ stable/12/lib/libc/tests/regex/multibyte.sh Sat Dec 8 19:45:05 2018 (r341745, copy of r340835, head/lib/libc/tests/regex/multibyte.sh) @@ -0,0 +1,35 @@ +# $FreeBSD$ + +atf_test_case multibyte +multibyte_head() +{ + atf_set "descr" "Check matching multibyte characters (PR153502)" +} +multibyte_body() +{ + export LC_CTYPE="C.UTF-8" + + printf 'é' | atf_check -o "inline:é" \ + sed -ne '/^.$/p' + printf 'éé' | atf_check -o "inline:éé" \ + sed -ne '/^..$/p' + printf 'aéa' | atf_check -o "inline:aéa" \ + sed -ne '/a.a/p' + printf 'aéa'| atf_check -o "inline:aéa" \ + sed -ne '/a.*a/p' + printf 'aaéaa' | atf_check -o "inline:aaéaa" \ + sed -ne '/aa.aa/p' + printf 'aéaéa' | atf_check -o "inline:aéaéa" \ + sed -ne '/a.a.a/p' + printf 'éa' | atf_check -o "inline:éa" \ + sed -ne '/.a/p' + printf 'aéaa' | atf_check -o "inline:aéaa" \ + sed -ne '/a.aa/p' + printf 'éaé' | atf_check -o "inline:éaé" \ + sed -ne '/.a./p' +} + +atf_init_test_cases() +{ + atf_add_test_case multibyte +}