FreeBSD Mail Archives

Date:      Tue, 23 Jan 2018 03:53:19 +0300
From:      Yuri Pankov <yuripv@icloud.com>
To:        freebsd-hackers <freebsd-hackers@freebsd.org>, Kyle Evans <kevans@FreeBSD.org>
Subject:   libc/regex: r302824 added invalid check breaking collating ranges
Message-ID:  <a0d9abd8-19b8-cdf6-5451-e184fa182b38@icloud.com>

index | next in thread | raw e-mail


(CCing Kyle as he's working on regex at the moment and not because he 
broke something)

Hi,

r302284 added an invalid check which breaks collating ranges:

-if (table->__collate_load_error) {
-    (void)REQUIRE((uch)start <= (uch)finish, REG_ERANGE);
+if (table->__collate_load_error || MB_CUR_MAX > 1) {
+    (void)REQUIRE(start <= finish, REG_ERANGE);

The "MB_CUR_MAX > 1" is wrong, we should be doing proper comparison 
according to current locale's collation and not simply comparing the 
wchar_t values.

Example -- see Table 1 in http://www.unicode.org/reports/tr10/:

Let's try Swedish collation:
$ echo 'test' | LC_COLLATE=se_SE.UTF-8 grep '[ö-z]'
grep: invalid character range
$ echo 'test' | LC_COLLATE=se_SE.UTF-8 grep '[z-ö]'

OK, the above seems to be correct, 'ö' > 'z' in Swedish collation, but 
we just got lucky here, as wchar_t comparison gives us the same result.

Now German one:
$ echo 'test' | LC_COLLATE=de_DE.UTF-8 grep '[ö-z]'
grep: invalid character range
$ echo 'test' | LC_COLLATE=de_DE.UTF-8 grep '[z-ö]'

Same, but according to the table, 'ö' < 'z' in German collation!

I think the fix here would be to drop the "if 
(table->__collate_load_error || MB_CUR_MAX > 1)" block entirely as we no 
longer use the "table" so there's no point in getting it and checking 
error, wcscoll() which would be called eventually in p_range_cmp() does 
the table handling itself, and we can't use the direct comparison for 
anything other than 'C' locale (not sure if it's applicable even there).

home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?a0d9abd8-19b8-cdf6-5451-e184fa182b38>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation