Date: Tue, 23 Jan 2018 03:53:19 +0300 From: Yuri Pankov <yuripv@icloud.com> To: freebsd-hackers <freebsd-hackers@freebsd.org>, Kyle Evans <kevans@FreeBSD.org> Subject: libc/regex: r302824 added invalid check breaking collating ranges Message-ID: <a0d9abd8-19b8-cdf6-5451-e184fa182b38@icloud.com>
index | next in thread | raw e-mail
(CCing Kyle as he's working on regex at the moment and not because he
broke something)
Hi,
r302284 added an invalid check which breaks collating ranges:
-if (table->__collate_load_error) {
- (void)REQUIRE((uch)start <= (uch)finish, REG_ERANGE);
+if (table->__collate_load_error || MB_CUR_MAX > 1) {
+ (void)REQUIRE(start <= finish, REG_ERANGE);
The "MB_CUR_MAX > 1" is wrong, we should be doing proper comparison
according to current locale's collation and not simply comparing the
wchar_t values.
Example -- see Table 1 in http://www.unicode.org/reports/tr10/:
Let's try Swedish collation:
$ echo 'test' | LC_COLLATE=se_SE.UTF-8 grep '[ö-z]'
grep: invalid character range
$ echo 'test' | LC_COLLATE=se_SE.UTF-8 grep '[z-ö]'
OK, the above seems to be correct, 'ö' > 'z' in Swedish collation, but
we just got lucky here, as wchar_t comparison gives us the same result.
Now German one:
$ echo 'test' | LC_COLLATE=de_DE.UTF-8 grep '[ö-z]'
grep: invalid character range
$ echo 'test' | LC_COLLATE=de_DE.UTF-8 grep '[z-ö]'
Same, but according to the table, 'ö' < 'z' in German collation!
I think the fix here would be to drop the "if
(table->__collate_load_error || MB_CUR_MAX > 1)" block entirely as we no
longer use the "table" so there's no point in getting it and checking
error, wcscoll() which would be called eventually in p_range_cmp() does
the table handling itself, and we can't use the direct comparison for
anything other than 'C' locale (not sure if it's applicable even there).
home |
help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?a0d9abd8-19b8-cdf6-5451-e184fa182b38>
