Date: Mon, 13 Nov 2017 14:35:29 -0500 From: mfv <mfv@bway.net> To: "James B. Byrne via freebsd-questions" <freebsd-questions@freebsd.org> Cc: byrnejb@harte-lyne.ca Subject: Re: Regex character and collation class documentation Message-ID: <20171113143529.572a4b76@gecko4> In-Reply-To: <b0835f510ae66a82808725fa8ae8c7d0.squirrel@webmail.harte-lyne.ca> References: <mailman.90.1510315202.51235.freebsd-questions@freebsd.org> <68be33ca89aab31e068253dffe129021.squirrel@webmail.harte-lyne.ca> <20171111104543.11279fb7@gecko4> <b0835f510ae66a82808725fa8ae8c7d0.squirrel@webmail.harte-lyne.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
> On Mon, 2017-11-13 at 09:09 "James B. Byrne via freebsd-questions" > <freebsd-questions@freebsd.org> wrote: > >On Sat, November 11, 2017 10:45, mfv wrote: > >> As a result I did some more digging and discovered that the valid >> names for [[.<name>.]] are contained in /usr/src/lib/libc/regex >> /cname.h. The names in "man ascii" are a subset of cname.h. >> >> It also explains why [[.SP.]] generates an error message. Even >> though SP is listed in "man ascii" it is not specified in cname.h. >> >> Cheers ... >> >> Marek >> > >A file named cname.h does not even exist on my system. At least if it >does then find does not report it. On the other hand, this file: > >/usr/local/include/nstring.h > >contains this: > >/* The standard C library routines isdigit(), for some weird > historical reason, does not take a character (type 'char') as its > argument. Instead it takes an integer. When the integer is a whole > number, it represents a character in the obvious way using the local > character set encoding. When the integer is negative, the results > are undefined. > > Passing a character to isdigit(), which expects an integer, > results in isdigit() sometimes getting a negative number. > > On some systems, when the integer is negative, it represents exactly > the character you want it to anyway (e.g. -1 is the character that > is encoded 0xFF). But on others, it does not. > > (The same is true of other routines like isdigit()). > > Therefore, we have the substitutes for isdigit() etc. that take an > actual character (type 'char') as an argument. >*/ > >#define ISALNUM(C) (isalnum((unsigned char)(C))) >#define ISALPHA(C) (isalpha((unsigned char)(C))) >#define ISCNTRL(C) (iscntrl((unsigned char)(C))) >#define ISDIGIT(C) (isdigit((unsigned char)(C))) >#define ISGRAPH(C) (isgraph((unsigned char)(C))) >#define ISLOWER(C) (islower((unsigned char)(C))) >#define ISPRINT(C) (isprint((unsigned char)(C))) >#define ISPUNCT(C) (ispunct((unsigned char)(C))) >#define ISSPACE(C) (isspace((unsigned char)(C))) >#define ISUPPER(C) (isupper((unsigned char)(C))) >#define ISXDIGIT(C) (isxdigit((unsigned char)(C))) >#define TOUPPER(C) ((char)toupper((unsigned char)(C))) > >But nowhere can I find 'isnul' or ISNUL'. > > > Hello James, Do you have /usr/src on your system? All the directories under /usr/src are the source code used to build FreeBSD on one's own computer. If not, here is a link to the GIT repository where the source code for /usr/src/lib/libc/regex/cname.h can be seen: https://github.com/freebsd/freebsd/blob/master/lib/libc/regex/cname.h All names listed on the left can be used in sed to match the character to the right. For example, /[[.asterisk.]]{3}/ matches ***. Some of the characters have two names. For example, the octal control character '\007' is represented by 'BEL' as well as 'alert'. I do not know the purpose of /usr/local/include/nstring.h. As such I can not shed any light on that particular file. Cheers ... Marek
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20171113143529.572a4b76>