Date: Thu, 9 Nov 2017 16:36:44 -0500 From: mfv <mfv@bway.net> To: "James B. Byrne via freebsd-questions" <freebsd-questions@freebsd.org> Cc: byrnejb@harte-lyne.ca Subject: Re: Regex character and collation calss documentation Message-ID: <20171109163644.3338c824@gecko4> In-Reply-To: <41c47638eec0e1a562f4446c7fe5a2df.squirrel@webmail.harte-lyne.ca> References: <41c47638eec0e1a562f4446c7fe5a2df.squirrel@webmail.harte-lyne.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
> On Wed, 2017-11-08 at 12:47 "James B. Byrne via freebsd-questions" > <freebsd-questions@freebsd.org> wrote: > >I have been perusing the available documentation respecting regex on >FreeBSD and cannot find a reference to [.NUL.]. Everything that I have >found points to ctype.h. The only class names I can find therein are: > >int isalnum(int); [:alnum:] >int isalpha(int); [:alpha:] >int iscntrl(int); [:cntrl:] >int isdigit(int); [:digit:] >int isgraph(int); [:graph:] >int islower(int); [:lower:] >int isprint(int); [:print:] >int ispunct(int); [:punct:] >int isspace(int); [:space:] >int isupper(int); [:upper:] >int isxdigit(int); [:xdigit:] > >From reading the reference at >https://docs.freebsd.org/info/regex/regex.pdf and comparing it to the >uncommented lines in ctype.h on my FreeBSD-11.1 desktop host one could >reasonably deduce that the following should be available on FreeBSD in >addition to the above: > >int isascii(int); [:ascii:] > >int isblank(int); [:blank:] > >int ishexnumber(int); [:hexnumber:] >int isideogram(int); [:ideogram:] >int isnumber(int); [:number:] >int isphonogram(int); [:phonogram:] >int isrune(int); [:rune:] >int isspecial(int); [:special:] > >But of these only [[:blank:]] is recognized by grep; whatever else >might employ the rest. > >[[:ascii:]] >grep: Invalid character class name >[[:hexnumber:]] >grep: Invalid character class name >[[:ideogram:]] >grep: Invalid character class name >[[:number:]] >grep: Invalid character class name >[[:phonogram:]] >grep: Invalid character class name >[[:rune:]] >grep: Invalid character class name >[[:special:]] >grep: Invalid character class name > > >However I see no reference to [.NUL.] anywhere. The sed man page has >no reference to nul or NUL at all and tr only has this to say: > > The tr utility has historically not permitted the manipulation > of NUL bytes in its input and, additionally, stripped NUL's from > its input stream. This implementation has removed this behavior > as a bug. > > >Is there a master list of character/collation classes for FreeBSD >regex? I have read the man pages for grep and re_format. In no case >is the character or collation class NUL mentioned. > >Where is the usage of [.NUL.] documented? > Hello James, This may help you with a bit of hacking. I asked myself the same question but could not find a satisfactory answer. After remembering that "man ascii" has names for all non-printable ASCII characters, I placed some of these characters in a text file and then removed the same characters using their name. Thus: - the character ^@ was removed using [[.NUL.]] - the character ^G was removed using [[.BEL.]] - the character ^F was removed using [[.ACK.]] - etc, I did not try all non-printable characters but a large sampling followed this pattern. Trying to use SP for a space produced the following error: sed: 1: "/[[.SP.]]/d": RE error: invalid collating element Perhaps there are other exceptions similar to SP. This syntax also recognises printable characters as well. For example the character 'A' was removed using 's/[[.A.]]//g'. I would have preferred some formal documentation on this matter but like yourself am still searching. Cheers ... Marek
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20171109163644.3338c824>