From owner-freebsd-questions@freebsd.org Mon Nov 13 19:35:40 2017 Return-Path: Delivered-To: freebsd-questions@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 042FFDC9345 for ; Mon, 13 Nov 2017 19:35:40 +0000 (UTC) (envelope-from mfv@bway.net) Received: from smtp1.bway.net (smtp1.bway.net [216.220.96.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C99387A034 for ; Mon, 13 Nov 2017 19:35:39 +0000 (UTC) (envelope-from mfv@bway.net) Received: from gecko4 (host-216-220-115-221.dsl.bway.net [216.220.115.221]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: m1316v@bway.net) by smtp1.bway.net (Postfix) with ESMTPSA id D567C95897; Mon, 13 Nov 2017 14:35:29 -0500 (EST) Date: Mon, 13 Nov 2017 14:35:29 -0500 From: mfv To: "James B. Byrne via freebsd-questions" Cc: byrnejb@harte-lyne.ca Subject: Re: Regex character and collation class documentation Message-ID: <20171113143529.572a4b76@gecko4> In-Reply-To: References: <68be33ca89aab31e068253dffe129021.squirrel@webmail.harte-lyne.ca> <20171111104543.11279fb7@gecko4> Reply-To: mfv@bway.net MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Nov 2017 19:35:41 -0000 > On Mon, 2017-11-13 at 09:09 "James B. Byrne via freebsd-questions" > wrote: > >On Sat, November 11, 2017 10:45, mfv wrote: > >> As a result I did some more digging and discovered that the valid >> names for [[..]] are contained in /usr/src/lib/libc/regex >> /cname.h. The names in "man ascii" are a subset of cname.h. >> >> It also explains why [[.SP.]] generates an error message. Even >> though SP is listed in "man ascii" it is not specified in cname.h. >> >> Cheers ... >> >> Marek >> > >A file named cname.h does not even exist on my system. At least if it >does then find does not report it. On the other hand, this file: > >/usr/local/include/nstring.h > >contains this: > >/* The standard C library routines isdigit(), for some weird > historical reason, does not take a character (type 'char') as its > argument. Instead it takes an integer. When the integer is a whole > number, it represents a character in the obvious way using the local > character set encoding. When the integer is negative, the results > are undefined. > > Passing a character to isdigit(), which expects an integer, > results in isdigit() sometimes getting a negative number. > > On some systems, when the integer is negative, it represents exactly > the character you want it to anyway (e.g. -1 is the character that > is encoded 0xFF). But on others, it does not. > > (The same is true of other routines like isdigit()). > > Therefore, we have the substitutes for isdigit() etc. that take an > actual character (type 'char') as an argument. >*/ > >#define ISALNUM(C) (isalnum((unsigned char)(C))) >#define ISALPHA(C) (isalpha((unsigned char)(C))) >#define ISCNTRL(C) (iscntrl((unsigned char)(C))) >#define ISDIGIT(C) (isdigit((unsigned char)(C))) >#define ISGRAPH(C) (isgraph((unsigned char)(C))) >#define ISLOWER(C) (islower((unsigned char)(C))) >#define ISPRINT(C) (isprint((unsigned char)(C))) >#define ISPUNCT(C) (ispunct((unsigned char)(C))) >#define ISSPACE(C) (isspace((unsigned char)(C))) >#define ISUPPER(C) (isupper((unsigned char)(C))) >#define ISXDIGIT(C) (isxdigit((unsigned char)(C))) >#define TOUPPER(C) ((char)toupper((unsigned char)(C))) > >But nowhere can I find 'isnul' or ISNUL'. > > > Hello James, Do you have /usr/src on your system? All the directories under /usr/src are the source code used to build FreeBSD on one's own computer. If not, here is a link to the GIT repository where the source code for /usr/src/lib/libc/regex/cname.h can be seen: https://github.com/freebsd/freebsd/blob/master/lib/libc/regex/cname.h All names listed on the left can be used in sed to match the character to the right. For example, /[[.asterisk.]]{3}/ matches ***. Some of the characters have two names. For example, the octal control character '\007' is represented by 'BEL' as well as 'alert'. I do not know the purpose of /usr/local/include/nstring.h. As such I can not shed any light on that particular file. Cheers ... Marek