Date: Fri, 20 Jun 2014 20:10:11 -0500 From: Pedro Giffuni <pfg@freebsd.org> To: Bruce Evans <brde@optusnet.com.au> Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, Stefan Farfeleder <stefanf@freebsd.org>, src-committers@freebsd.org Subject: Re: svn commit: r267675 - head/lib/libc/regex Message-ID: <B74D36BF-1BF7-40E4-9583-76E6A917ECDC@freebsd.org> In-Reply-To: <20140621072634.I921@besplex.bde.org> References: <201406201529.s5KFTAEB068038@svn.freebsd.org> <20140620182311.GA1214@mole.fafoe.narf.at> <53A48D62.4060801@FreeBSD.org> <20140621072634.I921@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Il giorno 20/giu/2014, alle ore 16:59, Bruce Evans = <brde@optusnet.com.au> ha scritto: >=20 >> El 6/20/2014 1:23 PM, Stefan Farfeleder escribi=F3: >>> On Fri, Jun 20, 2014 at 03:29:10PM +0000, Pedro F. Giffuni wrote: >>>> ... >>>> Log: >>>> regex: Make use of reallocf(). >>>> Use of reallocf is useful in libraries as we are not certain the >>>> application will exit after NULL. >>>> ... >>>> This somewhat reduces portability but if since you are building >>>> this as part of libc it is likely you have our non-standard >>>> reallocf(3) already. >=20 > It also somewhat increases namespace pollution. >=20 >>>> Modified: head/lib/libc/regex/regcomp.c >>>> = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D >>>> --- head/lib/libc/regex/regcomp.c Fri Jun 20 13:26:49 2014 = (r267674) >>>> +++ head/lib/libc/regex/regcomp.c Fri Jun 20 15:29:09 2014 = (r267675) >>>> @@ -1111,7 +1111,7 @@ allocset(struct parse *p) >>>> { >>>> cset *cs, *ncs; >>>> - ncs =3D realloc(p->g->sets, (p->g->ncsets + 1) * sizeof(*ncs)); >>>> + ncs =3D reallocf(p->g->sets, (p->g->ncsets + 1) * sizeof(*ncs)); >>>> if (ncs =3D=3D NULL) { >>>> SETERROR(REG_ESPACE); >>>> return (NULL); >>>> @@ -1174,7 +1174,7 @@ CHadd(struct parse *p, cset *cs, wint_t >>>> if (ch < NC) >>>> cs->bmp[ch >> 3] |=3D 1 << (ch & 7); >>>> else { >>>> - newwides =3D realloc(cs->wides, (cs->nwides + 1) * >>>> + newwides =3D reallocf(cs->wides, (cs->nwides + 1) * >>>> sizeof(*cs->wides)); >>>> if (newwides =3D=3D NULL) { >>>> SETERROR(REG_ESPACE); >>> I don't think these changes are OK. If reallocf() fails here, the >>> cs->wides pointer will be freed and later freeset() will call >>> free(cs->wides), probably crashing. The other cases are most = probably >>> similar though I haven't examined them closely. >>=20 >> OK ... >>=20 >> I don't think there is any problem: >>=20 >> If reallocf fails, newwides will be set to NULL and if free() is = called it doesn't do anything when the argument is NULL. >>=20 >> Also freeset() is meant to be called to "free a now-unused set" and = it is not called within the library. I would think using a value when = the allocation has failed is a much more serious issue than attempting = to fail to free it after trying to use it. ;-). >=20 > It doesn't look OK to me. It turns the careful use of newwides, etc., > into garbage, and leaves a garbage pointer in cs->wides, etc., to = cause > problems later. It might work without this careful use of a temporary > variable, but even that is not clear. Consider: >=20 > - start with a consistent data structure with some pointers to = malloced > storage in it; foo->p say > - simple reallocf() allocation strategy: > foo->p =3D reallocf(foo->p, size) > if (foo->p =3D=3D NULL) > return (ERROR); > This leaves the data structure consistent if and only if the only = thing > in it relevant to p is p itself, with foo->p serving as a flag for > validity. > - more complicated reallocf() allocation strategy: > foo->p =3D reallocf(foo->p, size) > if (foo->p =3D=3D NULL) { > foo->psize =3D 0; > foo->pvalid =3D 0; > foo->foovalid =3D 0; > ... > return (ERROR); > } > This still can't do things like cleaning up what foo->p points to, = since > reallocf() clobbered that. > This shows that reallocf() is only useful in simple cases. In = general, you > must keep the pointer valid to clean things up: > newp =3D reallocf(foo->p, size) > if (newp =3D=3D NULL) { > for (i =3D 0; i < foo->psize; i++) > free(foo->p[i]); > foo->p =3D NULL; > foo->psize =3D 0; > foo->pvalid =3D 0; > foo->foovalid =3D 0; > ... > return (ERROR); > } >=20 > Of course, malloc never fails so all this error checking is more a = source > of complexity and bugs than useful. >=20 This is interesting, however I have problems understanding that a = library would let you ignore an error condition and pass you garbage as = part of it=92s normal behavior. Plus being a standard library I am not = sure if you can count on other implementations doing the same ... > Re-quoting/recovering some context: >=20 > @ Modified: head/lib/libc/regex/regcomp.c > @ = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D > @ --- head/lib/libc/regex/regcomp.c Fri Jun 20 13:26:49 2014 = (r267674) > @ +++ head/lib/libc/regex/regcomp.c Fri Jun 20 15:29:09 2014 = (r267675) > @ @@ -1111,7 +1111,7 @@ allocset(struct parse *p) > @ { > @ cset *cs, *ncs; > @ @ - ncs =3D realloc(p->g->sets, (p->g->ncsets + 1) * sizeof(*ncs)); > @ + ncs =3D reallocf(p->g->sets, (p->g->ncsets + 1) * sizeof(*ncs)); > @ if (ncs =3D=3D NULL) { > @ SETERROR(REG_ESPACE); > @ return (NULL); >=20 > This code shows that even Henry didn't know how to program memory = allocation > in 1988. The code was much worse in 4.4BSD-Lite2: >=20 > old@ if (p->g->sets =3D=3D NULL) > old@ p->g->sets =3D (cset *)malloc(nc * = sizeof(cset)); > old@ else > old@ p->g->sets =3D (cset *)realloc((char = *)p->g->sets, > old@ nc * = sizeof(cset)); >=20 > This just leaked memory. It was improved by assigning to ncs and by > removing the pre-C90 and C++ casts. Now it is unimproved by turning = the > careful use of ncs into garbage and leaving garbage in *p. >=20 > old@ if (p->g->setbits =3D=3D NULL) > old@ p->g->setbits =3D (uch *)malloc(nbytes); > old@ else { > old@ p->g->setbits =3D (uch *)realloc((char = *)p->g->setbits, > old@ nbytes); > old@ /* xxx this isn't right if setbits is now NULL = */ > old@ for (i =3D 0; i < no; i++) > old@ p->g->sets[i].ptr =3D p->g->setbits + = css*(i/CHAR_BIT); > old@ } >=20 > Honest memory mismanagement. It just assumes that malloc() and = realloc() > can't fail. In -current, the function is much simpler and doesn't = have > this code. I think part of the simplication is to use realloc() of = NULL > instead of malloc() to start up. >=20 > @ @@ -1174,7 +1174,7 @@ CHadd(struct parse *p, cset *cs, wint_t @ = if (ch < NC) > @ cs->bmp[ch >> 3] |=3D 1 << (ch & 7); > @ else { > @ - newwides =3D realloc(cs->wides, (cs->nwides + 1) * > @ + newwides =3D reallocf(cs->wides, (cs->nwides + 1) * > @ sizeof(*cs->wides)); > @ if (newwides =3D=3D NULL) { > @ SETERROR(REG_ESPACE); > @ @@ -1203,7 +1203,7 @@ CHaddrange(struct parse *p, cset *cs, wi > @ CHadd(p, cs, min); > @ if (min >=3D max) > @ return; > @ - newranges =3D realloc(cs->ranges, (cs->nranges + 1) * > @ + newranges =3D reallocf(cs->ranges, (cs->nranges + 1) * > @ sizeof(*cs->ranges)); > @ if (newranges =3D=3D NULL) { > @ SETERROR(REG_ESPACE); > @ @@ -1227,7 +1227,7 @@ CHaddtype(struct parse *p, cset *cs, wct > @ for (i =3D 0; i < NC; i++) > @ if (iswctype(i, wct)) > @ CHadd(p, cs, i); > @ - newtypes =3D realloc(cs->types, (cs->ntypes + 1) * > @ + newtypes =3D reallocf(cs->types, (cs->ntypes + 1) * > @ sizeof(*cs->types)); > @ if (newtypes =3D=3D NULL) { > @ SETERROR(REG_ESPACE); > @ @@ -1350,7 +1350,7 @@ enlarge(struct parse *p, sopno size) > @ if (p->ssize >=3D size) > @ return 1; > @ @ - sp =3D (sop *)realloc(p->strip, size*sizeof(sop)); > @ + sp =3D (sop *)reallocf(p->strip, size*sizeof(sop)); > @ if (sp =3D=3D NULL) { > @ SETERROR(REG_ESPACE); > @ return 0; >=20 > Similarly in 4 more cases, except most of them weren't in old = versions. >=20 > @ @@ -1368,7 +1368,7 @@ static void > @ stripsnug(struct parse *p, struct re_guts *g) > @ { > @ g->nstates =3D p->slen; > @ - g->strip =3D (sop *)realloc((char *)p->strip, p->slen * = sizeof(sop)); > @ + g->strip =3D (sop *)reallocf((char *)p->strip, p->slen * = sizeof(sop)); > @ if (g->strip =3D=3D NULL) { > @ SETERROR(REG_ESPACE); > @ g->strip =3D p->strip; >=20 > This was the only realloc() that had not been cleaned up, so using > reallocf() has a chance of improving it. It still has the pre-C90 and > C++ casts. But there is an obvious new bug visible without reading = more > than the patch context: > - the local variable might not be needed now, since the variable = assigned > to, g->strip, is different from the variable realloced, p->strip > - on realloc failure, p->strip becomes invalid > - the error handling is completely broken. It points g->strip at the = old > (now deallocated) storage. This is immediate use of the garbage = pointer > created by using reallocf(). >=20 > Clearly, only realloc() code of the form "p =3D realloc(p, size);", = where the > pointer assigned to is the same as the pointer realloced, can be = improved > by blindly converting it to use reallocf(). >=20 Ah yes, my fault there. > The last function is most interesting. It seems to be to reduce the = size. > So an allocation failure is even less likely than usually, except lots = of > small allocations and reallocations are more likely to waste space = than > save it. But if failure occurs it is almost harmless, and the error > handling of keeping using the previous allocation was good. Maybe = Henry > knew how to program memory allocation after all. (The last function > survived previously-unchanged from 4.4BSDLite-2 except for removing = K&R > support and 'register=92. OK, it=92s pretty tricky, but I agree that reallocf() simply won=92t do = anything here. I will revert. Thanks! Pedro.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?B74D36BF-1BF7-40E4-9583-76E6A917ECDC>