Date: Tue, 23 Oct 2001 20:54:16 -0500 From: Mike Meyer <mwm@mired.org> To: Shill <shill@free.fr> Cc: questions@freebsd.org Subject: Re: regex(3), re_format(7) and shortest match Message-ID: <15318.8008.241466.858831@guru.mired.org> In-Reply-To: <87041511@toto.iv>
next in thread | previous in thread | raw e-mail | index | archive | help
Shill <shill@free.fr> types: > >> Say I have the string: > >> "yo <a --- </a> hiphop <a +++ </a> fun" > >> > >> Can anyone tell me how to achieve the shortest match? > >> > >> i.e. the first call to regexec() would return "<a --- </a>" > >> and the second would return "<a +++ </a>". > > > Try this: (<a[^<]*</a>) > > What if '<' '/' 'a' '>' are all valid characters inside "<a ... </a>"? > > i.e. if I have the string "yo <a %<%/%a%> </a> hiphop" a call to > regexec() should return "<a %<%/%a%> </a>". > > If I use the (<a[^<]*</a>) regex, it would just return REG_NOMATCH. Use python re module - or Perl if you must - and then "<a.*?</a>" will work for your regular expression. The "?" makes the * match the shortest possible string that matches, which is what you want. If you have to invoke it from C, the sources to this can be found in /usr/src/contrib/perl5. Be warned that this is GNU-licensed code, not BSD-licensed code, so check the license restrictions before distributing the source or anything built on it. <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/home/mwm/ Q: How do you make the gods laugh? A: Tell them your plans. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?15318.8008.241466.858831>