Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 23 Oct 2001 20:54:16 -0500
From:      Mike Meyer <mwm@mired.org>
To:        Shill <shill@free.fr>
Cc:        questions@freebsd.org
Subject:   Re: regex(3), re_format(7) and shortest match
Message-ID:  <15318.8008.241466.858831@guru.mired.org>
In-Reply-To: <87041511@toto.iv>

next in thread | previous in thread | raw e-mail | index | archive | help
Shill <shill@free.fr> types:
> >> Say I have the string:
> >> "yo <a --- </a> hiphop <a +++ </a> fun"
> >> 
> >> Can anyone tell me how to achieve the shortest match?
> >> 
> >> i.e. the first call to regexec() would return "<a --- </a>"
> >> and the second would return "<a +++ </a>".
> 
> > Try this: (<a[^<]*</a>)
> 
> What if '<' '/' 'a' '>' are all valid characters inside "<a ... </a>"?
> 
> i.e. if I have the string "yo <a %<%/%a%> </a> hiphop" a call to
> regexec() should return "<a %<%/%a%> </a>".
> 
> If I use the (<a[^<]*</a>) regex, it would just return REG_NOMATCH.

Use python re module - or Perl if you must - and then "<a.*?</a>" will
work for your regular expression. The "?" makes the * match the
shortest possible string that matches, which is what you want.

If you have to invoke it from C, the sources to this can be found in
/usr/src/contrib/perl5. Be warned that this is GNU-licensed code, not
BSD-licensed code, so check the license restrictions before
distributing the source or anything built on it.

	<mike
--
Mike Meyer <mwm@mired.org>			http://www.mired.org/home/mwm/
Q: How do you make the gods laugh?		A: Tell them your plans.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?15318.8008.241466.858831>