Date: Tue, 26 Aug 2008 07:34:57 -0400 From: An <anmichel@gmail.com> To: freebsd-questions@freebsd.org Subject: Re: sed html tags Message-ID: <db2611860808260434l2a3fe744y5e94c46d581bc25a@mail.gmail.com> In-Reply-To: <48B39A4E.1@gmail.com> References: <41baaeae-0c1d-4a73-9540-8049b837261c@l64g2000hse.googlegroups.com> <48B356BE.3080501@datapipe.com> <db2611860808252119g25adf379wf7b5825bbd4cd694@mail.gmail.com> <48B39A4E.1@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Well, thanks, Yuri ! That worked much better than all that i had done ! But i have the problem that I don't know what characters to expect... accents, =F1, etc... So i really need a "get everything between the <span xxxx> and the first </span>"... Regarding perl, it is perfect ! thanks ! The ? is critical ! Is it what makes what makes the .* non greedy ? Thanks, An M On Tue, Aug 26, 2008 at 1:53 AM, Yuri Pankov <yuri.pankov@gmail.com> wrote: > An wrote: > > unfortunately not... see: > > > > # cat file > > <span xxxx> 111 </span> 2222 <span yyyy> 3333 </span> > > > > # sed -e 's/<\/?span[^>]*>//g' file > > <span xxxx> 111 </span> 2222 <span yyyy> 3333 </span> > > > > (...nothing happens, the file is returned with no substitutions done) > > > > > > I could do it with a perl script, which basically does what i would > expect > > sed would do: > > > > # cat pscript.pl > > #!/usr/bin/perl -w > > $text =3D "<span xxxx> 111 </span> 2222 <span yyyy> 3333 </span> <spa= n > xxxx> > > 111 </span> 2222 <span yyyy> 3333 </span>"; > > $text =3D~ s/<span x[^>]*>[^\(<\/span>\)]*[\s]*<\/span>[\s]*//g; > > print $text . "\n" > > $text =3D~ s#<span xxxx>.*?</span>\s*##g; > > > # perl pscript.pl > > 2222 <span yyyy> 3333 </span> 2222 <span yyyy> 3333 </span> > > > > " <span xxx> ..... </span> " is removed... but i don't seem to be able = to > do > > it with sed... : ( > > regexps in sed are greedy and, sadly, you can't use *? as quantifier. > try the following (adding characters that can be inside your 'xxxx' > tags, of course): > sed 's#<span xxxx>[ a-zA-Z0-9]*</span>[ ]*##g' > > > Im on fedora c9, maybe that's the problem ? > > > > siran > > > > > > On Mon, Aug 25, 2008 at 8:35 PM, Paul A. Procacci < > pprocacci@datapipe.com>wrote: > > > >> siran wrote: > >> > >>> Hi, I have the string > >>> > >>> <span xxxx> 111 </span> 2222 <span yyyy> 3333 </span> > >>> > >>> And i wish to use sed to strip *only* the "<span xxxx>" tag and its > >>> contents... is this possible ? I'm trying this expression, but it > >>> doesn't work... > >>> > >>> sed 's/<span xxxx[^\(</span>\)]+<\/span>//g' file > >>> > >>> is there anything like it ? > >>> > >>> I would like to obtain > >>> > >>> 2222 > >>> > >>> > >>> > >>> I hope someone can help, > >>> > >>> thank you, > >>> > >>> siran > >>> _______________________________________________ > >>> freebsd-questions@freebsd.org mailing list > >>> http://lists.freebsd.org/mailman/listinfo/freebsd-questions > >>> To unsubscribe, send any mail to " > >>> freebsd-questions-unsubscribe@freebsd.org" > >>> > >>> > >> sed -E 's/<\/?span[^>]*>//g' > >> > >> Myabe that's what you want? > >> > > > HTH, > Yuri >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?db2611860808260434l2a3fe744y5e94c46d581bc25a>