Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 26 Aug 2008 07:34:57 -0400
From:      An <anmichel@gmail.com>
To:        freebsd-questions@freebsd.org
Subject:   Re: sed html tags
Message-ID:  <db2611860808260434l2a3fe744y5e94c46d581bc25a@mail.gmail.com>
In-Reply-To: <48B39A4E.1@gmail.com>
References:  <41baaeae-0c1d-4a73-9540-8049b837261c@l64g2000hse.googlegroups.com> <48B356BE.3080501@datapipe.com> <db2611860808252119g25adf379wf7b5825bbd4cd694@mail.gmail.com> <48B39A4E.1@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Well, thanks, Yuri !

That worked much better than all that i had done ! But i have the problem
that I don't know what characters to expect... accents, =F1, etc... So i
really need a "get everything between the <span xxxx> and the first
</span>"...

Regarding perl, it is perfect ! thanks !

The ? is critical ! Is it what makes what makes the .* non greedy ?


Thanks,

An M


On Tue, Aug 26, 2008 at 1:53 AM, Yuri Pankov <yuri.pankov@gmail.com> wrote:

> An wrote:
> > unfortunately not... see:
> >
> > # cat file
> > <span xxxx> 111 </span> 2222 <span yyyy> 3333 </span>
> >
> > # sed -e 's/<\/?span[^>]*>//g' file
> > <span xxxx> 111 </span> 2222 <span yyyy> 3333 </span>
> >
> > (...nothing happens, the file is returned with no substitutions done)
> >
> >
> > I could do it with a perl script, which basically does what i would
> expect
> > sed would do:
> >
> > # cat pscript.pl
> > #!/usr/bin/perl -w
> > $text =3D "<span xxxx> 111 </span>   2222 <span yyyy> 3333 </span> <spa=
n
> xxxx>
> > 111 </span>    2222    <span yyyy> 3333 </span>";
> > $text =3D~ s/<span x[^>]*>[^\(<\/span>\)]*[\s]*<\/span>[\s]*//g;
> > print $text . "\n"
>
> $text =3D~ s#<span xxxx>.*?</span>\s*##g;
>
> > # perl pscript.pl
> > 2222 <span yyyy> 3333 </span> 2222    <span yyyy> 3333 </span>
> >
> > " <span xxx> ..... </span> " is removed... but i don't seem to be able =
to
> do
> > it with sed... : (
>
> regexps in sed are greedy and, sadly, you can't use *? as quantifier.
> try the following (adding characters that can be inside your 'xxxx'
> tags, of course):
> sed 's#<span xxxx>[ a-zA-Z0-9]*</span>[ ]*##g'
>
> > Im on fedora c9, maybe that's the problem ?
> >
> > siran
> >
> >
> > On Mon, Aug 25, 2008 at 8:35 PM, Paul A. Procacci <
> pprocacci@datapipe.com>wrote:
> >
> >> siran wrote:
> >>
> >>> Hi, I have the string
> >>>
> >>> <span xxxx> 111 </span> 2222 <span yyyy> 3333 </span>
> >>>
> >>> And i wish to use sed to strip *only* the "<span xxxx>" tag and its
> >>> contents... is this possible ? I'm trying this expression, but it
> >>> doesn't work...
> >>>
> >>> sed 's/<span xxxx[^\(</span>\)]+<\/span>//g' file
> >>>
> >>> is there anything like it ?
> >>>
> >>> I would like to obtain
> >>>
> >>> 2222
> >>>
> >>>
> >>>
> >>> I hope someone can help,
> >>>
> >>> thank you,
> >>>
> >>> siran
> >>> _______________________________________________
> >>> freebsd-questions@freebsd.org mailing list
> >>> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> >>> To unsubscribe, send any mail to "
> >>> freebsd-questions-unsubscribe@freebsd.org"
> >>>
> >>>
> >> sed -E 's/<\/?span[^>]*>//g'
> >>
> >> Myabe that's what you want?
> >>
>
>
> HTH,
> Yuri
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?db2611860808260434l2a3fe744y5e94c46d581bc25a>