From owner-freebsd-questions@FreeBSD.ORG Tue Apr 27 10:48:34 2004 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2D5FF16A4CE for ; Tue, 27 Apr 2004 10:48:34 -0700 (PDT) Received: from smtp2.bahnhof.se (mail.bahnhof.se [213.136.33.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id 06AC243D64 for ; Tue, 27 Apr 2004 10:48:33 -0700 (PDT) (envelope-from mark.rowlands@mypost.se) Received: from mfilter1.bahnhof.se (mail.bahnhof.se [213.136.33.1]) by smtp2.bahnhof.se (Postfix) with ESMTP id 8636F88C94; Tue, 27 Apr 2004 19:48:34 +0200 (CEST) Received: from localhost (localhost.localdomain [127.0.0.1]) by re-injector1.bahnhof.se (Postfix) with ESMTP id 969D4129149; Tue, 27 Apr 2004 19:48:31 +0200 (CEST) Received: from smtp1.bahnhof.se ([213.136.33.1]) by localhost (mfilter1.bahnhof.se [10.0.1.21]) (amavisd-new, port 10024) with ESMTP id 22260-01; Tue, 27 Apr 2004 19:48:29 +0200 (CEST) Received: from pcmarpxy.mine.nu (81-170-150-191.bahnhofbredband.net [81.170.150.191]) by smtp1.bahnhof.se (Postfix) with ESMTP id 606B41F733C; Tue, 27 Apr 2004 19:48:33 +0200 (CEST) Received: from localhost (localhost.mwrwin2k.se [127.0.0.1]) by pcmarpxy.mine.nu (Postfix) with ESMTP id 02B5911B65; Tue, 27 Apr 2004 19:48:28 +0200 (CEST) Received: from pcmarpxy.mine.nu ([127.0.0.1]) by localhost (pcmarpxy.mine.nu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 85331-05; Tue, 27 Apr 2004 19:48:27 +0200 (CEST) Received: from EXCHSRV1.mwrwin2k.se (kalendar.mine.nu [192.168.0.4]) by pcmarpxy.mine.nu (Postfix) with ESMTP id 4C018119A0; Tue, 27 Apr 2004 19:48:27 +0200 (CEST) MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Date: Tue, 27 Apr 2004 19:48:26 +0200 Content-class: urn:content-classes:message Message-ID: <4789E43478F3994BB8D967C73FD9C68850BA@exchsrv1> X-MimeOLE: Produced By Microsoft Exchange V6.5.6944.0 X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Perl Help For Newbie Thread-Index: AcQr8toCY4AviOaKRKq5PKyjWT7m/AAf3gcA From: "mark rowlands" To: "freebsd-questions@FreeBSD. ORG" X-Virus-Scanned: by amavisd-new at bahnhof.se cc: Drew Tomlinson cc: Christopher Nehren Subject: RE: Perl Help For Newbie X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Apr 2004 17:48:34 -0000 > -----Original Message----- > From: owner-freebsd-questions@freebsd.org=20 > [mailto:owner-freebsd-questions@freebsd.org] On Behalf Of=20 > Christopher Nehren > Sent: Tuesday, April 27, 2004 2:53 AM > To: FreeBSD Questions List > Subject: Re: Perl Help For Newbie >=20 > Can someone explain to me why people are suggesting to parse=20 > markup languages manually? There's modules -- dozens -- for=20 > this. Use CPAN. because he is a perl beginner and doesn't know about cpan and modules and stuff...... how about being a bit more specific :- try :- cd /usr/ports/www/p5-HTML-parser && make install clean perldoc HTML::Parser (see the examples sections) or as a=20 starter use HTML::TokeParser::Simple; $p =3D HTML::TokeParser->new(shift||"index.html"); while (my $token =3D $p->get_tag("a")) { my $url =3D $token->[1]{href} || "-"; my $text =3D $p->get_trimmed_text("/a"); print "$url\t$text\n"; } (HTML::TokeParser::Simple is not in the ports tree yet but=20 will be once the current port freeze is over but perl -MCPAN -e shell =20 cpan> install HTML::TokeParser::Simple Running install for module HTML::TokeParser:: will perform the necessary magic :-=20