From owner-freebsd-vuxml@FreeBSD.ORG Sun Aug 29 21:44:07 2004 Return-Path: Delivered-To: freebsd-vuxml@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F025316A4CE for ; Sun, 29 Aug 2004 21:44:07 +0000 (GMT) Received: from bast.unixathome.org (bast.unixathome.org [66.11.174.150]) by mx1.FreeBSD.org (Postfix) with ESMTP id C565243D3F for ; Sun, 29 Aug 2004 21:44:07 +0000 (GMT) (envelope-from dan@langille.org) Received: from xeon (xeon.unixathome.org [192.168.0.18]) by bast.unixathome.org (Postfix) with ESMTP id C287E3D40 for ; Sun, 29 Aug 2004 17:44:06 -0400 (EDT) Date: Sun, 29 Aug 2004 17:44:06 -0400 (EDT) From: Dan Langille X-X-Sender: dan@xeon.unixathome.org To: freebsd-vuxml@freebsd.org Message-ID: <20040829173317.U9281@xeon.unixathome.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Subject: parsing vuln.xml with XML::Node X-BeenThere: freebsd-vuxml@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Documenting security issues in VuXML List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 Aug 2004 21:44:08 -0000 Hi folks, I've run into a problem parsing the vuln.xlm file. I'm using the perl XML::Node module. The issue is the body field. This field contains XHTML tags (for example,

, ,

). I have been unable to extract the contents of .... The only solution I've found is to explicitly specify tags such as

, , and

. That is by no means an ideal solution. There must be something I'm missing. It has been suggested I use XML::Parser instead (XML::Node is based upon XML::Parser). See a working example at http://beta.freshports.org/tmp/testing.tgz it runs like this: perl load_vuxml_into_db.pl vuln.xml If someone can figure out how I can do this, it will be appreciated. Thanks. -- Dan Langille - http://www.langille.org/ From owner-freebsd-vuxml@FreeBSD.ORG Mon Aug 30 17:39:55 2004 Return-Path: Delivered-To: freebsd-vuxml@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 67FDB16A4CE for ; Mon, 30 Aug 2004 17:39:55 +0000 (GMT) Received: from bast.unixathome.org (bast.unixathome.org [66.11.174.150]) by mx1.FreeBSD.org (Postfix) with ESMTP id 413A043D5F for ; Mon, 30 Aug 2004 17:39:55 +0000 (GMT) (envelope-from dan@langille.org) Received: from xeon (xeon.unixathome.org [192.168.0.18]) by bast.unixathome.org (Postfix) with ESMTP id 4FB5D3D40 for ; Mon, 30 Aug 2004 13:39:54 -0400 (EDT) Date: Mon, 30 Aug 2004 13:39:54 -0400 (EDT) From: Dan Langille X-X-Sender: dan@xeon.unixathome.org To: freebsd-vuxml@freebsd.org Message-ID: <20040830133416.X35009@xeon.unixathome.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Subject: vuln.xml is not XML X-BeenThere: freebsd-vuxml@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Documenting security issues in VuXML List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Aug 2004 17:39:55 -0000 I refer to my previous message regarding the difficulties in parsing vuln.xml. I have since learned that any markup (e.g.

) should be be in a CDATA section. See http://www.w3.org/TR/REC-xml/ and look at section 2.7. CDATA sections begin with the string "":] I propose that markup be enclosed with a CDATA section. -- Dan Langille - http://www.langille.org/ From owner-freebsd-vuxml@FreeBSD.ORG Tue Aug 31 00:15:24 2004 Return-Path: Delivered-To: freebsd-vuxml@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CA59516A4CE for ; Tue, 31 Aug 2004 00:15:24 +0000 (GMT) Received: from gw.celabo.org (gw.celabo.org [208.42.49.153]) by mx1.FreeBSD.org (Postfix) with ESMTP id 66E7A43D49 for ; Tue, 31 Aug 2004 00:15:24 +0000 (GMT) (envelope-from nectar@celabo.org) Received: from localhost (localhost [127.0.0.1]) by gw.celabo.org (Postfix) with ESMTP id D79985485D; Mon, 30 Aug 2004 19:15:23 -0500 (CDT) Received: from gw.celabo.org ([127.0.0.1]) by localhost (hellblazer.celabo.org [127.0.0.1]) (amavisd-new, port 10024) with SMTP id 07925-09; Mon, 30 Aug 2004 19:15:13 -0500 (CDT) Received: from [10.0.1.107] (lum.celabo.org [10.0.1.107]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client did not present a certificate) by gw.celabo.org (Postfix) with ESMTP id 436795487E; Mon, 30 Aug 2004 19:15:13 -0500 (CDT) In-Reply-To: <20040830133416.X35009@xeon.unixathome.org> References: <20040830133416.X35009@xeon.unixathome.org> Mime-Version: 1.0 (Apple Message framework v619) Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Jacques Vidrine Date: Mon, 30 Aug 2004 19:15:02 -0500 To: Dan Langille X-Mailer: Apple Mail (2.619) X-Mailman-Approved-At: Tue, 31 Aug 2004 00:16:20 +0000 cc: freebsd-vuxml@freebsd.org Subject: vuln.xml *is* XML (was Re: vuln.xml is not XML) X-BeenThere: freebsd-vuxml@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Documenting security issues in VuXML List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 Aug 2004 00:15:24 -0000 On Aug 30, 2004, at 12:39 PM, Dan Langille wrote: > I refer to my previous message regarding the difficulties in parsing > vuln.xml. I have since learned that any markup (e.g.

) should be > be in > a CDATA section. > > See http://www.w3.org/TR/REC-xml/ and look at section 2.7. > > CDATA sections begin with the string " string "]]>":] > > I propose that markup be enclosed with a CDATA section. No this is absolutely wrong :-) The XHTML is embedded with VuXML... the whole document is one XML document. Some elements are in the VuXML namespace, while others are in the XHTML namespace. Markup cannot exist in a CDATA section--- if it is in a CDATA section, it is *not* markup but *text content*. I saw your earlier message about XML::Node, but since I am not familiar with that (or XML::Parser), I did not understand what problem you were having. Could you try to describe it differently? Cheers, -- Jacques A Vidrine / NTT/Verio nectar@celabo.org / jvidrine@verio.net / nectar@freebsd.org From owner-freebsd-vuxml@FreeBSD.ORG Tue Aug 31 00:23:21 2004 Return-Path: Delivered-To: freebsd-vuxml@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A6E2016A4CE for ; Tue, 31 Aug 2004 00:23:21 +0000 (GMT) Received: from black.imgsrc.co.jp (black.imgsrc.co.jp [210.226.20.147]) by mx1.FreeBSD.org (Postfix) with ESMTP id E8E7A43D3F for ; Tue, 31 Aug 2004 00:23:20 +0000 (GMT) (envelope-from kuriyama@imgsrc.co.jp) Received: from localhost (localhost [127.0.0.1]) by black.imgsrc.co.jp (Postfix) with ESMTP id C6F8150BD6 for ; Tue, 31 Aug 2004 09:23:19 +0900 (JST) Received: from black.imgsrc.co.jp (black.imgsrc.co.jp [IPv6:2001:218:422:2::9999]) by black.imgsrc.co.jp (Postfix) with ESMTP id 5510D50BC8 for ; Tue, 31 Aug 2004 09:23:18 +0900 (JST) Date: Tue, 31 Aug 2004 09:23:18 +0900 Message-ID: <7mk6vg2m15.wl@black.imgsrc.co.jp> From: Jun Kuriyama To: freebsd-vuxml@freebsd.org In-Reply-To: References: <20040830133416.X35009@xeon.unixathome.org> User-Agent: Wanderlust/2.10.1 (Watching The Wheels) SEMI/1.14.6 (Maruoka) FLIM/1.14.6 (Marutamachi) APEL/10.6 Emacs/21.3 (i386--freebsd) MULE/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII X-Virus-Scanned: by amavisd 0.1 Subject: Re: vuln.xml *is* XML (was Re: vuln.xml is not XML) X-BeenThere: freebsd-vuxml@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Documenting security issues in VuXML List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 Aug 2004 00:23:21 -0000 At Mon, 30 Aug 2004 19:15:02 -0500, Jacques Vidrine wrote: > > I refer to my previous message regarding the difficulties in parsing > > vuln.xml. I have since learned that any markup (e.g.

) should be > > be in > > a CDATA section. > > > > See http://www.w3.org/TR/REC-xml/ and look at section 2.7. > > > > CDATA sections begin with the string " > string "]]>":] > > > > I propose that markup be enclosed with a CDATA section. > > No this is absolutely wrong :-) The XHTML is embedded with VuXML... > the whole document is one XML document. Some elements are in the VuXML > namespace, while others are in the XHTML namespace. Markup cannot > exist in a CDATA section--- if it is in a CDATA section, it is *not* > markup but *text content*. Both are correct. In good old XML world, we should use CDATA section to quote external markup. On the other hand, VuXML lives in XML + Namespace world (see related recommendations). > I saw your earlier message about XML::Node, but since I am not familiar > with that (or XML::Parser), I did not understand what problem you were > having. Could you try to describe it differently? I'm not sure XML::Parser can handle namespace correctly. If it cannot do such, parser will confuse when it reads markups with namespace. -- Jun Kuriyama // IMG SRC, Inc. // FreeBSD Project From owner-freebsd-vuxml@FreeBSD.ORG Tue Aug 31 00:34:19 2004 Return-Path: Delivered-To: freebsd-vuxml@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 27B6816A4CE for ; Tue, 31 Aug 2004 00:34:19 +0000 (GMT) Received: from bast.unixathome.org (bast.unixathome.org [66.11.174.150]) by mx1.FreeBSD.org (Postfix) with ESMTP id E8E6E43D53 for ; Tue, 31 Aug 2004 00:34:18 +0000 (GMT) (envelope-from dan@langille.org) Received: from xeon (xeon.unixathome.org [192.168.0.18]) by bast.unixathome.org (Postfix) with ESMTP id C23AE3D40; Mon, 30 Aug 2004 20:34:17 -0400 (EDT) Date: Mon, 30 Aug 2004 20:34:17 -0400 (EDT) From: Dan Langille X-X-Sender: dan@xeon.unixathome.org To: Jun Kuriyama In-Reply-To: <7mk6vg2m15.wl@black.imgsrc.co.jp> Message-ID: <20040830203241.V35009@xeon.unixathome.org> References: <20040830133416.X35009@xeon.unixathome.org> <7mk6vg2m15.wl@black.imgsrc.co.jp> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-vuxml@freebsd.org Subject: Re: vuln.xml *is* XML (was Re: vuln.xml is not XML) X-BeenThere: freebsd-vuxml@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Documenting security issues in VuXML List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 Aug 2004 00:34:19 -0000 On Tue, 31 Aug 2004, Jun Kuriyama wrote: > At Mon, 30 Aug 2004 19:15:02 -0500, > Jacques Vidrine wrote: > > > I refer to my previous message regarding the difficulties in parsing > > > vuln.xml. I have since learned that any markup (e.g.

) should be > > > be in > > > a CDATA section. > > > > > > See http://www.w3.org/TR/REC-xml/ and look at section 2.7. > > > > > > CDATA sections begin with the string " > > string "]]>":] > > > > > > I propose that markup be enclosed with a CDATA section. > > > > No this is absolutely wrong :-) The XHTML is embedded with VuXML... > > the whole document is one XML document. Some elements are in the VuXML > > namespace, while others are in the XHTML namespace. Markup cannot > > exist in a CDATA section--- if it is in a CDATA section, it is *not* > > markup but *text content*. > > Both are correct. In good old XML world, we should use CDATA section > to quote external markup. On the other hand, VuXML lives in XML + > Namespace world (see related recommendations). > > > I saw your earlier message about XML::Node, but since I am not familiar > > with that (or XML::Parser), I did not understand what problem you were > > having. Could you try to describe it differently? > > I'm not sure XML::Parser can handle namespace correctly. If it cannot > do such, parser will confuse when it reads markups with namespace. With CDATA, it works, without, it fails and I have to treat every

, and

as a node, not markup. -- Dan Langille - http://www.langille.org/ From owner-freebsd-vuxml@FreeBSD.ORG Tue Aug 31 01:25:41 2004 Return-Path: Delivered-To: freebsd-vuxml@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6A41916A4CE for ; Tue, 31 Aug 2004 01:25:41 +0000 (GMT) Received: from gw.celabo.org (gw.celabo.org [208.42.49.153]) by mx1.FreeBSD.org (Postfix) with ESMTP id DEAEF43D45 for ; Tue, 31 Aug 2004 01:25:40 +0000 (GMT) (envelope-from nectar@FreeBSD.org) Received: from localhost (localhost [127.0.0.1]) by gw.celabo.org (Postfix) with ESMTP id 685E65487F; Mon, 30 Aug 2004 20:25:40 -0500 (CDT) Received: from gw.celabo.org ([127.0.0.1]) by localhost (hellblazer.celabo.org [127.0.0.1]) (amavisd-new, port 10024) with SMTP id 09437-02; Mon, 30 Aug 2004 20:25:29 -0500 (CDT) Received: from [10.0.1.107] (lum.celabo.org [10.0.1.107]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client did not present a certificate) by gw.celabo.org (Postfix) with ESMTP id ECCC354861; Mon, 30 Aug 2004 20:25:28 -0500 (CDT) In-Reply-To: <7mk6vg2m15.wl@black.imgsrc.co.jp> References: <20040830133416.X35009@xeon.unixathome.org> <7mk6vg2m15.wl@black.imgsrc.co.jp> Mime-Version: 1.0 (Apple Message framework v619) Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: <9E499E76-FAEC-11D8-84D2-000A95BC6FAE@FreeBSD.org> Content-Transfer-Encoding: 7bit From: Jacques Vidrine Date: Mon, 30 Aug 2004 20:25:18 -0500 To: Jun Kuriyama X-Mailer: Apple Mail (2.619) cc: freebsd-vuxml@freebsd.org Subject: Re: vuln.xml *is* XML (was Re: vuln.xml is not XML) X-BeenThere: freebsd-vuxml@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Documenting security issues in VuXML List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 Aug 2004 01:25:41 -0000 On Aug 30, 2004, at 7:23 PM, Jun Kuriyama wrote: > Both are correct. In good old XML world, we should use CDATA section > to quote external markup. On the other hand, VuXML lives in XML + > Namespace world (see related recommendations). If you want to quote external markup as *text*, then sure: CDATA is one way of doing that (character and entity references are another). In this case, it is just *text*, not markup--- it looks like markup but it isn't as far as XML processors are concerned. But if you want to do something with that markup (e.g. validation, XSLT) then you really must use real XML and namespaces. I guess you are probably bringing this up from the perspective of DocBook, but it just happens that DocBook--- and some other XML applications such as XML-RPC and RSS--- was born before namespaces and has not adopted support (yet). So we're left with the CDATA workaround that we had to use with SGML. This should never be done in new XML applications. This is finally being addressed in some versions of DocBook (e.g. DocBook 4.3 + SVG). >> I saw your earlier message about XML::Node, but since I am not >> familiar >> with that (or XML::Parser), I did not understand what problem you were >> having. Could you try to describe it differently? > > I'm not sure XML::Parser can handle namespace correctly. If it cannot > do such, parser will confuse when it reads markups with namespace. I don't believe that is correct. Tools that do not grok namespaces will just not see the namespaces. They will still parse the content just fine. Since we use default namespace declarations by convention in vuln.xml, it is particularly un-obtrusive: a parser will just see "xmlns" attribute nodes, but otherwise continue just fine. Basically, a namespace-aware processor will see events like these: start element (http://www.vuxml.org/app/vuxml-1/, description) attributes [] start element (http://www.w3.org/1999/xhtml, body) attributes [] start element (http://www.w3.org/1999/xhtml, blockquote) attributes [(cite, "http://...")] ... end element (http://www.w3.org/1999/xhtml, blockquote) end element (http://www.w3.org/1999/xhtml, body) end element (http://www.vuxml.org/app/vuxml-1/, description) while an old XML processor with no support for namespaces will see events like these: start element description attributes [] start element body attributes [(xmlns, "http://www.w3.org/1999/xhtml")] start element blockquote attributes [(cite, "http://...")] ... end element blockquote end element body end element description You can even ignore the namespaces if you like. You just need to "remember" when you are processing stuff inside a element versus not. AFAIK, XML::Node is based on XML::Parser which is based on expat. expat supports namespaces perfectly well, so it is surprising if the Perl modules built on top of it do not. Cheers, -- Jacques A Vidrine / NTT/Verio nectar@celabo.org / jvidrine@verio.net / nectar@freebsd.org From owner-freebsd-vuxml@FreeBSD.ORG Tue Aug 31 01:30:31 2004 Return-Path: Delivered-To: freebsd-vuxml@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 20C6916A4CE for ; Tue, 31 Aug 2004 01:30:31 +0000 (GMT) Received: from gw.celabo.org (gw.celabo.org [208.42.49.153]) by mx1.FreeBSD.org (Postfix) with ESMTP id E531943D31 for ; Tue, 31 Aug 2004 01:30:30 +0000 (GMT) (envelope-from nectar@FreeBSD.org) Received: from localhost (localhost [127.0.0.1]) by gw.celabo.org (Postfix) with ESMTP id 6A5315487E; Mon, 30 Aug 2004 20:30:30 -0500 (CDT) Received: from gw.celabo.org ([127.0.0.1]) by localhost (hellblazer.celabo.org [127.0.0.1]) (amavisd-new, port 10024) with SMTP id 09437-03; Mon, 30 Aug 2004 20:30:19 -0500 (CDT) Received: from [10.0.1.107] (lum.celabo.org [10.0.1.107]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client did not present a certificate) by gw.celabo.org (Postfix) with ESMTP id B10B35485D; Mon, 30 Aug 2004 20:30:19 -0500 (CDT) In-Reply-To: <20040830203241.V35009@xeon.unixathome.org> References: <20040830133416.X35009@xeon.unixathome.org> <7mk6vg2m15.wl@black.imgsrc.co.jp> <20040830203241.V35009@xeon.unixathome.org> Mime-Version: 1.0 (Apple Message framework v619) Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: <4C792F15-FAED-11D8-84D2-000A95BC6FAE@FreeBSD.org> Content-Transfer-Encoding: 7bit From: Jacques Vidrine Date: Mon, 30 Aug 2004 20:30:10 -0500 To: Dan Langille X-Mailer: Apple Mail (2.619) cc: freebsd-vuxml@freebsd.org Subject: Re: vuln.xml *is* XML (was Re: vuln.xml is not XML) X-BeenThere: freebsd-vuxml@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Documenting security issues in VuXML List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 Aug 2004 01:30:31 -0000 On Aug 30, 2004, at 7:34 PM, Dan Langille wrote: > With CDATA, it works, With CDATA, it breaks: some markup is incorrectly treated as text content. > without, it fails and I have to treat every

, > and

as a node, not markup. without, it works perfectly. That is exactly how it is supposed to work, Dan :-) I guess I understand now what you are trying to accomplish. If you want to stuff processed XML (i.e. an XML nodeset) into a database, you must first convert the nodeset into a character stream in the usual fashion. Cheers, -- Jacques A Vidrine / NTT/Verio nectar@celabo.org / jvidrine@verio.net / nectar@freebsd.org