Date: Tue, 24 Aug 2004 17:38:49 -0400 From: "Dan Langille" <dan@langille.org> To: "Jacques A. Vidrine" <nectar@FreeBSD.org> Cc: freebsd-vuxml@freebsd.org Subject: Re: database tables for VuXML Message-ID: <412B7D29.31547.1472F80@localhost> In-Reply-To: <20040822192915.GA17478@madman.celabo.org> References: <41279E59.13631.76AD2AB9@localhost>
next in thread | previous in thread | raw e-mail | index | archive | help
On 22 Aug 2004 at 14:29, Jacques A. Vidrine wrote: > [Added freebsd-vuxml@] > > On Sat, Aug 21, 2004 at 07:11:21PM -0400, Dan Langille wrote: > > Hi folks, > > > > I'm in the early stage of incorporating VuXML information into > > FreshPorts. > > Alright! That'll be cool. > > > I have a preliminary database table design which appears > > below. If you think I've missed any essential information out of > > this design, please let me know. > > > > First: has anyone already worked on creating database tables for the > > VuXML data? > > I have an old prototype application which uses SQLite. FWIW, here is > the schema I used then. (the formatting is a little funny because this > was embedded in some other code) > > CREATE TABLE vuxml ( > vid VARCHAR PRIMARY KEY UNIQUE > , topic VARCHAR > , description VARCHAR > , discovery DATE > , entry DATE > , modified DATE > -- computed max(entry, modified) for convenience in > -- other expressions > , last DATE > ); This is pretty much identical to what I have. > > CREATE TABLE refs ( > vid VARCHAR > , type VARCHAR > , text VARCHAR > ); Ahh, yes, that repeating group. I'm not storing that. I should be > CREATE TABLE affected ( > vid VARCHAR > , type VARCHAR > ); What is this type field? I didn't follow what the affected table contains. > CREATE TABLE names ( > affected INTEGER > , name VARCHAR > ); >From your example below, the names table would have entries for mutt and ja-mutt. I think I'll do that as well. See the end of my message for how I plan to relate VuXML entries to specific ports. > CREATE TABLE ranges ( > affected INTEGER > , lowop VARCHAR > , low VARCHAR > , glow0 INTEGER > , glow1 INTEGER > , glow2 INTEGER > , glow3 INTEGER > , highop VARCHAR > , high VARCHAR > , ghigh0 INTEGER > , ghigh1 INTEGER > , ghigh2 INTEGER > , ghigh3 INTEGER > ); > > The `ranges' table perhaps looks erm interesting. In this case, > I encoded each package version number into four 32-bit integers. > This allowed one to lookup an affected package completely in SQL. > However, the encoding I used may now be incorrect in some ways, since > pkg_install et al were recently changed to treat version numbers > differently. FreshPorts stores PORTVERSION and PORTREVISION as separate text fields. I take it that the ranges equations should compare only PORTVERSION and ignore PORTREVISION. > Also, SQLite could not really optimize the complex SQL > statement that resulted. So I think it is interesting for further > study, but I doubt I'd do it that way again unless I had a very > particular need to use pure SQL or the database was huge. SQLite is a great little database. I like what I've seen of it (mostly via my work on Bacula). I'll be using PostgreSQL for this. > > > create table vuxml > > ( > > id serial not null, > > -- internal FreshPorts ID > > vid text not null, > > -- the vuln vid. e.g. c4b025bb-f05d-11d8-9837-000c41e2cdad > > topic text not null, > > description text not null, > > date_discovery date , > > date_entry date , > > date_modified date , > > status date not null, > > primary key (id) > > ); > > > > create table vuxml_port_xref > > ( > > id serial not null, > > vuxml_id integer not null, > > port_id integer not null, > > primary key (id) > > ); > > > > create table vuxml_port_xref_range > > ( > > id serial not null, > > vuxml_port_xref_id text , > > range_operator_start text , > > range_operator_end text , > > range_version_start text , > > range_version_end text , > > primary key (id) > > ); > > > > If you consider the output from: vxquery -t vuxml ~/VuXML/vuln.xml > > tnftpd, the entry in vuxml_port_xref_range for tnftpd might be: > > > > (1, 'lt', 20040810, NULL, NULL) > > Seem reasonable. For what its worth, storing the range operators as > `>', `=>', etc lets you get something human readable as the output of a > SELECT, e.g. > > SELECT range_version_start, range_operator_start, port_name, > range_operator_end, range_version_end FROM ... WHERE ...; > > ('20040810', '<', 'tnftpd', '', '') > > I found that handy for manual inspection of the database. Whatever I store, need to be able to run queries upon it. I was thinking of keeping the existing values as that might be easier with perl. Not sure yetp. > You may notice from my example that I used separate `name', `ranges', > and `affected' tables. I found this easier when filling the database, > since the <package>/<system> elements are kind of a cross-product > operator. For example, when processing something like > > <package> > <name>mutt</name> > <name>ja-mutt</name> > <range><ge>1.4</ge><lt>1.4.2</lt></range> > </package> This is what I do with my vuxml_port_xref table. That table relates a vid to a port. > I first created an `affected' entry to contain the other references. > Then, as I hit each `name' I just associated it with the `affected' > entry, and as I hit each `range' I did the same. (Otherwise one > needs to construct a few lists, and then make all entries once > the </package> end tag is seen.) I think this is also easier to > modify if we get new children of <package>, e.g. the <category> and > <architecture> tags that are currently in the VuXML pre-1.2 DTD. My plan is to populate empty the vuxml_* tables each time there is a commit to the data file. This keeps the vuxml system totally separate from FreshPorts. Each row under Commit History (e.g. http://beta.freshports.org/sysutils/bacula/) relates to a row from the commit_log_ports table. That looks something like this: create table commit_log_ports ( commit_log_id integer not null, port_id integer not null, needs_refresh smallint not null, port_version text , port_revision text , primary key (commit_log_id, port_id) ); I'll probably create another table commit_log_ports_vuxml: create table commit_log_ports ( commit_log_id integer not null, port_id integer not null, vuxml_id integer not null , primary key (commit_log_id, port_id) ); Looking at the current data, there's about 140 affected ports, but I haven't broken that with respect to ranges, which is what the above will do. I'll either do that, or add the vuxml_id column to the commit_log_ports table, but I'd rather keep it separate. Time will tell. -- Dan Langille : http://www.langille.org/
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?412B7D29.31547.1472F80>