FreeBSD Mail Archives

Date:      Mon, 14 May 2007 23:34:52 -0700
From:      Bert JW Regeer <xistence@0x58.com>
To:        Garrett Cooper <youshi10@u.washington.edu>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: New FreeBSD package system (a.k.a. Daemon Package System (dps))
Message-ID:  <96C6AAEA-B70A-400F-8614-8DFDE5930D19@0x58.com>
In-Reply-To: <46493F0A.9050303@u.washington.edu>
References:  <200705102105.27271.blackdragon@highveldmail.co.za>	<f20c8u$htp$1@sea.gmane.org>	<20070512155059.92011d54.stas@FreeBSD.org>	<4645AFAF.7010704@free.fr> <8916C4D5-4DB5-49C0-AF8D-07F9FFA0A6E0@0x58.com> <46493F0A.9050303@u.washington.edu>


--Apple-Mail-1--959769673
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=ISO-8859-1;
	delsp=yes;
	format=flowed


On May 14, 2007, at 10:03 PM, Garrett Cooper wrote:

> Bert JW Regeer wrote:
>> On May 12, 2007, at 5:14 AM, Philippe Laquet wrote:
>>> Stanislav Sedov a =E9crit :
>>>> On Fri, 11 May 2007 02:10:05 +0200
>>>> Ivan Voras <ivoras@fer.hr> mentioned:
>>>>
>>>>
>>>>> - I think it's time to give up on using BDB+directory tree full =20=

>>>>> of text
>>>>> files for storing the installed packages database, and I =20
>>>>> propose all of
>>>>> this be replaced by a single SQLite database. SQLite is public =20
>>>>> domain
>>>>> (can be slurped into base system), embeddable, stores all data =20
>>>>> in a
>>>>> single file, lightweight, fast, and can be used to do fancy =20
>>>>> things such
>>>>> as reporting.
>>>>>
>>>>
>>>> What is the reason to use SQL-based database? You'll perform direct
>>>> queries to database? The packaging system is for ordinal users, =20
>>>> not sql
>>>> geeks, so they should not have to use sql for managing packages. =20=

>>>> So a
>>>> simple set of hashes will suffer or needs. I agree with Julian =20
>>>> that we
>>>> should have a backup of packaging database in plain text format, =20=

>>>> and
>>>> utility to rebuild it. This way we can always restore the =20
>>>> database if
>>>> something goes wrong. Furhtermore, that should not make a great =20
>>>> impact
>>>> on performance, since we don't have to rebuild it every day.
>>>>
>>> I agree with Stan ;)
>>>
>>> "fast and improved" package utilities uses mainly some indexed =20
>>> berkeley DB combined with flat files, aren't they? I, and may be =20
>>> many other FreeBSD users use light systems for efficiency and =20
>>> easier management, if we use some database system it will require =20=

>>> Disk Space, resources for the DB to run, dependencies and so =20
>>> on... And we also may be exposed to a "that DB is better" war ;)
>>>
>> SQLite is compiled inside a program, and as such does not require =20
>> any resources other than one file handle and some CPU time when =20
>> querying. The file is stored on disk, and requires no separate =20
>> process to be running to query. Maybe I misunderstood what you =20
>> were trying to say. SQLite will require less resources than flat =20
>> text files, since SQLite is a one time open then process, instead =20
>> of what is currently happening, having to open and close hundreds =20
>> of files depending on how many ports are installed. With this =20
>> regard, SQLite is like BDB. Where SQLite uses standards compliant =20
>> SQL statements to get data.
>
> Correct. =46rom what I was reading shared memory read access and =20
> locking are two available features of BDB databases.
>
> The only thing is that I do agree that there should be a dumping =20
> and importing mechanism of some kind for semi-formatted text files, =20=

> for backup, debugging, and modification purposes. That's just my =20
> personal idea on the topic though :).
>
>>>> --=20
>>>> Stanislav Sedov
>>>> ST4096-RIPE
>>>>
>>>
>> I am able to understand many of the gripes with using a databases, =20=

>> and have to import yet another code base into the FreeBSD base, =20
>> however as one of the young ones, and knowing sed/awk/grep and =20
>> SQL, I prefer SQL over having to process hundreds of text files =20
>> using text processing tools. It saddens me each time I run one of =20
>> the pkg_* tools that needs to parse the flat file structure since =20
>> it takes so long. I have friends running Ubuntu and their apt-get =20
>> returns results much faster.
>> In a world where hard drives are becoming more reliable, and are =20
>> automatically relocating sectors that go bad, do we really have to =20=

>> worry about database corruption as much? I feel that many of the =20
>> fears that are being put forward will do harm to a text based =20
>> "storage" system as well. If one block drops out, it can cause =20
>> tools to not be able to parse the files. Create a backup copy of =20
>> the database after each successful transaction? There are ways to =20
>> battle data corruption.
>
> True. I was thinking of backup, and recreation from scratch, =20
> considering that the database wouldn't be more than a few megs. In =20
> place replacement just seems like a hairy situation sometimes..
>
>> Using BDB is not an real option either. I can not even count the =20
>> amount of times that the BDB database that portupgrade created has =20=

>> become corrupt because I accidently ran two portupgrades at the =20
>> same time, or even remembered that I did not want to upgrade =20
>> something and hit Ctrl+C.
>
> I'm sorry but nothing's completely solid in that respect, AFAIK. In =20=

> terms of the first problem you mentioned, Wade is working on the =20
> locking <http://wiki.freebsd.org/WadeWesolowsky>.
>
> In terms of transactions, maybe we should take a look at Subversion =20=

> for inspiration: <http://svn.haxx.se/dev/=20
> archive-2005-03/0301.shtml>. I'm a firm believer that it's easier =20
> to incorporate code than it is to remove it.

I am unable to see any references to transaction support for BDB =20
databases, maybe I am missing something. Subversion in that thread is =20=

suggesting SQL for a totally different reason. fsfs is what most =20
people are using as a subversion backend to help avoid BDB =20
corruption. =46rom the many people I have talked to that used to use =20
Subversion with BDB have had major issues, whereas fsfs has not had =20
any issues at all.

Just what I have experienced myself as a Subversion repository =20
administrator.

>
>> The experience I got from running SVN with BDB as the back-end =20
>> database to store my data, I say no thanks. In that case I would =20
>> much rather stick with the flat text files than go with a database.
>
> Well, a few comments:
>
> -Text files are bloated. Although many people are for XML, it takes =20=

> much longer to parse than binary databases.

/var/db/pkg/ are all plain flat text files. I am not a supporter of =20
XML at all.

> -Custom text files require custom format capable parsers, no matter =20=

> what the format, and the less coverage a parser has, the more =20
> probable the likelihood of bugs IMO.

We already have these in the pkg_* functions, so i'd hope they are =20
fairly solid!

> -In the event that features changed or were added, some required =20
> modifications to the parser could be trivial to major. With =20
> databases you can get away from that mentality to some degree IMHO.

Changing an SQL query versus re-writing a parser for text files is a =20
huge difference.
>
> -Garrett

I am not opposed to text files, other than that they can be slow. I =20
am against BDB because over the years, in my experience they have =20
shown to be extremely unreliable and easily corrupted. If we are =20
going to be making changes to the way the ports/packages store the =20
information about what exists, it should be done in such a way that =20
it is scalable and at the same time extensible (is this a word?).

Bert JW Regeer


--Apple-Mail-1--959769673--

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?96C6AAEA-B70A-400F-8614-8DFDE5930D19>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation