Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 16 Dec 2007 03:42:59 +0100
From:      Bernd Walter <ticso@cicely12.cicely.de>
To:        Ivan Voras <ivoras@freebsd.org>
Cc:        freebsd-current@freebsd.org
Subject:   Re: ZFS melting under postgres...
Message-ID:  <20071216024259.GI48684@cicely12.cicely.de>
In-Reply-To: <fk1j0l$o4l$1@ger.gmane.org>
References:  <06CAC7FC-DB58-441D-A6E0-76D1D8133393@tamu.edu> <86ir31xwlu.fsf@ds4.des.no> <ADCCD5E6-A792-49B9-A346-753176C12F2E@tamu.edu> <fjuljp$cvb$1@ger.gmane.org> <476343B4.8080208@FreeBSD.org> <fk09p8$b16$1@ger.gmane.org> <86tzmk54tt.fsf@ds4.des.no> <fk0ue7$bp$1@ger.gmane.org> <476419CD.9070401@terranova.net> <fk1j0l$o4l$1@ger.gmane.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Dec 15, 2007 at 11:04:04PM +0100, Ivan Voras wrote:
> Travis Mikalson wrote:
> 
> > If you're using compact flash for something that's constantly updated
> > like a ZIL, wouldn't your CF card die real quick?
> 
> Probably, for constant updates to the same areas. But as you say:

CF and the flash based SSD drives rotate the flash cells anyway, so it
doesn't matter that much if you write the same block or not.
I wouldn't worry about wearing out those devices, since todays media
survive many writes.

> > Since a ZIL is not really seek-intensive, why not just offload it to its
> > own standard hard disk that has its write caching and all other similar
> > data-corrupting technologies disabled?
> 
> Yes. I don't see a point writing a log that's mostly sequantially
> accessed on a SSD, and which probably wears the same areas on the drive.
> I'm more interested in loads like databases.

I wouldn't do both with them unless required for a specific reason.
The problem is how they work.
They contain NAND flash chips which have two data areas containing
data blocks of typically slightly more than 4 or 8kB these days.
One area is 100% error free with high write rate, but small and the
other is of much less quality, but large.
Devices use the later for the offered data blocks and the good cells
for maintening allocation of them.
One problem is with the data blocks beeing that big, when writing
512 Byte you effectifly do a read-modify-write of a larger physical
block.
This can be handled quite well with larger FS block.
The much bigger problem is with power loss when writing such a
maintenence block.
You loose a very large area of logical blocks when this fails,
since a 4k maintenence block contains the allocation for several hundert
kB of logical data blocks.
In other words - you possibly loose data blocks that were not written
a long time and the database wouldn't expect a problem with that data.
Even for ZIL it is very questionable if you loose a large data area,
since the purpose is to have the data that was already sinced readable
after a power loss.
I'm not sure what happens in case of a device reset in the wrong moment,
possibly this depends on the specific media, but I wouldn't be surprised
to see read errors after a reset without power loss as well.
This is true with all NAND based flash media, SD, MMC, SM, CF, ...
There are medias which are less critical because of the way they utulize
the maintenance blocks, but those things are usually a secret to the
vendor.
I do run PostgreSQL on SD media with ARM based FreeBSD systems, but
I'm prepared to loose the whole database and to recover it from backup
if things go wrong.

-- 
B.Walter                http://www.bwct.de      http://www.fizon.de
bernd@bwct.de           info@bwct.de            support@fizon.de



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071216024259.GI48684>