Date: Thu, 10 Aug 2006 19:49:21 -0700 From: Paul Allen <nospam@ugcs.caltech.edu> To: Pawel Jakub Dawidek <pjd@freebsd.org> Cc: freebsd-fs@freebsd.org, Craig Boston <craig@xfoil.gank.org>, freebsd-geom@freebsd.org, freebsd-arch@freebsd.org Subject: Re: GJournal (hopefully) final patches. Message-ID: <20060811024921.GF308@mark.ugcs.caltech.edu> In-Reply-To: <20060810192841.GA1345@garage.freebsd.pl> References: <20060808195202.GA1564@garage.freebsd.pl> <20060810184702.GA8567@nowhere> <20060810192841.GA1345@garage.freebsd.pl>
next in thread | previous in thread | raw e-mail | index | archive | help
It's a bit disturbing that a geom-class quite far away from the storage drivers presumes that the proper action here is a cache flush. The underlying hardware may support tagged command queuing (i.e., SCSIs ability to receive not only transaction completion notications but also to permit partial-orderings to be dictated to the controller) or native-command queuing (command completion). It's true that this functionality may not always work as advertised but that's a problem to be solved with dev. sysctls, not by taking a LCD approach in a high-level geom class. This really needs broader architecture consideration, not just what it takes it make it work. Paul >From Pawel Jakub Dawidek <pjd@freebsd.org>, Thu, Aug 10, 2006 at 09:28:41PM +0200: > On Thu, Aug 10, 2006 at 01:47:23PM -0500, Craig Boston wrote: > > Hi, > > > > It's great to see this project so close to completion! I'm trying it > > out on a couple machines to see how it goes. > > > > A few comments and questions: > > > > * It took me a little by surprise that it carves 1G out of the device > > for the journal. Depending on the size of the device that can be a > > pretty hefty price to pay (and I didn't see any mention of it in the > > setup notes). For a couple of my smaller filesystems I reduced it to > > 512MB. Perhaps some algorithm for auto-sizing the journal based on > > the size / expected workload of the device would be in order? > > It will be pointed out in documentation when I finally prepare it. > I don't have plans about autosizing currently. > > > * Attached is a quick patch for geom_eli to allow it to pass BIO_FLUSH > > down to its backing device. It seems like the right thing to do and > > fixes the "BIO_FLUSH not supported" warning on my laptop that uses a > > geli encrypted disk. > > I've this already in my perforce tree. I also implemented BIO_FLUSH > passing in gmirror and graid3. > > I also added a flag for gmirror and graid3 which says "don't > resynchronize components after a power failure - trust they are > consistent". And they are always consistent when placed below gjournal. > > > * On a different system, however, it complains about it even on a raw > > ATA slice: > > > > atapci1: <Intel ICH4 UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 31.1 on pci0 > > ata0: <ATA channel 0> on atapci1 > > ad0: 114473MB <WDC WD1200JB-00CRA1 17.07W17> at ata0-master UDMA100 > > GEOM_JOURNAL: BIO_FLUSH not supported by ad0s1e. > > > > It seems like a reasonably modern controller and disk, at least it > > should be capable of issuing a cache flush command. Not sure why it > > doesn't like it :/ > > We would need to add some printfs to diagnoze this probably - you can > try adding some lines to ad_init() to get this: > > if (atadev->param.support.command1 & ATA_SUPPORT_WRITECACHE) { > if (ata_wc) > ata_controlcmd(dev, ATA_SETFEATURES, ATA_SF_ENAB_WCACHE, 0, 0); > else > ata_controlcmd(dev, ATA_SETFEATURES, ATA_SF_DIS_WCACHE, 0, 0); > } else { > printf("ad_init: WRITE CACHE not supported by ad%d.\n", > device_get_unit(dev)); > } > > > * How "close" does the filesystem need to be to the gjournal device in > > order for the UFS hooks to work? Directly on it? > > > > The geom stack on my laptop currently looks something like this: > > > > [geom_disk] ad0 <- [geom_eli] ad0.eli <- [geom_gpt] ad0.elip6 <- > > [geom_label] gjtest <- [geom_journal] gjtest.journal <- UFS > > > > I was wondering if an arrangement like this would work: > > > > [geom_journal] ad0p6.journal <- [geom_eli] ad0p6.journaleli <- UFS > > > > and if it would be any more efficient (journal the encrypted data > > rather than encrypt the journal). Or even gjournal the whole disk at > > once? > > When you mount file system it sends BIO_GETATTR "GJOURNAL::provider" > requests. So as long as classes between the file system and gjournal > provider pass BIO_GETATTR down, it will work. > > On my home machine I've the following configuration: > > raid3/DATA1.elid.journal > > So it's UFS over gjournal over bsdlabel over geli over raid3 over ata. > > I prefer to put gjournal on the top, because it gives consistency to > layers below it. For example I can use geli with bigger sector size > (sector size greater than disk sector size in encryption-only-mode can > be unreliable on power failures, which is not the case when gjournal is > above geli), I can turn off synchronization of gmirror/graid3 after a > power failure, etc. > > On the other hand configuring geli on top of gjournal can be more > effective for large files - geli will not encrypt the data twice. > > Fortunatelly with GEOM you can freely mix your puzzles. > > > Haven't been brave enough to try gjournal on root yet, but my /usr and > > /compile (src, obj, ports) partitions are already on it so I'm sure I'll > > try it soon ;) > > Markus Trippelsdorf reported that it doesn't work out of the box, but he > manage to make it to work with some small changes to fsck_ffs(8). > > -- > Pawel Jakub Dawidek http://www.wheel.pl > pjd@FreeBSD.org http://www.FreeBSD.org > FreeBSD committer Am I Evil? Yes, I Am!
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060811024921.GF308>