From owner-freebsd-hackers@FreeBSD.ORG Thu Mar 7 03:25:02 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 608CF763 for ; Thu, 7 Mar 2013 03:25:02 +0000 (UTC) (envelope-from nonesuch@longcount.org) Received: from mail-qe0-f49.google.com (mail-qe0-f49.google.com [209.85.128.49]) by mx1.freebsd.org (Postfix) with ESMTP id DD63CB6C for ; Thu, 7 Mar 2013 03:25:01 +0000 (UTC) Received: by mail-qe0-f49.google.com with SMTP id 1so7132qec.36 for ; Wed, 06 Mar 2013 19:25:00 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:subject:references:from:content-type:x-mailer :in-reply-to:message-id:date:to:content-transfer-encoding :mime-version:x-gm-message-state; bh=6p30LIlL+zYj6M2F5ma9yC9vVoTZK3NIGS1hfIvpTq8=; b=iyrI/cmHrkqONQAtYY5SRqKXR1kxCdSM6+IKH1wBjApv8rW35xecfyUONeuoGec4SL /ufBC8TKfYmZykXuKKaOHrWPqNiZ1cTxCNOMyUUhdmUsXZKWkqkJqthRN2bIaS/WXeDk D86NMtwf/xX3E20VEllbNh3B2e4Oz8DYI/OoNbvsNT7HUXZMX20t7pUNCInKulxF3IwO rwqij8g8C+iFLI7OQPNLf7UhAhijgA/rCHEF+S0eMCZWD60iro9ThTSbJgrYbqVOFAh4 /Z9rTRvafm8kXoDjM30zuq/G3xE4AanSI3bNIb6crouA9kyY5kVBBALqpvG5ElC4rRa1 DAvw== X-Received: by 10.224.108.4 with SMTP id d4mr27624501qap.83.1362626700810; Wed, 06 Mar 2013 19:25:00 -0800 (PST) Received: from [97.249.201.102] (102.sub-97-249-201.myvzw.com. [97.249.201.102]) by mx.google.com with ESMTPS id gw9sm681030qab.10.2013.03.06.19.24.55 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 06 Mar 2013 19:24:59 -0800 (PST) Subject: Re: gjournal +UFS - anyone actually use it? References: <15BA8CB5-A10A-4FA9-AC5F-1BA2B7720E5E@longcount.org> From: Mark Saad X-Mailer: iPhone Mail (10B146) In-Reply-To: Message-Id: <44621188-C3CF-44EC-BC94-CF85530C9423@longcount.org> Date: Wed, 6 Mar 2013 22:24:48 -0500 To: "freebsd-hackers@freebsd.org" Mime-Version: 1.0 (1.0) X-Gm-Message-State: ALoCoQmP0l3ax3sH8cYu08ADnq8KNYH2+LQyVz3+Dz7VYnXp4D/V1x0/JbVlx2/NYbSA5El07De0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2013 03:25:02 -0000 On Mar 6, 2013, at 5:55 AM, Wojciech Puchar = wrote: >> I was using it to store large MySQL myisam tables , speed was acceptable a= t the time . I never had any fs corruption and it worked as expected . >>=20 >> At the time I set it up I remember there was some chatter about how slow g= journal was compared to ufs with softupdate . >=20 > did some tests yesterday on 25GB partition. >=20 > i simulated journal on SSD using 2GB malloc backed ramdisk (md0). UFS part= ition mounted async as gjournal recommends. >=20 > test: unpacking kernel sources to multiple directories until disk gets ful= l >=20 > simulates write heavy I/O. >=20 > gjournal+SATA drive (seagate constellation ES 500GB) with write cache disa= bled - 14 minutes 20 seconds. >=20 > write cache enabled - 14 minutes 5 seconds (nearly no difference). >=20 > UFS+journalled softupdates, no gjournal, disk write cache on - 26 minutes 4= 4 seconds. disk write cache off - was too lazy to wait after an hour. >=20 > With gjournal it is not only faster, but it doesn't make other I/O activit= y crawling. >=20 Interesting I will have to try this; can you post the exact test steps . Als= o what type of controller were you using and what kernel / version . >=20 >>=20 >> Fast forward to today I almost always use ufs with softupdate journal , n= ew in FreeBSD 9.0 and available as a patch to 6.x, 7.x , and 8.x >=20 > The problem is as follows: >=20 > SU+J makes sure that metadata will get consistent. NOT DATA. And this is q= uite a mess if you get UPS failure under high load. >=20 > gjournal does journal everything. >=20 Not exactly, ufs mounted with default options insures data is written sync a= nd metadata asynchronous . Standard Softupdate (no journal) improves upon th= is by limiting what ops need to write to the disk. It had some short falls f= or edge case operations; which softupdate journal resolved by journaling the= metadata ops that were not protected / covered by standard softupdate . =20= See=20 http://jeffr-tech.livejournal.com/24357.html >> This is better supported now , as more people use it in new 9.x builds .= >>=20 >>> i think about journal on SSD. >>=20 >> I believe this is only and option in geom journal , >=20 > SSDs are not expensive today. i can get 128GB SSD and create 20GB journal j= ust to limit wear. and possibly use the rest of SSD to store read-intensive d= ata. >=20 I wonder if how trim / no trim effects the journal wear .=20 > the way gjournal writes to journal device (sequential) is very friendly fo= r SSD too. >=20 > Small SLC-based SSD would be best but i don't see anything on the market w= ith acceptable price for now. >=20 >> I am not sure if you can relocate a suj journal to an alternate disk. > no you can't. But still you will end with consistent metadata but not cons= istent data. I recovered it but still it took a time and lots of checking. >=20 >=20 >=20 >=20 > gjournal doesn't seem to be elegant in case of journal failure (i simulate= d it with forced removal of ramdisk with mdconfig). >=20 > TONS of messages in logs, but still - no data loss, just you have to shutd= own system, boot from pendrive, remove journal, fsck (just for sure), and th= en add journal again I would be careful of using the md for the journal . Something makes me thin= k it will play nicer when you remove that then a real failure . Try a USB s= tick for the journal; and pull it out while running your test case. That to= me seams evil enough to break this setup .=20 Let me know what happens . Also when testing su+j I ran the following test case . Extract ports via por= tsnap extract , build world with -j4 . Let the box warm up the yank the powe= r and then boot the box back up and see what happens .=20 -- Mark saad | mark.saad@longcount.org =20=