From owner-freebsd-ports@FreeBSD.ORG Fri Jul 6 16:23:26 2007 Return-Path: X-Original-To: ports@freebsd.org Delivered-To: freebsd-ports@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 063B316A46B; Fri, 6 Jul 2007 16:23:26 +0000 (UTC) (envelope-from youshi10@u.washington.edu) Received: from mxout4.cac.washington.edu (mxout4.cac.washington.edu [140.142.33.19]) by mx1.freebsd.org (Postfix) with ESMTP id D75F013C447; Fri, 6 Jul 2007 16:23:25 +0000 (UTC) (envelope-from youshi10@u.washington.edu) Received: from smtp.washington.edu (smtp.washington.edu [140.142.33.7] (may be forged)) by mxout4.cac.washington.edu (8.13.7+UW06.06/8.13.7+UW07.06) with ESMTP id l66GNPHu016050 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 6 Jul 2007 09:23:25 -0700 X-Auth-Received: from [192.168.10.45] (c-24-10-12-194.hsd1.ca.comcast.net [24.10.12.194]) (authenticated authid=youshi10) by smtp.washington.edu (8.13.7+UW06.06/8.13.7+UW07.03) with ESMTP id l66GNOGS013145 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Fri, 6 Jul 2007 09:23:25 -0700 Message-ID: <468E6C81.4060908@u.washington.edu> Date: Fri, 06 Jul 2007 09:23:29 -0700 From: Garrett Cooper User-Agent: Thunderbird 2.0.0.4 (Windows/20070604) MIME-Version: 1.0 To: Tim Kientzle References: <468C96C0.1040603@u.washington.edu> <468C9718.1050108@u.washington.edu> <468E60E9.80507@freebsd.org> In-Reply-To: <468E60E9.80507@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-PMX-Version: 5.3.2.304607, Antispam-Engine: 2.5.1.298604, Antispam-Data: 2007.7.6.90257 X-Uwash-Spam: Gauge=IIIIIII, Probability=7%, Report='__CT 0, __CTE 0, __CT_TEXT_PLAIN 0, __HAS_MSGID 0, __MIME_TEXT_ONLY 0, __MIME_VERSION 0, __SANE_MSGID 0, __USER_AGENT 0' Cc: ports@freebsd.org, hackers@freebsd.org Subject: Re: Finding slowdowns in pkg_install (continuations of previous threads) X-BeenThere: freebsd-ports@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting software to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Jul 2007 16:23:26 -0000 Tim Kientzle wrote: >>> I'm currently running a gamut of tests (500 tests, per package -- >>> 128 total on my server), and outputting all data to CSV files to >>> interpret later, using another Perl script to interpret calculated >>> averages and standard deviations. > > Excellent! Much-needed work. > >>> Using basic printf(2)'s with clock_gettime(2) I have determined >>> that the majority of the issues are disk-bound (as Tom Kientzle put >>> it). > > Next question: What are those disk operations and are any > of them redundant? > >>> The scope of my problem is not to analyze tar,... > > I've spent the last three years+ doing exactly that. > Make sure you're using the newest bsdtar/libarchive, > which has some very noticable performance improvements. > >>> but I've discovered that a lot of time is spent in reading and >>> interpreting the +CONTENTS and related files (most notably in >>> parsing commands to be honest). > > Oh? That's interesting. Is data being re-parsed (in which case > some structural changes to parse it once and store the results > may help)? Or is the parser just slow? > >>> Will post more conclusive results tomorrow once all of my results >>> are available. > > I don't follow ports@ so didn't see these "conclusive results" > of yours. I'm very interested, though. > > Tim Kientzle Some extra notes: -My tests are still running, but almost done (unfortunately I won't be able to post any results before tonight since I'm going to work now). It's taking a lot longer than I originally thought it would (I've produced several gigabytes of logfiles and csv files... eep). -I placed them around what I considered pkg_install specific sensitive areas, i.e. locations where tar was run, or the meta files were processed. -I tried implementing a small buffering technique (read in 10 lines at once, parse the 10 lines, and repeat, instead of read 1 line and parse, then repeat), around the +CONTENTS file parsing function, and the majority of the time it yielded good results (9/10 times the buffering technique won over the non-buffering technique). Given that success I'm going to try implementing the file reading in terms of fgetc(2) to properly read in a number of lines all at once, and see what happens instead (my hunch is those results may be more favorable). Thanks, -Garrett