From owner-freebsd-hackers@FreeBSD.ORG Fri Jul 13 16:02:27 2007 Return-Path: X-Original-To: hackers@freebsd.org Delivered-To: freebsd-hackers@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 20C4D16A404; Fri, 13 Jul 2007 16:02:27 +0000 (UTC) (envelope-from youshi10@u.washington.edu) Received: from mxout7.cac.washington.edu (mxout7.cac.washington.edu [140.142.32.178]) by mx1.freebsd.org (Postfix) with ESMTP id ED9EA13C494; Fri, 13 Jul 2007 16:02:26 +0000 (UTC) (envelope-from youshi10@u.washington.edu) Received: from smtp.washington.edu (smtp.washington.edu [140.142.32.139]) by mxout7.cac.washington.edu (8.13.7+UW06.06/8.13.7+UW07.06) with ESMTP id l6DG2Q96004598 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 13 Jul 2007 09:02:26 -0700 X-Auth-Received: from [192.168.10.45] (c-24-10-12-194.hsd1.ca.comcast.net [24.10.12.194]) (authenticated authid=youshi10) by smtp.washington.edu (8.13.7+UW06.06/8.13.7+UW07.03) with ESMTP id l6DG2Pb6007177 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Fri, 13 Jul 2007 09:02:25 -0700 Message-ID: <4697A210.2020301@u.washington.edu> Date: Fri, 13 Jul 2007 09:02:24 -0700 From: Garrett Cooper User-Agent: Thunderbird 2.0.0.4 (Windows/20070604) MIME-Version: 1.0 To: Tim Kientzle References: <468C96C0.1040603@u.washington.edu> <468C9718.1050108@u.washington.edu> <468E60E9.80507@freebsd.org> <468E6C81.4060908@u.washington.edu> <468E7192.8030105@freebsd.org> <4696C0D2.6010809@u.washington.edu> In-Reply-To: <4696C0D2.6010809@u.washington.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-PMX-Version: 5.3.2.304607, Antispam-Engine: 2.5.1.298604, Antispam-Data: 2007.7.13.84333 X-Uwash-Spam: Gauge=IIIIIII, Probability=7%, Report='__CP_URI_IN_BODY 0, __CT 0, __CTE 0, __CT_TEXT_PLAIN 0, __HAS_MSGID 0, __MIME_TEXT_ONLY 0, __MIME_VERSION 0, __SANE_MSGID 0, __STOCK_PHRASE_7 0, __USER_AGENT 0' Cc: ports@freebsd.org, hackers@freebsd.org, krion@freebsd.org Subject: Re: Finding slowdowns in pkg_install (continuations of previous threads) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Jul 2007 16:02:27 -0000 Garrett Cooper wrote: > Tim Kientzle wrote: >>> -I tried ... buffering ... the +CONTENTS file parsing function, >>> and the >>> majority of the time it yielded good results .... >> >> One approach I prototyped sometime back was to use >> libarchive in pkg_add as follows: >> * Open the archive >> * Read +CONTENTS directly into memory (it's >> guaranteed to always be first in the archive) >> * Parse all of +CONTENTS at once >> * Continue scanning the archive, disposing >> of each file as it appears in the archive. >> >> Based on my experience with this, I would >> suggest you just read all of +CONTENTS >> directly into memory at once and parse >> the whole thing in a single shot. >> fopen(), then fstat() to get the size, >> then allocate a buffer and read the whole >> thing, then fclose(). You can then >> parse it all at once. >> >> As a bonus, your parser then becomes a nice >> little bit of reusable code that reads >> a block of memory and returns a structure describing >> the package metadata. >> >> Tim Kientzle > I'm not 100% sure because I'm not comparing apples (virtual disk on > desktop via VMware) to apples (real disk on server), but I'm showing a > 2.5-fold speedup after adding the simple parser: > > Virtual disk: > 4.42 real 1.37 user 1.47 sys > > Real disk: > 10.26 real 5.36 user 0.99 sys > > I'll run a battery of tests just to ensure whether or not that's the > case. > > Be back with results in a few more days. > > -Garrett Hello, As promised, here are some results for my work: By modifying the parser and heuristics in plist_cmd I appear to have decreased all figures (except plist_cmd, which I will note later) from their original values to much lower values. The only drawback is that I appear to have stimulated a bug with either malloc'ing memory, printf/vargs, or transferring large amounts of data via pipes where some of my debug messages are making it into plist_cmd(..) from obtainbymatch(..), which represents the the 3-fold increase in reported plist_cmd(..) iterations. I'm going to try replacing the debug commands with standard print statements wherever possible, then replace all tar commands with libarchive APIs, and see if the problem solves itself. Notes: 1. This sample is based off x11-libs/atk. 2. It isn't the final set of results. 3. Graphs coming soon (need to simulate values in Excel on work machine and convert to screenshots later on when I have a break -- thinking around noon). I'll repost when I have them available. 4. CSV files available at: http://students.washington.edu/youshi10/posted/atk-results.tgz.