From owner-freebsd-current@FreeBSD.ORG Sat Apr 24 13:49:09 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E6F3B16A4CE; Sat, 24 Apr 2004 13:49:09 -0700 (PDT) Received: from kientzle.com (h-66-166-149-50.snvacaid.covad.net [66.166.149.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8D5CE43D45; Sat, 24 Apr 2004 13:49:09 -0700 (PDT) (envelope-from kientzle@freebsd.org) Received: from freebsd.org (p54.kientzle.com [66.166.149.54]) by kientzle.com (8.12.9/8.12.9) with ESMTP id i3OKn890097906; Sat, 24 Apr 2004 13:49:08 -0700 (PDT) (envelope-from kientzle@freebsd.org) Message-ID: <408AD2C4.3030501@freebsd.org> Date: Sat, 24 Apr 2004 13:49:08 -0700 From: Tim Kientzle User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.4) Gecko/20031006 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Alfred Perlstein , current@freebsd.org, danfe@nsu.ru, richardcoleman@mindspring.com References: <200404231627.i3NGRcVA096244@repoman.freebsd.org> <20040424085913.GA78817@elvis.mu.org> In-Reply-To: <20040424085913.GA78817@elvis.mu.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Subject: Speeding up bsdtar X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 24 Apr 2004 20:49:10 -0000 Alfred Perlstein wrote: > Have you guys thought of using aio or at least another process > to parallelize IO? So far, experiments using separate processes have not been encouraging. Asynchronous I/O, mmap, or threads are all possibilities that haven't been tried yet. Alexey Dokuchaev suggested: > ... non-blocking/async IO would be faster ... Alfred Perlstein wrote: > Threads are pretty portable these days, ... I've considered all of the above, but haven't had time to actually implement them. Ultimately, it will require implementing and testing each one to see which approach works best. If someone has time to give it a try, the coding should be pretty simple. Here's an outline of what to do: * The read/extract side is much easier. Start there. ;-) * usr.bin/tar/read.c currently calls archive_read_open_file to open the file. * archive_read_open_file is just a fairly thin wrapper around archive_read_open. The basic strategy, then, is to use archive_read_open directly, providing your own open/read/close callback functions instead of using the simple canned versions that archive_read_open_file provides. So, start by copying libarchive/archive_read_open_file.c into usr.bin/tar/read.c. Rename things and make them static to avoid clashes with the functions in the library, of course. Now, try alternatives to open/read/close. Each call to the read callback has to return a pointer and size of a "block." Note that there are no restrictions on the size of that block. Among other things, you could try: * Setting up a list of block buffers and using async I/O or a separate thread to pre-fill them. * Play with block sizes * Use mmap() to return the entire file as one single block. The hard is doing all the testing. You need to test performance under a variety of different circumstances: * Reading an archive from a regular file on the same disk that you're extracting to. * Reading an archive from a regular file on a different disk. * Stdin * Reading from tape/floppy/other device. * Using no compression/gzip/bzip2 compression. Ultimately, we may need different handling for devices (many of which require using read(2) with fixed block sizes for proper operation), regular files (where many different strategies could be tried), and maybe even stdin. Tim