From owner-freebsd-current@FreeBSD.ORG Thu Jan 19 05:25:25 2006 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4DCAA16A41F for ; Thu, 19 Jan 2006 05:25:25 +0000 (GMT) (envelope-from sobomax@portaone.com) Received: from bugor.portaone.com (bugor.portaone.com [65.61.200.232]) by mx1.FreeBSD.org (Postfix) with ESMTP id EC72E43D46 for ; Thu, 19 Jan 2006 05:25:24 +0000 (GMT) (envelope-from sobomax@portaone.com) Received: from [192.168.1.2] (S0106000f3d63befd.vs.shawcable.net [70.71.19.119]) (authenticated bits=0) by bugor.portaone.com (8.13.4/8.13.4) with ESMTP id k0J5PJV9047171 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 18 Jan 2006 21:25:20 -0800 (PST) (envelope-from sobomax@portaone.com) Message-ID: <43CF22AD.3080702@portaone.com> Date: Wed, 18 Jan 2006 21:25:01 -0800 From: Maxim Sobolev Organization: Porta Software Ltd User-Agent: Thunderbird 1.5 (Windows/20051201) MIME-Version: 1.0 To: "Wojciech A. Koszek" References: <20060114223019.GA99634@FreeBSD.czest.pl> <43CC168D.9080708@portaone.com> <20060118082804.GC4846@FreeBSD.czest.pl> In-Reply-To: <20060118082804.GC4846@FreeBSD.czest.pl> Content-Type: text/plain; charset=ISO-8859-2; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.88/1245/Wed Jan 18 08:57:44 2006 on bugor.portaone.com X-Virus-Status: Clean X-Spam-Status: No, score=-0.2 required=5.0 tests=AWL,BAYES_00,SPF_SOFTFAIL autolearn=no version=3.1.0 X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on bugor.portaone.com Cc: freebsd-current@freebsd.org Subject: Re: [PATCH] Support for large number of md(4) disks X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Maxim.Sobolev@portaone.com List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Jan 2006 05:25:25 -0000 Wojciech A. Koszek wrote: > On Mon, Jan 16, 2006 at 01:56:29PM -0800, Maxim Sobolev wrote: >> Hi, > > Hi Maxim, > >> IMHO there is better approach to fetch unknown amount of data from the >> kernel using ioctl(2) facility. The main idea is that you allocate some >> buffer of size sufficient in 95% of cases (for md(4) I think 8-16 >> entries are enough), attach it to some structure which has size of the >> buffer as one of its members and send pointer to that structure as an >> argument to ioctl(2). >> >> Upon receiving this structure the kernel compares size of the buffer >> with amount of information that it needs to send back. If buffer size is >> sufficient to hold this information it copies it out and returns number >> of entries in the buffer as one of members of this structure. > > I don't like using array member for holding additional data. We have > something similar right now with md_pad[0]. I wanted to prevent us from > doing it once again. To do it right, we'd have to add yet another > structure describing size of list with pointer to list of disks and the > other one for describing separate disks.. but [1] > >> If the buffer size is insufficient, the kernel fills in desired size of >> the buffer in structure members and returns some error code indicating >> that the provided buffer is insufficient. Upon receiving this error >> userland increases the buffer size to the size suggested by the kernel >> (perhaps adding some extra space) and repeats the ioctl(2) calls. >> > > I belive both methods are acceptable since we always end up with > sysctl(3)-like problem. Solution you've described will give us one > ioctl() call in possitive case, but are there any others advantages? Yes, there is a difference. I don't like your approach when you are trying to win the race fixed amount of times (5) and then just bailing out, asymptotic approach is better IMHO. Especially considering that memory is cheap nowadays and you won't have any problems with allocating space for many thousand configuration entries, even in the case when you are really going to use only few of them. Regarding you assumption that meeting the situation when total number of devices changes quickly I don't quite agree. A simple script can make number of md(4) devices going up/down by few hundred per second easily, your approach will behave erratically in such case. -Maxim > [1] cases in which total device number will change are as probable as > using more than 100 md(4) disks ;-) This is why I decided to use simple > request for a size and to do a request for md(4) list. >