Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 3 Sep 2008 11:56:09 -0600 (MDT)
From:      Scott Long <scottl@samsco.org>
To:        Igor Sysoev <is@rambler-co.ru>
Cc:        Kostik Belousov <kostikbel@gmail.com>, freebsd-stable@freebsd.org, Tor Egge <tegge@freebsd.org>
Subject:   Re: vfs.ffs.rawreadahead
Message-ID:  <20080903114853.Q39726@pooker.samsco.org>
In-Reply-To: <20080903174452.GB73831@rambler-co.ru>
References:  <20080903095352.GA62541@rambler-co.ru> <20080903123955.GE2038@deviant.kiev.zoral.com.ua> <20080903124733.GH62541@rambler-co.ru> <20080903103846.T39726@pooker.samsco.org> <20080903174452.GB73831@rambler-co.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 3 Sep 2008, Igor Sysoev wrote:

> On Wed, Sep 03, 2008 at 10:44:46AM -0600, Scott Long wrote:
>
>> On Wed, 3 Sep 2008, Igor Sysoev wrote:
>>> On Wed, Sep 03, 2008 at 03:39:55PM +0300, Kostik Belousov wrote:
>>>
>>>> On Wed, Sep 03, 2008 at 01:53:52PM +0400, Igor Sysoev wrote:
>>>>> Hi,
>>>>>
>>>>> could anyone tell what does vfs.ffs.rawreadahead enable ?
>>>>> As I understand it's used in DIRECTIO code that allows read data
>>>>> directly to an userland buffer bypassing the buffer cache.
>>>>> What I can not understand where the read ahead data can be placed in ?
>>>>
>>>> The operation of the ffs_rawread is more accurately described as
>>>> bypassing the page cache. It creates the physical buffer that maps
>>>> the user pages.
>>>>
>>>> The readahead is performed only when the supplied user memory region
>>>> is bigger then blocksize. In this case, two reads are performed
>>>> simultaneously, with both buffers mapping consequent blocks from
>>>> user-supplied buffers. The read operation looks like footsteps.
>>>
>>> Nice!
>>>
>>> As I understand the size limit of one read operation is MAXPHYS, which is
>>> equal to 128K due to LBA28 ATA limit. On SCSI, SATA, and LBA48 ATA this
>>> limit
>>> can be increased. Is it safe ?
>>
>> The value of MAXPHYS is unrelated to capabilities or limitations of ATA.
>> It was chosen based on the needs to prevent an excessive amount of
>> parallel I/O from exhausting the kernel address space and system memory.
>> In fact, the concern was with SCSI, not with ATA.
>>
>> MAXPHYS can be raised, especially on 64bit platforms, but doing so also
>> bloats the sizes of a few key data structures.  I've been looking at a
>> solution for this, and I'd rather that people keep their MAXPHYS changes
>> confined to their local trees rather than changing FreeBSD unless they
>> also solve the associated side effects.
>
> As I understand MAXPHYS affects at least on pager_map size: on modern
> machines it's usually 256 * MAXPHYS = 32M, therefore increasing MAXPHYS
> will increase the map too.

This is intended and desirable.

>
> The 128K is probably good value and I do not suggest to increase it by
> default, I just want to increase MAXPHYS to improve disk throughput
> on some hosts where nginx serves large files (1G+) using DIRECTIO.

I've tested increases up to 1M, and they all are very beneficial not
only for silly sequential style benchmarks but also for clustered i/o. 
256-512k is the sweet spot, but Windows has set the standard at 1M and
I'd like to have FreeBSD follow suit eventually.

>
> BTW, is it possible to change MAXPHYS to a loader tunnable ?
>
>

No.  Struct buf is sized based on MAXPHYS, and there's no convenient way
yet to dynamically size that at runtime.

Scott




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080903114853.Q39726>