From owner-freebsd-fs@FreeBSD.ORG  Fri May 23 07:39:21 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 6474E2CB;
 Fri, 23 May 2014 07:39:21 +0000 (UTC)
Received: from mail-yh0-x236.google.com (mail-yh0-x236.google.com
 [IPv6:2607:f8b0:4002:c01::236])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 190A62C8F;
 Fri, 23 May 2014 07:39:21 +0000 (UTC)
Received: by mail-yh0-f54.google.com with SMTP id i57so3914876yha.41
 for <multiple recipients>; Fri, 23 May 2014 00:39:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=xnHz7qI6gafHAZXxxlz18AwOY7gpjqE6oAePFAKQNqI=;
 b=D7xzDHNPPVBMIoYTN8k530wl/YfHzMHeSPYmpT+UjM/9ONTklG5fA2WiLzyWEHD8OY
 0xxHFlm6xlK6tMP+se8JEU8k/BjYIUlp7KPTIr8dEvBZg4N+ipNQ/mGnhWapC0SwmELX
 HB2LyBYa1Pqbito8wv70M61D2IXQaoHPOONIRzUm34Xy4JTZQb7rhbBKQHxYXzK6cueG
 h+OSoNT1N4WfTyakyZ2BtWuDnfF3D+azeJ5KbsNT+O6PqaDpPA1JmQ1MdsKY3mm6uWFh
 0POUd/FMBa/Hq7zvGEfDqnqNnG3Xm+uVZOnacTf82OrUCI+EEWAlvZ9WFPHP3gjWD7ab
 qnog==
MIME-Version: 1.0
X-Received: by 10.236.147.232 with SMTP id t68mr4298684yhj.127.1400830759841; 
 Fri, 23 May 2014 00:39:19 -0700 (PDT)
Received: by 10.170.54.8 with HTTP; Fri, 23 May 2014 00:39:19 -0700 (PDT)
In-Reply-To: <537DFD70.3010705@freebsd.org>
References: <719056985.20140522033824@supranet.net>
 <537DF2F3.10604@denninger.net> <537DFD70.3010705@freebsd.org>
Date: Fri, 23 May 2014 08:39:19 +0100
Message-ID: <CALfReycEBCKS_VGxryE7EcpB2gNkfn6fRL+O0oPKDWBjVbXkNA@mail.gmail.com>
Subject: Re: Turn off RAID read and write caching with ZFS?
From: krad <kraduk@gmail.com>
To: Stefan Esser <se@freebsd.org>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.18
Cc: FreeBSD FS <freebsd-fs@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 23 May 2014 07:39:21 -0000

i think the general rule is correct in that you should turn off caching.
Simply because you have two algorithms trying to be clever and most
probably undermining each other. If you are worried about synchronous
writes, then put some ssds in and put zil on it, even if its under the raid
card, just make sure its a separate volume thats exported as a lun. There
may be some gains to be made in certain scenarios with this hardware
caching but that means you are going to have to test extensively to make
sure its working in your case, which may or may not be worth while to you.


On 22 May 2014 14:36, Stefan Esser <se@freebsd.org> wrote:

> Am 22.05.2014 14:52, schrieb Karl Denninger:
> [...]
> > Modern drives typically try to compensate for their
> > variable-geometryness through their own read-ahead cache, but the exact
> > details of their algorithm are typically not exposed.
> >
> > What I would love to find is a "buffered" controller that recognizes all
> > of this and works as follows:
> >
> > 1. Writes, when committed, are committed and no return is made until
> > storage has written the data and claims it's on the disk.  If the
> > sector(s) written are in the buffer memory (from a previous read in 2
> > below) then the write physically alters both the disk AND the buffer.
> >
> > 2. Reads are always one full track in size and go into the buffer memory
> > on a LRU basis.  A read for a sector already in the buffer memory
> > results in no physical I/O taking place.  The controller does not store
> > sectors per-se in the buffer, it stores tracks.  This requires that the
> > adapter be able to discern the *actual* underlying geometry of the drive
> > so it knows where track boundaries are.  Yes, I know drive caches
> > themselves try to do this, but how well do they manage?  Evidence
> > suggests that it's not particularly effective.
>
> In the old times, controllers implemented read-ahead, either under
> control of the host-adapter or the host OS (e.g. the based on the
> detection of sequential access patterns).
>
> This changed, when large on-drive caches became practical. Drives
> now do aggressive read-ahead caching, but without the penalty this
> had, in the old times. I do not know, whether this applies to all
> current drives, but since it is old technology, I assume so:
>
> The sector layout is reversed on each track - higher numbered
> sectors come first. The drive starts reading data into its cache
> as soon as the head receives stable data and it stops only when
> the whole requested range of sectors has been read.
>
> E.g. if you request sectors 10 to 20, the drive may have the read
> head positioned when sector 30 comes along. Starting at that sector,
> data is read from sectors 30, 29, ..., 10 and stored in the drive's
> cache. Only after sector 10 has been read, data is transferred to
> the requesting host adapter, while the drive seeks to the next
> track to operate on. This scheme offers opportunistic read-ahead,
> which does not increase the random access seek times.
>
> The old method required the head to stay on the track for some
> milliseconds to read sectors following the requested block on the
> vague chance, that this data might later be requested.
>
> The new method just starts reading as soon as there is data under
> the read head. This needs more cache on the drive, but does not add
> latency for read-ahead. The disadvantage is, that you never know
> how much read-ahead there will be, it depends on the rotational
> position of the disk when the seek ends. And if the first sector
> read from the track is in the middle of the requested range, the
> drive needs to read the whole track to fulfil the request, but
> that would happen with equal probability with the old sector
> layout as well.
>
> > Without this read cache is a crapshoot that gets difficult to tune and
> > is very workload-dependent in terms of what delivers best performance.
> > All you can do is tune (if you're able with a given controller) and test.
>
> The read-ahead of reverse sectors as described above does not have
> any negative side-effect. On average, you'll read half a track into
> the drive's cache whenever you request a single sector.
>
> A controller that implements read-ahead does this by increasing the
> amount of data requested from the drive. This leads to a higher
> probability that a full track must be read to satisfy the request
> and will thus increase latencies observed by the application.
>
> Rergards, STefan
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>