Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 Mar 2014 19:51:14 -0700 (PDT)
From:      Victor Sneider <v.sneider@yahoo.com>
To:        Mateusz Guzik <mjguzik@gmail.com>
Cc:        "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject:   Re: Issue with vn_open(), help me please
Message-ID:  <1395975074.6795.YahooMailNeo@web122106.mail.ne1.yahoo.com>
In-Reply-To: <20140328005604.GD4730@dft-labs.eu>
References:  <1395966793.20688.YahooMailNeo@web122101.mail.ne1.yahoo.com> <20140328005604.GD4730@dft-labs.eu>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Mateusz,=0A=0AThanks for your fast response.=A0=0A=0AI have recorded wha=
t printed out on the console when it crashes but I have not done backtrace =
and dump. I'll do that and post it soon.=A0=0A=0AI did not have INVARIANT a=
nd WITNESS enabled. Should I enable it?=A0=0A=0AI have to correct that when=
 I load the module, callout() is not used. The file opening and reading are=
 performed without issue. But when I use callout() in module, I experience =
the same crash.=A0=0A=0AHere is the console when the system crash:=A0=0A=0A=
Trap cause =3D 2 (TLB=0Amiss (load or instr. fetch) - kernel mode)=0Apanic:=
 trap=0AUptime: 5m1s=0AAutomatic reboot in=0A15 seconds - press a key on th=
e console to abort=0A=0A=0AThanks.=0A=0AV.Sneider.=A0=0A=0A=0A=0AOn Thursda=
y, March 27, 2014 8:56:09 PM, Mateusz Guzik <mjguzik@gmail.com> wrote:=0A =
=0AOn Thu, Mar 27, 2014 at 05:33:13PM -0700, Victor Sneider wrote:=0A=0A> H=
i all,=0A> =0A> I used kern_openat()/fget/fo_read() to open and read a text=
 files inside kernel.=A0=0A> =0A> When I load it as a kernel module, the mo=
dule works fine and do its job.=A0=0A> =0A> When I compiled it into kernel,=
 it crash in kern_openat(), more precisely, in vn_open().=A0I used call_out=
() to defer reading the file and wait for the rootfs mount completes. I set=
 the timeout long enough (5 min, for example) but it still crashes.=A0=0A> =
=0A> I googled a lot but have not found any report about this issue. I am n=
ot an expert about file reading/writing inside kernel but I feel this could=
 be a bug in vn_open().=A0=0A> =0A=0ACan you elaborate on the crash? backtr=
ace, crashing instruction, dump=0Apointer involved in the crash etc.=0A=0AA=
re you running kernel with INVARIANTS and WITNESS enabled? Does the module =
with=0Athese options?=0A-- =0AMateusz Guzik <mjguzik gmail.com>
From owner-freebsd-fs@FreeBSD.ORG  Fri Mar 28 09:23:37 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 7C8F8C76
 for <freebsd-fs@freebsd.org>; Fri, 28 Mar 2014 09:23:37 +0000 (UTC)
Received: from mail-wg0-x22b.google.com (mail-wg0-x22b.google.com
 [IPv6:2a00:1450:400c:c00::22b])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 09EA225A
 for <freebsd-fs@freebsd.org>; Fri, 28 Mar 2014 09:23:36 +0000 (UTC)
Received: by mail-wg0-f43.google.com with SMTP id x13so3296768wgg.2
 for <freebsd-fs@freebsd.org>; Fri, 28 Mar 2014 02:23:35 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=daKF47Nn70d7moPndoOawW65NF6LMWMYRd504nOHgxo=;
 b=g/I65BujQX+xrzaVvHEoMAibQjEC3o60lB3RZvjcwdyNgE420fVQ8kF/bING6QlgVR
 5HEwmyHVtJRlgwHdOSNlXS+sW1QdsRZYlsgCN6o0hJVW9T4gh63haSHDzWPuAdikI2p/
 tuh33Bre1aaUxQsS5qgVhWTPx1PYaNUkrPd/B4vd6qLwq3aTwdncXxhda+++0+5jYn5F
 pecQVqgnsHsSTt05+ntt8W3Ar21Y/N+BtBJgJeYEBzqkxXUAmU7sdX23uO4KSW1P2xOm
 YdT5EKuhbCcuA2nAXjo8XwDlGVxTVuQx3B5fnuI49U2KuEy+HbA0fpuOor94UT8SEYDi
 VyDw==
MIME-Version: 1.0
X-Received: by 10.180.20.71 with SMTP id l7mr10829282wie.35.1395998615213;
 Fri, 28 Mar 2014 02:23:35 -0700 (PDT)
Received: by 10.216.146.195 with HTTP; Fri, 28 Mar 2014 02:23:35 -0700 (PDT)
In-Reply-To: <20140328005911.GA30665@neutralgood.org>
References: <CAFfb-hpi20062+HCrSVhey1hVk9TAcOZAWgHSAP93RSov3sx4A@mail.gmail.com>
 <CALfReydi_29L5tVe1P-aiFnm_0T4JJt72Z1zKouuj8cjHLKhnw@mail.gmail.com>
 <CAFfb-hpZos5-d3xo8snU1aVER5u=dSFRx-B-oqjFRTkT83w0Kg@mail.gmail.com>
 <20140328005911.GA30665@neutralgood.org>
Date: Fri, 28 Mar 2014 10:23:35 +0100
Message-ID: <CAFfb-hr=wR6nxqL+4tn-y2eQEw4n_g7rZoK9rRLnm_Ldcm1TZQ@mail.gmail.com>
Subject: Re: zfs l2arc warmup
From: Joar Jegleim <joar.jegleim@gmail.com>
To: kpneal@pobox.com
Content-Type: text/plain; charset=UTF-8
Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>;
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 28 Mar 2014 09:23:37 -0000

On 28 March 2014 01:59,  <kpneal@pobox.com> wrote:
> On Thu, Mar 27, 2014 at 11:10:48AM +0100, Joar Jegleim wrote:
>> But it's really not a problem for me how long it takes to warm up the
>> l2arc, if it takes a week that's ok. After all I don't plan on
>> reboot'ing this setup very often + I have 2 servers so I have the
>> option to let the server warmup until i hook it into production again
>> after maintenance / patch upgrade and so on .
>>
>> I'm just curious about wether or not the l2arc warmup itself, or if I
>> would have to do that manual rsync to force l2arc warmup.
>
> Have you measured the difference in performance between a cold L2ARC and
> a warm one? Even better, have you measured the performance with a cold
> L2ARC to see if it meets your performance needs?
No I haven't.
I actually started using those 2 ssd's for l2arc the day before I sent
out this mail to the list .
I haven't done this the 'right' way by producing some numbers for
measurement, but I do know that the way this application work today is
that it will pull random jpegs from this dataset of about 1.6TB,
consisting of lots of millions of files ( more than 20 million). And
that today this pool is served from 20 SATA 7.2K disks which would be
the slowest solution for random read access.
Based on the huge performance gain by using ssd's simply by looking at
the spec., but also by looking at other peoples graphs from the net (
people who have done this more thorough than me) I'm pretty confident
to say that if at any time when the application request a jpeg if it
was served from either ram or ssd it would be a substantial
performance gain compared from serving it from the 7.2k array of
disks.

>
> If you really do need a huge L2ARC holding most of your data have you
> considered that maybe you are taking the wrong approach to getting
> performance? Consider load balancing across multiple servers, or having
> your application itself spread the loads of pictures across multiple
> servers.
yes I have  :p but again that would mean I'd have to rewrite the
application, or I would have to have several servers mirrored. There
are problems with having several servers mirrored related to the
application, I'll skip those details here, but I have thought about
what if I served those jpegs from say 4 backend servers, I really
don't think it would help compared to serving stuff from ssd's, or I
would at least have to have 20 disks pr. server for it to be any
performance gain... Bu t I'd still have latency and all the
disadvantages from having 7.2k disks.

The next release of the application actually has taken this into
account and I will in the future be able to spread this over 4
servers.
For the future I might spread this stuff over more backends.
At the moment the cheapest and easiest would be to simply by 2 more
480GB ssd's, put them in the server and make sure as much as possible
of the dataset resides in l2arc.

>
> If a super-huge L2ARC is really needed for the traffic _today_, what about
> when you have more traffic in 3-6-12 months? What about if you increase
> the number of pictures you are randomly choosing from? If your server is
> at the limit of its performance today then pretty soon you will outgrow
> it. Then what?
The server is actually far from any limit, in fact it has so 'little'
to do I've been a bit put off to figure out why our frontpage won't be
more snappy.
And these things will probably be taken care of, again, in the next
release of the application which will give me control of 'todays'
frontpage mosaic pictures where I can either make sure frontpage jpegs
stay in arc, or I'll simply serve frontpage jpegs from varnish .


>
> What happens if your production server fails and your backup server has
> a cold L2ARC? Then what?
performance would drop, but nothing really serious + I got 2 of them,
and my plan is to make sure the l2arc for the second server is warm.


>
> Having more and more parts in a server also means you have more opportunities
> for a failure, and that means a higher chance of something bringing down
> the entire server. What if one of the SSD in your huge L2ARC fails in a
> way that locks the bus? This is especially important since you indicated
> you are using cheaper SSD for the L2ARC. Fewer parts -> more robust server.
Good point. Again, I have a failover server and a proxy with health
check in front, and actually I have a third 'fall-back' server too for
worst case scenarios.

>
> On the ZIL: the ZIL holds data on synchronous writes. That's it. This is
> usually a fraction of the writes being done except in some circumstances.
> Have you measured to see if, or do you otherwise know for sure, that you
> really do need a ZIL? I suggest not adding a ZIL unless you are certain
> you need it.
Yes, I only recently realized that too, and I'm really not sure if a
zil is required.
Some small portion of files (som hundre MB's) are served over nfs from
the same server, if I understand it right a zil will help for nfs
stuff (?) , but I'm not sure if it's any gain of having a zil today.
On the other hand, a zil doesn't have to be big, I can simply buy a
128GB ssd which are cheap today .

>
> Oh, and when I need to pull files into memory I usually use something
> like 'find . -type f -exec cat {} \; >/dev/null'. Well, actually, I
> know I have no spaces or special characters in filenames so I really
> do 'find . -type f -print | xargs cat > /dev/null'. This method is
> probably best if you use '-print0' instead plus the correct argument to
> xargs.
Thanks, this really makes sense and I reckon it would be faster than
rsync from an other server.

>
> --
> Kevin P. Neal                                http://www.pobox.com/~kpn/
>
> "Nonbelievers found it difficult to defend their position in \
>     the presense of a working computer." -- a DEC Jensen paper



-- 
----------------------
Joar Jegleim
Homepage: http://cosmicb.no
Linkedin: http://no.linkedin.com/in/joarjegleim
fb: http://www.facebook.com/joar.jegleim
AKA: CosmicB @Freenode

----------------------



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1395975074.6795.YahooMailNeo>