FreeBSD Mail Archives

Date:      Thu, 11 Apr 2013 08:40:15 -0600
From:      Josh Beard <josh@signalboxes.net>
To:        freebsd-fs@freebsd.org
Subject:   Re: ZFS + NFS poor performance after restarting from 100 day uptime
Message-ID:  <CAHDrHSvA%2B782vNfPt4iERfCh_C_HnRBpBKGh1rhND9Ea=Lsh2g@mail.gmail.com>
In-Reply-To: <CAHDrHSvpcnnEf5_ys67rF4md7fDdKZ4f%2B3bDsndy7_hnmofrWg@mail.gmail.com>
References:  <CAHDrHSsCunt9eQKjMy9epPBYTmaGs5HNgKV2%2BUKuW0RQZPpw%2BA@mail.gmail.com> <D763F64A24B54755BBF716E91D646F6A@multiplay.co.uk> <CAHDrHSvXCu%2Bv%2Bps3ctg=T0qtHjKGkXxvnn_EaNrt_eenkJ9dbQ@mail.gmail.com> <12CCA57CCC7E4F16A1147F8422F5F151@multiplay.co.uk> <CAHDrHSvpcnnEf5_ys67rF4md7fDdKZ4f%2B3bDsndy7_hnmofrWg@mail.gmail.com>

I wanted to give a followup to this in case someone else stumbles upon this
thread with search queries.

I was wrong about the original (9.1-RC3) kernel performing better.  It was
exhibiting the same behavior under "real world" conditions.  Real world for
this server is 100-200 Mac clients connecting with network homes via NFS.

I haven't completely confirmed anything, but disabling Spotlight Indexing
(Mac client feature) helped *significantly*.  It's still curious why
spotlight indexing was never an issue prior to the reboot I mentioned.  I'm
also unsure why the RAID controller's verifications are intermittently slow
since that reboot.  In any event, I don't think it's a ZFS or FreeBSD
issue, based off of various benchmarks, which show expected performance.

Thanks.


On Fri, Mar 22, 2013 at 2:24 PM, Josh Beard <josh@signalboxes.net> wrote:

>
>
> On Fri, Mar 22, 2013 at 1:07 PM, Steven Hartland <killing@multiplay.co.uk>wrote:
>
>>
>>  ----- Original Message ----- From: Josh Beard
>>>
>>>> A snip of gstat:
>>>>
>>>> dT: 1.002s  w: 1.000s
>>>> L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
>>>>
>>> ...
>>
>>>    4    160    126   1319   31.3     34    100    0.1  100.3| da1
>>>>    4    146    110   1289   33.6     36     98    0.1   97.8| da2
>>>>    4    142    107   1370   36.1     35    101    0.2  101.9| da3
>>>>    4    121     95   1360   35.6     26     19    0.1   95.9| da4
>>>>    4    151    117   1409   34.0     34    102    0.1  100.1| da5
>>>>    4    141    109   1366   35.9     32    101    0.1   97.9| da6
>>>>    4    136    118   1207   24.6     18     13    0.1   87.0| da7
>>>>    4    118    102   1278   32.2     16     12    0.1   89.8| da8
>>>>    4    138    116   1240   33.4     22     55    0.1  100.0| da9
>>>>    4    133    117   1269   27.8     16     13    0.1   86.5| da10
>>>>    4    121    102   1302   53.1     19     51    0.1  100.0| da11
>>>>    4    120     99   1242   40.7     21     51    0.1   99.7| da12
>>>>
>>>> Your ops/s are be maxing your disks. You say "only" but the ~190 ops/s
>>>> is what HD's will peak at, so whatever our machine is doing is causing
>>>> it to max the available IO for your disks.
>>>>
>>>> If you boot back to your previous kernel does the problem go away?
>>>>
>>>> If so you could look at the changes between the two kernel revisions
>>>> for possible causes and if needed to a binary chop with kernel builds
>>>> to narrow down the cause.
>>>>
>>>
>>> Thanks for your response.  I booted with the old kernel (9.1-RC3) and the
>>> problem disappeared!  We're getting 3x the performance with the previous
>>> kernel than we do with the 9.1-RELEASE-p1 kernel:
>>>
>>> Output from gstat:
>>>
>>>     1    362      0      0    0.0    345  20894    9.4   52.9| da1
>>>     1    365      0      0    0.0    348  20893    9.4   54.1| da2
>>>     1    367      0      0    0.0    350  20920    9.3   52.6| da3
>>>     1    362      0      0    0.0    345  21275    9.5   54.1| da4
>>>     1    363      0      0    0.0    346  21250    9.6   54.2| da5
>>>     1    359      0      0    0.0    342  21352    9.5   53.8| da6
>>>     1    347      0      0    0.0    330  20486    9.4   52.3| da7
>>>     1    353      0      0    0.0    336  20689    9.6   52.9| da8
>>>     1    355      0      0    0.0    338  20669    9.5   53.0| da9
>>>     1    357      0      0    0.0    340  20770    9.5   52.5| da10
>>>     1    351      0      0    0.0    334  20641    9.4   53.1| da11
>>>     1    362      0      0    0.0    345  21155    9.6   54.1| da12
>>>
>>>
>>> The kernels were compiled identically using GENERIC with no modification.
>>> I'm no expert, but none of the stuff I've seen looking at svn commits
>>> looks like it would have any impact on this.  Any clues?
>>>
>>
>> Your seeing a totally different profile there Josh as in all writes no
>> reads where as before you where seeing mainly reads and some writes.
>>
>> So I would ask if your sure your seeing the same work load, or has
>> something external changed too?
>>
>> Might be worth rebooting back to the new kernel and seeing if your
>> still see the issue ;-)
>>
>>
>>    Regards
>>    Steve
>>
>> Regards
>> Steve
>>
>>
> Steve,
>
> You're absolutely right.  I didn't catch that, but the total ops/s is
> reaching quite a bit higher.  Things are certainly more responsive than
> they have been, for what it's worth, so it "feels right."  I'm also not
> seeing this thing consistently railed to 100% busy like I was before with
> similar testing (that was 50 machines just pushing data with dd).  I won't
> be able to get a good comparison until Monday, when our students come back
> (this is a file server for a public school district and used for network
> homes).
>
> Josh
>
>

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAHDrHSvA%2B782vNfPt4iERfCh_C_HnRBpBKGh1rhND9Ea=Lsh2g>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation