Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 22 Jan 2015 06:26:53 -0500
From:      Randy Stewart <randall@lakerest.net>
To:        Hans Petter Selasky <hps@selasky.org>
Cc:        Adrian Chadd <adrian@freebsd.org>, "src-committers@freebsd.org" <src-committers@freebsd.org>, "svn-src-all@freebsd.org" <svn-src-all@freebsd.org>, "svn-src-head@freebsd.org" <svn-src-head@freebsd.org>, Konstantin Belousov <kostikbel@gmail.com>, "M. Warner Losh" <imp@bsdimp.com>
Subject:   Re: svn commit: r277213 - in head: share/man/man9 sys/kern sys/ofed/include/linux sys/sys
Message-ID:  <04866FE0-43BF-4569-9B67-7ED5F6F4F736@lakerest.net>
In-Reply-To: <54C0B75B.9070305@selasky.org>
References:  <201501151532.t0FFWV2Y037455@svn.freebsd.org> <CAJ-Vmok0GXZoojyi=jE=b5D-d338APztaf3Pw0_AAQ-173XSWw@mail.gmail.com> <54BDD9E1.6090505@selasky.org> <20150120075126.GA42409@kib.kiev.ua> <54BE0AAA.4050104@selasky.org> <20150120090057.GD42409@kib.kiev.ua> <54BE21F0.6010602@selasky.org> <7C692107-51CF-4DFA-BD6C-623D56893150@bsdimp.com> <54C0A352.8090701@selasky.org> <20150122081023.GT42409@kib.kiev.ua> <54C0B75B.9070305@selasky.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Hans:

We (netflix) run in production 35% of the internet with these very =
things
you identify no lock an all. We *do* have some issue we are looking at =
but so far
I have *never* connected the dots the way you were claiming that would
cause a crash. I can see where TCP would do incorrect retransmissions =
but
I did *not* see a crash. Now granted my look was quick at this, but that
was due to time constraints and the holidays. I am going to put myself =
full-time
on this to see if I can understand both how you got at =93there is a =
panic in tcp=94 and
it must fully be the callout-subsystem thus we need to re-write large =
parts of it.

You *may* be correct in a re-write is needed, you *may* be completely =
incorrect.
In either case I plan to dig into this and find out.

R
> On Jan 22, 2015, at 3:39 AM, Hans Petter Selasky <hps@selasky.org> =
wrote:
>=20
> On 01/22/15 09:10, Konstantin Belousov wrote:
>> On Thu, Jan 22, 2015 at 08:14:26AM +0100, Hans Petter Selasky wrote:
>>> On 01/22/15 06:26, Warner Losh wrote:
>>>  >
>>>>> The code simply needs an update. It is not broken in any ways - =
right? If it is not broken, fixing it is not that urgent.
>>>>=20
>>>> Radically changing the performance characteristics is breaking the =
code. Performance regression in the TCP stack is urgent to fix.
>>=20
>>> Not being able to enumerate what all the consumers are that use this =
and
>>> provide an analysis about why they aren?t important to fix is a bug =
in
>>> your process, and in your interaction with the project. We simply do =
not
>>> operate that way.
>> Right, I completely agree with this statement.
>>=20
>>=20
>>> Hi,
>>>=20
>>> My plan is to work out a patch for the TCP stack today, which only
>>> change the callout_init() call or its function. This should not need =
any
>>> particular review. I'll let adrian test and review, because I think =
he
>>> is closer to me timezone wise and you're standing on my head saying =
its
>>> urgent. If he is still not happy, I can back my change out. Else it
>>> remains in -current AS-IS.
>> TCP regresssion was noted, so it is brought in front.  There is =
nothing
>> else which makes TCP issue different from other (hidden) issues.
>>=20
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D
>>> MFC to 10-stable I can delay for sure until
>>> all issues you report to me are fixed.
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D
>>=20
>> Sigh, you still do not understand.  It is your duty to identify all =
pieces
>> which break after your change.  After that, we can argue whether each =
of
>> them is critical or not to allow the migration. But this must have =
been
>> done before the KPI change hit the tree.
>>=20
>=20
> Hi,
>=20
> Are you saying that pieces of code that runs completely unlocked using =
"volatile" as only synchronization mechanism is better than what I would =
call a temporary and hopefully short TCP stack performance loss?
>=20
> I don't understand? How frequently do you reboot your boxes? Maybe one =
every day? And you don't care?
>=20
> --HPS
>=20
>=20
>=20

-----
Randall Stewart
randall@lakerest.net







Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?04866FE0-43BF-4569-9B67-7ED5F6F4F736>