Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 18 Sep 2006 15:29:24 +0200
From:      Thomas Herrlin <junics-fbsdcurrent@atlantis.maniacs.se>
To:        current@freebsd.org
Subject:   Re: zonelimit livelock, some possable workarounds
Message-ID:  <450E9F34.8090202@atlantis.maniacs.se>
In-Reply-To: <20060828165542.GA78024@peter.osted.lan>
References:  <20060828165542.GA78024@peter.osted.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
Peter Holm wrote:
> While stress testing GENERIC HEAD from Aug 26 13:08 UTC I ran into
> this livelock:
>
> http://people.freebsd.org/~pho/stress/log/cons206.html
>   
This has sadly been a common occurrence for some of us on live prod.
systems, however i think it appeared more as an application deadlock for
us as the cpu usage didn't go crazy. (IIRC)
Using our previous custom daemon that lacked good syn flood protection
on a network i help admin caused this with our own stress test suite and
once during a ddos attack.
Currently running releng_6_1 but it has happened on 5.x aswell.

Essentially what our test suite did was connect with a huge amount of
clients simultaneously and not responding to the server replay until all
clients where connected. This caused a lot of Recv-Q/Send-Q allocation
according to netstat (esp. if running the test locally), it probably ran
out of network memory and most network applications went into zoneli
state according to top.
We wont publish our test suite publicly as it is essentially a DOS
client, but can send it to a few officially listed freebsd developers
and help with replication upon request.

The workarounds we currently are trying are as follows:
* Replacing the daemons that dont handle attacks very well.
    (Some tests with the PF firewall connection rate limiting has had
have some success on a test system, but not employed live yet)
* Raising the vm.zone/mbuf_cluster indirectly with maxusers to a very
high number.
    (maxusers does not scale beyond a certain number so we had to
calculate it manually based on sysctl vm.zone and our RAM.)
    (We used to change this limit seperatly but rather have all the
limits scale with maxusers.)
* Taking a fraction of the new mbuf_cluster byte size and setting it as
a sbsize limit in login.conf so the limit cant be reached easily (or so
we hope).
* Written a small executable that attempts local network connection and
that can be used with watchdogd -e parameter to reboot the box.


/Thomas Herrlin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?450E9F34.8090202>