From owner-freebsd-hackers@FreeBSD.ORG Sat Sep 3 23:12:25 2005 Return-Path: X-Original-To: hackers@FreeBSD.org Delivered-To: freebsd-hackers@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AE2F416A41F for ; Sat, 3 Sep 2005 23:12:25 +0000 (GMT) (envelope-from bra@fsn.hu) Received: from people.fsn.hu (people.fsn.hu [195.228.252.137]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4118C43D45 for ; Sat, 3 Sep 2005 23:12:24 +0000 (GMT) (envelope-from bra@fsn.hu) Received: from localhost (localhost [127.0.0.1]) by people.fsn.hu (Postfix) with ESMTP id EA20E8441F for ; Sat, 3 Sep 2005 23:12:24 +0200 (CEST) Received: from people.fsn.hu ([127.0.0.1]) by localhost (people.fsn.hu [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 53009-04-2 for ; Sat, 3 Sep 2005 23:12:12 +0200 (CEST) Received: from [172.16.164.2] (fw.axelero.hu [195.228.243.120]) by people.fsn.hu (Postfix) with ESMTP id 1E8478441E for ; Sat, 3 Sep 2005 23:12:12 +0200 (CEST) Message-ID: <431A11AB.2060008@fsn.hu> Date: Sat, 03 Sep 2005 23:12:11 +0200 From: Attila Nagy User-Agent: Debian Thunderbird 1.0.2 (X11/20050602) X-Accept-Language: en-us, en MIME-Version: 1.0 To: hackers@FreeBSD.org Content-Type: text/plain; charset=ISO-8859-2; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: amavisd-new at fsn.hu Cc: Subject: Bind DoS? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Sep 2005 23:12:25 -0000 Hello, I am currently trying to set up two caching nameservers and noticed an interesting behaviour. The configuration is the following: two FreeBSD/amd64 6-CURRENT machines, with single Opteron processors. Bind was compiled from ports, without threading, with gcc34 (from ports), with -O2 -static. It runs in a jail, with nothing more than the config and a nearly empty devfs mount. Machine A has a simple config of the following: options { directory "/etc/bind"; tcp-clients 256; recursive-clients 8192; max-cache-size 600M; minimal-responses yes; pid-file "/tmp/named.pid"; forwarders { MACHINE_B_IP; }; }; Machine B has the same bind, but runs as an authoritative NS with a joker record of: * IN TXT "256xA" in the . zone (so it answers 256 "A"s for everything). The test: from machine B I start a queryperf, this way: queryperf -d list -s MACHINE_A_IP where list has the following: www.RANDOMNUMBER.hu TXT [...] this is 9000000 times. During the test, machine A starts to fill its cache up until about 860 MBs. Until that I see this in top: CPU states: 27.7% user, 0.0% nice, 58.1% system, 14.2% interrupt, 0.0% idle On machine B queryperf receives answer within the default timeout (5 seconds). After bind reaches about 860 MBs, it starts to eat CPU, so there is 100% user and nearly 0% system and interrupt usage. queryperf starts to time out with the following: [Timeout] Query timed out: msg id 64837 Warning: Received a response with an unexpected (maybe timed out) id: 64837 The server effectively dies, it can answer only a very little number of queries and with very low performance. If I stop queryperf, bind remains in the CPU eating state: 76423 bind 1 129 0 861M 862M RUN 8:30 97.71% named Because the machine has much more RAM, I first tried with 1200M in the config. The server has reached its "zombie" state at around 1600 MB of usage and it was much unresponsive. On another (real) server, I noticed similar behaviour this week. Bind started to eat all CPU resources, there were only "recursive quota reached" messages in the logs, but rndc status said only very low usage (for example 60/1024 on that server). I can repeat this with and without patch-lib_dns_resolver.c. If I stop the queries, the server starts to answer the queries in a few minutes, after it has finished its strange "CPU eating" loop. ktrace says, it's doing this many-many times between two successful queries: 76423 named CALL gettimeofday(0x7fffffffe450,0) 76423 named RET gettimeofday 0 Any ideas? Thanks, -- Attila Nagy e-mail: Attila.Nagy@fsn.hu Free Software Network (FSN.HU) phone @work: +361 371 3536 ISOs: http://www.fsn.hu/?f=download cell.: +3630 306 6758