From owner-freebsd-performance@FreeBSD.ORG Thu Dec 8 18:59:24 2005 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D317516A432 for ; Thu, 8 Dec 2005 18:59:24 +0000 (GMT) (envelope-from j_guojun@lbl.gov) Received: from smtp104.sbc.mail.mud.yahoo.com (smtp104.sbc.mail.mud.yahoo.com [68.142.198.203]) by mx1.FreeBSD.org (Postfix) with SMTP id 8FD2243DA0 for ; Thu, 8 Dec 2005 18:59:08 +0000 (GMT) (envelope-from j_guojun@lbl.gov) Received: (qmail 85586 invoked from network); 8 Dec 2005 18:59:00 -0000 Received: from unknown (HELO ?192.168.2.8?) (jinmtb@sbcglobal.net@68.127.176.52 with plain) by smtp104.sbc.mail.mud.yahoo.com with SMTP; 8 Dec 2005 18:59:00 -0000 Message-ID: <439883D4.3090503@lbl.gov> Date: Thu, 08 Dec 2005 11:04:52 -0800 From: "Jin Guojun [VFFS]" User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.8b) Gecko/20051013 MIME-Version: 1.0 To: Imri Zvik References: In-Reply-To: Content-Type: text/plain; charset=windows-1255; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-performance@freebsd.org Subject: Re: very busy syslog server X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Dec 2005 18:59:25 -0000 Clear enough. em(4) should be able to handle this amount traffic without polling unless all syslog traffics come at the same time that could cause congested resource. That is why I want you to run the script to watch the CPU utilization when the drop happens. The average CPU use does NOT reflect sparkling issue. If CPU utilization is lower than 60%, there is nothing to worry about interrupt, recvspace, etc. because CPU will have enough time to move data in and out. If you see CPU utilization over 60% and interrupt is also over 60%-80%, than interrupt coalescence or polling needs to be considered. At this moment, only one place with three conditions can cause such drop (6.0-Release): --- see function sbappendaddr_locked() in kern/uipc_socket2.c between line 934-942 --- recvspace, asa->sa_len, and number of mbufs. I doubt recvspace will be the problem since sending size (maxdgram) is much smaller than recvspace. sa_len should not be the case unless we had bug in 6.0. The last thing you may check is the mbufs -- type "netstat -m" to see the statistics on mbuf when drop happens. Since you have a lot of CPU time, try to run the script I mentioned to you and add "netstat -m" to the condition when drop count increasing. This should be a few minutes programming work, and run it for hours or a day. If you can get such info., we may know what is going on. We may have a bug since I just reboot my 6.0 box and see numbers of UDP drops, see below. Belkin: netstat -p udp -s udp: 148 datagrams received 0 with incomplete header 0 with bad data length field 0 with bad checksum 0 with no checksum 63 dropped due to no socket 20 broadcast/multicast datagrams dropped due to no socket 0 dropped due to full socket buffers 0 not for hashed pcb 65 delivered 68 datagrams output Belkin: netstat -p udp -s udp: 175 datagrams received 0 with incomplete header 0 with bad data length field 0 with bad checksum 0 with no checksum 69 dropped due to no socket 35 broadcast/multicast datagrams dropped due to no socket 0 dropped due to full socket buffers 0 not for hashed pcb 71 delivered 74 datagrams output Imri Zvik wrote: >Hi, > >1. The NIC being used is "Intel(R) PRO/1000" (the em(4) driver). >2. The CPU utilization in average is between 15% and 20%. >3. This machine is being used _only_ for the sysloging - the database resides on another server. > >Meanwhile, I have added some more memory to the machine, and now it has 3GB of RAM, but I am still seeing packets being dropped due to full socket buffers. > >Thanks, > >-- >Imri Zvik >PGP (2.6.3ia) Public Key: http://mariska.inter.net.il/~imriz/imriz.pgp > >________________________________________ >From: Jin Guojun [mailto:jinmtb@sbcglobal.net] >Sent: Wednesday, December 07, 2005 9:56 PM >To: Sean Chittenden >Cc: Imri Zvik; freebsd-performance@freebsd.org >Subject: Re: very busy syslog server > >Sean Chittenden wrote: >I'm trying to setup a syslog server to serve a large group of >servers. For the syslog daemon, I have chosen rsyslogd, and the >backend is mysql (on a different machine). > >The machine has 2 Intel Xeon 2.80GHz CPUs, and 1GB of RAM, and it is >running FreeBSD 6 (6.0-STABLE). > >The problem is, that I see a lot of UDP packets being dropped: > >udp: > 390202 datagrams received > 0 with incomplete header > 0 with bad data length field > 0 with bad checksum > 6 with no checksum > 0 dropped due to no socket > 0 broadcast/multicast datagrams dropped due to no socket >->>> 123677 dropped due to full socket buffers > 0 not for hashed pcb > 266525 delivered > 133260 datagrams output > >I have tried to increase net.inet.udp.recvspace, but it didn't solve >the problem. > >I would appreciate any hint or tips. > > >When you're doing a large number of packets per second, you may want >to look into enabling device polling(4). Right now, every packet >results in an interrupt. With device polling, you can handle more >than one packet per interrupt. See the man page for details. >Not quite, the interrupt interval depends on the device driver, or which NIC is used. >A number NICs are able to to interrupt coalescence, which requires to increase buffer >descriptor ring size (just for receiving buffer descriptors). Of course, polling is a simple thing >to try. > >Before we can come up a better way to alter a better solution for this case, you also need to >monitor a few things: > >What is NIC on this machine? > >What is the CUP utilization in average and in case the packet drops? You can simply write a >script to do this instead of instructing kernel to do so (since this needs no super accurate): > >run vmstat to record CPU utilization in every 1 to 3 seconds for use when following event happens: >use netstat watch UDP and pipe it to awk "netstat -udp | awk '$2=="drooped" {print $1; exit}'" >every 3-5 seconds, and compare the result with previous one to see if any changes. If so, >grep the last couple of line from vmstat output records. > >>From your information, it seems that this machine has enough memory bandwidth for syslog needs, >since it is not clear what this machine is for rlog daemon or sql server, or both are on the same machine. >If the third case is true, then you may run out of memory bandwidth. Under this circumstance, >you need to obtain the packet rate and the average packet size in order to determine the I/O >and memory bandwidth requirements. > > -Jin Guojun > >