From owner-cvs-all@FreeBSD.ORG Thu Aug 3 09:43:22 2006 Return-Path: X-Original-To: cvs-all@FreeBSD.org Delivered-To: cvs-all@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1037916A4DA; Thu, 3 Aug 2006 09:43:22 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id 91CCF43D58; Thu, 3 Aug 2006 09:43:21 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 2DFE846C6C; Thu, 3 Aug 2006 05:43:21 -0400 (EDT) Date: Thu, 3 Aug 2006 10:43:21 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Helge Oldach In-Reply-To: <200608030536.k735aIT3081092@sep.oldach.net> Message-ID: <20060803104026.A45647@fledge.watson.org> References: <200608030536.k735aIT3081092@sep.oldach.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: scottl@samsco.org, src-committers@FreeBSD.org, Hajimu UMEMOTO , cvs-src@FreeBSD.org, cvs-all@FreeBSD.org, bz@FreeBSD.org, kensmith@cse.Buffalo.EDU Subject: Re: cvs commit: src/sys/sys param.h src/include Makefile netdb.h res_update.h resolv.h src/include/arpa inet.h nameser.h nameser X-BeenThere: cvs-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the entire tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Aug 2006 09:43:22 -0000 On Thu, 3 Aug 2006, Helge Oldach wrote: > Well... I've spotted a regression not with the ports tree but with 6-STABLE. > On several boxes with this change applied I see lots of sendmails stacking > up over time, for example: > > 713 ?? Ss 0:01.05 sendmail: accepting connections (sendmail) > 717 ?? Is 0:00.02 sendmail: Queue runner@00:30:00 for /var/spool/client > 31747 ?? I 0:00.00 sendmail: startup with 71.119.31.81 (sendmail) > 32834 ?? I 0:00.00 sendmail: startup with 83.36.190.38 (sendmail) > 33569 ?? I 0:00.00 sendmail: startup with 221.206.76.60 (sendmail) > 34023 ?? I 0:00.00 sendmail: startup with 49.195.192.61.tokyo.flets.alph > 34459 ?? I 0:00.00 sendmail: startup with 221.165.35.46 (sendmail) > 36517 ?? I 0:00.00 sendmail: startup with 61.192.180.137 (sendmail) > 38722 ?? I 0:00.00 sendmail: startup with 203.177.238.78 (sendmail) > 39126 ?? I 0:00.00 sendmail: startup with 222.90.251.185 (sendmail) > 39203 ?? I 0:00.00 sendmail: startup with 221.9.214.183 (sendmail) > 39859 ?? I 0:00.00 sendmail: startup with 59.20.101.111 (sendmail) > 41090 ?? I 0:00.00 sendmail: startup with 61.192.166.235 (sendmail) > 41766 ?? I 0:00.00 sendmail: startup with 68.118.52.132 (sendmail) > 42482 ?? I 0:00.00 sendmail: startup with 219.249.201.36 (sendmail) > 42483 ?? I 0:00.00 sendmail: startup with 219.249.201.36 (sendmail) > 43467 ?? I 0:00.00 sendmail: startup with 210.213.191.70 (sendmail) > 43757 ?? I 0:00.00 sendmail: startup with 220.189.144.7 (sendmail) > 44176 ?? I 0:00.00 sendmail: startup with 71.205.226.98 (sendmail) > 44850 ?? I 0:00.00 sendmail: startup with 72.89.135.133 (sendmail) > 44943 ?? I 0:00.00 sendmail: startup with 220.167.134.212 (sendmail) > 48031 ?? I 0:00.00 sendmail: startup with 60.22.198.23 (sendmail) > > On one busy sendmail box I've seen literally thousands of such processes. > Note that these processes don't disappear, so it is not related to > sendmail.cf's timeouts. > > Broswing through the recent STABLE commits, I firstly thought it was related > to the recent socket code changes, but no, it's not. It is definitely this > introduction of BIND9's resolver. If I back out this change, all is fine > again. > > As said, this is a very recent 6-STABLE. I'm tracking CTM, not cvs. > > I would seriously suggest to more thoroughly test this. I'm not asking to > back it out right now, but this is definitely a breakage in 6-STABLE that > should be fixed before 6.2. I've had a similar report from Bjoern Zeeb; at first we thought the reason he had stacking up TCP connections was a bug I introduced in 7.x, but it turns out it's because his sshd is wedging in name resolution, and not closing the TCP sockets (which are now visible in netstat in a way they weren't before). We only concluded that it was not a kernel socket bug a day or so ago, so I'm not sure he's had a chance to generate a resolver bug report. He reported that the application appeared to have two connected UDP sockets for name resolution, and one bad name server entry, but that the resolver appeared to be blocked in a read on the UDP socket that didn't have data queued, rather than the one that did. This was all from looking at netstat, and as far as I know, he's not dug into the resolver yet to see what might be happening. I've CC'd Bjoern in case he has further insight or can offer some more suggestions on what might be going on. Robert N M Watson Computer Laboratory University of Cambridge