From owner-freebsd-questions@FreeBSD.ORG Fri Mar 18 18:41:56 2005 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CA77E16A4CE for ; Fri, 18 Mar 2005 18:41:56 +0000 (GMT) Received: from aphrodite.gwi.net (aphrodite.gwi.net [207.5.128.164]) by mx1.FreeBSD.org (Postfix) with ESMTP id 40B0A43D39 for ; Fri, 18 Mar 2005 18:41:56 +0000 (GMT) (envelope-from jcoombs@gwi.net) Received: from failure (murdoc.gwi.net [207.5.142.8]) by aphrodite.gwi.net (8.12.9p2/8.12.9) with SMTP id j2IIft0m024280 for ; Fri, 18 Mar 2005 13:41:55 -0500 (EST) (envelope-from jcoombs@gwi.net) Message-ID: <06da01c52bea$6d1f32f0$1700a8c0@failure> From: "Joshua Coombs" To: Date: Fri, 18 Mar 2005 13:43:49 -0500 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.2527 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2527 Subject: Bind Wierdness X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Mar 2005 18:41:56 -0000 Hello. While trying to track down periodic radius failures, I discovered that Bind was periodically timing out, and even occasionally incorrectly responding with a failure. We orriginally were running 9.2.3 built from ports on FreeBSD 4.9p11, with a mem limit set at 900M, maint interval of 60 minutes. The failures were 61 minutes appart, like clockwork. We moved up to 9.3.0, again built from ports, and continued to observe the same problem. I then built from src, enabling threading, with no luck. A quick discussion with the port maintainer pointed out that 9.3.1 would have 'major threading fixes' for FreeBSD, so I waited for it to come out. Now that it's out, I've built it, threading enabled, and still have the periodic outages. I've currently got the maint interval set at 15 mins, and my problems are tracking the period like clock work. At the moment, my primary source of data comes from my radius server monitoring, as I don't have a direct long term dns monitor going yet. I've been testing by throwing nslookup requests inside while loops from cli and observing the output. The host system for bind is running 9 to 14% cpu load, even durring the maint windows, so I don't believe the host system is overloaded. How should I proceed to diagnose and correct this? I've posted to the bind-users list, seems a few others have noticed similar problems, but noone wants to provide any diagnostic hints there. Joshua Coombs