From owner-freebsd-stable Sun Dec 9 14:56:29 2001 Delivered-To: freebsd-stable@freebsd.org Received: from mailhub2.pegs.com (mailhub2.pegs.com [138.113.16.9]) by hub.freebsd.org (Postfix) with ESMTP id 9A9B037B416 for ; Sun, 9 Dec 2001 14:56:22 -0800 (PST) Received: (from root@localhost) by mailhub2.pegs.com (8.11.4/8.11.4) id fB9MtnJ65247; Sun, 9 Dec 2001 15:55:49 -0700 (MST) (envelope-from william.bloom@pegs.com) Received: from wbloom.pegs.com (wbloom.pegs.com [138.113.129.92]) by mailhub2.pegs.com (8.11.4/8.11.4) with ESMTP id fB9Mtii65235; Sun, 9 Dec 2001 15:55:44 -0700 (MST) (envelope-from william.bloom@pegs.com) Date: Sun, 9 Dec 2001 15:55:43 -0700 From: William Bloom To: freebsd-stable@freebsd.org Cc: john.beckner@pegs.com, chad@larsons.org Subject: SMP and Process Priority: named-xfer problem Message-ID: <20011209155543.G12454@wbloom.pegs.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Mailer: Balsa 1.2.0 Lines: 125 X-Virus-Scanned: by AMaViS perl-10 Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG I've just reached a checkpoint on a problem-solving effort with BIND 8.2.5 on a dual-processor Dell PowerEdge 4200 running an SMP FreeBSD kernel built from STABLE sources CVSup'd on 11/7, and I'd like to compare notes with folks on the list and also perhaps get some enlightenment over a few unanswered question about what happened. As background, this machine runs a secondary nameserver with several hundred zones. The nameserver is not chroot'd, but it is configured to run in a sandbox (a non-superuser account and a non-wheel group). As usual for a sandbox, the entire runtime directory for the nameserver database is of course owned by the sandbox user/group, as is the directory where the PID file is maintained. The symptom, briefly, was that whenever the nameserver attempted to transfer a zone from the master, then the named-xfer process would become suspended and would eventually timeout. The problem was 100% reproducible. None of the other non-SMP machines on which we run the -same- FreeBSD installation, with the -same- BIND (built from the ports collection in all cases), and the -same- BIND configuration had this symptom. The symptom only occured on the one machine. After some investigation, it was found that a trial transfer of a particular zone could be done from the command line using named-xfer successfully -only- if the caller was the superuser. We were using something like... /usr/local/libexec/named-xfer -z abc.com -f db.abc -s 0 ...where '' would be the IP address of a master nameserver. Any attempt to use named-xfer interactively by any user other than the superuser caused the transfer to suspend in precisely the same fashion as seen during nameserver operation on the problem machine. The non-SMP nameserver hosts did not exhibit this behavior. Even more interesting, we discovered that once a named-xfer process had hung, then it would resume and complete the zone transfer normally if it were sent one of a set of certain signals (SIGALRM or SIGHUP) using 'kill'. Using 'truss' to see how the named-xfer process was hanging, we found that the hang always happened during a system call, but not always the same one. Sometimes it would suspend during a 'connect' call for a socket, sometimes it would hang during a 'getpid' call. No matter where it suspended, it would shake loose and complete normally if it were sent an ALRM signal. The clue was noticing an EPERM error returned by a 'setpriority' call near the top of the truss output. This is a 'silent' error that does not appear on stderr or in the nameserver debug log; it is only seen in a truss session log. This particular 'setpriority' error was absent (meaning that the 'setpriority' completes without error) in the truss output which we captured from a coomparative superuser named-xfer session. Checking the named-xfer.c sources, the following is present near the beginning of the named-xfer code path... #ifdef RENICE nice(-40); /* this is the recommended procedure to */ nice(20); /* reset the priority of the current process */ nice(0); /* to "normal" (== 0) - see nice(3) */ #endif This code looks quite suspicious in a program that is allowed to run as non-superuser, since BSD -only- permits a negative 'nice' value as an argument if the calling process is owned by the superuser. The point of the above code is that the current process's 'nice' value will be first reduced to the lowest possible value permitted by 'nice()' (which should be -20), then immediately bumped back to an absolute value of 0 (lowered to -20 and then raised by 20), and the final 'nice(0)' call is for upwards compatibility with older versions of 'nice()'. This estblishes a nominal baseline scheduling priority for a process that was forked from another process whose priority is unknown. As a footnote, it seems that in the case of FreeBSD it would much simpler to just make one call to 'setpriority()'. But executing such code as a non-superuser has an entirely different effect for BSD processes. The first call (nice(-40)) is ignored and returns an EPERM error, since only the superuser can lower the 'nice' value. That means that the 2nd 'nice(20)' call now has the effect of raising the process's 'nice' to 20 instead of to 0, hence greatly lowering the process's scheduling priority. That's why the problem was only in evidence when we ran named-xfer as the superuser. Experimentally, we inserted an '#undef RENICE' in front of the above code in named-xfer.c, rebuilt/reinstalled the one binary, and now all seems well. But there are unanswered questions that are relevant to STABLE. The only impact was on a dual-processor SMP machine, and one that wasn't particularly busy at the time. Non-SMP machines running the same named-xfer binary on the same FreeBSD build never saw a named-xfer suspend. I'm thinking that the above code is indeed incorrect for a program that may be run by a non-superuser process (as would be the case for a chroot'd or sandbox'd nameserver), and we certainly aren't keen on the idea of making named-xfer root SUID (nor, am I supposing, would anyone who bothers to chroot or sandbox a nameserver). However, even though the named-xfer process ends up being deprioritized when executed from a sandbox'd named as described above, we notice that the zone transfer -still- easily completes within the 2 minute timeout period on non-SMP FreeBSD nameservers. Only on an SMP machine does the 'endless suspend' seem to occur. So why does the deprioritized named-xfer suspend forever on an SMP FreeBSD host, as the first question that puzzles me? As a second question, why does such a suspended process then resume without any further hangs after it getss a SIGALRM (does signal response include a priority boost)? It seems that I've got a set of circumstances that must not be very common or else there would have been more people impacted: a multiprocessor machine running a FreeBSD SMP kernel built from post-4.4 sources on a Dell PowerEdge and executing a nameserver with a lot of slave zones. Are there really not many people doing this? Is there some nuance of kernel configuration that I've overlooked that accounts for this odd SMP behavior? I'm only using the SMP and APIC_IO options on this PowerEdge so far, and we've seen no instability or oddness in the machine in about 4 weeks of operation apart from this one process priority issue. I've perused the freebsd-stable and freebsd-smp lists and not seen anything that quite sheds light on this. Bill -- William Bloom (602) 906-7525 Pegasus Solutions, Inc. 7500 North Dreamy Draw Drive, Suite 120 Phoenix, Az 85020 http://www.pegs.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message