Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 9 Dec 2001 22:12:41 +0000
From:      Josh Paetzel <friar_josh@webwarrior.net>
To:        William Bloom <william.bloom@pegs.com>
Cc:        freebsd-stable@FreeBSD.ORG, john.beckner@pegs.com, chad@larsons.org
Subject:   Re: SMP and Process Priority: named-xfer problem
Message-ID:  <20011209221241.E562@twincat.vladsempire.net>
In-Reply-To: <20011209155543.G12454@wbloom.pegs.com>; from william.bloom@pegs.com on Sun, Dec 09, 2001 at 03:55:43PM -0700
References:  <20011209155543.G12454@wbloom.pegs.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Dec 09, 2001 at 03:55:43PM -0700, William Bloom wrote:
> I've just reached a checkpoint on a problem-solving effort with BIND
> 8.2.5 on a dual-processor Dell PowerEdge 4200 running an SMP FreeBSD
> kernel built from STABLE sources CVSup'd on 11/7, and I'd like to
> compare notes with folks on the list and also perhaps get some
> enlightenment over a few unanswered question about what happened.  As
> background, this machine runs a secondary nameserver with several
> hundred zones.  The nameserver is not chroot'd, but it is configured
> to run in a sandbox (a non-superuser account and a non-wheel group).
> As usual for a sandbox, the entire runtime directory for the
> nameserver database is of course owned by the sandbox user/group, as
> is the directory where the PID file is maintained.
> 
> The symptom, briefly, was that whenever the nameserver attempted to
> transfer a zone from the master, then the named-xfer process would
> become suspended and would eventually timeout.  The problem was 100%
> reproducible.  None of the other non-SMP machines on which we run the
> -same- FreeBSD installation, with the -same- BIND (built from the
> ports collection in all cases), and the -same- BIND configuration had
> this symptom.  The symptom only occured on the one machine.
> 
> After some investigation, it was found that a trial transfer of a
> particular zone could be done from the command line using named-xfer
> successfully -only- if the caller was the superuser.  We were using
> something like...
> 
>     /usr/local/libexec/named-xfer -z abc.com -f db.abc -s 0 <master>
> 
> ...where '<master>' would be the IP address of a master nameserver.
> Any attempt to use named-xfer interactively by any user other than the
> superuser caused the transfer to suspend in precisely the same fashion
> as seen during nameserver operation on the problem machine.  The
> non-SMP nameserver hosts did not exhibit this behavior.  Even more
> interesting, we discovered that once a named-xfer process had hung,
> then it would resume and complete the zone transfer normally if it
> were sent one of a set of certain signals (SIGALRM or SIGHUP) using
> 'kill'.
> 
> Using 'truss' to see how the named-xfer process was hanging, we found
> that the hang always happened during a system call, but not always the
> same one.  Sometimes it would suspend during a 'connect' call for a
> socket, sometimes it would hang during a 'getpid' call.  No matter
> where it suspended, it would shake loose and complete normally if it
> were sent an ALRM signal.
> 
> The clue was noticing an EPERM error returned by a 'setpriority' call
> near the top of the truss output.  This is a 'silent' error that does
> not appear on stderr or in the nameserver debug log; it is only seen
> in a truss session log.  This particular 'setpriority' error was
> absent (meaning that the 'setpriority' completes without error) in the
> truss output which we captured from a coomparative superuser
> named-xfer session.  Checking the named-xfer.c sources, the following
> is present near the beginning of the named-xfer code path...
> 
>     #ifdef RENICE
>     nice(-40);  /* this is the recommended procedure to        */
>     nice(20);   /*   reset the priority of the current process */
>     nice(0);    /*   to "normal" (== 0) - see nice(3)          */   
>     #endif
> 
> This code looks quite suspicious in a program that is allowed to run
> as non-superuser, since BSD -only- permits a negative 'nice' value as
> an argument if the calling process is owned by the superuser.  The
> point of the above code is that the current process's 'nice' value
> will be first reduced to the lowest possible value permitted by
> 'nice()' (which should be -20), then immediately bumped back to an
> absolute value of 0 (lowered to -20 and then raised by 20), and the
> final 'nice(0)' call is for upwards compatibility with older versions
> of 'nice()'.  This estblishes a nominal baseline scheduling priority
> for a process that was forked from another process whose priority is
> unknown.  As a footnote, it seems that in the case of FreeBSD it would
> much simpler to just make one call to 'setpriority()'.
> 
> But executing such code as a non-superuser has an entirely different
> effect for BSD processes.  The first call (nice(-40)) is ignored and
> returns an EPERM error, since only the superuser can lower the 'nice'
> value.  That means that the 2nd 'nice(20)' call now has the effect of
> raising the process's 'nice' to 20 instead of to 0, hence greatly
> lowering the process's scheduling priority.  That's why the problem
> was only in evidence when we ran named-xfer as the superuser.
> 
> Experimentally, we inserted an '#undef RENICE' in front of the above
> code in named-xfer.c, rebuilt/reinstalled the one binary, and now all
> seems well.
> 
> But there are unanswered questions that are relevant to STABLE.  The
> only impact was on a dual-processor SMP machine, and one that wasn't
> particularly busy at the time.  Non-SMP machines running the same
> named-xfer binary on the same FreeBSD build never saw a named-xfer
> suspend. I'm thinking that the above code is indeed incorrect for a
> program that may be run by a non-superuser process (as would be the
> case for a chroot'd or sandbox'd nameserver), and we certainly aren't
> keen on the idea of making named-xfer root SUID (nor, am I supposing,
> would anyone who bothers to chroot or sandbox a nameserver).  However,
> even though the named-xfer process ends up being deprioritized when
> executed from a sandbox'd named as described above, we notice that the
> zone transfer -still- easily completes within the 2 minute timeout
> period on non-SMP FreeBSD nameservers.  Only on an SMP machine does
> the 'endless suspend' seem to occur.
> 
> So why does the deprioritized named-xfer suspend forever on an SMP
> FreeBSD host, as the first question that puzzles me?  As a second
> question, why does such a suspended process then resume without any
> further hangs after it getss a SIGALRM (does signal response include a
> priority boost)?
> 
> It seems that I've got a set of circumstances that must not be very
> common or else there would have been more people impacted: a
> multiprocessor machine running a FreeBSD SMP kernel built from
> post-4.4 sources on a Dell PowerEdge and executing a nameserver with a
> lot of slave zones.  Are there really not many people doing this?  Is
> there some nuance of kernel configuration that I've overlooked that
> accounts for this odd SMP behavior?  I'm only using the SMP and APIC_IO
> options on this PowerEdge so far, and we've seen no instability or
> oddness in the machine in about 4 weeks of operation apart from this
> one process priority issue.
> 
> I've perused the freebsd-stable and freebsd-smp lists and not seen
> anything that quite sheds light on this.

At one point a friend of mine has a twin processor Dell Poweredge 
2450, I believe it was shortly after 4.0-RELEASE came out.  There was 
no SMP support for that machine at the time, and it was introduced 
shortly thereafter.  My point is, that SMP for those machines hasn't 
existed for all that long.

Your analysis of the code does seem correct, I can't see a reason to 
reset the nice 3 times on a BSD system.  I have several SMP machines 
available, and while some of them run BIND, non of them have a large 
number of zones. (none of them are dell poweredge boxes, either)  I'll
 load a ton of zones and see what happens on transfers.  If I do get 
the hangs, then maybe I can test your patch as well.  I'll probably take
a shot at this monday or tuesday.

Josh
 

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20011209221241.E562>