Date: Sun, 14 Sep 2008 15:56:12 +0300 From: Giorgos Keramidas <keramida@freebsd.org> To: Julian Elischer <julian@elischer.org> Cc: freebsd-current@freebsd.org, Robert Watson <rwatson@freebsd.org> Subject: Re: panic in rt_check_fib() Message-ID: <87tzcij383.fsf@kobe.laptop> References: <87prnjh80z.fsf@kobe.laptop> <alpine.BSF.1.10.0809131105280.55411@fledge.watson.org> <48CC14AD.4090708@elischer.org> <874p4ju8t3.fsf@kobe.laptop> <87zlmbstv1.fsf@kobe.laptop> <48CCAF23.1010605@elischer.org>
index | next in thread | previous in thread | raw e-mail
On Sat, 13 Sep 2008 23:28:51 -0700, Julian Elischer <julian@elischer.org> wrote:
> To recap on this, I rewrote this function a couple of week sagobecause I
> couldn't keep track of what was going on, and I thought it might
> havesome bad edge cases. a couple of days later Giorgos contacted me
> saying hta the had a fairly reproducible situation
> where this was triggered and it appeared to be an edge case in
> this function that allowed it to try lock the same lock twice.
>
> I immediatly thought "ah=hah!" I may have a solution to this,
> and gave him a copy of my new function and indead it DOES fix that
> panic. however after deleting and recreating intefaces a few hundred
> times without crashing in rt_check_fib() it then fails somewhere else,
> (actually it leacks some resources and eventually networking stops).
>
> I'm not convinced that is a problem with the new or old rt_check() but
> it did stop me from just committing the new code.
>
> I rereading the way the function (did and still does) work it
> occurred to me that there was a large flaw in teh way it worked..
>
> It dropped a the lock on one route while it went off an did something
> else that might block, On returning it blindly re-grabbed that lock,
> completely ignoring the fact that the route might not even be valid any
> more. (or any of several other things that may have changed while
> it was away (maybe sleeping)).
>
> the code Giorgos is referring to is a patch I suggested to him to
> fix this oversight and not the one that I originally tested and
> had suggested to fix the edge case.
>
> I do however ask that some other people look at this patch!
Exactly. Thanks for summarizing this so well :)
I have started a kernel with your latest patch (from the quoted message
above), and I can't panic my kernel with the script that did it in a
semi-reliable manner before:
% root@kobe:/root# while true ; do \
% sh home.sh > /dev/null 2>&1 ; \
% vmstat -z | sed -n -e 1p -e /rt/p ; \
% sleep 1 ; \
% done
% ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
% rtentry: 120, 0, 19, 77, 43, 0
% ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
% rtentry: 120, 0, 20, 76, 47, 0
% ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
% rtentry: 120, 0, 21, 75, 51, 0
% ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
% rtentry: 120, 0, 23, 73, 55, 0
% ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
% rtentry: 120, 0, 24, 72, 59, 0
% ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
% rtentry: 120, 0, 25, 71, 62, 0
% ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
% rtentry: 120, 0, 26, 70, 65, 0
% ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
% rtentry: 120, 0, 27, 69, 69, 0
% ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
% rtentry: 120, 0, 29, 67, 73, 0
% ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
% rtentry: 120, 0, 30, 66, 76, 0
% ^C
% root@kobe:/root# sh home.sh
rtentries seem to be going up every time I cycle through the script,
which essentially brings down both wireless and wired interfaces and
then brings up the wired interface of my laptop. The core of the script
is currently:
# network interface options
export ifconfig_re0="inet 192.168.1.10/24"
export defaultrouter='192.168.1.1'
echo '## Stopping network interfaces.'
/etc/rc.d/netif stop re0 && ifconfig re0 delete
/etc/rc.d/netif stop iwn0 && ifconfig iwn0 delete
echo '## Bringing up network interface.'
/etc/rc.d/netif start re0
echo "## Reloading firewall rules."
/etc/rc.d/pf reload
# The default route may be pointing to another interface. Find out
# the IP address of the default gateway, delete it and point to the
# default gateway configured as ${defaultrouter}.
if [ -n "${defaultrouter}" ]; then
echo '## Setting default router.'
_oldrouter=`netstat -rn | grep default | awk '{print $2}'`
if [ -n "${_oldrouter}" ]; then
route delete default "${_oldrouter}"
unset _oldrouter
fi
route add default "$defaultrouter"
fi
With your version of rt_check_fib() I have no panics so far. This
doesn't mean we don't have a bug elsewhere, or that it will not panic
tomorrow, but it's nice that thing seem a bit more stable now. The old
version of rt_check_fib() used to panic about one third of the time I
ran my 'home.sh' script...
Now an interesting question is: Is it `normal' that the USED rtentry
objects keep going up at every interface restart and are (at least at
first glance) not reclaimed as fast as they are acquired?
home |
help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?87tzcij383.fsf>
