Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 29 Oct 1999 10:53:29 +0100 (BST)
From:      Doug Rabson <dfr@nlsystems.com>
To:        Andrew Gallatin <gallatin@cs.duke.edu>
Cc:        freebsd-hackers@freebsd.org, freebsd-alpha@freebsd.org
Subject:   Re: ip forwarding broken on alpha
Message-ID:  <Pine.BSF.4.10.9910291027070.331-100000@salmon.nlsystems.com>
In-Reply-To: <14360.62787.116526.830259@grasshopper.cs.duke.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 28 Oct 1999, Andrew Gallatin wrote:

> 
> Andrew Gallatin writes:
>  > 
>  > I have an older AlphaStation 600 5/266 running -current (cvsupped
>  > last week) which is setup as a router between 2 100mb networks.  When
>  > the machine is pushed fairly hard (like running a netperf -tUDP_STREAM
>  > -- -m 100 across the router, eg about 10-20k 100byte packets/sec ) the
>  > alpha falls over almost instantly.  I have not enabled any NAT or
>  > firewall functionality, just ip forwarding.
> 
> <...>
> 
>  > 
>  > This might be a red herring, but I've found that if I run the entire
>  > ip_input path under splnet() (added splnet() around the call to
>  > ip_input() in ipintr().), things get a hell of a lot more stable.
>  > Rather than crashing in a few seconds, it sometimes takes minutes.
>  > And rather than an illegal access, I tend to run out of kernel stack
>  > space ( either a panic("possible stack overflow\n"); in
>  > alpha/alpha/interrupt.c, or I end up in the SRM console after calling
>  > halt from a PC which isn't in the kernel, which smells like an overrun
>  > stack to me).  I'm not sure if this is related, or if it is a separate
>  > problem entirely.
> 
> That was it.
> 
> The problem is that the interrupt handler returns through
> exception_return, like the trap handler does.  Exception_return checks
> to see if the last ipl the system was at was 0.  If it was, it
> eventually lowers the ipl to zero and checks for a pending ast.  This
> was the problem.  If you're getting interrupts quickly enough, there's
> large window when you're still running on the interrupt stack where
> you're sitting at ipl0 and you can get another interrupt & build onto
> that stack.  If you're getting 40,000 interrupts per second
> (forwarding 20,000 packets/sec), this can build up & rapidly run you
> out of stack space.
> 
> I've found the system can forward 70,000 packets per second & remain
> perfectly stable with the appended patch.  I'm not terribly good at
> assembler, so rather than try to be tricky & check to see if the
> current ipl is >= 4 (handling a device interrupt), I simply copied 
> exception_return & skipped the ipl lowering & the check for an ast
> since I don't think you're ever going to need to check for an ast
> after an interrupt.  
> 
> I have NFC why mclfree was getting trashed, but it must have been
> caused by running out of stack space as the appended patch seems to
> take care of everything.
> 
> Doug -- should I commit this as-is, or do you want to take a more
> refined approach?

I think the intention of ASTs is that they are generated whenever you are
returning to user mode. This patch will essentially defer the AST until
the next system call which might be unacceptable.

I can see the window and its a serious problem but I'm worried about
fixing it in this way. What I really want is some way to generate a 'real'
AST after the PALcode has dropped the exception frame for the interrupt.
Without changing to use the VMS palcode, we aren't going to get that
though :-). (ASTs and SWIs are derived from the way VAXen work and the VMS
palcode emulates the old vax behaviour).

The main problem as I see it is that we are dropping the IPL to zero
before calling the ast. I don't see why we are doing this at all. We
should be able to just call the ast without changing the ipl at all. This
still leaves a window in do_sir (which lowers the IPL to 1) though.

Perhaps, SWIs should be handled by using another kernel thread which can
be switched to instead of calling do_sir. I have to think about that some
more. Could you test just removing the swpipl(0) code and see if it
improves things, thanks.

--
Doug Rabson				Mail:  dfr@nlsystems.com
Nonlinear Systems Ltd.			Phone: +44 181 442 9037




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.10.9910291027070.331-100000>