Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 2 Nov 2002 11:27:12 +0100 (CET)
From:      Michal Mertl <mime@traveller.cz>
To:        Terry Lambert <tlambert2@mindspring.com>
Cc:        Bill Fenner <fenner@research.att.com>, <current@freebsd.org>
Subject:   Re: crash with network load (in tcp syncache ?)
Message-ID:  <Pine.BSF.4.41.0211020937210.87031-100000@prg.traveller.cz>
In-Reply-To: <3DC32598.A0D0909A@mindspring.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 1 Nov 2002, Terry Lambert wrote:

> Bill Fenner wrote:
> > >I think this can still crash (just like my patch); the problem is in
> > >what happens when it fails to allocate memory.  Unless you set one of
> > >the flags, it's still going to panic in the same place, I think, when
> > >you run out of memory.
> >
> > No.  The flags are only checked when so_head is not NULL.  sonewconn()
> > was handing sofree() an inconsistent struct so (so_head was set without
> > being on either queue), i.e. sonewconn() was creating an invalid data
> > structure.
>
> You're right... I missed that; I was thinking too hard on the other
> situations (e.g. soabort()) that could trigger that code, and no
> enough on the code itself.
>
> > The call in sonewconn() used to be to sodealloc(), which didn't care
> > about whether or not the data structure was self-consistent.  The code
> > was refactored to do reference counting, but the fact that the socket
> > was inconsistent at that point wasn't noticed until now.
>
> Yeah; I looked at doing a ref() of the thing as a partial fix,
> but the unref() did the sotryfree() anyway.
>
>
> > The problem is not at all based on what happens in the allocation or
> > protocol attach failure cases.  The SYN cache is not involved, this is
> > a bug in sonewconn(), plain and simple.
>
> I still think there is a potential failure case, but the amount of
> code you'd have to read through to follow it is immense.  It has to
> do with the conection completing at NETISR, instead of in a process
> context, in the allocation failure case.  I ran into the same issue
> when trying to run connections to completion up to the accept() at
> interrupt, in the LRP case.  The SYN cache case is very similar, in
> the case of a cookie that hits when there are no resources remaining.
> He might be able to trigger it with his setup, by setting the cache
> size way, way don, and thus relying on cookies, and then flooding it
> with conection requests until he runs it out of resources.

Do I read you correctly that Bill's patch is probably better than yours
(I tested both, both fix the problem)?

If you still believe there's a problem (bug) I may trigger with some
setting please tell me. I don't know how to make syncookies kick in - I
set net.inet.tcp.cachelimit to 100 but it doesn't seem to make a
difference but I don't know what am I doing :-). I imagine syncache
doesn't grow much when I'm connecting from signle IP and connections are quickly
eastablished. I'll be able to do some tests on monday - this is a computer
at work.

FWIW netstat -m during the benchmark run shows (I read it that it doesn't
have problem - even just before the crash):

mbuf usage:
        GEN list:       0/0 (in use/in pool)
        CPU #0 list:    71/160 (in use/in pool)
        CPU #1 list:    79/160 (in use/in pool)
        Total:          150/320 (in use/in pool)
        Maximum number allowed on each CPU list: 512
        Maximum possible: 34560
        Allocated mbuf types:
          80 mbufs allocated to data
          70 mbufs allocated to packet headers
        0% of mbuf map consumed
mbuf cluster usage:
        GEN list:       0/0 (in use/in pool)
        CPU #0 list:    38/114 (in use/in pool)
        CPU #1 list:    41/104 (in use/in pool)
        Total:          79/218 (in use/in pool)
        Maximum number allowed on each CPU list: 128
        Maximum possible: 17280
        1% of cluster map consumed
516 KBytes of wired memory reserved (37% in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines


-- 
Michal Mertl
mime@traveller.cz









To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.41.0211020937210.87031-100000>