From owner-freebsd-current  Sat Nov  2  2:27:16 2002
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 044A537B401
	for <current@freebsd.org>; Sat,  2 Nov 2002 02:27:14 -0800 (PST)
Received: from prg.traveller.cz (prg.traveller.cz [193.85.2.77])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 4BD6443E42
	for <current@freebsd.org>; Sat,  2 Nov 2002 02:27:13 -0800 (PST)
	(envelope-from mime@traveller.cz)
Received: from prg.traveller.cz (localhost [127.0.0.1])
	by prg.traveller.cz (8.12.2[KQ/pukvis]/8.12.2-prg) with ESMTP id gA2ARCiT095400;
	Sat, 2 Nov 2002 11:27:12 +0100 (CET)
Received: from localhost (mime@localhost)
	by prg.traveller.cz (8.12.2[KQ/pukvis]/8.12.2-prg/submit) with ESMTP id gA2ARCuU095397;
	Sat, 2 Nov 2002 11:27:12 +0100 (CET)
Date: Sat, 2 Nov 2002 11:27:12 +0100 (CET)
From: Michal Mertl <mime@traveller.cz>
To: Terry Lambert <tlambert2@mindspring.com>
Cc: Bill Fenner <fenner@research.att.com>, <current@freebsd.org>
Subject: Re: crash with network load (in tcp syncache ?)
In-Reply-To: <3DC32598.A0D0909A@mindspring.com>
Message-ID: <Pine.BSF.4.41.0211020937210.87031-100000@prg.traveller.cz>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-current.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-current>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-current>
X-Loop: FreeBSD.ORG

On Fri, 1 Nov 2002, Terry Lambert wrote:

> Bill Fenner wrote:
> > >I think this can still crash (just like my patch); the problem is in
> > >what happens when it fails to allocate memory.  Unless you set one of
> > >the flags, it's still going to panic in the same place, I think, when
> > >you run out of memory.
> >
> > No.  The flags are only checked when so_head is not NULL.  sonewconn()
> > was handing sofree() an inconsistent struct so (so_head was set without
> > being on either queue), i.e. sonewconn() was creating an invalid data
> > structure.
>
> You're right... I missed that; I was thinking too hard on the other
> situations (e.g. soabort()) that could trigger that code, and no
> enough on the code itself.
>
> > The call in sonewconn() used to be to sodealloc(), which didn't care
> > about whether or not the data structure was self-consistent.  The code
> > was refactored to do reference counting, but the fact that the socket
> > was inconsistent at that point wasn't noticed until now.
>
> Yeah; I looked at doing a ref() of the thing as a partial fix,
> but the unref() did the sotryfree() anyway.
>
>
> > The problem is not at all based on what happens in the allocation or
> > protocol attach failure cases.  The SYN cache is not involved, this is
> > a bug in sonewconn(), plain and simple.
>
> I still think there is a potential failure case, but the amount of
> code you'd have to read through to follow it is immense.  It has to
> do with the conection completing at NETISR, instead of in a process
> context, in the allocation failure case.  I ran into the same issue
> when trying to run connections to completion up to the accept() at
> interrupt, in the LRP case.  The SYN cache case is very similar, in
> the case of a cookie that hits when there are no resources remaining.
> He might be able to trigger it with his setup, by setting the cache
> size way, way don, and thus relying on cookies, and then flooding it
> with conection requests until he runs it out of resources.

Do I read you correctly that Bill's patch is probably better than yours
(I tested both, both fix the problem)?

If you still believe there's a problem (bug) I may trigger with some
setting please tell me. I don't know how to make syncookies kick in - I
set net.inet.tcp.cachelimit to 100 but it doesn't seem to make a
difference but I don't know what am I doing :-). I imagine syncache
doesn't grow much when I'm connecting from signle IP and connections are quickly
eastablished. I'll be able to do some tests on monday - this is a computer
at work.

FWIW netstat -m during the benchmark run shows (I read it that it doesn't
have problem - even just before the crash):

mbuf usage:
        GEN list:       0/0 (in use/in pool)
        CPU #0 list:    71/160 (in use/in pool)
        CPU #1 list:    79/160 (in use/in pool)
        Total:          150/320 (in use/in pool)
        Maximum number allowed on each CPU list: 512
        Maximum possible: 34560
        Allocated mbuf types:
          80 mbufs allocated to data
          70 mbufs allocated to packet headers
        0% of mbuf map consumed
mbuf cluster usage:
        GEN list:       0/0 (in use/in pool)
        CPU #0 list:    38/114 (in use/in pool)
        CPU #1 list:    41/104 (in use/in pool)
        Total:          79/218 (in use/in pool)
        Maximum number allowed on each CPU list: 128
        Maximum possible: 17280
        1% of cluster map consumed
516 KBytes of wired memory reserved (37% in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines


-- 
Michal Mertl
mime@traveller.cz


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message