Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 15 Oct 2004 14:18:09 +0200
From:      Marc "UBM" Bocklet <ubm@u-boot-man.de>
To:        Robert Watson <rwatson@freebsd.org>
Cc:        current@freebsd.org
Subject:   Re: [BETA7-panic] sodealloc(): so_count 1
Message-ID:  <20041015141809.3f1e1062.ubm@u-boot-man.de>
In-Reply-To: <Pine.NEB.3.96L.1041015062126.84384k-100000@fledge.watson.org>
References:  <20041015113321.126a6c4d.ubm@u-boot-man.de> <Pine.NEB.3.96L.1041015062126.84384k-100000@fledge.watson.org>

index | next in thread | previous in thread | raw e-mail

On Fri, 15 Oct 2004 06:24:47 -0400 (EDT)
Robert Watson <rwatson@freebsd.org> wrote:

> On Fri, 15 Oct 2004, Marc UBM Bocklet wrote:
> 
> > > Sounds good.  I know that the problem Brian identified is a real
> > > race and a potential source of precisely the panic you were
> > > seeing.  One reason I was interested in getting access to a dump
> > > from the panic, though, was to(if possible) confirm that it was
> > > *the* race causing the problem.  It's a very likely candidate, but
> > > it would be good to know if we should be looking for another
> > > related race.  If the code now in HEAD fixes it for you, please
> > > let me know (or if not, also :-).  If it doesn't, the core would
> > > be very helpful.
> > 
> > Ok, bad news first: 
> > 
> > I just got exactly the same panic with Brian's
> > tcp_accept_race_crash.patch applied. 
> > 
> > Debug output is attached, but it looks just like the last time. 
> > 
> > The good news: 
> > 
> > I got a coredump that I can poke. :-) 
> > 
> > So now I just need to know what info to extract from the dump :-) 
> 
> It would be interesting to have you try with the current head of
> RELENG_5, which now includes my fix, which is a little different from
> Brian's fix in the sense that it tries to rewrite things less (since
> that code is very sensitive to change).
> 
> Regarding the dump -- wonderful.  Here's what I'd like you to do.  In
> one of the sofree/sodealloc frames, I'd like to see the contents of
> *so, to see what state the socket is in. 

Ok, I did 

frame 23
list
print *so

and got:

http://www.u-boot-man.de/~mbocklet/content_so.txt

Content of so in frame 24 is the same.


> If you move up a few frames to
> in_pcbdetach(), the contents of *inp would be very useful, and up

Ok, here they are:

http://www.u-boot-man.de/~mbocklet/content_inp.txt

> another frame or so to the tcp_close() frame, *tp.  I don't know how

Hmm, something seems to be wrong there, since:

(kgdb) frame 26
#26 0xc065532a in tcp_close (tp=0x0) at
#/usr/src/sys/netinet/tcp_subr.c:785
785                     in_pcbdetach(inp);
(kgdb) list
780     #ifdef INET6
781             if (INP_CHECK_SOCKAF(so, AF_INET6))
782                     in6_pcbdetach(inp);
783             else
784     #endif
785                     in_pcbdetach(inp);
786             tcpstat.tcps_closed++;
787             return (NULL);
788     }
789
(kgdb) print *tp
Cannot access memory at address 0x0
(kgdb)

But if I try to get the contens of tp in tcp_input, it works:

http://www.u-boot-man.de/~mbocklet/content_tp.txt


> fmiliar you are with our kernel debugging suite, but if you're not the
> documentation in the handbook is fairly decent.  The one caveat I'd
> give is that that documentation might still reference "gdb -k" instead
> of "kgdb" to work with the core dump.

Well, let's say the documentation pointed me in the right direction ;-)

 
> Thanks for your help on this one -- I'm still unable to reproduce the
> problem in my testbeds, so having someone who's willing to keep
> following through on the bug is really invaluable!
> 
> Thanks

You're welcome :-)

Bye
Marc


-- 
"And what rough beast, its hour come round at last,
Slouches towards Bethlehem to be born?"

W.B. Yeats, The Second Coming


help

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20041015141809.3f1e1062.ubm>