Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 15 Oct 2004 14:18:09 +0200
From:      Marc "UBM" Bocklet <ubm@u-boot-man.de>
To:        Robert Watson <rwatson@freebsd.org>
Cc:        current@freebsd.org
Subject:   Re: [BETA7-panic] sodealloc(): so_count 1
Message-ID:  <20041015141809.3f1e1062.ubm@u-boot-man.de>
In-Reply-To: <Pine.NEB.3.96L.1041015062126.84384k-100000@fledge.watson.org>
References:  <20041015113321.126a6c4d.ubm@u-boot-man.de> <Pine.NEB.3.96L.1041015062126.84384k-100000@fledge.watson.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 15 Oct 2004 06:24:47 -0400 (EDT)
Robert Watson <rwatson@freebsd.org> wrote:

> On Fri, 15 Oct 2004, Marc UBM Bocklet wrote:
> 
> > > Sounds good.  I know that the problem Brian identified is a real
> > > race and a potential source of precisely the panic you were
> > > seeing.  One reason I was interested in getting access to a dump
> > > from the panic, though, was to(if possible) confirm that it was
> > > *the* race causing the problem.  It's a very likely candidate, but
> > > it would be good to know if we should be looking for another
> > > related race.  If the code now in HEAD fixes it for you, please
> > > let me know (or if not, also :-).  If it doesn't, the core would
> > > be very helpful.
> > 
> > Ok, bad news first: 
> > 
> > I just got exactly the same panic with Brian's
> > tcp_accept_race_crash.patch applied. 
> > 
> > Debug output is attached, but it looks just like the last time. 
> > 
> > The good news: 
> > 
> > I got a coredump that I can poke. :-) 
> > 
> > So now I just need to know what info to extract from the dump :-) 
> 
> It would be interesting to have you try with the current head of
> RELENG_5, which now includes my fix, which is a little different from
> Brian's fix in the sense that it tries to rewrite things less (since
> that code is very sensitive to change).
> 
> Regarding the dump -- wonderful.  Here's what I'd like you to do.  In
> one of the sofree/sodealloc frames, I'd like to see the contents of
> *so, to see what state the socket is in. 

Ok, I did 

frame 23
list
print *so

and got:

http://www.u-boot-man.de/~mbocklet/content_so.txt

Content of so in frame 24 is the same.


> If you move up a few frames to
> in_pcbdetach(), the contents of *inp would be very useful, and up

Ok, here they are:

http://www.u-boot-man.de/~mbocklet/content_inp.txt

> another frame or so to the tcp_close() frame, *tp.  I don't know how

Hmm, something seems to be wrong there, since:

(kgdb) frame 26
#26 0xc065532a in tcp_close (tp=0x0) at
#/usr/src/sys/netinet/tcp_subr.c:785
785                     in_pcbdetach(inp);
(kgdb) list
780     #ifdef INET6
781             if (INP_CHECK_SOCKAF(so, AF_INET6))
782                     in6_pcbdetach(inp);
783             else
784     #endif
785                     in_pcbdetach(inp);
786             tcpstat.tcps_closed++;
787             return (NULL);
788     }
789
(kgdb) print *tp
Cannot access memory at address 0x0
(kgdb)

But if I try to get the contens of tp in tcp_input, it works:

http://www.u-boot-man.de/~mbocklet/content_tp.txt


> fmiliar you are with our kernel debugging suite, but if you're not the
> documentation in the handbook is fairly decent.  The one caveat I'd
> give is that that documentation might still reference "gdb -k" instead
> of "kgdb" to work with the core dump.

Well, let's say the documentation pointed me in the right direction ;-)

 
> Thanks for your help on this one -- I'm still unable to reproduce the
> problem in my testbeds, so having someone who's willing to keep
> following through on the bug is really invaluable!
> 
> Thanks

You're welcome :-)

Bye
Marc


-- 
"And what rough beast, its hour come round at last,
Slouches towards Bethlehem to be born?"

W.B. Yeats, The Second Coming



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20041015141809.3f1e1062.ubm>