Date: Fri, 15 Oct 2004 14:18:09 +0200 From: Marc "UBM" Bocklet <ubm@u-boot-man.de> To: Robert Watson <rwatson@freebsd.org> Cc: current@freebsd.org Subject: Re: [BETA7-panic] sodealloc(): so_count 1 Message-ID: <20041015141809.3f1e1062.ubm@u-boot-man.de> In-Reply-To: <Pine.NEB.3.96L.1041015062126.84384k-100000@fledge.watson.org> References: <20041015113321.126a6c4d.ubm@u-boot-man.de> <Pine.NEB.3.96L.1041015062126.84384k-100000@fledge.watson.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 15 Oct 2004 06:24:47 -0400 (EDT) Robert Watson <rwatson@freebsd.org> wrote: > On Fri, 15 Oct 2004, Marc UBM Bocklet wrote: > > > > Sounds good. I know that the problem Brian identified is a real > > > race and a potential source of precisely the panic you were > > > seeing. One reason I was interested in getting access to a dump > > > from the panic, though, was to(if possible) confirm that it was > > > *the* race causing the problem. It's a very likely candidate, but > > > it would be good to know if we should be looking for another > > > related race. If the code now in HEAD fixes it for you, please > > > let me know (or if not, also :-). If it doesn't, the core would > > > be very helpful. > > > > Ok, bad news first: > > > > I just got exactly the same panic with Brian's > > tcp_accept_race_crash.patch applied. > > > > Debug output is attached, but it looks just like the last time. > > > > The good news: > > > > I got a coredump that I can poke. :-) > > > > So now I just need to know what info to extract from the dump :-) > > It would be interesting to have you try with the current head of > RELENG_5, which now includes my fix, which is a little different from > Brian's fix in the sense that it tries to rewrite things less (since > that code is very sensitive to change). > > Regarding the dump -- wonderful. Here's what I'd like you to do. In > one of the sofree/sodealloc frames, I'd like to see the contents of > *so, to see what state the socket is in. Ok, I did frame 23 list print *so and got: http://www.u-boot-man.de/~mbocklet/content_so.txt Content of so in frame 24 is the same. > If you move up a few frames to > in_pcbdetach(), the contents of *inp would be very useful, and up Ok, here they are: http://www.u-boot-man.de/~mbocklet/content_inp.txt > another frame or so to the tcp_close() frame, *tp. I don't know how Hmm, something seems to be wrong there, since: (kgdb) frame 26 #26 0xc065532a in tcp_close (tp=0x0) at #/usr/src/sys/netinet/tcp_subr.c:785 785 in_pcbdetach(inp); (kgdb) list 780 #ifdef INET6 781 if (INP_CHECK_SOCKAF(so, AF_INET6)) 782 in6_pcbdetach(inp); 783 else 784 #endif 785 in_pcbdetach(inp); 786 tcpstat.tcps_closed++; 787 return (NULL); 788 } 789 (kgdb) print *tp Cannot access memory at address 0x0 (kgdb) But if I try to get the contens of tp in tcp_input, it works: http://www.u-boot-man.de/~mbocklet/content_tp.txt > fmiliar you are with our kernel debugging suite, but if you're not the > documentation in the handbook is fairly decent. The one caveat I'd > give is that that documentation might still reference "gdb -k" instead > of "kgdb" to work with the core dump. Well, let's say the documentation pointed me in the right direction ;-) > Thanks for your help on this one -- I'm still unable to reproduce the > problem in my testbeds, so having someone who's willing to keep > following through on the bug is really invaluable! > > Thanks You're welcome :-) Bye Marc -- "And what rough beast, its hour come round at last, Slouches towards Bethlehem to be born?" W.B. Yeats, The Second Coming
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20041015141809.3f1e1062.ubm>