Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 19 Sep 2008 14:36:36 +0300
From:      "Oleg V. Nauman" <oleg@opentransfer.com>
To:        Robert Watson <rwatson@FreeBSD.org>
Cc:        freebsd-stable@FreeBSD.org
Subject:   Re: RELENG_7: something is very wrong with UDP?
Message-ID:  <20080919143636.p661cjfopw44osco@webmail.opentransfer.com>
In-Reply-To: <alpine.BSF.1.10.0809182005570.16464@fledge.watson.org>
References:  <20080918180543.pt7s2zmaio48ww8g@webmail.opentransfer.com> <alpine.BSF.1.10.0809182005570.16464@fledge.watson.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Quoting Robert Watson <rwatson@FreeBSD.org>:

> On Thu, 18 Sep 2008, Oleg V. Nauman wrote:
>
>> It seems to be something is very wrong with UDP on latest RELENG_7
>>
>> Well some symptoms I have seen today when I was trying to boot  =20
>> newly compiled RELENG_7 on my laptop:
>>
>> a) rc scripts indefinitely waiting on logger to be completed during =20
>>  the boot ( devd and ifconfig are good examples)
>
> If you hit "ctrl-t" while these are waiting, what is the output?

load: 0.00 cmd: logger [nanslp] 0.00u 0.07s 0% 832k

>
>> b) Sporadic DNS request failures
>
> I don't know what your comfortable level with debugging tools is, but
> if you're happy using tcpdump, etc, I think I'd recommend diagnosing
> this directly that way.  I'd probably do something like this:
>
> (1) Start by deleting all but one nameserver entry in /etc/resolv.conf.
>     Confirm that you can still reproduce the problem.

  Due to various reasons my laptop running local caching DNS server ( =20
named ) without any forwarders assigned. My /etc/resolv.conf contains
nameserver 127.0.0.1


>
> (2) Use dig(1) and tcpdump(1) to watch wire-level DNS behavior -- do you s=
ee
>     queries go out?  Do you see replies come back?  Is dig "waking up" and
>     seeing the replies when they arrive, or is there a delay or hang in di=
g?
>     If dig hangs, what does ctrl-t show the sleep state (wmesg) is?

  Will try do dig into when it occurs again

> Could you
>     also use procstat -k on the dig process to generate a kernel stack tra=
ce
>     for it?
>
>> c) traceroute prints 0.00 like response time for every host
>>
>> d) was unable to reboot my laptop performing shutdown -r ( due to  =20
>> logger/syslog related issues I think)
>
> Could you try killing syslogd by hand and see if it dies?  If not, can
> you use procstat -kk to generate a stack trace for it?

  syslogd killing not helps..
Here is procstat -kk output for "shutdown -r now" process waiting on =20
something:

   PID    TID COMM             TDNAME           KSTACK
  1447 100098 shutdown         -                mi_switch+0x2c8 =20
sleepq_switch+0xd9 sleepq_catch_signals+0x239 =20
sleepq_timedwait_sig+0x17 _sleep+0x339 kern_nanosleep+0xc1 =20
nanosleep+0x6f syscall+0x2b3 Xint0x80_syscall+0x20

And procstat -kk output for logger process waiting:

   PID    TID COMM             TDNAME           KSTACK
  1421 100095 logger           -                mi_switch+0x2c8 =20
sleepq_switch+0xd9 sleepq_catch_signals+0x239 sleepq_wait_sig+0x14 =20
_sleep+0x35f pipe_read+0x389 dofileread+0x96 kern_readv+0x58 read+0x4f =20
syscall+0x2b3 Xint0x80_syscall+0x20

>
>> e ) I was unable to start X session ( it seems to be freezes laptop =20
>>  because I was unable to switch to another virtual console even)
>>
>> csup "backout" to date=3D2008.09.15.12.00.00 and recompiling the  =20
>> kernel fixes this issue for me.
>
> This is approximately the date of my last UDP MFC.  Could you try
> backing out just src/sys/netinet6/udp6_usrreq.c revision 1.81.2.7 and
> see if that helps? (specifically, restore the use of sosend_generic
> instead of sosend_dgram)
>
> Could you confirm that either you're not using any kernel modules from
> ports, or that if you are, you have recompiled them with your most
> recent update?

  I'm not using any third party kernel modules at this moment.

>
> Could you try compiling your kernel with WITNESS to see if we get any
> extended debugging information?

  Have added WITNESS ( and STACK required by procstat ) options but it =20
is not producing any output ( so no LORs or something like this )

>
>> Is anybody experiencing the same issues with fresh RELENG_7? Unsure =20
>>  it is my local issues though
>
> I'm not experiencing them, but these sorts of things can be quite
> subtle and workload-dependent.

  Well experiencing this issue during the system boot even..

>
>
> Robert N M Watson
> Computer Laboratory
> University of Cambridge





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080919143636.p661cjfopw44osco>