Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 10 Aug 2018 19:01:36 -0700
From:      John-Mark Gurney <jmg@funkthat.com>
To:        Ian Lepore <ian@freebsd.org>
Cc:        freebsd-arm@FreeBSD.org
Subject:   Re: sx_sleep not waking up when timo expires
Message-ID:  <20180811020136.GD97145@funkthat.com>
In-Reply-To: <1532874944.61594.110.camel@freebsd.org>
References:  <20180729010157.GC2884@funkthat.com> <1532874944.61594.110.camel@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Ian Lepore wrote this message on Sun, Jul 29, 2018 at 08:35 -0600:
> On Sat, 2018-07-28 at 18:01 -0700, John-Mark Gurney wrote:
> > I recently upgraded my router to an Pine A64-LTS board, and have hit
> > the same issue as PR 222126[1].  The solution at the end does not work
> > for me, as I do not have that line in my loader.conf:
> > kern.timecounter.smp_tsc_adjust=1
> > 
> > I have verified that the wake up does not happen, as I used a dtrace
> > script to verify that pf_purge_expired_states is called or not called..
> > When I change the timeout, pf will kick the thread and get things
> > running again, but it has stopped a couple times later...
> > 
> > I'm running a recent SNAPSHOT:
> > FreeBSD gate2.funkthat.com 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r336134: Mon Jul  9 19:20:11 UTC 2018     root@releng3.nyi.freebsd.org:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC  arm64
> > 
> > This is likely reproducable by just starting pf, even in a pass all
> > mode, and watching for when the function stops getting called...  I'll
> > see if I can't get an extermely minimal config to reproduce it.
> > 
> > Any suggestions?
> > 
> > [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=222126
> > 
> 
> Sounds like
> 
>  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229644 
> 
> which has some patches attached which reduce but don't quite eliminate
> the occurrances, so nothing has been committed yet. I just ordered a
> SOPINE board so I can do some hands-on debugging.

That patch does not fix the problem..  didn't even take 2 days w/ the
patch applied before I got the failure...  I had the pf thread stop w/
that patch applied..

# ps laxwww | grep 'pf purge' | grep -v grep; sleep 5; ps laxwww | grep 'pf purge' | grep -v grep
  0   614     0   0 -16  0     0    16 pftm     DL    -    15:15.23 [pf purge]
  0   614     0   0 -16  0     0    16 pftm     DL    -    15:15.23 [pf purge]

There'll be cpu usage even w/o traffic running...  simply loading
pf, and then waiting till the cpu usage time stops incrementing is
another easy way to test for it...

Also, I've had the shell command sleep hang as well.. I figure that's
expected, but made me realized that a good test program could be to
fire up a bunch of threads and sleep in them, to make finding the
problem more quickly....

Anything I can do to help debug/fix it?

I have a couple spare LTS boards specifically to do stuff like this.

Thanks.

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180811020136.GD97145>