Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 17 Feb 2018 18:34:34 +0100
From:      Mateusz Guzik <mjguzik@gmail.com>
To:        Peter Holm <peter@holm.cc>
Cc:        Konstantin Belousov <kib@freebsd.org>, Mateusz Guzik <mjg@freebsd.org>,  src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org,  svn-src-head@freebsd.org
Subject:   Re: svn commit: r329448 - head/sys/kern
Message-ID:  <CAGudoHGUTP28KOvzbPaeHu1j%2Bw7zhtL4H2dMpM58vhi64_=5jg@mail.gmail.com>
In-Reply-To: <20180217163822.GA81555@x2.osted.lan>
References:  <201802170848.w1H8mkfb081764@repo.freebsd.org> <20180217112738.GO94212@kib.kiev.ua> <CAGudoHG%2BXxobj9ziASdW3ugQoxd843K%2BncDSyLbEEGQ20QzAQg@mail.gmail.com> <20180217162632.GQ94212@kib.kiev.ua> <20180217163822.GA81555@x2.osted.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Feb 17, 2018 at 5:38 PM, Peter Holm <peter@holm.cc> wrote:

> On Sat, Feb 17, 2018 at 06:26:32PM +0200, Konstantin Belousov wrote:
> > On Sat, Feb 17, 2018 at 05:07:07PM +0100, Mateusz Guzik wrote:
> > > On Sat, Feb 17, 2018 at 01:27:38PM +0200, Konstantin Belousov wrote:
> > > > On Sat, Feb 17, 2018 at 08:48:46AM +0000, Mateusz Guzik wrote:
> > > > > Author: mjg
> > > > > Date: Sat Feb 17 08:48:45 2018
> > > > > New Revision: 329448
> > > > > URL: https://svnweb.freebsd.org/changeset/base/329448
> > > > >
> > > > > Log:
> > > > >   exit: get rid of PROC_SLOCK when checking a process to report
> > > > Was this tested ?
> > > >
> > >
> > > I was trussing multithreaded microbenchmarks, no issues.
> > >
> > > > In particular, are you aware of r309539 ?
> > > >
> > >
> > > So it looks like I misread the code - I have grepped
> > > thread_suspend_switch operating with the proc locked and misread
> > > thread_suspend_one's assert as PROC_LOCK_ASSERT.
> > >
> > > That said, I think this is harmless. Regardless of the lock the
> > > inspecting thread can race and check "too soon". Even for a case where
> > > it decides to report, I don't see anything which would depend on the
> > > suspending thread to finish.
> > It was definitely not harmless when I tried to avoid the spin lock there,
> > but I do not remember exact failure mode.  Most likely, it was a missed
> > report of the traced child indeed, but I am not sure that truss triggered
> > it.  Most likely, Peter Holm was the reporter, since he is listed in
> > the commit.
> >
>
> I ran a truss(1) test on r329456 and it fails. I have not had a
> chance to look closer at this, but this is what I see:
>
> [root@mercat1 /home/pho]# pgrep truss | xargs ps -Hlp
> UID   PID  PPID CPU PRI NI   VSZ  RSS MWCHAN STAT TT     TIME COMMAND
>   0 41149 41118   0  52  0 11532 2588 wait   I     0  0:01.38 truss
> /tmp/ttruss 10
>   0 41151 41149   0  52  0 13156 2300 -      TX    0  0:00.98 /tmp/ttruss
> 10
>   0 41151 41149   0  52  0 13156 2300 -      TX    0  0:00.00 /tmp/ttruss
> 10
> [root@mercat1 /home/pho]# procstat -k 41151
>   PID    TID COMM                TDNAME              KSTACK
> 41151 100211 ttruss              -                   mi_switch
> thread_suspend_switch ptracestop amd64_syscall fast_syscall_common
> 41151 100765 ttruss              -                   mi_switch
> thread_suspend_check ast doreti_ast
> [root@mercat1 /home/pho]#
>
>
Ok, I reproduced the bug with your script. I reverted the change.

The patch I mailed in this thread fixes it for me. Below is a variant
which can be applied on top of fresh head:

https://people.freebsd.org/~mjg/wait6_slock.diff

Now that the bug got reported it is rather obvious:
the suspending thread does lock -> wakeup -> slock -> unlock -> sunlock ->
sleep

Only locking the proc puts as in a spot where we are past the wakeup,
but before it gets the chance to bump the counter.
On the other hand if we slock, we effectively wait for it to bump and go
to sleep after which we see what's going on.

-- 
Mateusz Guzik <mjguzik gmail.com>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGudoHGUTP28KOvzbPaeHu1j%2Bw7zhtL4H2dMpM58vhi64_=5jg>