From owner-svn-src-head@freebsd.org Sat Feb 17 17:34:35 2018 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C5AEAF228D5; Sat, 17 Feb 2018 17:34:35 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-qt0-x244.google.com (mail-qt0-x244.google.com [IPv6:2607:f8b0:400d:c0d::244]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 660978AABE; Sat, 17 Feb 2018 17:34:35 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by mail-qt0-x244.google.com with SMTP id f4so7606613qtj.6; Sat, 17 Feb 2018 09:34:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=SNk8loT5L613jMSozrtvnPdxxFvqJv1Z/FIBCsfWf8A=; b=IO2tu6sg7Yqi0JS4W8N+nsnGVHkdI0PHL7WWEMUuSJUVAFaPUwSI8Rcx+pozFofh3H tHBg6d3MMzy6CCatUijxUk2RghOt8akclnFFMv5frd3uMR/YWAPRxzVyq7eOMjq/sPj8 dIXcTGklm86p+ZlUEcjmj3+KZ9SHlywYC5b8YjATKLX07X2ZC8V94FZRNSfI4ossb7EZ q9O1m7V+7DoUHnrq49pwsGSaG7cniY3xOlrFTQpJQ97zfz1G/TAYEspm/3fnp0R+2NRM sL5ATV5Lv9f9+loH+LVA0TwERQsHTkc5il5o18vY696n87nCXLMUffe4viOdfkEoCslN 3xVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=SNk8loT5L613jMSozrtvnPdxxFvqJv1Z/FIBCsfWf8A=; b=UAgoyT/sn3bdvLmd1Xj94BF3bColgKNikW4LIy1FNqWYHqpEPVsCXGA/tOdHyhTZ7q DgWzdOJQmReqmAO2y03dy123pxpqqD3fKn4I28un/knqtVBICcC7pK2FBY3rHbSZmUgB zchViRXJ6DeHR1u3JdV3nMjmTkj1n8M2lNDwzIuDk+xUgf+XrslimFWAdLxKc6k4zJRR eKFOzHmao0+XMrJ5X3lUjttUIdXkOaeLsBjfxrQdFdPfFxLsL90RSe2RwqmFXKN3zCHE CdMBrvgylaohWCgPyrhu+yoNbzp5ELu7Lt9vW5yOJpuPIC/lTN0stSWk4P6OrW7dtmXT wtjw== X-Gm-Message-State: APf1xPB/u+jLLmjFywNUO3ffCSJtvouZFKsXyImfcFZnxEAX6HRwtSk6 skarda+OyPVMEg2YDUUSCo1AZaaHMRPQtwjN/+ErQA== X-Google-Smtp-Source: AH8x224YYYcaYzeLHLuQvQVesAu8zrUj5V5bTE1+gIPkBZ7VHUH0Xr1d0skEzuDkqM9+H9B7XtmGcpzsmVhdeoV5uNE= X-Received: by 10.200.48.13 with SMTP id f13mr16239329qte.140.1518888874940; Sat, 17 Feb 2018 09:34:34 -0800 (PST) MIME-Version: 1.0 Received: by 10.237.58.99 with HTTP; Sat, 17 Feb 2018 09:34:34 -0800 (PST) In-Reply-To: <20180217163822.GA81555@x2.osted.lan> References: <201802170848.w1H8mkfb081764@repo.freebsd.org> <20180217112738.GO94212@kib.kiev.ua> <20180217162632.GQ94212@kib.kiev.ua> <20180217163822.GA81555@x2.osted.lan> From: Mateusz Guzik Date: Sat, 17 Feb 2018 18:34:34 +0100 Message-ID: Subject: Re: svn commit: r329448 - head/sys/kern To: Peter Holm Cc: Konstantin Belousov , Mateusz Guzik , src-committers , svn-src-all@freebsd.org, svn-src-head@freebsd.org Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.25 X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Feb 2018 17:34:36 -0000 On Sat, Feb 17, 2018 at 5:38 PM, Peter Holm wrote: > On Sat, Feb 17, 2018 at 06:26:32PM +0200, Konstantin Belousov wrote: > > On Sat, Feb 17, 2018 at 05:07:07PM +0100, Mateusz Guzik wrote: > > > On Sat, Feb 17, 2018 at 01:27:38PM +0200, Konstantin Belousov wrote: > > > > On Sat, Feb 17, 2018 at 08:48:46AM +0000, Mateusz Guzik wrote: > > > > > Author: mjg > > > > > Date: Sat Feb 17 08:48:45 2018 > > > > > New Revision: 329448 > > > > > URL: https://svnweb.freebsd.org/changeset/base/329448 > > > > > > > > > > Log: > > > > > exit: get rid of PROC_SLOCK when checking a process to report > > > > Was this tested ? > > > > > > > > > > I was trussing multithreaded microbenchmarks, no issues. > > > > > > > In particular, are you aware of r309539 ? > > > > > > > > > > So it looks like I misread the code - I have grepped > > > thread_suspend_switch operating with the proc locked and misread > > > thread_suspend_one's assert as PROC_LOCK_ASSERT. > > > > > > That said, I think this is harmless. Regardless of the lock the > > > inspecting thread can race and check "too soon". Even for a case where > > > it decides to report, I don't see anything which would depend on the > > > suspending thread to finish. > > It was definitely not harmless when I tried to avoid the spin lock there, > > but I do not remember exact failure mode. Most likely, it was a missed > > report of the traced child indeed, but I am not sure that truss triggered > > it. Most likely, Peter Holm was the reporter, since he is listed in > > the commit. > > > > I ran a truss(1) test on r329456 and it fails. I have not had a > chance to look closer at this, but this is what I see: > > [root@mercat1 /home/pho]# pgrep truss | xargs ps -Hlp > UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND > 0 41149 41118 0 52 0 11532 2588 wait I 0 0:01.38 truss > /tmp/ttruss 10 > 0 41151 41149 0 52 0 13156 2300 - TX 0 0:00.98 /tmp/ttruss > 10 > 0 41151 41149 0 52 0 13156 2300 - TX 0 0:00.00 /tmp/ttruss > 10 > [root@mercat1 /home/pho]# procstat -k 41151 > PID TID COMM TDNAME KSTACK > 41151 100211 ttruss - mi_switch > thread_suspend_switch ptracestop amd64_syscall fast_syscall_common > 41151 100765 ttruss - mi_switch > thread_suspend_check ast doreti_ast > [root@mercat1 /home/pho]# > > Ok, I reproduced the bug with your script. I reverted the change. The patch I mailed in this thread fixes it for me. Below is a variant which can be applied on top of fresh head: https://people.freebsd.org/~mjg/wait6_slock.diff Now that the bug got reported it is rather obvious: the suspending thread does lock -> wakeup -> slock -> unlock -> sunlock -> sleep Only locking the proc puts as in a spot where we are past the wakeup, but before it gets the chance to bump the counter. On the other hand if we slock, we effectively wait for it to bump and go to sleep after which we see what's going on. -- Mateusz Guzik