From owner-freebsd-ppc@freebsd.org Mon Feb 25 23:50:02 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C720A1519791 for ; Mon, 25 Feb 2019 23:50:02 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic314-21.consmr.mail.ne1.yahoo.com (sonic314-21.consmr.mail.ne1.yahoo.com [66.163.189.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8B49C87F53 for ; Mon, 25 Feb 2019 23:50:01 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: RDhR2xMVM1m_1kIA6YwjyxT9mLcW5_nV6QlagfYGoGJ.LmP11FsmtaELcHMZXMy dUwZctqrTrIlegcPdMC4e0cHMWGGf6uxNks1.C_xjBOsEkcbQk4gYLcCld8nHCFXtEk2xSFYvxPq gtI3FEwnrlG5aDV8MCjT2kobY7grjwRbNhrWzA6ib4yHiOPtupjQKbpibjZVIDcfMz2ilo5GQ2YJ LGZTeH.QVSOBXlwQeKIvsqwGP_6nOVdOsywcM3O26uplw6ZfQkvT8L89dHqrpig4RQAuNaP8OBTh LfVwoJLEAADcpFf6AKgN9nNo5YlepbacvvxQ1yHMU8uyYFUhyPTP9owqA7.eJnfYzJUtJEriW.EH lRgRnbxgNr1tM4fqInE.cVcNWRYlN5NaJvKBOURzIDW4_a21q2cVuKesg_YZ5sH1F.jWQLXu0xMD uDAWWRRJHL2h1QN5PQDpgdUnMy2G7P104N6ocpjR9eZYhgywpiWE6Vw3P0lcBssrJ5hckwXFHa7N P.sYqqjprRDDy1us0kIWGx509gTENteCmuq9yT4Dxkyi0hCM.NsjWVsSzwteUlrCGH4BiLHlPJBQ FR1J_RzvvE69tE86jKU2oKVmsgk44ESltRYlV57d7eoZkE5jLshRAqkyt44tQDsHd1DgY1epJWRy t8Cl7O0ksf7.Xf.rupHhdM_zTmaeCIxbdCVNIRzO..SYXfA4HrT1hVzxYtdApdSsywxqID5pcQkx sp5woe68vjM8du6PG1tvbCAygq1IOY3jxNMGZAQLzh.XTE3_QfXLVMbxnkjR5rLP6LchCAKgI._w YujikXZ5YJm6aoeZC1cs0_YWDSFD05dSWuKNYWGVdE9c31_jwwjrKEKIWc5za_Qq2tbDJj2mlV__ 48qkxWUEYSpebyMBKUMx20yaDI3fBJYAkutt8fmCZx2fb09XkqcfUVxde86t7HkgjSWe1LXnT14. DsW6_uC6wqxachY1TkGg03BnpOsQRMxveP5Lo_95Ij67RVzrZ2HZauDAJ2dRMeVQiIMiPJdsiyQm 9jEuWmSdyLmsv7yV7tkft5Hd12duHd1AlUR9LlekJ5UcKFUjs8W4.BaTScrXQqx8- Received: from sonic.gate.mail.ne1.yahoo.com by sonic314.consmr.mail.ne1.yahoo.com with HTTP; Mon, 25 Feb 2019 23:50:00 +0000 Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.115]) ([67.170.167.181]) by smtp405.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID fe7a88af288ff3baaf0e2c3a26d6e422; Mon, 25 Feb 2019 23:49:57 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: Re: head -r344018 based powerpc64 pmac_thermal hangup (stuck sleeping): some preliminary evidence [not as uniform as I initially saw] From: Mark Millard In-Reply-To: <40D1DDA1-10FB-4F2C-B38B-C7FED5795542@yahoo.com> Date: Mon, 25 Feb 2019 15:49:56 -0800 Cc: freebsd-hackers Hackers Content-Transfer-Encoding: quoted-printable Message-Id: References: <40D1DDA1-10FB-4F2C-B38B-C7FED5795542@yahoo.com> To: FreeBSD PowerPC ML X-Mailer: Apple Mail (2.3445.102.3) X-Rspamd-Queue-Id: 8B49C87F53 X-Spamd-Bar: ++ X-Spamd-Result: default: False [2.59 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36646, ipnet:66.163.184.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; NEURAL_SPAM_SHORT(0.93)[0.927,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(1.34)[ip: (4.48), ipnet: 66.163.184.0/21(1.27), asn: 36646(1.02), country: US(-0.07)]; NEURAL_SPAM_MEDIUM(0.71)[0.711,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(0.12)[0.118,0]; RCVD_IN_DNSWL_NONE(0.00)[147.189.163.66.list.dnswl.org : 127.0.5.0] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Feb 2019 23:50:03 -0000 [I've now seen examples of sbt=3D=3D0xfffffed8 that did not lead to a hangup.] On 2019-Feb-25, at 14:46, Mark Millard wrote: > I adjusted what KTR_PROC does to just show just some of its pid 26 > (pmac_thermal) messages and adding some extra output as well. I'll > list some of that output later. I'll note that beyond pmac_thermal > the buf*daemon* threads also seem to be subject to being stuck > sleeping in (offsets are for a specific build of mine): >=20 > mi_switch+0x134 sleepq_switch+0x2ec sleepq_timedwait+0x48 _sleep+0x41c >=20 >=20 > So far for pmac_thermal I've seen that until the failing case: >=20 > sleepq_set_timeout_sbt was being given: sbt=3D=3D0xcccccbe0 pr=3D=3D0x0 = flags=3D=3D0x100 > and in turn was using: prec=3D=3D0xcccccbe flags=3D=3D0x501 (of course = the used > td_sleeptimo varies). >=20 > [I note that 16*0xcccccbe =3D=3D 0xcccccbe0, the original sbt value, > not that I know yet if this matters.] >=20 > But the sequence leading to failures is different: I've now seen examples of sbt=3D=3D0xfffffed8 that did not lead to a hangup. So it is not a reliable predictor of the hang-up in sleep. I'm trying to see if I can observe a failure with different value. > sleepq_set_timeout_sbt was given: sbt=3D=3D0xfffffed8 pr=3D=3D0x0 = flags=3D=3D0x100 > and in turn was using: prec=3D=3D0xfffffed flags=3D=3D0x501 >=20 > [I note that 16*0xfffffed =3D=3D 0xfffffed0, so less than the original > sbt value, not that I know this matters at this point.] >=20 > For sbt=3D=3D0xfffffed8, the callout to sleepq_timeout ends up with = values > like (a particular example): >=20 > td_sleeptimo=3D0x470d360fe5 sbinuptime=3D0x46c869f6aa >=20 > where the reporting code looks like: >=20 > static void > sleepq_timeout(void *arg) > { > struct sleepqueue_chain *sc __unused; > struct sleepqueue *sq; > struct thread *td; > void *wchan; > int wakeup_swapper; > sbintime_t sbut; // HACK!!! >=20 > td =3D arg; > wakeup_swapper =3D 0; > if (26 =3D=3D td->td_proc->p_pid) // HACK!!! > CTR3(KTR_PROC, "sleepq_timeout: thread %p (pid %ld, %s)", > (void *)td, (long)td->td_proc->p_pid, (void *)td->td_name); >=20 > thread_lock(td); >=20 > sbut=3D sbinuptime(); // HACK!!! > if (td->td_sleeptimo > sbut || td->td_sleeptimo =3D=3D 0) { > /* > * The thread does not want a timeout (yet). > */ > if (26 =3D=3D td->td_proc->p_pid) // HACK!!! > CTR5(KTR_PROC, "sleepq_timeout thread not want = timeout yet: thread %p (pid %ld, %s) td_sleeptimo=3D%jx sbinuptime=3D%jx",= > (void *)td, (long)td->td_proc->p_pid, (void = *)td->td_name, (uintmax_t)td->td_sleeptimo, (uintmax_t)sbut); > . . . >=20 > So far sleepq_set_timeout_sbt being given sbt=3D=3D0xfffffed8 instead = of > sbt=3D=3D0xcccccbe0 seems to be an accurate indicator of if the = problem will > happen in sleepq_timeout. (But I've only a few examples so far.) >=20 I've now seen examples of sbt=3D=3D0xfffffed8 that did not lead to a hangup. So it is not a reliable predictor of the hang-up in sleep. It looks like the values are sometimes more varied than I'd seen before as well. > I'll note that the sleepq_timeout code for this case does not set up > another callout to itself for later and the sleep then continues > indefinitely. >=20 > I've not yet gotten into finding evidence for why the callout to > sleepq_timeout itself happens. Hopefully I can find some. >=20 >=20 > An example of some modified KTR_PROC output is: >=20 > . . . =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)