From owner-freebsd-ppc@freebsd.org Mon Feb 25 22:47:01 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id ACC881518196 for ; Mon, 25 Feb 2019 22:47:01 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic317-35.consmr.mail.ne1.yahoo.com (sonic317-35.consmr.mail.ne1.yahoo.com [66.163.184.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 03A5685834 for ; Mon, 25 Feb 2019 22:46:59 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: 6.engXcVM1kf7n9XG_zhA8TkVeIFhJ2iuZIC3AP.LZ03dcjlen2zkTNIutyYwBe y.r2l35N.ongc7pcV2mOKex6EqFe6ti3tKL7zNdX9fQwAYgvQVxzhoUfBVD4JuQt2uKKUW3qvExm XGhvsDUvfMDyr8.iLhZx1UGDQMc0Fi..3b5ulHtTu645XX_.aEUjfY11wypp_sicsHCFNQEvqeF1 G342L_8uEC6mu7Y.ZeFtUgJ0d3rey.Mq6RsHfWMPbsQPSKMcvGGA4IMvBaiz0ffbfMX_xZRGscyQ gLSMNvL23.TwC8wrqZWfJAuQt2Jd7dDGc287tk9pmG1HrtK1ZsH5JAbc2BsrI4tldybxCoaKG1Xi ZQfV0lFAy.mDPHJGsELr4grqm5OLy3Bpo8wwqWNwueGtTOONnzr3pp6ErsCQ7f9MEVs2WDzdZHJz xF6nBzhommZcSJyqutJKHuBf5_VfpccLmZTST_cMbPDZOLAquun5bkzK.7V4AqNvPq1.L3jk1XgS 22oow8OHftvL.oNq3IE0Xw.7mF0T5UMaCd48IqqJuaY0sQKBsCqZsdC9rkzPpmbGCaK0NG.dMt2V om7SW47J3aUiU0T4S0IN33.QIrv2o4X9V7cyWSIn21Jh70d0aFhnU4OPR8HQ7dO1fTSG6_3JDV1q T80oGNg5wjUjaiob7T_MBz5esYKWUeluclEEy9220k85njTBE5LnW4UrIS0g7q96_6TDs3H5Rn.p KMq7Ic53sPyjFHmJ9t4B0aWkvUhoNuYDQG7dra77f9pBIDZaSQUH8EQueH6cAL.cUJTuOYXrt18B 3xNCt7FT53KQRxJrUND5Fqw4jMh0XqI8vejNnTDeYiLBWN0e9V3Zti7Q_FZl81KvuvqNgZH1.2RM e_2d3IIfoNV784o7HSaYj4kVUL2j7kQ6MAs1llI2VFevAELafbTCMB2B4owgy.UmZygrZ.7r3MIm INvBLH0QmN0o.tYIhIYJcKUDfbRMQkrT.s9w6n9CanOvx.0.B9N0WHtWAe1qY99u_cIyp2cuJMLV aUgailDgkpxQR2qlrMJEbofmRzm9r7rgt2Dz7n9JtzqEwDbi.mbffhEEae0w5qis- Received: from sonic.gate.mail.ne1.yahoo.com by sonic317.consmr.mail.ne1.yahoo.com with HTTP; Mon, 25 Feb 2019 22:46:58 +0000 Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.115]) ([67.170.167.181]) by smtp427.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID 764dc0a887305ebf83c8ac32aaf510eb; Mon, 25 Feb 2019 22:46:56 +0000 (UTC) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: head -r344018 based powerpc64 pmac_thermal hangup (stuck sleeping): some preliminary evidence Message-Id: <40D1DDA1-10FB-4F2C-B38B-C7FED5795542@yahoo.com> Date: Mon, 25 Feb 2019 14:46:55 -0800 Cc: freebsd-hackers Hackers To: FreeBSD PowerPC ML X-Mailer: Apple Mail (2.3445.102.3) X-Rspamd-Queue-Id: 03A5685834 X-Spamd-Bar: ++++ X-Spamd-Result: default: False [4.15 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36646, ipnet:66.163.184.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; SUBJECT_ENDS_SPACES(0.50)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; NEURAL_SPAM_SHORT(0.93)[0.934,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(1.74)[ip: (6.47), ipnet: 66.163.184.0/21(1.28), asn: 36646(1.02), country: US(-0.07)]; NEURAL_SPAM_MEDIUM(0.76)[0.762,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(0.72)[0.721,0]; RCVD_IN_DNSWL_NONE(0.00)[46.184.163.66.list.dnswl.org : 127.0.5.0] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Feb 2019 22:47:02 -0000 I adjusted what KTR_PROC does to just show just some of its pid 26 (pmac_thermal) messages and adding some extra output as well. I'll list some of that output later. I'll note that beyond pmac_thermal the buf*daemon* threads also seem to be subject to being stuck sleeping in (offsets are for a specific build of mine): mi_switch+0x134 sleepq_switch+0x2ec sleepq_timedwait+0x48 _sleep+0x41c So far for pmac_thermal I've seen that until the failing case: sleepq_set_timeout_sbt was being given: sbt=3D=3D0xcccccbe0 pr=3D=3D0x0 = flags=3D=3D0x100 and in turn was using: prec=3D=3D0xcccccbe flags=3D=3D0x501 (of course = the used td_sleeptimo varies). [I note that 16*0xcccccbe =3D=3D 0xcccccbe0, the original sbt value, not that I know yet if this matters.] But the sequence leading to failures is different: sleepq_set_timeout_sbt was given: sbt=3D=3D0xfffffed8 pr=3D=3D0x0 = flags=3D=3D0x100 and in turn was using: prec=3D=3D0xfffffed flags=3D=3D0x501 [I note that 16*0xfffffed =3D=3D 0xfffffed0, so less than the original sbt value, not that I know this matters at this point.] For sbt=3D=3D0xfffffed8, the callout to sleepq_timeout ends up with = values like (a particular example): td_sleeptimo=3D0x470d360fe5 sbinuptime=3D0x46c869f6aa where the reporting code looks like: static void sleepq_timeout(void *arg) { struct sleepqueue_chain *sc __unused; struct sleepqueue *sq; struct thread *td; void *wchan; int wakeup_swapper; sbintime_t sbut; // HACK!!! td =3D arg; wakeup_swapper =3D 0; if (26 =3D=3D td->td_proc->p_pid) // HACK!!! CTR3(KTR_PROC, "sleepq_timeout: thread %p (pid %ld, %s)", (void *)td, (long)td->td_proc->p_pid, (void *)td->td_name); =20 thread_lock(td); =20 sbut=3D sbinuptime(); // HACK!!! if (td->td_sleeptimo > sbut || td->td_sleeptimo =3D=3D 0) { /* * The thread does not want a timeout (yet). */ if (26 =3D=3D td->td_proc->p_pid) // HACK!!! CTR5(KTR_PROC, "sleepq_timeout thread not want = timeout yet: thread %p (pid %ld, %s) td_sleeptimo=3D%jx sbinuptime=3D%jx",= (void *)td, (long)td->td_proc->p_pid, (void = *)td->td_name, (uintmax_t)td->td_sleeptimo, (uintmax_t)sbut); . . . So far sleepq_set_timeout_sbt being given sbt=3D=3D0xfffffed8 instead of sbt=3D=3D0xcccccbe0 seems to be an accurate indicator of if the problem = will happen in sleepq_timeout. (But I've only a few examples so far.) I'll note that the sleepq_timeout code for this case does not set up another callout to itself for later and the sleep then continues indefinitely. I've not yet gotten into finding evidence for why the callout to sleepq_timeout itself happens. Hopefully I can find some. An example of some modified KTR_PROC output is: 493 (0xc000000001eb45a0:cpu3) 3278339934 = /usr/src/sys/kern/subr_sleepqueue.c.1026: sleepq_timeout thread not want = timeout yet: thread 0xc00000000a8315a0 (pid 26, pmac_thermal) = td_sleeptimo=3D470d360fe5 sbinuptime=3D46c869f6aa 492 (0xc000000001eb45a0:cpu3) 3278339919 = /usr/src/sys/kern/subr_sleepqueue.c.1015: sleepq_timeout: thread = 0xc00000000a8315a0 (pid 26, pmac_thermal) 491 (0xc00000000a8315a0:cpu3) 3253979006 = /usr/src/sys/kern/kern_synch.c.435: mi_switch: old thread 100078 = (td_sched 0xc00000000a831ad8, pid 26, pmac_thermal) 490 (0xc00000000a8315a0:cpu3) 3253978972 = /usr/src/sys/kern/subr_sleepqueue.c.408: sleepq_set_timeout_sbt: = sbt=3Dfffffed8 pr=3D0 flags=3D100 td_sleeptimo=3D470d360fe5 prec=3Dfffffed= flags=3D501 489 (0xc00000000a8315a0:cpu3) 3253978965 = /usr/src/sys/kern/subr_sleepqueue.c.406: sleepq_set_timeout_sbt = installing sleepq_timeout: thread 0xc00000000a8315a0 (pid 26, = pmac_thermal) 488 (0xc00000000a8315a0:cpu3) 3253978940 = /usr/src/sys/kern/kern_synch.c.180: sleep: thread 100078 (pid 26, = pmac_thermal) on pmac_therm (0xc0000000015082c3) 487 (0xc00000000a8315a0:cpu3) 3253978890 = /usr/src/sys/kern/subr_sleepqueue.c.630: sleepq resume: thread = 0xc00000000a8315a0 (pid 26, pmac_thermal) 486 (0xc00000000a8315a0:cpu3) 3253978883 = /usr/src/sys/kern/kern_synch.c.445: mi_switch: new thread 100078 = (td_sched 0xc00000000a831ad8, pid 26, pmac_thermal) 485 (0xc000000001f6e000:cpu2) 3253978598 = /usr/src/sys/kern/subr_sleepqueue.c.836: sleepq_wakeup: thread = 0xc00000000a8315a0 (pid 26, pmac_thermal) 484 (0xc000000001f6e000:cpu2) 3253978580 = /usr/src/sys/kern/subr_sleepqueue.c.988: sleepq_remove_matching calling = sleepq_resume_thread: thread 0xc00000000a8315a0 (pid 26, pmac_thermal) 483 (0xc00000000a8315a0:cpu3) 3252269011 = /usr/src/sys/kern/kern_synch.c.435: mi_switch: old thread 100078 = (td_sched 0xc00000000a831ad8, pid 26, pmac_thermal) 482 (0xc00000000a8315a0:cpu3) 3252268968 = /usr/src/sys/kern/subr_sleepqueue.c.408: sleepq_set_timeout_sbt: = sbt=3Dcccccbe0 pr=3D0 flags=3D100 td_sleeptimo=3D46ccf486f8 prec=3Dcccccbe= flags=3D501 481 (0xc00000000a8315a0:cpu3) 3252268962 = /usr/src/sys/kern/subr_sleepqueue.c.406: sleepq_set_timeout_sbt = installing sleepq_timeout: thread 0xc00000000a8315a0 (pid 26, = pmac_thermal) 480 (0xc00000000a8315a0:cpu3) 3252268935 = /usr/src/sys/kern/kern_synch.c.180: sleep: thread 100078 (pid 26, = pmac_thermal) on smu (0xe000000087fd1670) 479 (0xc00000000a8315a0:cpu3) 3252268793 = /usr/src/sys/kern/subr_sleepqueue.c.630: sleepq resume: thread = 0xc00000000a8315a0 (pid 26, pmac_thermal) 478 (0xc00000000a8315a0:cpu3) 3252268778 = /usr/src/sys/kern/kern_synch.c.445: mi_switch: new thread 100078 = (td_sched 0xc00000000a831ad8, pid 26, pmac_thermal) 477 (0xc000000001f6e000:cpu2) 3252268391 = /usr/src/sys/kern/subr_sleepqueue.c.836: sleepq_wakeup: thread = 0xc00000000a8315a0 (pid 26, pmac_thermal) 476 (0xc000000001f6e000:cpu2) 3252268385 = /usr/src/sys/kern/subr_sleepqueue.c.988: sleepq_remove_matching calling = sleepq_resume_thread: thread 0xc00000000a8315a0 (pid 26, pmac_thermal) 475 (0xc00000000a8315a0:cpu2) 3250546514 = /usr/src/sys/kern/kern_synch.c.435: mi_switch: old thread 100078 = (td_sched 0xc00000000a831ad8, pid 26, pmac_thermal) 474 (0xc00000000a8315a0:cpu2) 3250546468 = /usr/src/sys/kern/subr_sleepqueue.c.408: sleepq_set_timeout_sbt: = sbt=3Dcccccbe0 pr=3D0 flags=3D100 td_sleeptimo=3D46bfa4a7cc prec=3Dcccccbe= flags=3D501 473 (0xc00000000a8315a0:cpu2) 3250546462 = /usr/src/sys/kern/subr_sleepqueue.c.406: sleepq_set_timeout_sbt = installing sleepq_timeout: thread 0xc00000000a8315a0 (pid 26, = pmac_thermal) 472 (0xc00000000a8315a0:cpu2) 3250546436 = /usr/src/sys/kern/kern_synch.c.180: sleep: thread 100078 (pid 26, = pmac_thermal) on smu (0xe000000087fd1670) 471 (0xc00000000a8315a0:cpu2) 3250546295 = /usr/src/sys/kern/subr_sleepqueue.c.630: sleepq resume: thread = 0xc00000000a8315a0 (pid 26, pmac_thermal) 470 (0xc00000000a8315a0:cpu2) 3250546286 = /usr/src/sys/kern/kern_synch.c.445: mi_switch: new thread 100078 = (td_sched 0xc00000000a831ad8, pid 26, pmac_thermal) 469 (0xc000000001f6e000:cpu0) 3250545941 = /usr/src/sys/kern/subr_sleepqueue.c.836: sleepq_wakeup: thread = 0xc00000000a8315a0 (pid 26, pmac_thermal) 468 (0xc000000001f6e000:cpu0) 3250545934 = /usr/src/sys/kern/subr_sleepqueue.c.988: sleepq_remove_matching calling = sleepq_resume_thread: thread 0xc00000000a8315a0 (pid 26, pmac_thermal) . . . =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)