From owner-freebsd-hackers@freebsd.org Mon Mar 4 09:40:32 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7E6FE1509777 for ; Mon, 4 Mar 2019 09:40:32 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic309-20.consmr.mail.ne1.yahoo.com (sonic309-20.consmr.mail.ne1.yahoo.com [66.163.184.146]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D66448FD77 for ; Mon, 4 Mar 2019 09:40:29 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: l_LH1.sVM1kq1w61EeTqzGXhGfMI5E8B25qx4egeP4xLq0SeNmBAQYIGYRTjCb3 D.pJ7qyIX2j6HvKkuC_l9huJ6ImJfvdmUT75tYr0FfCDXz5zb3yeqzfDzKADbRxu2JMG4Y7oDZVw bfUBjZawDv5PUm.Dt0gECgyLSRl4yPBGTXnhiqjYQt9_Qhl2CiUyBiPkCZx8sR98onE84H5FbYBk 8o508qJc3I7ADmQfprKhnWl9mGrRnbvTN60LBINS46IXxXIMBXHaK8qJoPrd2mn7KdiLbxCY0bIq H.rLzlvKMFULcGwLrVGn3SqR4mTMqzPzvUH8fSgF5Jq.5Ntrm88CR9jG4VNm7GOH.OjdDACkP1rN poSBtZqLH3Ne5I81HCAcc10YgKqfDV3QPc_LT9zSl.F5qYr0uL9A1AlCrrIavaXUCZJQyT_z8Uqv 56fe2Ugm4elEc3MD7XgKXFKZRF5zHgVtMugLMCuyvnw2DIR14FHzk9vN8b0YBIuG8ys7vQb8oKav HnTQ4yko6I4eWqKEKjnOz07tvF409LKHxuhPHl0Ga42kyLTsFLmMcjWDkLqjisv2yc.hyUbA1re. lPTJa9o1u6XODGfre2ypNFW5ftObUmR..CjOciWoWp6QZO.odCOSS_cuxkaBqoE18OrCFSmUvU.i GvSHD__0vzSYwN_O0NJiz3iHHX1JhZAeHu1u87fs77hL4kgMf2KYdL9DuHGg8YXsVU4X7n.goMBb O95iSehu1BA4OqrU_Vfku8MzRXGYI09Kk3w9W4z0XnKqi_PrDnJ8Bex6tGlREL5crikAfPzjXZ5D 3TJsS6bZzcc_pKQ6ky9fl09I63EZEpRZzUo_S6nm71aoH69lQlcfhJQ3O7i0n80u0SpKBJlfxE5N KhOcQPDur7Tx_QkV6r5zR9004OOmSctivkWo6hLOzCCY046.8u5jD9djBp1_1.8du51oKFimrwEg bZnbqTxPeOQVebYBEa6kccai0C1SP82retwGWuE_unmdvLmO2jTBeOs13a05CHOJkYu7yJdiMeqf 3tAP9LCQ7tdvDaYGVgwvQ8XXFpHpoX69_QnOJmwz4CA-- Received: from sonic.gate.mail.ne1.yahoo.com by sonic309.consmr.mail.ne1.yahoo.com with HTTP; Mon, 4 Mar 2019 09:40:23 +0000 Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.115]) ([67.170.167.181]) by smtp413.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID d5eeb14818ac1606459c94027e379899; Mon, 04 Mar 2019 09:40:19 +0000 (UTC) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: Re: powerpc64 on PowerMac G5 4-core (system total): a hack that so far seem to avoid the stuck-sleeping issue [self-hosted buildworld/buildkernel completed] Date: Mon, 4 Mar 2019 01:40:18 -0800 References: <76E8BF75-A8F5-4A48-9B7C-6494F4A9520B@yahoo.com> To: FreeBSD PowerPC ML , Mark Millard via freebsd-hackers In-Reply-To: <76E8BF75-A8F5-4A48-9B7C-6494F4A9520B@yahoo.com> Message-Id: <75A8BB07-3273-423E-9436-798395BC8640@yahoo.com> X-Mailer: Apple Mail (2.3445.102.3) X-Rspamd-Queue-Id: D66448FD77 X-Spamd-Bar: +++ X-Spamd-Result: default: False [3.37 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net]; DKIM_TRACE(0.00)[yahoo.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36646, ipnet:66.163.184.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; NEURAL_SPAM_SHORT(0.96)[0.960,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(1.28)[ip: (4.16), ipnet: 66.163.184.0/21(1.29), asn: 36646(1.04), country: US(-0.07)]; NEURAL_SPAM_MEDIUM(0.75)[0.754,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(0.89)[0.886,0]; RCVD_IN_DNSWL_NONE(0.00)[146.184.163.66.list.dnswl.org : 127.0.5.0] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2019 09:40:32 -0000 [I did some testing of other figures than testing for < 0x10.] On 2019-Mar-3, at 13:23, Mark Millard wrote: > [So far the hack has been successful. Details given later > below.] >=20 > On 2019-Mar-2, at 21:20, Mark Millard wrote: >=20 >> [This note goes in a different direction compared to my >> prior evidence report for overflows and the later activity >> that has been happening for it. This does *not* involve >> the patches associated with that report.] >>=20 >> I view the following as an evidence-gathering hack: >> showing the change in behavior with the code changes, >> not as directly what FreeBSD should do for powerpc64. >> In code for defined(__powerpc64__) && defined(AIM) >> I freely use knowledge of the PowerMac G5 context >> instead of attempting general code. >>=20 >> Also: the code is set up to record some information >> that I've been looking at via ddb. The recording is >> not part of what changes the behavior but I decided >> to show that code too. >>=20 >> It is preliminary, but, so far, the hack has avoided >> buf*daemon* threads and pmac_thermal getting stuck >> sleeping (or, at least, far less frequently). >>=20 >>=20 >> The tbr-value hack: >>=20 >> =46rom what I see the G5 various cores have each tbr running at the >> same rate but have some some offsets as far as the base time >> goes. cpu_mp_unleash does: >>=20 >> ap_awake =3D 1; >>=20 >> /* Provide our current DEC and TB values for APs */ >> ap_timebase =3D mftb() + 10; >> __asm __volatile("msync; isync"); >>=20 >> /* Let APs continue */ >> atomic_store_rel_int(&ap_letgo, 1); >>=20 >> platform_smp_timebase_sync(ap_timebase, 0); >>=20 >> and machdep_ap_bootstrap does: >>=20 >> /* >> * Set timebase as soon as possible to meet an implicit = rendezvous >> * from cpu_mp_unleash(), which sets ap_letgo and then = immediately >> * sets timebase. >> * >> * Note that this is instrinsically racy and is only relevant = on >> * platforms that do not support better mechanisms. >> */ >> platform_smp_timebase_sync(ap_timebase, 1); >>=20 >>=20 >> which attempts to set the tbrs appropriately. >>=20 >> But on small scales of differences the various tbr >> values from different cpus end up not well ordered >> relative to time, synchronizes with, and the like. >> Only large enough differences can well indicate an >> ordering of interest. >>=20 >> Note: tc->tc_get_timecount(tc) only provides the >> least signficant 32 bits of the tbr value. >> th->th_offset_count is also 32 bits and based on >> truncated tbr values. >>=20 >> So I made binuptime avoid finishing when it sees >> a small (<0x10) step backwards for a new >> tc->tc_get_timecount(tc) value vs. the existing >> th->th_offset_count value (values strongly tied >> to powerpc64 tbr values): >>=20 >> void >> binuptime(struct bintime *bt) >> { >> struct timehands *th; >> u_int gen; >>=20 >> struct bintime old_bt=3D *bt; // HACK!!! >> struct timecounter *tc; // HACK!!! >> u_int tim_cnt, tim_offset, tim_diff; // HACK!!! >> uint64_t freq, scale_factor, diff_scaled; // HACK!!! >>=20 >> u_int try_cnt=3D 0ull; // HACK!!! >>=20 >> do { >> do { // HACK!!! >> th =3D timehands; >> tc =3D th->th_counter; >> gen =3D atomic_load_acq_int(&th->th_generation); >> tim_cnt=3D tc->tc_get_timecount(tc); >> tim_offset=3D th->th_offset_count; >> } while (tim_cnt> *bt =3D th->th_offset; >> tim_diff=3D (tim_cnt - tim_offset) & = tc->tc_counter_mask; >> scale_factor=3D th->th_scale; >> diff_scaled=3D scale_factor * tim_diff; >> bintime_addx(bt, diff_scaled); >> freq=3D tc->tc_frequency; >> atomic_thread_fence_acq(); >> try_cnt++; >> } while (gen =3D=3D 0 || gen !=3D th->th_generation); >>=20 >> if (*(volatile uint64_t*)0xc000000000000020=3D=3D0u && = (0xffffffffffffffffull/scale_factor)> *(volatile uint64_t*)0xc000000000000020=3D = bttosbt(old_bt); >> *(volatile uint64_t*)0xc000000000000028=3D = bttosbt(*bt); >> *(volatile uint64_t*)0xc000000000000030=3D freq; >> *(volatile uint64_t*)0xc000000000000038=3D = scale_factor; >> *(volatile uint64_t*)0xc000000000000040=3D tim_offset; >> *(volatile uint64_t*)0xc000000000000048=3D tim_cnt; >> *(volatile uint64_t*)0xc000000000000050=3D tim_diff; >> *(volatile uint64_t*)0xc000000000000058=3D try_cnt; >> *(volatile uint64_t*)0xc000000000000060=3D diff_scaled; >> *(volatile uint64_t*)0xc000000000000068=3D = scale_factor*freq; >> __asm__ ("sync"); >> } else if (*(volatile uint64_t*)0xc0000000000000a0=3D=3D0u && = (0xffffffffffffffffull/scale_factor)> *(volatile uint64_t*)0xc0000000000000a0=3D = bttosbt(old_bt); >> *(volatile uint64_t*)0xc0000000000000a8=3D = bttosbt(*bt); >> *(volatile uint64_t*)0xc0000000000000b0=3D freq; >> *(volatile uint64_t*)0xc0000000000000b8=3D = scale_factor; >> *(volatile uint64_t*)0xc0000000000000c0=3D tim_offset; >> *(volatile uint64_t*)0xc0000000000000c8=3D tim_cnt; >> *(volatile uint64_t*)0xc0000000000000d0=3D tim_diff; >> *(volatile uint64_t*)0xc0000000000000d8=3D try_cnt; >> *(volatile uint64_t*)0xc0000000000000e0=3D diff_scaled; >> *(volatile uint64_t*)0xc0000000000000e8=3D = scale_factor*freq; >> __asm__ ("sync"); >> } >> } >> #else >> . . . >> #endif >>=20 >> So far as I can tell, the FreeBSD code is not designed to deal >> with small differences in tc->tc_get_timecount(tc) not actually >> indicating a useful < vs. =3D=3D vs. > ordering relation uniquely. >>=20 >> (I make no claim that the hack is a proper way to deal with >> such.) >=20 > I did a somewhat over 7 hours buildworld buildkernel on the > PowerMac G5. Overall the G5 has been up over 13 hours and > none of the buf*daemon* threads have gotten stuck sleeping. > Nor has pmac_thermal gotten stuck. Similarly for vnlru > and syncer: "top -HIStopid" still shows them all as > periodically active. >=20 > Previously for this usefdt=3D1 context (with the modern > VM_MAX_KERNEL_ADDRESS), going more than a few minutes > without at least one of those threads getting stuck > sleeping was rare on the G5 (powerpc64 example). >=20 > So this hack has managed to avoid finding sbinuptime() > in sleepq_timeout being less than the earlier (by call > structure/code sequencing) sbinuptime() in timercb that > lead to the sleepq_timeout callout being called in the > first place. >=20 > So in the sleepq_timeout callout's: >=20 > if (td->td_sleeptimo > sbinuptime() || td->td_sleeptimo =3D=3D = 0) { > /* > * The thread does not want a timeout (yet). > */ > } else . . . >=20 > td->td_sleeptimo > sbinuptime() ends up false now for small > enough original differences. >=20 > This case does not set up another timeout, it just leaves the > thread stuck sleeping, no longer doing periodic activities. >=20 > As stands what I did (presuming an appropriate definition > of "small differences in the problematical direction") should > leave this and other sbinuptime-using code with: >=20 > td->td_sleeptimo <=3D sbinuptime() >=20 > for what were originally "small" tbr value differences in the > problematical direction (in case other places require it in > some way). >=20 > If, instead, just sleepq_timeout's test could allow for > some slop in the ordering, it could be a cheaper hack then > looping in binuptime . >=20 > At this point I've no clue what a correct/efficient FreeBSD > design for allowing the sloppy match across tbr's for different > CPUs would be. Instead of 0x10 in "&& tim_offset-tim_cnt<0x10" I tried the each of following and they all failed: && tim_offset-tim_cnt<0x2 && tim_offset-tim_cnt<0x4 && tim_offset-tim_cnt<0x8 && tim_offset-tim_cnt<0xc 0x2, 0x4, and 0x8 failed for the first boot attempt, almost mediately having stuck-in-sleep threads. 0xc seemed to be working for the first boot (including a buildworld buildkernel that did not have to rebuild much). But the 2nd boot attempt had a stuck-in-sleep thread by the time I logged in. By contrast, for: && tim_offset-tim_cnt<0x10 I've not it fail so far, after many reboots, a full buildworld buildkernel, and running over 24 hours (that included the somewhat over 7 hours for build world buildkernel). But it might be that some boots would need a bigger figure. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)