From owner-freebsd-hackers@freebsd.org Wed Mar 6 21:03:53 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id ADDC415273FF for ; Wed, 6 Mar 2019 21:03:53 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic305-22.consmr.mail.ne1.yahoo.com (sonic305-22.consmr.mail.ne1.yahoo.com [66.163.185.148]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A21DA70560 for ; Wed, 6 Mar 2019 21:03:52 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: ok9IFkgVM1mDn1NyROc6pAsNPoJjWLIQ1eZslNsDEbrtjuz3QthYjLnuT9m_imE x8kHcG0P3LT0gM_jJzyZfpMf_hLJKNikukaSwy1_JgGOmitIXNwkKV9MUYSJibh7_zNic79Ik_wH AsIejzs2Qvx17ShWTM43j5R3Z48XajS8.WZ4BR3rrnhu86bxqHfH89ssV3gXQcoqFUUMse4BOoEy 2wnWtLflah9DxhzKkYynVsa8Hyc7zaRLti3OTgvI5D.oW8_flNX7gRWXITj6crJb3_yTadMYYHP1 _YnQLgHpLTePtvbgrJmgAWzyQhtVctI1HHWtTZBlmiyq795aohDShVi0WML9z1aUnDczI2BhgIeJ WR7n1iVU8Pu1CJ7LVAYgpe5CNtuR1BQxXfwwCEhePGqMDxdqUSZ2rDXYynvNILC5M.QMrZQl4eJK FJoUwKdHfIefHQ2LpRxauxlI6wW9TSsN5eASnZsQGFJMNArscbxaZaBnMZBBcbNNec7o1nNhzi71 g0HmscP41eaFZLmAUMqSZWC0tkmhomHm1ej30..gaVcO6lz.5AYV9ix7UOMNoH1XCiG_0Ksn6MFg y2TrGQRA5QgzfU6B7vAicY4KKN2ojD27hf19gDvhzp8niI8.uusz.bPknnt1ID6WBEQTY_cdUzck Jbt0D6FmbTPZEKo5_5Se1d5F3I4ZwLUhex38sw.h75AK.URMKakP5M36z5rPERZq6oKtadg_Yfh2 sQq6.oV97g3HY6s3u2MQcRTlX6B83ztQ4.mwNeV8qOHhE_HQ3r.h_2CegvO9JwKbTAN9NmY8kqaR Qa9CLX5.hpuVQbmTFIA71rve6anB7DHQROaYsuAwPg8HjVfxN2lsI.qAZkJdtk7O1h8OjpNk6GI_ 0JHiOfJLR0UkGVuYRLSh51WGEaTsa4iylXSDbvFFAof7AlYUHAXjSO7.p4GRAunkU6V55oeap0S0 vTVHZ87aN36JLhrIJyWArJVSjrcvF.mn6JCS4TeGnZpNfXMKBXRnJ6oANhyBmk1mR6vkg68b2WhZ 0YwHYLpzFNh3C8Ean2uWb4G40WYMjFMa7xfQHO9K1mdd2_9vIeLgb99ckKkkOzCQl.Q-- Received: from sonic.gate.mail.ne1.yahoo.com by sonic305.consmr.mail.ne1.yahoo.com with HTTP; Wed, 6 Mar 2019 21:03:46 +0000 Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.115]) ([67.170.167.181]) by smtp407.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID 989e6db6d298809b3619be81032a35c7; Wed, 06 Mar 2019 21:03:43 +0000 (UTC) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: Re: powerpc64 on PowerMac G5 4-core (system total): a hack that so far seem to avoid the stuck-sleeping issue [self-hosted buildworld/buildkernel completed] Date: Wed, 6 Mar 2019 13:03:42 -0800 References: <76E8BF75-A8F5-4A48-9B7C-6494F4A9520B@yahoo.com> <75A8BB07-3273-423E-9436-798395BC8640@yahoo.com> To: FreeBSD PowerPC ML , Mark Millard via freebsd-hackers In-Reply-To: <75A8BB07-3273-423E-9436-798395BC8640@yahoo.com> Message-Id: <23683875-418E-4E48-BE26-01221EABC906@yahoo.com> X-Mailer: Apple Mail (2.3445.102.3) X-Rspamd-Queue-Id: A21DA70560 X-Spamd-Bar: +++ X-Spamd-Result: default: False [3.11 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net]; DKIM_TRACE(0.00)[yahoo.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36646, ipnet:66.163.184.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; NEURAL_SPAM_SHORT(0.79)[0.791,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(1.31)[ip: (4.36), ipnet: 66.163.184.0/21(1.25), asn: 36646(1.00), country: US(-0.07)]; NEURAL_SPAM_MEDIUM(0.81)[0.808,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(0.71)[0.709,0]; RCVD_IN_DNSWL_NONE(0.00)[148.185.163.66.list.dnswl.org : 127.0.5.0] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2019 21:03:53 -0000 [I have a new observed maximum difference, having changed the code record such.] On 2019-Mar-4, at 01:40, Mark Millard wrote: > [I did some testing of other figures than testing for < 0x10.] >=20 > On 2019-Mar-3, at 13:23, Mark Millard wrote: >=20 >> [So far the hack has been successful. Details given later >> below.] >>=20 >> On 2019-Mar-2, at 21:20, Mark Millard wrote: >>=20 >>> [This note goes in a different direction compared to my >>> prior evidence report for overflows and the later activity >>> that has been happening for it. This does *not* involve >>> the patches associated with that report.] >>>=20 >>> I view the following as an evidence-gathering hack: >>> showing the change in behavior with the code changes, >>> not as directly what FreeBSD should do for powerpc64. >>> In code for defined(__powerpc64__) && defined(AIM) >>> I freely use knowledge of the PowerMac G5 context >>> instead of attempting general code. >>>=20 >>> Also: the code is set up to record some information >>> that I've been looking at via ddb. The recording is >>> not part of what changes the behavior but I decided >>> to show that code too. >>>=20 >>> It is preliminary, but, so far, the hack has avoided >>> buf*daemon* threads and pmac_thermal getting stuck >>> sleeping (or, at least, far less frequently). >>>=20 >>>=20 >>> The tbr-value hack: >>>=20 >>> =46rom what I see the G5 various cores have each tbr running at the >>> same rate but have some some offsets as far as the base time >>> goes. cpu_mp_unleash does: >>>=20 >>> ap_awake =3D 1; >>>=20 >>> /* Provide our current DEC and TB values for APs */ >>> ap_timebase =3D mftb() + 10; >>> __asm __volatile("msync; isync"); >>>=20 >>> /* Let APs continue */ >>> atomic_store_rel_int(&ap_letgo, 1); >>>=20 >>> platform_smp_timebase_sync(ap_timebase, 0); >>>=20 >>> and machdep_ap_bootstrap does: >>>=20 >>> /* >>> * Set timebase as soon as possible to meet an implicit = rendezvous >>> * from cpu_mp_unleash(), which sets ap_letgo and then = immediately >>> * sets timebase. >>> * >>> * Note that this is instrinsically racy and is only relevant = on >>> * platforms that do not support better mechanisms. >>> */ >>> platform_smp_timebase_sync(ap_timebase, 1); >>>=20 >>>=20 >>> which attempts to set the tbrs appropriately. >>>=20 >>> But on small scales of differences the various tbr >>> values from different cpus end up not well ordered >>> relative to time, synchronizes with, and the like. >>> Only large enough differences can well indicate an >>> ordering of interest. >>>=20 >>> Note: tc->tc_get_timecount(tc) only provides the >>> least signficant 32 bits of the tbr value. >>> th->th_offset_count is also 32 bits and based on >>> truncated tbr values. >>>=20 >>> So I made binuptime avoid finishing when it sees >>> a small (<0x10) step backwards for a new >>> tc->tc_get_timecount(tc) value vs. the existing >>> th->th_offset_count value (values strongly tied >>> to powerpc64 tbr values): >>>=20 >>> . . . [old code omitted] . . . >>>=20 >>> So far as I can tell, the FreeBSD code is not designed to deal >>> with small differences in tc->tc_get_timecount(tc) not actually >>> indicating a useful < vs. =3D=3D vs. > ordering relation uniquely. >>>=20 >>> (I make no claim that the hack is a proper way to deal with >>> such.) >>=20 >> I did a somewhat over 7 hours buildworld buildkernel on the >> PowerMac G5. Overall the G5 has been up over 13 hours and >> none of the buf*daemon* threads have gotten stuck sleeping. >> Nor has pmac_thermal gotten stuck. Similarly for vnlru >> and syncer: "top -HIStopid" still shows them all as >> periodically active. >>=20 >> Previously for this usefdt=3D1 context (with the modern >> VM_MAX_KERNEL_ADDRESS), going more than a few minutes >> without at least one of those threads getting stuck >> sleeping was rare on the G5 (powerpc64 example). >>=20 >> So this hack has managed to avoid finding sbinuptime() >> in sleepq_timeout being less than the earlier (by call >> structure/code sequencing) sbinuptime() in timercb that >> lead to the sleepq_timeout callout being called in the >> first place. >>=20 >> So in the sleepq_timeout callout's: >>=20 >> if (td->td_sleeptimo > sbinuptime() || td->td_sleeptimo =3D=3D = 0) { >> /* >> * The thread does not want a timeout (yet). >> */ >> } else . . . >>=20 >> td->td_sleeptimo > sbinuptime() ends up false now for small >> enough original differences. >>=20 >> This case does not set up another timeout, it just leaves the >> thread stuck sleeping, no longer doing periodic activities. >>=20 >> As stands what I did (presuming an appropriate definition >> of "small differences in the problematical direction") should >> leave this and other sbinuptime-using code with: >>=20 >> td->td_sleeptimo <=3D sbinuptime() >>=20 >> for what were originally "small" tbr value differences in the >> problematical direction (in case other places require it in >> some way). >>=20 >> If, instead, just sleepq_timeout's test could allow for >> some slop in the ordering, it could be a cheaper hack then >> looping in binuptime . >>=20 >> At this point I've no clue what a correct/efficient FreeBSD >> design for allowing the sloppy match across tbr's for different >> CPUs would be. >=20 > Instead of 0x10 in "&& tim_offset-tim_cnt<0x10" I tried > the each of following and they all failed: >=20 > && tim_offset-tim_cnt<0x2 > && tim_offset-tim_cnt<0x4 > && tim_offset-tim_cnt<0x8 > && tim_offset-tim_cnt<0xc I've now seen a difference of 0x11 that lead to hung up threads, hung waiting for sleep. > 0x2, 0x4, and 0x8 failed for the first boot attempt, > almost mediately having stuck-in-sleep threads. >=20 > 0xc seemed to be working for the first boot (including > a buildworld buildkernel that did not have to rebuild > much). But the 2nd boot attempt had a stuck-in-sleep > thread by the time I logged in. >=20 > By contrast, for: >=20 > && tim_offset-tim_cnt<0x10 >=20 > I've not it fail so far, after many reboots, a full > buildworld buildkernel, and running over 24 hours > (that included the somewhat over 7 hours for build > world buildkernel). But it might be that some boots > would need a bigger figure. >=20 During a ports-mgmt/poudriere-devel run I had some threads hang in sleep when the code was based on less than 0x10 differences. But I'd changed to be recording the maximum "small difference in the problematical direction" observed and so was able to see that it got a: 0x11 difference. The below is the newer code structure as far as what is recorded. It already has 0x14 instead of 0x10 for the bound it uses to control the loop. I omitted #if 0 . . . #endif code that I'm not currently using. #if defined(__powerpc64__) && defined(AIM) void binuptime(struct bintime *bt) { struct timehands *th; u_int gen; u_int tim_cnt, tim_offset; // HACK!!! (for "small difference is = problem direction loop") struct timecounter *tc; // HACK!!! (for recording other data for = inspection via ddb) u_int tim_diff; // HACK!!! uint64_t scale_factor, diff_scaled; // HACK!!! #if 1 u_int tim_wrong_order_diff=3D 0u; // HACK!!! u_int max_wrong_order_diff=3D 0u; // HACK!!! u_int wrong_order_cnt=3D 0u; // HACK!!! u_int wrong_order_offset=3D 0u; // HACK!!! #endif do { do { // HACK!!! th=3D timehands; tc=3D th->th_counter; gen=3D atomic_load_acq_int(&th->th_generation); tim_cnt=3D tc->tc_get_timecount(tc); tim_offset=3D th->th_offset_count; #if 1 tim_wrong_order_diff=3D tim_offset-tim_cnt; if ( tim_cntth_offset; tim_diff=3D (tim_cnt - tim_offset) & = tc->tc_counter_mask; scale_factor=3D th->th_scale; diff_scaled=3D scale_factor * tim_diff; bintime_addx(bt, diff_scaled); atomic_thread_fence_acq(); } while (gen =3D=3D 0 || gen !=3D th->th_generation); #if 1 // Uses direct-map addresses (mapping to the most signficant c = being masked off). // Justin H. reported that some of the 0x0..0xff addresses were = unused // and available. The 2 larger ranges that I observed to stay at = zero // were 0x20..0x7f and 0xa..0xff --so that is what I limited = myself to. if (*(volatile = uint64_t*)0xc0000000000000b0