From owner-freebsd-ppc@freebsd.org Wed May 6 22:46:14 2020 Return-Path: Delivered-To: freebsd-ppc@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id EF69A2E1B0F for ; Wed, 6 May 2020 22:46:14 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic313-20.consmr.mail.gq1.yahoo.com (sonic313-20.consmr.mail.gq1.yahoo.com [98.137.65.83]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 49HWsd54K7z4JqH for ; Wed, 6 May 2020 22:46:13 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: 1DTOFBUVM1l3vkgFLHk9Orgw1l6iSVl6V736.O6AYzeKYG9409JF.XE3pj4Nhb8 wxM2socAmMDgRgT1STjmAnWIMvhrHXpd7Xe1QGptkfr_A5aE_DHdoCe_U6pgHJDgueZOAHSoF16c 9s8FWCUrg.RCSh9pwOWVMXuUX43e_ghISRk2Wg6.6xhQ2T.LWSE2jSb.nCEJwcICvLb0LoH0rPvQ vndyx74T7k8habzGqMXaaorUrJyMnNh6P8bSYpJt7DLQuvVbRNJDkp2T6Usxb6JNO4juPYnH8VwQ YiXFCLB7tcfiHKLbL8wvzOaTxBjRiQGBGZgYQ_nOjtZQoEoBw_fpTJLehecnIbJ3NC80ou2quMCt tAgDse.nEyOt3lO1M0ScM3KEAWrAl8ojZhru7IL22j.myAtpIYWnpBA5WxK4RWWDvKEi.D66tvOn T.gFL.GPd8a1bo.veitEh9BtnCiAN8tfkbb5IKvjFGveVxmokR6SD7m6vn7lDcrtNkusudrUcXBk dkcTMFBnB0s7RPfDariunBkdsPPvInSzLyVKi9Wo2JQ8Bsx1Ws1qTa5Z4at6mmAlEPvVPptjnopD MEgqBxRpdRlOdNmdcj5nSQjUqjAYNCxhyP8xDIlAXPFQl9T0xe7uFDpkxhiInzFPLOQQpkot9lRo LxGXGg3mjbW4JbCYr528PcSfl_2jp20T2QBQfzCrJrjShysycv.wMLZBuMaSwXzG5MmvCUfYx1f8 MB5H12nWAo1L1DMzJHq1_EnkALd_yYa.tKrtyp7MgXil2RLnahwUAxt5Ye8MIwC245lKh.E_7TNX D6ICqQCEt56rxzvHl6xrsD18fF195Xi0CXBOfBXXiDMWCwA8JXU4WwPaEUL19Ahw39xZwDhVC0LZ 8jo16mJcXhFw32jwzAfOQZBvSkUSTywRL_8S8CWGfOY2445i_DohrPqYQUmcEN3VPIc3Db4mkNWX Jd.yIbz0um9JFfA8LbNfwrT8iIR23jUqXUUdFu2T6f6qons4yoR8nm6HxD084oj4tkqorvnhB3AA VEnntdhlly5WWwbdc3W7YT.9W9VS_MEuwKUEnejIjkh2YBRvts1_szYGtgq0tul1T7uzPzvcm96e 33ugj8lZKMDpdmj0aYtXIRY3wVoyBUktsIl7e.9yhTDoS_xQuGd7_57dYSGiYTnGto1HWGj52mpR XsvimPGcFSWPZp3sF0azYOK8AndD8tDw8C.VfowCiIhjsVDqW13syPLGZJD7C_FarbVpB7.ngp4_ dPDd_Jj38oAhhFvfV1FOfNiGUzjf91gyRG436GZHov9pmm12C2C3QJeKB262E8ePSVWdQPjWSbJC .N4hja_jYvYFduomkEO1F2R0NRbctHhjBvm8sN3_3ZSnMw3MdC0FYs3lBaRB9kVMblC2HkwOetVn P4g4R5hOCJig7i7Ifpg-- Received: from sonic.gate.mail.ne1.yahoo.com by sonic313.consmr.mail.gq1.yahoo.com with HTTP; Wed, 6 May 2020 22:46:12 +0000 Received: by smtp401.mail.bf1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID e84a67d5a75c05956583e617f413d1f5; Wed, 06 May 2020 22:46:06 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.80.23.2.2\)) Subject: Re: svn commit: r360233 - in head: contrib/jemalloc . . . : This partially breaks a 2-socket 32-bit powerpc (old PowerMac G4) based on head -r360311 From: Mark Millard In-Reply-To: <20200506120215.2615b439@titan.knownspace> Date: Wed, 6 May 2020 15:46:04 -0700 Cc: Brandon Bergren , FreeBSD PowerPC ML Content-Transfer-Encoding: 7bit Message-Id: <012EC6DD-AF2F-40EE-A9E2-A74ACE28E7A3@yahoo.com> References: <1588493689.54538000.et1xl2l8@frv55.fwdcdn.com> <922FBA7C-039D-4852-AC8F-E85A221C2559@yahoo.com> <20200506120215.2615b439@titan.knownspace> To: Justin Hibbits X-Mailer: Apple Mail (2.3608.80.23.2.2) X-Rspamd-Queue-Id: 49HWsd54K7z4JqH X-Spamd-Bar: -- X-Spamd-Result: default: False [-2.45 / 15.00]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; FREEMAIL_FROM(0.00)[yahoo.com]; MV_CASE(0.50)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FREEMAIL_TO(0.00)[gmail.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.95)[-0.953,0]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; NEURAL_HAM_LONG(-0.99)[-0.993,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(0.00)[ip: (1.47), ipnet: 98.137.64.0/21(0.83), asn: 36647(0.66), country: US(-0.05)]; IP_SCORE_FREEMAIL(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[83.65.137.98.list.dnswl.org : 127.0.5.0]; RWL_MAILSPIKE_POSSIBLE(0.00)[83.65.137.98.rep.mailspike.net : 127.0.0.17]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 May 2020 22:46:15 -0000 On 2020-May-6, at 10:02, Justin Hibbits wrote: > On Sun, 03 May 2020 09:56:02 -0500 > "Brandon Bergren" wrote: > >> On Sun, May 3, 2020, at 9:38 AM, Mark Millard via freebsd-ppc wrote: >>> >>> Observing and reporting the reverting result is an initial >>> part of problem isolation. I made no request for FreeBSD >>> to give up on using the updated jemalloc. (Unfortunately, >>> I'm not sure what a good next step of problem isolation >>> might be for the dual-socket PowerMac G4 context.) >> >> I appreciate this testing btw. The only dual-socket G4 I have (my >> xserve g4) does not have the second socket populated, so I am >> currently unable to test two-socket ppc32. >> >>> Other than reverting, no patch is known for the issue at >>> this point. More problem isolation is needed first. >>> >>> While I do not have access, https://wiki.freebsd.org/powerpc >>> lists more modern 32-bit powerpc hardware as supported: >>> MPC85XX evaluation boards and AmigaOne A1222 (powerpcspe). >>> (The AmigaOne A1222 seems to be dual-ore/single-socket.) >> >> jhibbits has an A1222 that is used as an actual primary desktop, and >> I will hopefully have one at the end of the year. And I have an RB800 >> that I use for testing. >> >> powerpcspe is really a different beast than aim32 though. I have been >> mainly working on aim32 on g4 laptops, although I do have an xserve. >> >>> >>> So folks with access to one of those may want to see >>> if they also see the problem(s) with head -r360233 or >>> later. >> >> Frankly, I wouldn't be surprised if this continues to be down to the >> timebase skew somehow. I know that jemalloc tends to be sensitive to >> time problems. >> >>> >>> Another interesting context to test could be single-socket >>> with just one core. (I might be able to do that on another >>> old PowerMac, booting the same media after moving the >>> media.) >> >> That's my primary aim32 testing platform. I have a stack of g4 >> laptops that I test on, and a magically working usb stick (ADATA C008 >> / 8GB model. For some reason it just works, I've never seen another >> stick actually work) >> >>> >>> If I understand right, the most common 32-bit powerpc >>> tier 2 hardware platforms may still be old PowerMac's. >>> They are considered supported and "mature", instead of >>> just "stable". See https://wiki.freebsd.org/powerpc . >>> However, the reality is that there are various problems >>> for old PowerMacs (32-bit and 64-bit, at least when >>> there is more than one socket present). The wiki page >>> does not hint at such. (I'm not sure about >>> single socket/multi-core PowerMacs: no access to >>> such.) >> >> Yes, neither I nor jhibbits have multiple socket g4 hardware at the >> moment, and I additionally don't have multiple socket g5 either. >> >>> >>> It is certainly possible for some problem to happen >>> that would lead to dropping the supported-status >>> for some or all old 32-bit PowerMacs, even as tier 2. >>> But that has not happened yet and I'd have no say in >>> such a choice. >> >> From a kernel standpoint, I for one have no intention of dropping 32 >> bit support in the forseeable future. I've actually been putting more >> work into 32 bit than 64 bit recently in fact. >> > > I currently have FreeBSD HEAD from late last week running on a dual G4 > MDD (WITNESS kernel), and no segmentation faults from dhclient. I'm > using the following patch against jemalloc. Brandon has reported other > results with that patch to me, so I'm not sure it's a correct patch. > > - Justin Thanks. The status of trying to track this down . . . I normally use MALLOC_PRODUCTION= in my normally non-debug builds. So: no jemalloc assert's. So I tried a "debug" build without MALLOC_PRODUCTION= --and so far I've not had any failures after booting with that world-build. Nor have any assert's failed. It has been longer than usual but it would probably be a few days before I concluded much. (At some point I'll reboot just to change the conditions some and then give it more time as well.) I had hoped this type of build would detect there being a problem earlier after things start going bad internally. I've still no means of directly causing the problem. I've still only seen the odd SIGSEGV's in dhclient, rpcbind, mountd, nfsd, and sendmail. I've really only learned: A) Either messed up memory contents is involved or addresses in registers were pointing to the wrong place. (I know not which for sure.) B) It seems to be probabilistic for when it starts in each of the 5 types of context. (Possibly some data race involved?) C) The programs do not all fail together but over time more than one type can get failures. D) Once sendmail's quickly executing subprocess starts having the problem during its exit, later instances seem to have it as well. (Inheriting bad memory content via a fork-ish operation that creates the subprocess?) E) I do have the example failure of one of the contexts with the prior jemalloc code. (It was a MALLOC_PRODUCTION= style build.) (I reverted to the modern jemalloc that seemed to expose the problem more.) So far I've made no progress for isolating the context for where the problem starts. I've no clue how much is messed up or for how long it has been messed up by the time a notice is reported. I still do not blame jemalloc: as far as I know it could be just contributing to exposing problem(s) from other code instead of having problems of its own. Some of the SIGEGV's are not in jemalloc code at the time of the SIGSEGV. > diff --git a/contrib/jemalloc/include/jemalloc/internal/cache_bin.h > b/contrib/jemalloc/include/jemalloc/internal/cache_bin.h index > d14556a3da8..728959a448e 100644 --- > a/contrib/jemalloc/include/jemalloc/internal/cache_bin.h +++ > b/contrib/jemalloc/include/jemalloc/internal/cache_bin.h @@ -88,7 +88,7 > @@ JEMALLOC_ALWAYS_INLINE void * cache_bin_alloc_easy(cache_bin_t *bin, > bool *success) { void *ret; > > - bin->ncached--; > + cache_bin_sz_t cached = --bin->ncached; > > /* > * Check for both bin->ncached == 0 and ncached < low_water > @@ -111,7 +111,7 @@ cache_bin_alloc_easy(cache_bin_t *bin, bool > *success) { > * cacheline). > */ > *success = true; > - ret = *(bin->avail - (bin->ncached + 1)); > + ret = *(bin->avail - (cached + 1)); > > return ret; > } As stands, it is messy trying to conclude if something helps vs. hurts vs. makes-little-difference. So I'm not sure how or when I'll try the above. So far I've focused on reproducing the problem, possibly in a away that gives better (earlier) information. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)