From owner-freebsd-hackers@freebsd.org Sat Mar 2 17:14:31 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 45B3E150565B; Sat, 2 Mar 2019 17:14:31 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id 78E0B700AC; Sat, 2 Mar 2019 17:14:29 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 5BD983DD847; Sun, 3 Mar 2019 04:14:24 +1100 (AEDT) Date: Sun, 3 Mar 2019 04:14:23 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Poul-Henning Kamp cc: Konstantin Belousov , Ian Lepore , Mark Millard , Mark Millard via freebsd-hackers , Konstantin Belousov , FreeBSD PowerPC ML Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] In-Reply-To: <9993.1551536230@critter.freebsd.dk> Message-ID: <20190303032006.T4781@besplex.bde.org> References: <20190228145542.GT2420@kib.kiev.ua> <20190228150811.GU2420@kib.kiev.ua> <962D78C3-65BE-40C1-BB50-A0088223C17B@yahoo.com> <28C2BB0A-3DAA-4D18-A317-49A8DD52778F@yahoo.com> <20190301112717.GW2420@kib.kiev.ua> <679402FF-907C-43AF-B18C-8C9CC857D7A6@yahoo.com> <6669.1551473821@critter.freebsd.dk> <210dfd0f50ee6b1149c914ee503502654eb5f328.camel@freebsd.org> <20190302105652.GD68879@kib.kiev.ua> <9993.1551536230@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=UJetJGXy c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=nwOOQBBF5AvJ24hNhIcA:9 a=CjuIK1q_8ugA:10 X-Rspamd-Queue-Id: 78E0B700AC X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of brde@optusnet.com.au designates 211.29.132.42 as permitted sender) smtp.mailfrom=brde@optusnet.com.au X-Spamd-Result: default: False [-6.25 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; RCVD_IN_DNSWL_LOW(-0.10)[42.132.29.211.list.dnswl.org : 127.0.5.1]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:211.29.132.0/23]; FREEMAIL_FROM(0.00)[optusnet.com.au]; MIME_GOOD(-0.10)[text/plain]; MIME_TRACE(0.00)[0:+]; DMARC_NA(0.00)[optusnet.com.au]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: extmail.optusnet.com.au]; NEURAL_HAM_SHORT(-0.83)[-0.827,0]; RCPT_COUNT_SEVEN(0.00)[7]; IP_SCORE(-3.11)[ip: (-8.30), ipnet: 211.28.0.0/14(-4.01), asn: 4804(-3.19), country: AU(-0.04)]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[optusnet.com.au]; ASN(0.00)[asn:4804, ipnet:211.28.0.0/14, country:AU]; FREEMAIL_CC(0.00)[gmail.com]; RCVD_COUNT_TWO(0.00)[2] X-Mailman-Approved-At: Sun, 03 Mar 2019 01:45:11 +0000 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Mar 2019 17:14:31 -0000 On Sat, 2 Mar 2019, Poul-Henning Kamp wrote: > -------- > In message <20190302105652.GD68879@kib.kiev.ua>, Konstantin Belousov writes: > >> Using more than two timehands increases a chance of reader to try to >> use outdated timehands. > > No, using only two timehands increase the chance that the reader tries > to use the timehand which is being updated. Then it sees the generation change and retries. We fixed the ordering of accesses to the generation count so that this is robust. 1 timehands is always valid, so with 2 timehands there is no wait for the retry except in the very unlikely event that the generation changes for the new timehands too. 1 timehands would work too, but the retries would have to wait while it is updated. > As long as the reader does not use the timehand being updated, using > a one or two generations old timehand is OK. In old versions, there were races checking the generation count. Having multiple timehands made these races more unlikely to matter. > The target-value for delta-t was "a few milliseconds" when I wrote > timecounters, if somebody has changed that since, I hope they did > their math first. Tickless kernels complicate things. It's surprising that tc_ticktock() works so well with them. Calls to hardclock() are not periodic, so calls to tc_ticktock() are not periodic either. It has to handle coalesced and 1/hz ticks. Too much coalescing would break it. With my normal hz = 100, cpu0:timer interrupts still occur at at least 100 Hz. These presumably go to hardclock(), so the timing is satisfied. With hz = 1000, cpu0:timer interrupts only occur at at least 200 Hz. This is less than tc_ticktock() expects, but it still works. Bruce From owner-freebsd-hackers@freebsd.org Sun Mar 3 01:02:26 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3FE541519EE1 for ; Sun, 3 Mar 2019 01:02:26 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-lf1-f52.google.com (mail-lf1-f52.google.com [209.85.167.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5CCC68D115 for ; Sun, 3 Mar 2019 01:02:25 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-lf1-f52.google.com with SMTP id 131so1055631lfa.5 for ; Sat, 02 Mar 2019 17:02:25 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=ohD1f3Ox+9VHloeb66MbX2uGbpZ30c9wSGwtkKW8C5w=; b=p7aHhcwjsTomAl/uN1e5YfuAIPoqqJwoSDUiBqrtp3E+klzRCpF2+Q11V7PCpiwLPK obj/ds7nLaML0MGn95Y+wiDFXYi5mGk+r+Sy0fKkJrg5aFKhvjT0YiyrF1uCyu4n/hdT SLX6UGiXodIHgMTFpC8utqFCGedIc2hZm5rc/ccTCbFTH8n/ygTEatw1V2oM/CBAMSsK tQSHuQLNsptLb6CiycSxCIO/f6LoCsy5WIdoNTyVlcyNBWC2zDn2RVoal5iCIi6AyN2Q GF8+Yn8lwWUDF5anWyqYuSlrMwtRZ2697o+5EeUcTacLYbSEc9aM/unPJE1V1kBgz1sp IgCA== X-Gm-Message-State: APjAAAXF248RWuFsEuOcT5tv+UZzMXWxdw3XvOiD3bAnXbtGq+fT1FMV SDkKP+89/tN5STefz4UJSI5qWkuUObHmXGQLMwSHEg== X-Google-Smtp-Source: APXvYqxyr/2AFnUi+2fv/yYsZQK4YB1PpPGQYemI0NsgVqU/opjX40dJPsZpNRtnI52jGD0WCzc0FsHkqgsfNREdv4E= X-Received: by 2002:ac2:4343:: with SMTP id o3mr6013319lfl.129.1551574937397; Sat, 02 Mar 2019 17:02:17 -0800 (PST) MIME-Version: 1.0 From: Alan Somers Date: Sat, 2 Mar 2019 18:02:06 -0700 Message-ID: Subject: Adding namecache entries outside of vfs_lookup and vn_open ? To: FreeBSD Hackers Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 5CCC68D115 X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of asomers@gmail.com designates 209.85.167.52 as permitted sender) smtp.mailfrom=asomers@gmail.com X-Spamd-Result: default: False [-2.73 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.95)[-0.954,0]; RCVD_COUNT_TWO(0.00)[2]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:209.85.128.0/17]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org]; DMARC_NA(0.00)[freebsd.org]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_TRACE(0.00)[0:+]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: alt3.gmail-smtp-in.l.google.com]; NEURAL_HAM_SHORT(-0.47)[-0.468,0]; RCVD_IN_DNSWL_NONE(0.00)[52.167.85.209.list.dnswl.org : 127.0.5.0]; IP_SCORE(-1.30)[ip: (-0.58), ipnet: 209.85.128.0/17(-3.83), asn: 15169(-2.03), country: US(-0.07)]; RCVD_TLS_LAST(0.00)[]; FORGED_SENDER(0.30)[asomers@freebsd.org,asomers@gmail.com]; R_DKIM_NA(0.00)[]; SUBJECT_ENDS_QUESTION(1.00)[]; ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US]; FROM_NEQ_ENVFROM(0.00)[asomers@freebsd.org,asomers@gmail.com]; FREEMAIL_ENVFROM(0.00)[gmail.com]; TO_DOM_EQ_FROM_DOM(0.00)[] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Mar 2019 01:02:26 -0000 It looks like lookup and open are the only common vops that create new namecache entries. At least, those are the only ones that set MAKEENTRY in the cn_flags field. However, fuse(4)'s create-like operations (FUSE_CREATE, FUSE_SYMLINK, etc) all return enough information to create a namecache entry for the newly created file. As-is, an operation like FUSE_CREATE will almost always be followed up by a FUSE_LOOKUP, necessitating an extra round-trip to userland. Would it be possible and wise to add these newly created entries to the namecache automatically? -Alan From owner-freebsd-hackers@freebsd.org Sat Mar 2 17:43:24 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 993A2150643F; Sat, 2 Mar 2019 17:43:24 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id E180270CEF; Sat, 2 Mar 2019 17:43:23 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 02F9B432B9E; Sun, 3 Mar 2019 04:43:20 +1100 (AEDT) Date: Sun, 3 Mar 2019 04:43:20 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Konstantin Belousov cc: Mark Millard , freebsd-hackers Hackers , FreeBSD PowerPC ML Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] In-Reply-To: <20190302142521.GE68879@kib.kiev.ua> Message-ID: <20190303041441.V4781@besplex.bde.org> References: <20190228145542.GT2420@kib.kiev.ua> <20190228150811.GU2420@kib.kiev.ua> <962D78C3-65BE-40C1-BB50-A0088223C17B@yahoo.com> <28C2BB0A-3DAA-4D18-A317-49A8DD52778F@yahoo.com> <20190301112717.GW2420@kib.kiev.ua> <20190302043936.A4444@besplex.bde.org> <20190301194217.GB68879@kib.kiev.ua> <20190302071425.G5025@besplex.bde.org> <20190302105140.GC68879@kib.kiev.ua> <20190302225513.W3408@besplex.bde.org> <20190302142521.GE68879@kib.kiev.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=14Grze90KK8wkU9TH5gA:9 a=CjuIK1q_8ugA:10 X-Rspamd-Queue-Id: E180270CEF X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-6.98 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-0.98)[-0.983,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; REPLY(-4.00)[] X-Mailman-Approved-At: Sun, 03 Mar 2019 01:53:54 +0000 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Mar 2019 17:43:24 -0000 On Sat, 2 Mar 2019, Konstantin Belousov wrote: > On Sun, Mar 03, 2019 at 12:03:18AM +1100, Bruce Evans wrote: >> On Sat, 2 Mar 2019, Konstantin Belousov wrote: >>> ... >>> So I am able to reproduce it with some surprising ease on HPET running >>> on Haswell. >> >> So what is the cause of it? Maybe the tickless code doesn't generate >> fake clock ticks right. Or it is just a library bug. The kernel has >> to be slightly real-time to satisfy the requirement of 1 update per. >> Applications are further from being real-time. But isn't it enough >> for the kernel to ensure that the timehands cycle more than once per >> second? > No, I entered ddb as you suggested. But using ddb is not normal. It is convenient that this fixes HPET and ACPI timecounters after using ddb, but this method doesn't help for timecounters that wrap fast. TSC-low at 2GHz wraps in 2 seconds, and i8254 wraps in a few milliseconds. >> I don't changing this at all this. binuptime() was carefully written >> to not need so much 64-bit arithmetic. >> >> If this pessimization is allowed, then it can also handle a 64-bit >> deltas. Using the better kernel method: >> >> if (__predict_false(delta >= th->th_large_delta)) { >> bt->sec += (scale >> 32) * (delta >> 32); >> x = (scale >> 32) * (delta & 0xffffffff); >> bt->sec += x >> 32; >> bintime_addx(bt, x << 32); >> x = (scale & 0xffffffff) * (delta >> 32); >> bt->sec += x >> 32; >> bintime_addx(bt, x << 32); >> bintime_addx(bt, (scale & 0xffffffff) * >> (delta & 0xffffffff)); >> } else >> bintime_addx(bt, scale * (delta & 0xffffffff)); > This only makes sense if delta is extended to uint64_t, which requires > the pass over timecounters. Yes, that was its point. It is a bit annoying to have a hardware timecounter like the TSC that doesn't wrap naturally, but then make it wrap by masking high bits. The masking step is also a bit wasteful. For the TSC, it is 1 step to discard high bids at the register level, then another step to apply the nask to discard th high bits again. >> I just noticed that there is a 64 x 32 -> 64 bit multiplication in the >> current method. This can be changed to do expicit 32 x 32 -> 64 bit >> multiplications and fix the overflow problem at small extra cost on >> 32-bit arches: >> >> /* 32-bit arches did the next multiplication implicitly. */ >> x = (scale >> 32) * delta; >> /* >> * And they did the following shifts and most of the adds >> * implicitly too. Except shifting x left by 32 lost the >> * seconds part that the next line handles. The next line >> * is the only extra cost for them. >> */ >> bt->sec += x >> 32; >> bintime_addx(bt, (x << 32) + (scale & 0xffffffff) * delta); > > Ok, what about the following. I'm not sure that I really want this, even if the pessimization is done. But it avoids using fls*(), so is especially good for 32-bit systems and OK for 64-bit systems too, especially in userland where fls*() is in the fast path. > > diff --git a/lib/libc/sys/__vdso_gettimeofday.c b/lib/libc/sys/__vdso_gettimeofday.c > index 3749e0473af..cfe3d96d001 100644 > --- a/lib/libc/sys/__vdso_gettimeofday.c > +++ b/lib/libc/sys/__vdso_gettimeofday.c > @@ -32,6 +32,8 @@ __FBSDID("$FreeBSD$"); > #include > #include > #include > +#include Not needed with 0xffffffff instead of UINT_MAX. The userland part is otherwise little changed. > diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c > index 2656fb4d22f..2e28f872229 100644 > --- a/sys/kern/kern_tc.c > +++ b/sys/kern/kern_tc.c > ... > @@ -351,17 +352,44 @@ fbclock_getmicrotime(struct timeval *tvp) > } while (gen == 0 || gen != th->th_generation); > } > #else /* !FFCLOCK */ > + > +static void > +bintime_helper(struct bintime *bt, uint64_t *scale, u_int delta) > +{ > + uint64_t x; > + > + x = (*scale >> 32) * delta; > + *scale &= 0xffffffff; > + bt->sec += x >> 32; > + bintime_addx(bt, x << 32); > +} It is probably best to not inline the slow path, but clang tends to inline everything anyway. I prefer my way of writing this in 3 lines. Modifying 'scale' for the next step is especially ugly and pessimal when the next step is in the caller and this function is not inlined. > + > void > binuptime(struct bintime *bt) > { > struct timehands *th; > - u_int gen; > + uint64_t scale; > + u_int delta, gen; > > do { > th = timehands; > gen = atomic_load_acq_int(&th->th_generation); > *bt = th->th_offset; > - bintime_addx(bt, th->th_scale * tc_delta(th)); > + scale = th->th_scale; > + delta = tc_delta(th); > +#ifdef _LP64 > + /* Avoid overflow for scale * delta. */ > + if (__predict_false(th->th_large_delta <= delta)) > + bintime_helper(bt, &scale, delta); > + bintime_addx(bt, scale * delta); > +#else > + /* > + * Also avoid (uint64_t, uint32_t) -> uint64_t > + * multiplication on 32bit arches. > + */ "Also avoid overflow for ..." > + bintime_helper(bt, &scale, delta); > + bintime_addx(bt, (u_int)scale * delta); The cast should be to uint32_t, but better write it as & 0xffffffff as elsewhere. bintime_helper() already reduced 'scale' to 32 bits. The cast might be needed to tell the compiler this, especially when the function is not inlined. Better not do it in the function. The function doesn't even use the reduced value. bintime_helper() is in the fast path in this case, so should be inlined. > +#endif > atomic_thread_fence_acq(); > } while (gen == 0 || gen != th->th_generation); > } This needs lots of testing of course. Bruce From owner-freebsd-hackers@freebsd.org Sun Mar 3 05:21:06 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DF6A915244D5 for ; Sun, 3 Mar 2019 05:21:05 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic317-36.consmr.mail.ne1.yahoo.com (sonic317-36.consmr.mail.ne1.yahoo.com [66.163.184.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9747A6EDE5 for ; Sun, 3 Mar 2019 05:21:04 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: qYN4k3IVM1lUtekxELAbd42EJuBb7C7eA4P_EyTRQ_K6TdyTM1cbzUqDPu4KV3M gCAgqy6wc2BiQtk5oJNAIq01M_Uu4Asixqsb7ZN3hnyc69CwT_Msst4bN0GVoolnXqlb0rLIJPnj FXS2gaBHA1Isy7P0rBXxr4KIVv2_BMmb3KVqJfGOxjqMwW.y_sEXy30xBoP6SnY3OwiHv0IiPUxq b9eQRWZPz15hZaTBofYkBuyN96FULOQ_zhxgrVEcTWUFvCfv.Aik6oELWCjsDLf1iABFREPoTfXm MnM4sSEFx9j3n.xlYnHdZdFGMFU8UoHf2Zi38trFbU6aYre2z9qVvyeJrSt2AxfyUqedFroS9LBc 76NMtjfd24WYb1_Juzx5gFqqyi2H3COYAzSiDjV_WSMa6FptXaIgB9HT6UI0KlAlG9m9zLJeKPXQ 2L2wfD_VAl8e_ld73o091ws80Vigl0FZBKmlUBE5KyRyFAiQ4Dd37xATiStRGJZ7VKx8XXmcnook R7fpOBOjcr5gvJEN3WMQaUf9hl1NLvKw009PAPSvvjx8yDmtdpdT2HvQSVclpudBO8cKaTngu3Uc Ts6Ls9NtHb_Cbn2ZIwXH42CEhL8lgwX9sC282o8COW9JJSnd9FjU9UE4qoIba7REjzvIEEe5fuUS stMoDANNAyGzoGHmaD_3EWyHtK9_.bAGO_OLTi9siYT6cdo_zMCfywcG37E6o_1SNlzs8YQPEKjr 9CJBwkeYNRTLiw0lVPqa.QOtcNFlZgGgTDg7XHbamIxD6jdLoJ5u1lD6dGalUKYl7ucX3KRSYxUP idSJGzkneMgDgQEU6lRkUCrE0PxiAZYOpmWKzR2TXU8AYb.egbqYnMJIsyALShoDB8N7B6L_mMI1 hbHRXrwNWu9d7KTy.qc0gY5.ab2lZ2PrDqcJLJiSjUl0T5y9EfgZI_U5dUH2b2RNHVGvnsnkf0BH _MVAefJoZxjqY4TgAFPpbNVfAta4tY_D6Rv.8uE5eXOscoHxOEovK.x22KjM547i9jJzRgwyGfiq yTg1ryJr5cwkeW.zWR4ocR51z.CGrUbBMCRZ9zUx5GLYxUW25PdM5lhvl0nxz2GwWhYQ6lJQ- Received: from sonic.gate.mail.ne1.yahoo.com by sonic317.consmr.mail.ne1.yahoo.com with HTTP; Sun, 3 Mar 2019 05:21:02 +0000 Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.115]) ([67.170.167.181]) by smtp421.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID 44d2170f61a51d6c5540268ab4cad8d3; Sun, 03 Mar 2019 05:21:00 +0000 (UTC) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: powerpc64 on PowerMac G5 4-core (system total): a hack that so far seem to avoid the stuck-sleeping issue Message-Id: Date: Sat, 2 Mar 2019 21:20:58 -0800 To: FreeBSD PowerPC ML , Mark Millard via freebsd-hackers X-Mailer: Apple Mail (2.3445.102.3) X-Rspamd-Queue-Id: 9747A6EDE5 X-Spamd-Bar: +++ X-Spamd-Result: default: False [3.33 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36646, ipnet:66.163.184.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; NEURAL_SPAM_SHORT(0.67)[0.672,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(1.33)[ip: (4.40), ipnet: 66.163.184.0/21(1.29), asn: 36646(1.03), country: US(-0.07)]; NEURAL_SPAM_MEDIUM(0.96)[0.956,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(0.88)[0.879,0]; RCVD_IN_DNSWL_NONE(0.00)[47.184.163.66.list.dnswl.org : 127.0.5.0] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Mar 2019 05:21:06 -0000 [This note goes in a different direction compared to my prior evidence report for overflows and the later activity that has been happening for it. This does *not* involve the patches associated with that report.] I view the following as an evidence-gathering hack: showing the change in behavior with the code changes, not as directly what FreeBSD should do for powerpc64. In code for defined(__powerpc64__) && defined(AIM) I freely use knowledge of the PowerMac G5 context instead of attempting general code. Also: the code is set up to record some information that I've been looking at via ddb. The recording is not part of what changes the behavior but I decided to show that code too. It is preliminary, but, so far, the hack has avoided buf*daemon* threads and pmac_thermal getting stuck sleeping (or, at least, far less frequently). The tbr-value hack: =46rom what I see the G5 various cores have each tbr running at the same rate but have some some offsets as far as the base time goes. cpu_mp_unleash does: ap_awake =3D 1; /* Provide our current DEC and TB values for APs */ ap_timebase =3D mftb() + 10; __asm __volatile("msync; isync"); /* Let APs continue */ atomic_store_rel_int(&ap_letgo, 1); platform_smp_timebase_sync(ap_timebase, 0); and machdep_ap_bootstrap does: /* * Set timebase as soon as possible to meet an implicit = rendezvous * from cpu_mp_unleash(), which sets ap_letgo and then = immediately * sets timebase. * * Note that this is instrinsically racy and is only relevant on * platforms that do not support better mechanisms. */ platform_smp_timebase_sync(ap_timebase, 1); which attempts to set the tbrs appropriately. But on small scales of differences the various tbr values from different cpus end up not well ordered relative to time, synchronizes with, and the like. Only large enough differences can well indicate an ordering of interest. Note: tc->tc_get_timecount(tc) only provides the least signficant 32 bits of the tbr value. th->th_offset_count is also 32 bits and based on truncated tbr values. So I made binuptime avoid finishing when it sees a small (<0x10) step backwards for a new tc->tc_get_timecount(tc) value vs. the existing th->th_offset_count value (values strongly tied to powerpc64 tbr values): void binuptime(struct bintime *bt) { struct timehands *th; u_int gen; struct bintime old_bt=3D *bt; // HACK!!! struct timecounter *tc; // HACK!!! u_int tim_cnt, tim_offset, tim_diff; // HACK!!! uint64_t freq, scale_factor, diff_scaled; // HACK!!! u_int try_cnt=3D 0ull; // HACK!!! do { do { // HACK!!! th =3D timehands; tc =3D th->th_counter; gen =3D atomic_load_acq_int(&th->th_generation); tim_cnt=3D tc->tc_get_timecount(tc); tim_offset=3D th->th_offset_count; } while (tim_cntth_offset; tim_diff=3D (tim_cnt - tim_offset) & = tc->tc_counter_mask; scale_factor=3D th->th_scale; diff_scaled=3D scale_factor * tim_diff; bintime_addx(bt, diff_scaled); freq=3D tc->tc_frequency; atomic_thread_fence_acq(); try_cnt++; } while (gen =3D=3D 0 || gen !=3D th->th_generation); if (*(volatile uint64_t*)0xc000000000000020=3D=3D0u && = (0xffffffffffffffffull/scale_factor)tc_get_timecount(tc) not actually indicating a useful < vs. =3D=3D vs. > ordering relation uniquely. (I make no claim that the hack is a proper way to deal with such.) =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) From owner-freebsd-hackers@freebsd.org Sun Mar 3 11:03:55 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9523F1506379 for ; Sun, 3 Mar 2019 11:03:55 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id BF52C804F5; Sun, 3 Mar 2019 11:03:54 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x23B3lct050818 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sun, 3 Mar 2019 13:03:50 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x23B3lct050818 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id x23B3kgh050817; Sun, 3 Mar 2019 13:03:46 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 3 Mar 2019 13:03:46 +0200 From: Konstantin Belousov To: Alan Somers Cc: FreeBSD Hackers Subject: Re: Adding namecache entries outside of vfs_lookup and vn_open ? Message-ID: <20190303110346.GH68879@kib.kiev.ua> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.11.3 (2019-02-01) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Mar 2019 11:03:55 -0000 On Sat, Mar 02, 2019 at 06:02:06PM -0700, Alan Somers wrote: > It looks like lookup and open are the only common vops that create new > namecache entries. At least, those are the only ones that set > MAKEENTRY in the cn_flags field. However, fuse(4)'s create-like > operations (FUSE_CREATE, FUSE_SYMLINK, etc) all return enough > information to create a namecache entry for the newly created file. > As-is, an operation like FUSE_CREATE will almost always be followed up > by a FUSE_LOOKUP, necessitating an extra round-trip to userland. In VFS, creation of the new file is done by VOP_CREATE() after negative VOP_LOOKUP(). VOP_CREATE() returns the new vnode that is installed into file. [A flag VN_OPEN_NAMECACHE was added for vn_open_cred() which results in created name entry insertion into namecache. It was done to handle very specific situation in core dump code, which is no longer relevant. The flag is still there.] Similar discussion occured some time ago. I think that the current selection of the cases where namecache entry is created, is optimized for the scenario where extracting large tarball does not largely affect the non-directory elements of the cache. If you do such extraction, it is unlikely that you will access most of the files shortly. > Would it be possible and wise to add these newly created entries to > the namecache automatically? Not from VFS, but the policy can be overriden by the filesystem by inserting the elements into cache from VOPs as it finds suitable. Does FUSE cache vnodes ? I would find aggressive caching on the kernel side somewhat unexpected for it. From owner-freebsd-hackers@freebsd.org Sun Mar 3 16:16:47 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1BDCC150FB82; Sun, 3 Mar 2019 16:16:47 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 41FCD8A1A6; Sun, 3 Mar 2019 16:16:46 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x23GGaML078609 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sun, 3 Mar 2019 18:16:39 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x23GGaML078609 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id x23GGZF2078608; Sun, 3 Mar 2019 18:16:35 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 3 Mar 2019 18:16:35 +0200 From: Konstantin Belousov To: Bruce Evans Cc: Mark Millard , freebsd-hackers Hackers , FreeBSD PowerPC ML Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] Message-ID: <20190303161635.GJ68879@kib.kiev.ua> References: <20190301112717.GW2420@kib.kiev.ua> <20190302043936.A4444@besplex.bde.org> <20190301194217.GB68879@kib.kiev.ua> <20190302071425.G5025@besplex.bde.org> <20190302105140.GC68879@kib.kiev.ua> <20190302225513.W3408@besplex.bde.org> <20190302142521.GE68879@kib.kiev.ua> <20190303041441.V4781@besplex.bde.org> <20190303111931.GI68879@kib.kiev.ua> <20190303223100.B3572@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190303223100.B3572@besplex.bde.org> User-Agent: Mutt/1.11.3 (2019-02-01) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Mar 2019 16:16:47 -0000 On Mon, Mar 04, 2019 at 12:32:12AM +1100, Bruce Evans wrote: > On Sun, 3 Mar 2019, Konstantin Belousov wrote: > > > On Sun, Mar 03, 2019 at 04:43:20AM +1100, Bruce Evans wrote: > >> On Sat, 2 Mar 2019, Konstantin Belousov wrote: > >> > >>> On Sun, Mar 03, 2019 at 12:03:18AM +1100, Bruce Evans wrote: > >>>> On Sat, 2 Mar 2019, Konstantin Belousov wrote: > >* ... > >>>> I don't changing this at all this. binuptime() was carefully written > >>>> to not need so much 64-bit arithmetic. > >>>> > >>>> If this pessimization is allowed, then it can also handle a 64-bit > >>>> deltas. Using the better kernel method: > >>>> > >>>> if (__predict_false(delta >= th->th_large_delta)) { > >>>> bt->sec += (scale >> 32) * (delta >> 32); > >>>> x = (scale >> 32) * (delta & 0xffffffff); > >>>> bt->sec += x >> 32; > >>>> bintime_addx(bt, x << 32); > >>>> x = (scale & 0xffffffff) * (delta >> 32); > >>>> bt->sec += x >> 32; > >>>> bintime_addx(bt, x << 32); > >>>> bintime_addx(bt, (scale & 0xffffffff) * > >>>> (delta & 0xffffffff)); > >>>> } else > >>>> bintime_addx(bt, scale * (delta & 0xffffffff)); > >>> This only makes sense if delta is extended to uint64_t, which requires > >>> the pass over timecounters. > >> > >> Yes, that was its point. It is a bit annoying to have a hardware > >> timecounter like the TSC that doesn't wrap naturally, but then make it > >> wrap by masking high bits. > >> > >> The masking step is also a bit wasteful. For the TSC, it is 1 step to > >> discard high bids at the register level, then another step to apply the > >> nask to discard th high bits again. > > rdtsc-low is implemented in the natural way, after RDTSC, no register > > combining into 64bit value is done, instead shrd operates on %edx:%eax > > to get the final result into %eax. I am not sure what you refer to. > > I was referring mostly to the masking step '& tc->tc_counter_mask' and > the lack of register combining in rdtsc(). > > However, shrd in rdtsc-low (tsc_get_timecount_low()) does a slow combining > step. i386 used to be faster here -- the first masking step of discarding > %edx doesn't take any code. amd64 has to mask out the top bits in %rax. > Now for the tsc-low pessimization, i386 has to do a slow shrd, and amd64 > has to do a not so slow shr. i386 cannot discard %edx after RDTSC since some bits from %edx come into the timecounter value. amd64 cannot either, but amd64 does not need to mask out top bits in %rax, since the whole shrdl calculation occurs in 32bit registers, and the result is in %rax where top word is cleared by shrdl instruction automatically. But the clearing is not required since result is unsigned int anyway. Dissassemble of tsc_get_timecount_low() is very clear: 0xffffffff806767e4 <+4>: mov 0x30(%rdi),%ecx 0xffffffff806767e7 <+7>: rdtsc 0xffffffff806767e9 <+9>: shrd %cl,%edx,%eax ... 0xffffffff806767ed <+13>: retq (I removed frame manipulations). > > Then the '& tc->tc_counter_mask' step has no effect. This is true. > > All this is wrapped in many layers of function calls which are quite slow > but this lets the other operations run in parallel on some CPUs. > > >>>> /* 32-bit arches did the next multiplication implicitly. */ > >>>> x = (scale >> 32) * delta; > >>>> /* > >>>> * And they did the following shifts and most of the adds > >>>> * implicitly too. Except shifting x left by 32 lost the > >>>> * seconds part that the next line handles. The next line > >>>> * is the only extra cost for them. > >>>> */ > >>>> bt->sec += x >> 32; > >>>> bintime_addx(bt, (x << 32) + (scale & 0xffffffff) * delta); > >>> > >>> Ok, what about the following. > >> > >> I'm not sure that I really want this, even if the pessimization is done. > >> But it avoids using fls*(), so is especially good for 32-bit systems and > >> OK for 64-bit systems too, especially in userland where fls*() is in the > >> fast path. > > For userland I looked at the generated code, and BSR usage seems to be > > good enough, for default compilation settings with clang. > > I use gcc-4.2.1, and it doesn't do this optimization. > > I already reported this in connection with fixing calcru1(). calcru1() > is unnecessarily several times slower on i386 than on amd64 even after > avoiding using flsll() on it. The main slowness is in converting 'usec' > to tv_sec and tv_usec, due to the bad design and implementation of the > __udivdi3 and __umoddi3 libcalls. The bad design is having to make 2 > libcalls to get the quotient and remainder. The bad implementation is > the portable C version in libkern. libgcc provides a better implementation, > but this is not available in the kernel. > > >>> diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c > >>> index 2656fb4d22f..2e28f872229 100644 > >>> --- a/sys/kern/kern_tc.c > >>> +++ b/sys/kern/kern_tc.c > >>> ... > >>> @@ -351,17 +352,44 @@ fbclock_getmicrotime(struct timeval *tvp) > >>> } while (gen == 0 || gen != th->th_generation); > >>> } > >>> #else /* !FFCLOCK */ > >>> + > >>> +static void > >>> +bintime_helper(struct bintime *bt, uint64_t *scale, u_int delta) > >>> +{ > >>> + uint64_t x; > >>> + > >>> + x = (*scale >> 32) * delta; > >>> + *scale &= 0xffffffff; > >>> + bt->sec += x >> 32; > >>> + bintime_addx(bt, x << 32); > >>> +} > >> > >> It is probably best to not inline the slow path, but clang tends to > >> inline everything anyway. > > It does not matter if it inlines it, as far as it is moved out of the > > linear sequence for the fast path. > >> > >> I prefer my way of writing this in 3 lines. Modifying 'scale' for > >> the next step is especially ugly and pessimal when the next step is > >> in the caller and this function is not inlined. > > Can you show exactly what do you want ? > > Just write 'scale & 0xffffffff' for the low bits of 'scale' in callers, > and don't pass 'scale' indirectly to bintime_helper() and don't modify > it there. > > Oops, there is a problem. 'scale' must be reduced iff bintime_helper() > was used. Duplicate some source code so as to not need a fall-through > to the fast path. See below. Yes, this is the reason why it is passed by pointer (C has no references). > > >>> void > >>> binuptime(struct bintime *bt) > >>> { > >>> struct timehands *th; > >>> - u_int gen; > >>> + uint64_t scale; > >>> + u_int delta, gen; > >>> > >>> do { > >>> th = timehands; > >>> gen = atomic_load_acq_int(&th->th_generation); > >>> *bt = th->th_offset; > >>> - bintime_addx(bt, th->th_scale * tc_delta(th)); > >>> + scale = th->th_scale; > >>> + delta = tc_delta(th); > >>> +#ifdef _LP64 > >>> + /* Avoid overflow for scale * delta. */ > >>> + if (__predict_false(th->th_large_delta <= delta)) > >>> + bintime_helper(bt, &scale, delta); > >>> + bintime_addx(bt, scale * delta); > >>> +#else > >>> + /* > >>> + * Also avoid (uint64_t, uint32_t) -> uint64_t > >>> + * multiplication on 32bit arches. > >>> + */ > >> > >> "Also avoid overflow for ..." > >> > >>> + bintime_helper(bt, &scale, delta); > >>> + bintime_addx(bt, (u_int)scale * delta); > >> > >> The cast should be to uint32_t, but better write it as & 0xffffffff as > >> elsewhere. > > This is actually very broken. The cast gives a 32 x 32 -> 32 bit > multiplication, but all 64 bits of the result are needed. Yes, fixed in the updated version. > > >> > >> bintime_helper() already reduced 'scale' to 32 bits. The cast might be > >> needed to tell the compiler this, especially when the function is not > >> inlined. Better not do it in the function. The function doesn't even > >> use the reduced value. > > I used cast to use 32x32 multiplication. I am not sure that all (or any) > > compilers are smart enough to deduce that they can use 32 bit mul. > > Writing the reduction to 32 bits using a mask instead of a cast automatically > avoids the bug, but might not give the optimization. > > They do do this optimization, but might need the cast as well as the mask. > At worst, '(uint64_t)(uint32_t)(scale & 0xffffffff)', where the mask is > now redundant but the cast back to 64 bits is needed if the cast to 32 > bits is used. > > You already depended on them not needing the cast for the expression > '(*scale >> 32) * delta'. Here delta is 32 bits and the other operand > must remain 64 bits so that after default promotions the multiplication > is 64 x 64 -> 64 bits, but the compiler should optimize this to > 32 x 32 -> 64 bits. (*scale >> 32) would need to be cast to 32 bits > and then back to 64 bits if the compiler can't do this automatically. > > I checked what some compilers do. Both gcc-3.3.3 and gcc-4.2.1 > optimize only (uint64_t)x * y (where x and y have type uint32_t), so they > need to be helped by casts if x and y have have a larger type even if > their values obviously fit in 32 bits. So the expressions should be > written as: > > (uint64_t)(uint32_t)(scale >> 32) * delta; > > and > > (uint64_t)(uint32_t)scale * delta; > > The 2 casts are always needed, but the '& 0xffffffff' operation doesn't > need to be explicit because the cast does. This is what I do now. > > >> This needs lots of testing of course. > > > > Current kernel-only part of the change is below, see the question about > > your preference for binuptime_helper(). > > > > diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c > > index 2656fb4d22f..6c41ab22288 100644 > > --- a/sys/kern/kern_tc.c > > +++ b/sys/kern/kern_tc.c > > @@ -72,6 +71,7 @@ struct timehands { > > struct timecounter *th_counter; > > int64_t th_adjustment; > > uint64_t th_scale; > > + uint64_t th_large_delta; > > u_int th_offset_count; > > struct bintime th_offset; > > struct bintime th_bintime; > > @@ -351,17 +351,45 @@ fbclock_getmicrotime(struct timeval *tvp) > > } while (gen == 0 || gen != th->th_generation); > > } > > #else /* !FFCLOCK */ > > + > > +static void > > Add __inline. This is in the fast path for 32-bit systems. Compilers do not need this hand-holding, and I prefer to avoid __inline unless really necessary. I checked with both clang 7.0 and gcc 8.3 that autoinlining did occured. > > > +bintime_helper(struct bintime *bt, uint64_t *scale, u_int delta) > > +{ > > + uint64_t x; > > + > > + x = (*scale >> 32) * delta; > > + *scale &= 0xffffffff; > > Remove the '*' on scale, cast (scale >> 32) to > (uint64_t)(uint32_t)(scale >> 32), and remove the change to *scale. > > > + bt->sec += x >> 32; > > + bintime_addx(bt, x << 32); > > +} > > + > > void > > binuptime(struct bintime *bt) > > { > > struct timehands *th; > > - u_int gen; > > + uint64_t scale; > > + u_int delta, gen; > > > > do { > > th = timehands; > > gen = atomic_load_acq_int(&th->th_generation); > > *bt = th->th_offset; > > - bintime_addx(bt, th->th_scale * tc_delta(th)); > > + scale = th->th_scale; > > + delta = tc_delta(th); > > +#ifdef _LP64 > > + /* Avoid overflow for scale * delta. */ > > + if (__predict_false(th->th_large_delta <= delta)) > > + bintime_helper(bt, &scale, delta); > > + bintime_addx(bt, scale * delta); > > Change to: > > if (__predict_false(th->th_large_delta <= delta)) { > bintime_helper(bt, scale, delta); > bintime_addx(bt, (scale & 0xffffffff) * delta); > } else > bintime_addx(bt, scale * delta); I do not like it, but ok. > > > +#else > > + /* > > + * Avoid both overflow as above and > > + * (uint64_t, uint32_t) -> uint64_t > > + * multiplication on 32bit arches. > > + */ > > This is a bit unclear. Better emphasize avoidance of the 64 x 32 -> 64 bit > multiplication. Something like: > > /* > * Use bintime_helper() unconditionally, since the fast > * path in the above method is not so fast here, since > * the 64 x 32 -> 64 bit multiplication is usually not > * available in hardware and emulating it using 2 > * 32 x 32 -> 64 bit multiplications uses code much > * like that in bintime_helper(). > */ > > > + bintime_helper(bt, &scale, delta); > > + bintime_addx(bt, (uint32_t)scale * delta); > > +#endif > > Remove '&' as usual, and fix this by casting the reduced scale back to > 64 bits. > > Similarly in bintime(). I merged two functions, finally. Having to copy the same code is too annoying for this change. So I verified that: - there is no 64bit multiplication in the generated code, for i386 both for clang 7.0 and gcc 8.3; - that everything is inlined, the only call from bintime/binuptime is the indirect call to get the timecounter value. > > Similarly in libc -- don't use the slow flsll() method in the 32-bit > case where it is especially slow. Don't use it in the 64-bit case either, > since this would need to be change when th_large_delta is added to the > API. > > Now I don't like my method in the kernel. It is is unnecessarily > complicated to have a specal case, and not faster either. diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c index 2656fb4d22f..0fd39e25058 100644 --- a/sys/kern/kern_tc.c +++ b/sys/kern/kern_tc.c @@ -72,6 +72,7 @@ struct timehands { struct timecounter *th_counter; int64_t th_adjustment; uint64_t th_scale; + uint64_t th_large_delta; u_int th_offset_count; struct bintime th_offset; struct bintime th_bintime; @@ -351,21 +352,63 @@ fbclock_getmicrotime(struct timeval *tvp) } while (gen == 0 || gen != th->th_generation); } #else /* !FFCLOCK */ -void -binuptime(struct bintime *bt) + +static void +bintime_helper(struct bintime *bt, uint64_t scale, u_int delta) +{ + uint64_t x; + + x = (scale >> 32) * delta; + bt->sec += x >> 32; + bintime_addx(bt, x << 32); +} + +static void +binnouptime(struct bintime *bt, u_int off) { struct timehands *th; - u_int gen; + struct bintime *bts; + uint64_t scale; + u_int delta, gen; do { th = timehands; gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_offset; - bintime_addx(bt, th->th_scale * tc_delta(th)); + bts = (struct bintime *)(vm_offset_t)th + off; + *bt = *bts; + scale = th->th_scale; + delta = tc_delta(th); +#ifdef _LP64 + if (__predict_false(th->th_large_delta <= delta)) { + /* Avoid overflow for scale * delta. */ + bintime_helper(bt, scale, delta); + bintime_addx(bt, (scale & 0xffffffff) * delta); + } else { + bintime_addx(bt, scale * delta); + } +#else + /* + * Use bintime_helper() unconditionally, since the fast + * path in the above method is not so fast here, since + * the 64 x 32 -> 64 bit multiplication is usually not + * available in hardware and emulating it using 2 + * 32 x 32 -> 64 bit multiplications uses code much + * like that in bintime_helper(). + */ + bintime_helper(bt, scale, delta); + bintime_addx(bt, (uint64_t)(uint32_t)scale * delta); +#endif atomic_thread_fence_acq(); } while (gen == 0 || gen != th->th_generation); } +void +binuptime(struct bintime *bt) +{ + + binnouptime(bt, __offsetof(struct timehands, th_offset)); +} + void nanouptime(struct timespec *tsp) { @@ -387,16 +430,8 @@ microuptime(struct timeval *tvp) void bintime(struct bintime *bt) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_bintime; - bintime_addx(bt, th->th_scale * tc_delta(th)); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + binnouptime(bt, __offsetof(struct timehands, th_bintime)); } void @@ -1464,6 +1499,7 @@ tc_windup(struct bintime *new_boottimebin) scale += (th->th_adjustment / 1024) * 2199; scale /= th->th_counter->tc_frequency; th->th_scale = scale * 2; + th->th_large_delta = ((uint64_t)1 << 63) / scale; /* * Now that the struct timehands is again consistent, set the new From owner-freebsd-hackers@freebsd.org Sun Mar 3 16:25:26 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 07C73151015F for ; Sun, 3 Mar 2019 16:25:26 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 522448A6C6; Sun, 3 Mar 2019 16:25:25 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x23GPIna080120 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sun, 3 Mar 2019 18:25:21 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x23GPIna080120 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id x23GPIVo080118; Sun, 3 Mar 2019 18:25:18 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 3 Mar 2019 18:25:18 +0200 From: Konstantin Belousov To: Alan Somers Cc: FreeBSD Hackers Subject: Re: Adding namecache entries outside of vfs_lookup and vn_open ? Message-ID: <20190303162518.GK68879@kib.kiev.ua> References: <20190303110346.GH68879@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.11.3 (2019-02-01) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Mar 2019 16:25:26 -0000 On Sun, Mar 03, 2019 at 09:02:07AM -0700, Alan Somers wrote: > On Sun, Mar 3, 2019 at 4:03 AM Konstantin Belousov wrote: > > > > On Sat, Mar 02, 2019 at 06:02:06PM -0700, Alan Somers wrote: > > > It looks like lookup and open are the only common vops that create new > > > namecache entries. At least, those are the only ones that set > > > MAKEENTRY in the cn_flags field. However, fuse(4)'s create-like > > > operations (FUSE_CREATE, FUSE_SYMLINK, etc) all return enough > > > information to create a namecache entry for the newly created file. > > > As-is, an operation like FUSE_CREATE will almost always be followed up > > > by a FUSE_LOOKUP, necessitating an extra round-trip to userland. > > In VFS, creation of the new file is done by VOP_CREATE() after negative > > VOP_LOOKUP(). VOP_CREATE() returns the new vnode that is installed into > > file. [A flag VN_OPEN_NAMECACHE was added for vn_open_cred() which results > > in created name entry insertion into namecache. It was done to handle > > very specific situation in core dump code, which is no longer relevant. > > The flag is still there.] > > > > Similar discussion occured some time ago. I think that the current > > selection of the cases where namecache entry is created, is optimized > > for the scenario where extracting large tarball does not largely affect > > the non-directory elements of the cache. If you do such extraction, > > it is unlikely that you will access most of the files shortly. > > > > > Would it be possible and wise to add these newly created entries to > > > the namecache automatically? > > Not from VFS, but the policy can be overriden by the filesystem by inserting > > the elements into cache from VOPs as it finds suitable. > > So MAKEENTRY is just advisory, and there shouldn't be a problem with > inserting cache entries from fuse_nop_create even if MAKEENTRY isn't > set? I might try that. The penalty for not doing so is an extra trip > to userland, which is greater than the penalty for other file systems > not doing it. There can be problems from the too aggressive caching. See below. > > > > > Does FUSE cache vnodes ? I would find aggressive caching on the kernel > > side somewhat unexpected for it. > > No, it just uses the regular vnode cache. The unique things that it > does is it caches file attributes within the vnode, and the daemon can > request a timeout period for either the attr cache or the entry cache. > When the timeout expires, the kernel is supposed to purge (or ignore) > its cached values. This is what I mean, e.g. one of the strategy there might be to reclaim fuse vnode on inactivation. This is very harsh, of course, but was done by nullfs not too long time ago. For less contrived example, on NFS with its relatively defined semantic, caching on the client sometimes become problematic. AFAIR, nfs client re-checks mtime in strategic places, and ensures close-to-open consistency by always flushing attributes on close, at least for NFS v3. I am somewhat surprised that for FUSE it is considered safe (and useful) to cache at all. From owner-freebsd-hackers@freebsd.org Sun Mar 3 11:19:45 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9A6E11506BBC; Sun, 3 Mar 2019 11:19:45 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 02A5180B42; Sun, 3 Mar 2019 11:19:44 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x23BJWMX054208 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sun, 3 Mar 2019 13:19:36 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x23BJWMX054208 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id x23BJVXN054206; Sun, 3 Mar 2019 13:19:31 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 3 Mar 2019 13:19:31 +0200 From: Konstantin Belousov To: Bruce Evans Cc: Mark Millard , freebsd-hackers Hackers , FreeBSD PowerPC ML Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] Message-ID: <20190303111931.GI68879@kib.kiev.ua> References: <962D78C3-65BE-40C1-BB50-A0088223C17B@yahoo.com> <28C2BB0A-3DAA-4D18-A317-49A8DD52778F@yahoo.com> <20190301112717.GW2420@kib.kiev.ua> <20190302043936.A4444@besplex.bde.org> <20190301194217.GB68879@kib.kiev.ua> <20190302071425.G5025@besplex.bde.org> <20190302105140.GC68879@kib.kiev.ua> <20190302225513.W3408@besplex.bde.org> <20190302142521.GE68879@kib.kiev.ua> <20190303041441.V4781@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190303041441.V4781@besplex.bde.org> User-Agent: Mutt/1.11.3 (2019-02-01) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Mar 2019 11:19:45 -0000 On Sun, Mar 03, 2019 at 04:43:20AM +1100, Bruce Evans wrote: > On Sat, 2 Mar 2019, Konstantin Belousov wrote: > > > On Sun, Mar 03, 2019 at 12:03:18AM +1100, Bruce Evans wrote: > >> On Sat, 2 Mar 2019, Konstantin Belousov wrote: > >>> ... > >>> So I am able to reproduce it with some surprising ease on HPET running > >>> on Haswell. > >> > >> So what is the cause of it? Maybe the tickless code doesn't generate > >> fake clock ticks right. Or it is just a library bug. The kernel has > >> to be slightly real-time to satisfy the requirement of 1 update per. > >> Applications are further from being real-time. But isn't it enough > >> for the kernel to ensure that the timehands cycle more than once per > >> second? > > No, I entered ddb as you suggested. > > But using ddb is not normal. It is convenient that this fixes HPET and > ACPI timecounters after using ddb, but this method doesn't help for > timecounters that wrap fast. TSC-low at 2GHz wraps in 2 seconds, and > i8254 wraps in a few milliseconds. > > >> I don't changing this at all this. binuptime() was carefully written > >> to not need so much 64-bit arithmetic. > >> > >> If this pessimization is allowed, then it can also handle a 64-bit > >> deltas. Using the better kernel method: > >> > >> if (__predict_false(delta >= th->th_large_delta)) { > >> bt->sec += (scale >> 32) * (delta >> 32); > >> x = (scale >> 32) * (delta & 0xffffffff); > >> bt->sec += x >> 32; > >> bintime_addx(bt, x << 32); > >> x = (scale & 0xffffffff) * (delta >> 32); > >> bt->sec += x >> 32; > >> bintime_addx(bt, x << 32); > >> bintime_addx(bt, (scale & 0xffffffff) * > >> (delta & 0xffffffff)); > >> } else > >> bintime_addx(bt, scale * (delta & 0xffffffff)); > > This only makes sense if delta is extended to uint64_t, which requires > > the pass over timecounters. > > Yes, that was its point. It is a bit annoying to have a hardware > timecounter like the TSC that doesn't wrap naturally, but then make it > wrap by masking high bits. > > The masking step is also a bit wasteful. For the TSC, it is 1 step to > discard high bids at the register level, then another step to apply the > nask to discard th high bits again. rdtsc-low is implemented in the natural way, after RDTSC, no register combining into 64bit value is done, instead shrd operates on %edx:%eax to get the final result into %eax. I am not sure what you refer to. > > >> I just noticed that there is a 64 x 32 -> 64 bit multiplication in the > >> current method. This can be changed to do expicit 32 x 32 -> 64 bit > >> multiplications and fix the overflow problem at small extra cost on > >> 32-bit arches: > >> > >> /* 32-bit arches did the next multiplication implicitly. */ > >> x = (scale >> 32) * delta; > >> /* > >> * And they did the following shifts and most of the adds > >> * implicitly too. Except shifting x left by 32 lost the > >> * seconds part that the next line handles. The next line > >> * is the only extra cost for them. > >> */ > >> bt->sec += x >> 32; > >> bintime_addx(bt, (x << 32) + (scale & 0xffffffff) * delta); > > > > Ok, what about the following. > > I'm not sure that I really want this, even if the pessimization is done. > But it avoids using fls*(), so is especially good for 32-bit systems and > OK for 64-bit systems too, especially in userland where fls*() is in the > fast path. For userland I looked at the generated code, and BSR usage seems to be good enough, for default compilation settings with clang. > > > > > diff --git a/lib/libc/sys/__vdso_gettimeofday.c b/lib/libc/sys/__vdso_gettimeofday.c > > index 3749e0473af..cfe3d96d001 100644 > > --- a/lib/libc/sys/__vdso_gettimeofday.c > > +++ b/lib/libc/sys/__vdso_gettimeofday.c > > @@ -32,6 +32,8 @@ __FBSDID("$FreeBSD$"); > > #include > > #include > > #include > > +#include > > Not needed with 0xffffffff instead of UINT_MAX. > > The userland part is otherwise little changed. Yes, see above. If ABI for shared page going to be changed in some future, I will export th_large_delta as well. > > > diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c > > index 2656fb4d22f..2e28f872229 100644 > > --- a/sys/kern/kern_tc.c > > +++ b/sys/kern/kern_tc.c > > ... > > @@ -351,17 +352,44 @@ fbclock_getmicrotime(struct timeval *tvp) > > } while (gen == 0 || gen != th->th_generation); > > } > > #else /* !FFCLOCK */ > > + > > +static void > > +bintime_helper(struct bintime *bt, uint64_t *scale, u_int delta) > > +{ > > + uint64_t x; > > + > > + x = (*scale >> 32) * delta; > > + *scale &= 0xffffffff; > > + bt->sec += x >> 32; > > + bintime_addx(bt, x << 32); > > +} > > It is probably best to not inline the slow path, but clang tends to > inline everything anyway. It does not matter if it inlines it, as far as it is moved out of the linear sequence for the fast path. > > I prefer my way of writing this in 3 lines. Modifying 'scale' for > the next step is especially ugly and pessimal when the next step is > in the caller and this function is not inlined. Can you show exactly what do you want ? > > > + > > void > > binuptime(struct bintime *bt) > > { > > struct timehands *th; > > - u_int gen; > > + uint64_t scale; > > + u_int delta, gen; > > > > do { > > th = timehands; > > gen = atomic_load_acq_int(&th->th_generation); > > *bt = th->th_offset; > > - bintime_addx(bt, th->th_scale * tc_delta(th)); > > + scale = th->th_scale; > > + delta = tc_delta(th); > > +#ifdef _LP64 > > + /* Avoid overflow for scale * delta. */ > > + if (__predict_false(th->th_large_delta <= delta)) > > + bintime_helper(bt, &scale, delta); > > + bintime_addx(bt, scale * delta); > > +#else > > + /* > > + * Also avoid (uint64_t, uint32_t) -> uint64_t > > + * multiplication on 32bit arches. > > + */ > > "Also avoid overflow for ..." > > > + bintime_helper(bt, &scale, delta); > > + bintime_addx(bt, (u_int)scale * delta); > > The cast should be to uint32_t, but better write it as & 0xffffffff as > elsewhere. > > bintime_helper() already reduced 'scale' to 32 bits. The cast might be > needed to tell the compiler this, especially when the function is not > inlined. Better not do it in the function. The function doesn't even > use the reduced value. I used cast to use 32x32 multiplication. I am not sure that all (or any) compilers are smart enough to deduce that they can use 32 bit mul. > > bintime_helper() is in the fast path in this case, so should be inlined. > > > +#endif > > atomic_thread_fence_acq(); > > } while (gen == 0 || gen != th->th_generation); > > } > > This needs lots of testing of course. Current kernel-only part of the change is below, see the question about your preference for binuptime_helper(). diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c index 2656fb4d22f..6c41ab22288 100644 --- a/sys/kern/kern_tc.c +++ b/sys/kern/kern_tc.c @@ -72,6 +71,7 @@ struct timehands { struct timecounter *th_counter; int64_t th_adjustment; uint64_t th_scale; + uint64_t th_large_delta; u_int th_offset_count; struct bintime th_offset; struct bintime th_bintime; @@ -351,17 +351,45 @@ fbclock_getmicrotime(struct timeval *tvp) } while (gen == 0 || gen != th->th_generation); } #else /* !FFCLOCK */ + +static void +bintime_helper(struct bintime *bt, uint64_t *scale, u_int delta) +{ + uint64_t x; + + x = (*scale >> 32) * delta; + *scale &= 0xffffffff; + bt->sec += x >> 32; + bintime_addx(bt, x << 32); +} + void binuptime(struct bintime *bt) { struct timehands *th; - u_int gen; + uint64_t scale; + u_int delta, gen; do { th = timehands; gen = atomic_load_acq_int(&th->th_generation); *bt = th->th_offset; - bintime_addx(bt, th->th_scale * tc_delta(th)); + scale = th->th_scale; + delta = tc_delta(th); +#ifdef _LP64 + /* Avoid overflow for scale * delta. */ + if (__predict_false(th->th_large_delta <= delta)) + bintime_helper(bt, &scale, delta); + bintime_addx(bt, scale * delta); +#else + /* + * Avoid both overflow as above and + * (uint64_t, uint32_t) -> uint64_t + * multiplication on 32bit arches. + */ + bintime_helper(bt, &scale, delta); + bintime_addx(bt, (uint32_t)scale * delta); +#endif atomic_thread_fence_acq(); } while (gen == 0 || gen != th->th_generation); } @@ -388,13 +416,29 @@ void bintime(struct bintime *bt) { struct timehands *th; - u_int gen; + uint64_t scale; + u_int delta, gen; do { th = timehands; gen = atomic_load_acq_int(&th->th_generation); *bt = th->th_bintime; - bintime_addx(bt, th->th_scale * tc_delta(th)); + scale = th->th_scale; + delta = tc_delta(th); +#ifdef _LP64 + /* Avoid overflow for scale * delta. */ + if (__predict_false(th->th_large_delta <= delta)) + bintime_helper(bt, &scale, delta); + bintime_addx(bt, scale * delta); +#else + /* + * Avoid both overflow as above and + * (uint64_t, uint32_t) -> uint64_t + * multiplication on 32bit arches. + */ + bintime_helper(bt, &scale, delta); + bintime_addx(bt, (uint32_t)scale * delta); +#endif atomic_thread_fence_acq(); } while (gen == 0 || gen != th->th_generation); } @@ -1464,6 +1508,7 @@ tc_windup(struct bintime *new_boottimebin) scale += (th->th_adjustment / 1024) * 2199; scale /= th->th_counter->tc_frequency; th->th_scale = scale * 2; + th->th_large_delta = ((uint64_t)1 << 63) / scale; /* * Now that the struct timehands is again consistent, set the new From owner-freebsd-hackers@freebsd.org Sun Mar 3 16:02:27 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 76CA0150F4E2 for ; Sun, 3 Mar 2019 16:02:27 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-lj1-f194.google.com (mail-lj1-f194.google.com [209.85.208.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E644C89C18 for ; Sun, 3 Mar 2019 16:02:26 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-lj1-f194.google.com with SMTP id g80so2163275ljg.6 for ; Sun, 03 Mar 2019 08:02:26 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=I9i2Bsej1cVHQ5NM52IjlaLqeWvyrxisw8H355lTI1g=; b=PqpfHPro9TOH1G0R/Hgzf9FFK4N8XExW7TPltsEr8dj7r4Yq5821JkUbsesuALPY5F 7JgLJIR/I667fJpEmPymaunhK3eZT1dUK0sz9PxMQh+OP6eSX7OzlZmFeheVhbGto22p DKjT2QMPB3TA3c+/I9CVr7oDFUJRwnvQMFXvowA1mYe1bTkMmI0XvVE5t17afTWNWD3i og1jIWTg8OEbOahoTroO6vxGEUOt83xKAoRlKzI/H26w/mzetTn8rrcewCtqDThxFNSc fYL4eeqyEbLlU5y1NpaWLpQyoU8iD9ci3qXL1RalSPgBHPs24p6C6H/T6WPMYIBbhjXs UHbw== X-Gm-Message-State: APjAAAXRqdUQsRLfizYhOfYLbG6rBGm/EzHeb/IsM9AG6HEhVAFqOgeh jMMD87qseKTjE02oNorIHOdGxMb3FZzO8o8QGxc= X-Google-Smtp-Source: APXvYqzOJFa2V4amUNR7ZKvz94vBiZ5LuyoZD4+uVpZW7mwcsZjlLziPi73MQx+sGbmtBbQTtzkPjZF0GSMCLxolKY8= X-Received: by 2002:a2e:1510:: with SMTP id s16mr8276238ljd.62.1551628938908; Sun, 03 Mar 2019 08:02:18 -0800 (PST) MIME-Version: 1.0 References: <20190303110346.GH68879@kib.kiev.ua> In-Reply-To: <20190303110346.GH68879@kib.kiev.ua> From: Alan Somers Date: Sun, 3 Mar 2019 09:02:07 -0700 Message-ID: Subject: Re: Adding namecache entries outside of vfs_lookup and vn_open ? To: Konstantin Belousov Cc: FreeBSD Hackers Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: E644C89C18 X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-6.98 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-0.98)[-0.981,0]; REPLY(-4.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Mar 2019 16:02:27 -0000 On Sun, Mar 3, 2019 at 4:03 AM Konstantin Belousov wrote: > > On Sat, Mar 02, 2019 at 06:02:06PM -0700, Alan Somers wrote: > > It looks like lookup and open are the only common vops that create new > > namecache entries. At least, those are the only ones that set > > MAKEENTRY in the cn_flags field. However, fuse(4)'s create-like > > operations (FUSE_CREATE, FUSE_SYMLINK, etc) all return enough > > information to create a namecache entry for the newly created file. > > As-is, an operation like FUSE_CREATE will almost always be followed up > > by a FUSE_LOOKUP, necessitating an extra round-trip to userland. > In VFS, creation of the new file is done by VOP_CREATE() after negative > VOP_LOOKUP(). VOP_CREATE() returns the new vnode that is installed into > file. [A flag VN_OPEN_NAMECACHE was added for vn_open_cred() which results > in created name entry insertion into namecache. It was done to handle > very specific situation in core dump code, which is no longer relevant. > The flag is still there.] > > Similar discussion occured some time ago. I think that the current > selection of the cases where namecache entry is created, is optimized > for the scenario where extracting large tarball does not largely affect > the non-directory elements of the cache. If you do such extraction, > it is unlikely that you will access most of the files shortly. > > > Would it be possible and wise to add these newly created entries to > > the namecache automatically? > Not from VFS, but the policy can be overriden by the filesystem by inserting > the elements into cache from VOPs as it finds suitable. So MAKEENTRY is just advisory, and there shouldn't be a problem with inserting cache entries from fuse_nop_create even if MAKEENTRY isn't set? I might try that. The penalty for not doing so is an extra trip to userland, which is greater than the penalty for other file systems not doing it. > > Does FUSE cache vnodes ? I would find aggressive caching on the kernel > side somewhat unexpected for it. No, it just uses the regular vnode cache. The unique things that it does is it caches file attributes within the vnode, and the daemon can request a timeout period for either the attr cache or the entry cache. When the timeout expires, the kernel is supposed to purge (or ignore) its cached values. -Alan From owner-freebsd-hackers@freebsd.org Sun Mar 3 16:41:06 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E777315109D3 for ; Sun, 3 Mar 2019 16:41:05 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-lf1-f65.google.com (mail-lf1-f65.google.com [209.85.167.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6034D8AECB for ; Sun, 3 Mar 2019 16:41:05 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-lf1-f65.google.com with SMTP id p73so1177672lfe.10 for ; Sun, 03 Mar 2019 08:41:05 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=02ke6Y9nioz7OMFWzwHE0x2KijJllI7UJRyaDPy7VTU=; b=Y16F6QX924mPKerRwCajcgz5SiK6Oo0WzrNF/JzJBUHztpEIA4LPXUm5xIsMu4nEV5 XFm9Zhtw1dXp0kTzTZKkoL8HTpqgt12EMHi7rey4yPTkPTpht25Hj6a7wj+M5Z8wvP/O YSw+DTGkO/Y7SpbxchHMqcEYs21ScL3Wo4VLfgcTlwXyCErfRTAtBK08y3smLuGsZcAO pfL0/qjZRwIQtSbZj6j/saOLUBLPkFKFit5uYQTWQW1RLYaP6SrRUZQbbslVm/YaOTcg erJU9dTVDdna+iCAG6Wv4QEOeFPccD4/kMCwFFU2RD0v1bhCQHP3PdPDn3WkXvYMgMSq rTDQ== X-Gm-Message-State: APjAAAVNPtsQ0vqyLD8cPfd7JuKimA60ri1mvdG7R/jWWo4QZogXqCpy APxZ9qTj1mlo1t7brEZbGBVej0iBmtRf7Uqbtc0= X-Google-Smtp-Source: APXvYqzNsnGg/S6labwB9bvrho+WnYU88Zad7yY3AuWMRFNE2EiJX+vlcmL9UtmOqux9YaGn09TwqvFQJjrjG3hwnls= X-Received: by 2002:a19:c1c4:: with SMTP id r187mr8070925lff.10.1551631257890; Sun, 03 Mar 2019 08:40:57 -0800 (PST) MIME-Version: 1.0 References: <20190303110346.GH68879@kib.kiev.ua> <20190303162518.GK68879@kib.kiev.ua> In-Reply-To: <20190303162518.GK68879@kib.kiev.ua> From: Alan Somers Date: Sun, 3 Mar 2019 09:40:46 -0700 Message-ID: Subject: Re: Adding namecache entries outside of vfs_lookup and vn_open ? To: Konstantin Belousov Cc: FreeBSD Hackers Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 6034D8AECB X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-6.98 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-0.98)[-0.981,0]; REPLY(-4.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Mar 2019 16:41:06 -0000 On Sun, Mar 3, 2019 at 9:25 AM Konstantin Belousov wrote: > > On Sun, Mar 03, 2019 at 09:02:07AM -0700, Alan Somers wrote: > > On Sun, Mar 3, 2019 at 4:03 AM Konstantin Belousov wrote: > > > > > > On Sat, Mar 02, 2019 at 06:02:06PM -0700, Alan Somers wrote: > > > > It looks like lookup and open are the only common vops that create new > > > > namecache entries. At least, those are the only ones that set > > > > MAKEENTRY in the cn_flags field. However, fuse(4)'s create-like > > > > operations (FUSE_CREATE, FUSE_SYMLINK, etc) all return enough > > > > information to create a namecache entry for the newly created file. > > > > As-is, an operation like FUSE_CREATE will almost always be followed up > > > > by a FUSE_LOOKUP, necessitating an extra round-trip to userland. > > > In VFS, creation of the new file is done by VOP_CREATE() after negative > > > VOP_LOOKUP(). VOP_CREATE() returns the new vnode that is installed into > > > file. [A flag VN_OPEN_NAMECACHE was added for vn_open_cred() which results > > > in created name entry insertion into namecache. It was done to handle > > > very specific situation in core dump code, which is no longer relevant. > > > The flag is still there.] > > > > > > Similar discussion occured some time ago. I think that the current > > > selection of the cases where namecache entry is created, is optimized > > > for the scenario where extracting large tarball does not largely affect > > > the non-directory elements of the cache. If you do such extraction, > > > it is unlikely that you will access most of the files shortly. > > > > > > > Would it be possible and wise to add these newly created entries to > > > > the namecache automatically? > > > Not from VFS, but the policy can be overriden by the filesystem by inserting > > > the elements into cache from VOPs as it finds suitable. > > > > So MAKEENTRY is just advisory, and there shouldn't be a problem with > > inserting cache entries from fuse_nop_create even if MAKEENTRY isn't > > set? I might try that. The penalty for not doing so is an extra trip > > to userland, which is greater than the penalty for other file systems > > not doing it. > There can be problems from the too aggressive caching. See below. > > > > > > > > > Does FUSE cache vnodes ? I would find aggressive caching on the kernel > > > side somewhat unexpected for it. > > > > No, it just uses the regular vnode cache. The unique things that it > > does is it caches file attributes within the vnode, and the daemon can > > request a timeout period for either the attr cache or the entry cache. > > When the timeout expires, the kernel is supposed to purge (or ignore) > > its cached values. > > This is what I mean, e.g. one of the strategy there might be to reclaim > fuse vnode on inactivation. This is very harsh, of course, but was done > by nullfs not too long time ago. Currently fuse doesn't do anything special when the timeout expires. It only checks the timeout on lookup, and ignores the cached value if the timeout has already expired. > > For less contrived example, on NFS with its relatively defined semantic, > caching on the client sometimes become problematic. AFAIR, nfs client > re-checks mtime in strategic places, and ensures close-to-open > consistency by always flushing attributes on close, at least for NFS v3. > > I am somewhat surprised that for FUSE it is considered safe (and useful) > to cache at all. The daemon can choose the timeout period. For local filesystems like fusefs-ext2 it might set the timeout to infinity. For simple network filesystems like fusefs-sshfs it might set the timeout to 0, disabling all kernel cacheing. And for more sophisticated network filesystems like an NFSv4 client might set the timeout to a finite non-zero time. Later versions of the fuse protocol also allow the daemon to tell the kernel to immediately expire its cache. -Alan From owner-freebsd-hackers@freebsd.org Sun Mar 3 21:33:40 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 50E6F151CACC for ; Sun, 3 Mar 2019 21:33:40 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic305-49.consmr.mail.ne1.yahoo.com (sonic305-49.consmr.mail.ne1.yahoo.com [66.163.185.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3A7026E392 for ; Sun, 3 Mar 2019 21:33:39 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: qD9TOnMVM1k.Ghd4.8T_W8XTbUVNtLtatAuJ7hQZ5Ks1VpuxiRcKsrOf8r66Bod mP9F5P4PQQGh22H8HbhChBHcmXxKQ6SA_llrTDQY1oawxWa4RUDk- Received: from sonic.gate.mail.ne1.yahoo.com by sonic305.consmr.mail.ne1.yahoo.com with HTTP; Sun, 3 Mar 2019 21:33:31 +0000 Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.115]) ([67.170.167.181]) by smtp410.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID 908822f7437e5714b55851d382380e1a; Sun, 03 Mar 2019 21:23:06 +0000 (UTC) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: Re: powerpc64 on PowerMac G5 4-core (system total): a hack that so far seem to avoid the stuck-sleeping issue [self-hosted buildworld/buildkernel completed] Date: Sun, 3 Mar 2019 13:23:04 -0800 References: To: FreeBSD PowerPC ML , Mark Millard via freebsd-hackers In-Reply-To: Message-Id: <76E8BF75-A8F5-4A48-9B7C-6494F4A9520B@yahoo.com> X-Mailer: Apple Mail (2.3445.102.3) X-Rspamd-Queue-Id: 3A7026E392 X-Spamd-Bar: / X-Spamd-Result: default: False [-0.66 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net]; DKIM_TRACE(0.00)[yahoo.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36646, ipnet:66.163.184.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.30)[-0.299,0]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; NEURAL_SPAM_SHORT(0.58)[0.583,0]; NEURAL_HAM_LONG(-0.88)[-0.883,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(0.45)[ipnet: 66.163.184.0/21(1.29), asn: 36646(1.04), country: US(-0.07)]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[175.185.163.66.list.dnswl.org : 127.0.5.0] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Mar 2019 21:33:40 -0000 [So far the hack has been successful. Details given later below.] On 2019-Mar-2, at 21:20, Mark Millard wrote: > [This note goes in a different direction compared to my > prior evidence report for overflows and the later activity > that has been happening for it. This does *not* involve > the patches associated with that report.] >=20 > I view the following as an evidence-gathering hack: > showing the change in behavior with the code changes, > not as directly what FreeBSD should do for powerpc64. > In code for defined(__powerpc64__) && defined(AIM) > I freely use knowledge of the PowerMac G5 context > instead of attempting general code. >=20 > Also: the code is set up to record some information > that I've been looking at via ddb. The recording is > not part of what changes the behavior but I decided > to show that code too. >=20 > It is preliminary, but, so far, the hack has avoided > buf*daemon* threads and pmac_thermal getting stuck > sleeping (or, at least, far less frequently). >=20 >=20 > The tbr-value hack: >=20 > =46rom what I see the G5 various cores have each tbr running at the > same rate but have some some offsets as far as the base time > goes. cpu_mp_unleash does: >=20 > ap_awake =3D 1; >=20 > /* Provide our current DEC and TB values for APs */ > ap_timebase =3D mftb() + 10; > __asm __volatile("msync; isync"); >=20 > /* Let APs continue */ > atomic_store_rel_int(&ap_letgo, 1); >=20 > platform_smp_timebase_sync(ap_timebase, 0); >=20 > and machdep_ap_bootstrap does: >=20 > /* > * Set timebase as soon as possible to meet an implicit = rendezvous > * from cpu_mp_unleash(), which sets ap_letgo and then = immediately > * sets timebase. > * > * Note that this is instrinsically racy and is only relevant = on > * platforms that do not support better mechanisms. > */ > platform_smp_timebase_sync(ap_timebase, 1); >=20 >=20 > which attempts to set the tbrs appropriately. >=20 > But on small scales of differences the various tbr > values from different cpus end up not well ordered > relative to time, synchronizes with, and the like. > Only large enough differences can well indicate an > ordering of interest. >=20 > Note: tc->tc_get_timecount(tc) only provides the > least signficant 32 bits of the tbr value. > th->th_offset_count is also 32 bits and based on > truncated tbr values. >=20 > So I made binuptime avoid finishing when it sees > a small (<0x10) step backwards for a new > tc->tc_get_timecount(tc) value vs. the existing > th->th_offset_count value (values strongly tied > to powerpc64 tbr values): >=20 > void > binuptime(struct bintime *bt) > { > struct timehands *th; > u_int gen; >=20 > struct bintime old_bt=3D *bt; // HACK!!! > struct timecounter *tc; // HACK!!! > u_int tim_cnt, tim_offset, tim_diff; // HACK!!! > uint64_t freq, scale_factor, diff_scaled; // HACK!!! >=20 > u_int try_cnt=3D 0ull; // HACK!!! >=20 > do { > do { // HACK!!! > th =3D timehands; > tc =3D th->th_counter; > gen =3D atomic_load_acq_int(&th->th_generation); > tim_cnt=3D tc->tc_get_timecount(tc); > tim_offset=3D th->th_offset_count; > } while (tim_cnt *bt =3D th->th_offset; > tim_diff=3D (tim_cnt - tim_offset) & = tc->tc_counter_mask; > scale_factor=3D th->th_scale; > diff_scaled=3D scale_factor * tim_diff; > bintime_addx(bt, diff_scaled); > freq=3D tc->tc_frequency; > atomic_thread_fence_acq(); > try_cnt++; > } while (gen =3D=3D 0 || gen !=3D th->th_generation); >=20 > if (*(volatile uint64_t*)0xc000000000000020=3D=3D0u && = (0xffffffffffffffffull/scale_factor) *(volatile uint64_t*)0xc000000000000020=3D = bttosbt(old_bt); > *(volatile uint64_t*)0xc000000000000028=3D = bttosbt(*bt); > *(volatile uint64_t*)0xc000000000000030=3D freq; > *(volatile uint64_t*)0xc000000000000038=3D = scale_factor; > *(volatile uint64_t*)0xc000000000000040=3D tim_offset; > *(volatile uint64_t*)0xc000000000000048=3D tim_cnt; > *(volatile uint64_t*)0xc000000000000050=3D tim_diff; > *(volatile uint64_t*)0xc000000000000058=3D try_cnt; > *(volatile uint64_t*)0xc000000000000060=3D diff_scaled; > *(volatile uint64_t*)0xc000000000000068=3D = scale_factor*freq; > __asm__ ("sync"); > } else if (*(volatile uint64_t*)0xc0000000000000a0=3D=3D0u && = (0xffffffffffffffffull/scale_factor) *(volatile uint64_t*)0xc0000000000000a0=3D = bttosbt(old_bt); > *(volatile uint64_t*)0xc0000000000000a8=3D = bttosbt(*bt); > *(volatile uint64_t*)0xc0000000000000b0=3D freq; > *(volatile uint64_t*)0xc0000000000000b8=3D = scale_factor; > *(volatile uint64_t*)0xc0000000000000c0=3D tim_offset; > *(volatile uint64_t*)0xc0000000000000c8=3D tim_cnt; > *(volatile uint64_t*)0xc0000000000000d0=3D tim_diff; > *(volatile uint64_t*)0xc0000000000000d8=3D try_cnt; > *(volatile uint64_t*)0xc0000000000000e0=3D diff_scaled; > *(volatile uint64_t*)0xc0000000000000e8=3D = scale_factor*freq; > __asm__ ("sync"); > } > } > #else > . . . > #endif >=20 > So far as I can tell, the FreeBSD code is not designed to deal > with small differences in tc->tc_get_timecount(tc) not actually > indicating a useful < vs. =3D=3D vs. > ordering relation uniquely. >=20 > (I make no claim that the hack is a proper way to deal with > such.) I did a somewhat over 7 hours buildworld buildkernel on the PowerMac G5. Overall the G5 has been up over 13 hours and none of the buf*daemon* threads have gotten stuck sleeping. Nor has pmac_thermal gotten stuck. Similarly for vnlru and syncer: "top -HIStopid" still shows them all as periodically active. Previously for this usefdt=3D1 context (with the modern VM_MAX_KERNEL_ADDRESS), going more than a few minutes without at least one of those threads getting stuck sleeping was rare on the G5 (powerpc64 example). So this hack has managed to avoid finding sbinuptime() in sleepq_timeout being less than the earlier (by call structure/code sequencing) sbinuptime() in timercb that lead to the sleepq_timeout callout being called in the first place. So in the sleepq_timeout callout's: if (td->td_sleeptimo > sbinuptime() || td->td_sleeptimo =3D=3D = 0) { /* * The thread does not want a timeout (yet). */ } else . . . td->td_sleeptimo > sbinuptime() ends up false now for small enough original differences. This case does not set up another timeout, it just leaves the thread stuck sleeping, no longer doing periodic activities. As stands what I did (presuming an appropriate definition of "small differences in the problematical direction") should leave this and other sbinuptime-using code with: td->td_sleeptimo <=3D sbinuptime() for what were originally "small" tbr value differences in the problematical direction (in case other places require it in some way). If, instead, just sleepq_timeout's test could allow for some slop in the ordering, it could be a cheaper hack then looping in binuptime . At this point I've no clue what a correct/efficient FreeBSD design for allowing the sloppy match across tbr's for different CPUs would be. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) From owner-freebsd-hackers@freebsd.org Sun Mar 3 13:32:23 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D82A8150AD29; Sun, 3 Mar 2019 13:32:22 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id 2DFAB84F1C; Sun, 3 Mar 2019 13:32:21 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 2FB8F436AEC; Mon, 4 Mar 2019 00:32:12 +1100 (AEDT) Date: Mon, 4 Mar 2019 00:32:12 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Konstantin Belousov cc: Mark Millard , freebsd-hackers Hackers , FreeBSD PowerPC ML Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] In-Reply-To: <20190303111931.GI68879@kib.kiev.ua> Message-ID: <20190303223100.B3572@besplex.bde.org> References: <962D78C3-65BE-40C1-BB50-A0088223C17B@yahoo.com> <28C2BB0A-3DAA-4D18-A317-49A8DD52778F@yahoo.com> <20190301112717.GW2420@kib.kiev.ua> <20190302043936.A4444@besplex.bde.org> <20190301194217.GB68879@kib.kiev.ua> <20190302071425.G5025@besplex.bde.org> <20190302105140.GC68879@kib.kiev.ua> <20190302225513.W3408@besplex.bde.org> <20190302142521.GE68879@kib.kiev.ua> <20190303041441.V4781@besplex.bde.org> <20190303111931.GI68879@kib.kiev.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=L2uf15vNulIdqj9DapQA:9 a=CjuIK1q_8ugA:10 X-Rspamd-Queue-Id: 2DFAB84F1C X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-6.90 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-0.90)[-0.900,0]; REPLY(-4.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0] X-Mailman-Approved-At: Sun, 03 Mar 2019 22:44:33 +0000 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Mar 2019 13:32:23 -0000 On Sun, 3 Mar 2019, Konstantin Belousov wrote: > On Sun, Mar 03, 2019 at 04:43:20AM +1100, Bruce Evans wrote: >> On Sat, 2 Mar 2019, Konstantin Belousov wrote: >> >>> On Sun, Mar 03, 2019 at 12:03:18AM +1100, Bruce Evans wrote: >>>> On Sat, 2 Mar 2019, Konstantin Belousov wrote: >* ... >>>> I don't changing this at all this. binuptime() was carefully written >>>> to not need so much 64-bit arithmetic. >>>> >>>> If this pessimization is allowed, then it can also handle a 64-bit >>>> deltas. Using the better kernel method: >>>> >>>> if (__predict_false(delta >= th->th_large_delta)) { >>>> bt->sec += (scale >> 32) * (delta >> 32); >>>> x = (scale >> 32) * (delta & 0xffffffff); >>>> bt->sec += x >> 32; >>>> bintime_addx(bt, x << 32); >>>> x = (scale & 0xffffffff) * (delta >> 32); >>>> bt->sec += x >> 32; >>>> bintime_addx(bt, x << 32); >>>> bintime_addx(bt, (scale & 0xffffffff) * >>>> (delta & 0xffffffff)); >>>> } else >>>> bintime_addx(bt, scale * (delta & 0xffffffff)); >>> This only makes sense if delta is extended to uint64_t, which requires >>> the pass over timecounters. >> >> Yes, that was its point. It is a bit annoying to have a hardware >> timecounter like the TSC that doesn't wrap naturally, but then make it >> wrap by masking high bits. >> >> The masking step is also a bit wasteful. For the TSC, it is 1 step to >> discard high bids at the register level, then another step to apply the >> nask to discard th high bits again. > rdtsc-low is implemented in the natural way, after RDTSC, no register > combining into 64bit value is done, instead shrd operates on %edx:%eax > to get the final result into %eax. I am not sure what you refer to. I was referring mostly to the masking step '& tc->tc_counter_mask' and the lack of register combining in rdtsc(). However, shrd in rdtsc-low (tsc_get_timecount_low()) does a slow combining step. i386 used to be faster here -- the first masking step of discarding %edx doesn't take any code. amd64 has to mask out the top bits in %rax. Now for the tsc-low pessimization, i386 has to do a slow shrd, and amd64 has to do a not so slow shr. Then the '& tc->tc_counter_mask' step has no effect. All this is wrapped in many layers of function calls which are quite slow but this lets the other operations run in parallel on some CPUs. >>>> /* 32-bit arches did the next multiplication implicitly. */ >>>> x = (scale >> 32) * delta; >>>> /* >>>> * And they did the following shifts and most of the adds >>>> * implicitly too. Except shifting x left by 32 lost the >>>> * seconds part that the next line handles. The next line >>>> * is the only extra cost for them. >>>> */ >>>> bt->sec += x >> 32; >>>> bintime_addx(bt, (x << 32) + (scale & 0xffffffff) * delta); >>> >>> Ok, what about the following. >> >> I'm not sure that I really want this, even if the pessimization is done. >> But it avoids using fls*(), so is especially good for 32-bit systems and >> OK for 64-bit systems too, especially in userland where fls*() is in the >> fast path. > For userland I looked at the generated code, and BSR usage seems to be > good enough, for default compilation settings with clang. I use gcc-4.2.1, and it doesn't do this optimization. I already reported this in connection with fixing calcru1(). calcru1() is unnecessarily several times slower on i386 than on amd64 even after avoiding using flsll() on it. The main slowness is in converting 'usec' to tv_sec and tv_usec, due to the bad design and implementation of the __udivdi3 and __umoddi3 libcalls. The bad design is having to make 2 libcalls to get the quotient and remainder. The bad implementation is the portable C version in libkern. libgcc provides a better implementation, but this is not available in the kernel. >>> diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c >>> index 2656fb4d22f..2e28f872229 100644 >>> --- a/sys/kern/kern_tc.c >>> +++ b/sys/kern/kern_tc.c >>> ... >>> @@ -351,17 +352,44 @@ fbclock_getmicrotime(struct timeval *tvp) >>> } while (gen == 0 || gen != th->th_generation); >>> } >>> #else /* !FFCLOCK */ >>> + >>> +static void >>> +bintime_helper(struct bintime *bt, uint64_t *scale, u_int delta) >>> +{ >>> + uint64_t x; >>> + >>> + x = (*scale >> 32) * delta; >>> + *scale &= 0xffffffff; >>> + bt->sec += x >> 32; >>> + bintime_addx(bt, x << 32); >>> +} >> >> It is probably best to not inline the slow path, but clang tends to >> inline everything anyway. > It does not matter if it inlines it, as far as it is moved out of the > linear sequence for the fast path. >> >> I prefer my way of writing this in 3 lines. Modifying 'scale' for >> the next step is especially ugly and pessimal when the next step is >> in the caller and this function is not inlined. > Can you show exactly what do you want ? Just write 'scale & 0xffffffff' for the low bits of 'scale' in callers, and don't pass 'scale' indirectly to bintime_helper() and don't modify it there. Oops, there is a problem. 'scale' must be reduced iff bintime_helper() was used. Duplicate some source code so as to not need a fall-through to the fast path. See below. >>> void >>> binuptime(struct bintime *bt) >>> { >>> struct timehands *th; >>> - u_int gen; >>> + uint64_t scale; >>> + u_int delta, gen; >>> >>> do { >>> th = timehands; >>> gen = atomic_load_acq_int(&th->th_generation); >>> *bt = th->th_offset; >>> - bintime_addx(bt, th->th_scale * tc_delta(th)); >>> + scale = th->th_scale; >>> + delta = tc_delta(th); >>> +#ifdef _LP64 >>> + /* Avoid overflow for scale * delta. */ >>> + if (__predict_false(th->th_large_delta <= delta)) >>> + bintime_helper(bt, &scale, delta); >>> + bintime_addx(bt, scale * delta); >>> +#else >>> + /* >>> + * Also avoid (uint64_t, uint32_t) -> uint64_t >>> + * multiplication on 32bit arches. >>> + */ >> >> "Also avoid overflow for ..." >> >>> + bintime_helper(bt, &scale, delta); >>> + bintime_addx(bt, (u_int)scale * delta); >> >> The cast should be to uint32_t, but better write it as & 0xffffffff as >> elsewhere. This is actually very broken. The cast gives a 32 x 32 -> 32 bit multiplication, but all 64 bits of the result are needed. >> >> bintime_helper() already reduced 'scale' to 32 bits. The cast might be >> needed to tell the compiler this, especially when the function is not >> inlined. Better not do it in the function. The function doesn't even >> use the reduced value. > I used cast to use 32x32 multiplication. I am not sure that all (or any) > compilers are smart enough to deduce that they can use 32 bit mul. Writing the reduction to 32 bits using a mask instead of a cast automatically avoids the bug, but might not give the optimization. They do do this optimization, but might need the cast as well as the mask. At worst, '(uint64_t)(uint32_t)(scale & 0xffffffff)', where the mask is now redundant but the cast back to 64 bits is needed if the cast to 32 bits is used. You already depended on them not needing the cast for the expression '(*scale >> 32) * delta'. Here delta is 32 bits and the other operand must remain 64 bits so that after default promotions the multiplication is 64 x 64 -> 64 bits, but the compiler should optimize this to 32 x 32 -> 64 bits. (*scale >> 32) would need to be cast to 32 bits and then back to 64 bits if the compiler can't do this automatically. I checked what some compilers do. Both gcc-3.3.3 and gcc-4.2.1 optimize only (uint64_t)x * y (where x and y have type uint32_t), so they need to be helped by casts if x and y have have a larger type even if their values obviously fit in 32 bits. So the expressions should be written as: (uint64_t)(uint32_t)(scale >> 32) * delta; and (uint64_t)(uint32_t)scale * delta; The 2 casts are always needed, but the '& 0xffffffff' operation doesn't need to be explicit because the cast does. >> This needs lots of testing of course. > > Current kernel-only part of the change is below, see the question about > your preference for binuptime_helper(). > > diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c > index 2656fb4d22f..6c41ab22288 100644 > --- a/sys/kern/kern_tc.c > +++ b/sys/kern/kern_tc.c > @@ -72,6 +71,7 @@ struct timehands { > struct timecounter *th_counter; > int64_t th_adjustment; > uint64_t th_scale; > + uint64_t th_large_delta; > u_int th_offset_count; > struct bintime th_offset; > struct bintime th_bintime; > @@ -351,17 +351,45 @@ fbclock_getmicrotime(struct timeval *tvp) > } while (gen == 0 || gen != th->th_generation); > } > #else /* !FFCLOCK */ > + > +static void Add __inline. This is in the fast path for 32-bit systems. > +bintime_helper(struct bintime *bt, uint64_t *scale, u_int delta) > +{ > + uint64_t x; > + > + x = (*scale >> 32) * delta; > + *scale &= 0xffffffff; Remove the '*' on scale, cast (scale >> 32) to (uint64_t)(uint32_t)(scale >> 32), and remove the change to *scale. > + bt->sec += x >> 32; > + bintime_addx(bt, x << 32); > +} > + > void > binuptime(struct bintime *bt) > { > struct timehands *th; > - u_int gen; > + uint64_t scale; > + u_int delta, gen; > > do { > th = timehands; > gen = atomic_load_acq_int(&th->th_generation); > *bt = th->th_offset; > - bintime_addx(bt, th->th_scale * tc_delta(th)); > + scale = th->th_scale; > + delta = tc_delta(th); > +#ifdef _LP64 > + /* Avoid overflow for scale * delta. */ > + if (__predict_false(th->th_large_delta <= delta)) > + bintime_helper(bt, &scale, delta); > + bintime_addx(bt, scale * delta); Change to: if (__predict_false(th->th_large_delta <= delta)) { bintime_helper(bt, scale, delta); bintime_addx(bt, (scale & 0xffffffff) * delta); } else bintime_addx(bt, scale * delta); > +#else > + /* > + * Avoid both overflow as above and > + * (uint64_t, uint32_t) -> uint64_t > + * multiplication on 32bit arches. > + */ This is a bit unclear. Better emphasize avoidance of the 64 x 32 -> 64 bit multiplication. Something like: /* * Use bintime_helper() unconditionally, since the fast * path in the above method is not so fast here, since * the 64 x 32 -> 64 bit multiplication is usually not * available in hardware and emulating it using 2 * 32 x 32 -> 64 bit multiplications uses code much * like that in bintime_helper(). */ > + bintime_helper(bt, &scale, delta); > + bintime_addx(bt, (uint32_t)scale * delta); > +#endif Remove '&' as usual, and fix this by casting the reduced scale back to 64 bits. Similarly in bintime(). Similarly in libc -- don't use the slow flsll() method in the 32-bit case where it is especially slow. Don't use it in the 64-bit case either, since this would need to be change when th_large_delta is added to the API. Now I don't like my method in the kernel. It is is unnecessarily complicated to have a specal case, and not faster either. Bruce From owner-freebsd-hackers@freebsd.org Sun Mar 3 18:29:54 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8A3861513E35; Sun, 3 Mar 2019 18:29:54 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id C60318DC84; Sun, 3 Mar 2019 18:29:53 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id F3142433301; Mon, 4 Mar 2019 05:29:49 +1100 (AEDT) Date: Mon, 4 Mar 2019 05:29:48 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Konstantin Belousov cc: Bruce Evans , Mark Millard , freebsd-hackers Hackers , FreeBSD PowerPC ML Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] In-Reply-To: <20190303161635.GJ68879@kib.kiev.ua> Message-ID: <20190304043416.V5640@besplex.bde.org> References: <20190301112717.GW2420@kib.kiev.ua> <20190302043936.A4444@besplex.bde.org> <20190301194217.GB68879@kib.kiev.ua> <20190302071425.G5025@besplex.bde.org> <20190302105140.GC68879@kib.kiev.ua> <20190302225513.W3408@besplex.bde.org> <20190302142521.GE68879@kib.kiev.ua> <20190303041441.V4781@besplex.bde.org> <20190303111931.GI68879@kib.kiev.ua> <20190303223100.B3572@besplex.bde.org> <20190303161635.GJ68879@kib.kiev.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=8yM2XH24hrI5ozH3vLgA:9 a=CjuIK1q_8ugA:10 X-Rspamd-Queue-Id: C60318DC84 X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-6.97 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-0.97)[-0.973,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; REPLY(-4.00)[] X-Mailman-Approved-At: Sun, 03 Mar 2019 22:45:01 +0000 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Mar 2019 18:29:54 -0000 On Sun, 3 Mar 2019, Konstantin Belousov wrote: > On Mon, Mar 04, 2019 at 12:32:12AM +1100, Bruce Evans wrote: >> On Sun, 3 Mar 2019, Konstantin Belousov wrote: >> >>> On Sun, Mar 03, 2019 at 04:43:20AM +1100, Bruce Evans wrote: >>>> On Sat, 2 Mar 2019, Konstantin Belousov wrote: >>>> >>>>> On Sun, Mar 03, 2019 at 12:03:18AM +1100, Bruce Evans wrote: >>>>>> On Sat, 2 Mar 2019, Konstantin Belousov wrote: > * ... >>>> Yes, that was its point. It is a bit annoying to have a hardware >>>> timecounter like the TSC that doesn't wrap naturally, but then make it >>>> wrap by masking high bits. >>>> >>>> The masking step is also a bit wasteful. For the TSC, it is 1 step to >>>> discard high bids at the register level, then another step to apply the >>>> nask to discard th high bits again. >>> rdtsc-low is implemented in the natural way, after RDTSC, no register >>> combining into 64bit value is done, instead shrd operates on %edx:%eax >>> to get the final result into %eax. I am not sure what you refer to. >> >> I was referring mostly to the masking step '& tc->tc_counter_mask' and >> the lack of register combining in rdtsc(). >> >> However, shrd in rdtsc-low (tsc_get_timecount_low()) does a slow combining >> step. i386 used to be faster here -- the first masking step of discarding >> %edx doesn't take any code. amd64 has to mask out the top bits in %rax. >> Now for the tsc-low pessimization, i386 has to do a slow shrd, and amd64 >> has to do a not so slow shr. > i386 cannot discard %edx after RDTSC since some bits from %edx come into > the timecounter value. These bits are part of the tsc-low pessimization. The shift count should always be 1, giving a TSC frequency of > INT32_MAX (usually) and > UINT32_MAX sometimes. When tsc-low was new, the shift count was often larger (as much as 8), and it is still changeable by a read-only tunable, but now it is 1 in almost all cases. The code only limits the timecounter frequency to UINT_MAX, except the tunable defaults to 1 so average CPUs running at nearly 4 GHz are usually limited to about 2 GHz. The comment about this UINT_MAX doesn't match the code. The comment says int, but the code says UINT. All that a shoft count of 1 does is waste time to lose 1 bit of accuracy. This much accuracy is noise for most purposes. The tunable is fairly undocumented. Its description is "Shift to apply for the maximum TSC frequency". Of course, it has no effect on the TSC frequency. It only affects the TSC timecounter frequency. The cputicker normally uses the TSC without even an lfence. This use only has to be monotonic per-CPU, so this is OK. Also, any bugs hidden by discarding low bits shouldn't show up per-CPU. However, keeping the cputicker below 4G actually has some efficiency advantages. For timecounters, there are no multiplications or divisions by the frequency in the fast path, but cputicker use isn't so optimized and it does a slow 64-bit division in cputick2usec(). Keeping cpu_tick_freqency below UINT_MAX allows dividing by it in integer arithmetic in some cases, This optimization is not done. > amd64 cannot either, but amd64 does not need to mask out top bits in %rax, > since the whole shrdl calculation occurs in 32bit registers, and the result > is in %rax where top word is cleared by shrdl instruction automatically. > But the clearing is not required since result is unsigned int anyway. > > Dissassemble of tsc_get_timecount_low() is very clear: > 0xffffffff806767e4 <+4>: mov 0x30(%rdi),%ecx > 0xffffffff806767e7 <+7>: rdtsc > 0xffffffff806767e9 <+9>: shrd %cl,%edx,%eax > ... > 0xffffffff806767ed <+13>: retq > (I removed frame manipulations). It would without the shift pessimization, since the function returns uint32_t but rdtsc() gives uint64_t. Removing the top bits is not needed since tc_delta() removes them again, but the API doesn't allow expressing this. Without the shift pessimization, we just do rdtsc() in all cases and don't need this function call. I think this is about 5-10 cycles faster after some parallelism. >>>> I prefer my way of writing this in 3 lines. Modifying 'scale' for >>>> the next step is especially ugly and pessimal when the next step is >>>> in the caller and this function is not inlined. >>> Can you show exactly what do you want ? >> >> Just write 'scale & 0xffffffff' for the low bits of 'scale' in callers, >> and don't pass 'scale' indirectly to bintime_helper() and don't modify >> it there. >> >> Oops, there is a problem. 'scale' must be reduced iff bintime_helper() >> was used. Duplicate some source code so as to not need a fall-through >> to the fast path. See below. > Yes, this is the reason why it is passed by pointer (C has no references). The indirection is slow no matter how it is spelled, unless it is inlined away. >>> diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c >>> index 2656fb4d22f..6c41ab22288 100644 >>> --- a/sys/kern/kern_tc.c >>> +++ b/sys/kern/kern_tc.c >>> @@ -72,6 +71,7 @@ struct timehands { >>> struct timecounter *th_counter; >>> int64_t th_adjustment; >>> uint64_t th_scale; >>> + uint64_t th_large_delta; >>> u_int th_offset_count; >>> struct bintime th_offset; >>> struct bintime th_bintime; >>> @@ -351,17 +351,45 @@ fbclock_getmicrotime(struct timeval *tvp) >>> } while (gen == 0 || gen != th->th_generation); >>> } >>> #else /* !FFCLOCK */ >>> + >>> +static void >> >> Add __inline. This is in the fast path for 32-bit systems. > Compilers do not need this hand-holding, and I prefer to avoid __inline > unless really necessary. I checked with both clang 7.0 and gcc 8.3 > that autoinlining did occured. But they do. I don't use either of these compilers, and turn of inlining as much as possible anyway using -fno-inline -fno-inline-functions-called- once (this is very broken in clang -- -fno-inline turns off inlining of even functions declared as __inline (like curthread), and clang doesn't support -fno-inline -fno-inline-functions-called-once. >> ... >> Similarly in bintime(). > I merged two functions, finally. Having to copy the same code is too > annoying for this change. > > So I verified that: > - there is no 64bit multiplication in the generated code, for i386 both > for clang 7.0 and gcc 8.3; > - that everything is inlined, the only call from bintime/binuptime is > the indirect call to get the timecounter value. I will have to fix it for compilers that I use. > diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c > index 2656fb4d22f..0fd39e25058 100644 > --- a/sys/kern/kern_tc.c > +++ b/sys/kern/kern_tc.c + ... > +static void > +binnouptime(struct bintime *bt, u_int off) > { > struct timehands *th; > - u_int gen; > + struct bintime *bts; > + uint64_t scale; > + u_int delta, gen; > > do { > th = timehands; > gen = atomic_load_acq_int(&th->th_generation); > - *bt = th->th_offset; > - bintime_addx(bt, th->th_scale * tc_delta(th)); > + bts = (struct bintime *)(vm_offset_t)th + off; I don't like the merging. It obscures the code with conversions like this. > + *bt = *bts; > + scale = th->th_scale; > + delta = tc_delta(th); > +#ifdef _LP64 > + if (__predict_false(th->th_large_delta <= delta)) { > + /* Avoid overflow for scale * delta. */ > + bintime_helper(bt, scale, delta); > + bintime_addx(bt, (scale & 0xffffffff) * delta); > + } else { > + bintime_addx(bt, scale * delta); > + } > +#else > + /* > + * Use bintime_helper() unconditionally, since the fast > + * path in the above method is not so fast here, since > + * the 64 x 32 -> 64 bit multiplication is usually not > + * available in hardware and emulating it using 2 > + * 32 x 32 -> 64 bit multiplications uses code much > + * like that in bintime_helper(). > + */ > + bintime_helper(bt, scale, delta); > + bintime_addx(bt, (uint64_t)(uint32_t)scale * delta); > +#endif Check that this method is really better. Without this, the complicated part is about half as large and duplicating it is smaller than this version. > @@ -387,16 +430,8 @@ microuptime(struct timeval *tvp) > void > bintime(struct bintime *bt) > { > - struct timehands *th; > - u_int gen; > > - do { > - th = timehands; > - gen = atomic_load_acq_int(&th->th_generation); > - *bt = th->th_bintime; > - bintime_addx(bt, th->th_scale * tc_delta(th)); > - atomic_thread_fence_acq(); > - } while (gen == 0 || gen != th->th_generation); Duplicating this loop is much better than obfuscating it using inline functions. This loop was almost duplicated (except for the delta calculation) in no less than 17 functions in kern_tc.c (9 tc ones and 8 fflock ones). Now it is only duplicated 16 times. > + binnouptime(bt, __offsetof(struct timehands, th_bintime)); > } > > void Bruce From owner-freebsd-hackers@freebsd.org Mon Mar 4 09:40:32 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7E6FE1509777 for ; Mon, 4 Mar 2019 09:40:32 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic309-20.consmr.mail.ne1.yahoo.com (sonic309-20.consmr.mail.ne1.yahoo.com [66.163.184.146]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D66448FD77 for ; Mon, 4 Mar 2019 09:40:29 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: l_LH1.sVM1kq1w61EeTqzGXhGfMI5E8B25qx4egeP4xLq0SeNmBAQYIGYRTjCb3 D.pJ7qyIX2j6HvKkuC_l9huJ6ImJfvdmUT75tYr0FfCDXz5zb3yeqzfDzKADbRxu2JMG4Y7oDZVw bfUBjZawDv5PUm.Dt0gECgyLSRl4yPBGTXnhiqjYQt9_Qhl2CiUyBiPkCZx8sR98onE84H5FbYBk 8o508qJc3I7ADmQfprKhnWl9mGrRnbvTN60LBINS46IXxXIMBXHaK8qJoPrd2mn7KdiLbxCY0bIq H.rLzlvKMFULcGwLrVGn3SqR4mTMqzPzvUH8fSgF5Jq.5Ntrm88CR9jG4VNm7GOH.OjdDACkP1rN poSBtZqLH3Ne5I81HCAcc10YgKqfDV3QPc_LT9zSl.F5qYr0uL9A1AlCrrIavaXUCZJQyT_z8Uqv 56fe2Ugm4elEc3MD7XgKXFKZRF5zHgVtMugLMCuyvnw2DIR14FHzk9vN8b0YBIuG8ys7vQb8oKav HnTQ4yko6I4eWqKEKjnOz07tvF409LKHxuhPHl0Ga42kyLTsFLmMcjWDkLqjisv2yc.hyUbA1re. lPTJa9o1u6XODGfre2ypNFW5ftObUmR..CjOciWoWp6QZO.odCOSS_cuxkaBqoE18OrCFSmUvU.i GvSHD__0vzSYwN_O0NJiz3iHHX1JhZAeHu1u87fs77hL4kgMf2KYdL9DuHGg8YXsVU4X7n.goMBb O95iSehu1BA4OqrU_Vfku8MzRXGYI09Kk3w9W4z0XnKqi_PrDnJ8Bex6tGlREL5crikAfPzjXZ5D 3TJsS6bZzcc_pKQ6ky9fl09I63EZEpRZzUo_S6nm71aoH69lQlcfhJQ3O7i0n80u0SpKBJlfxE5N KhOcQPDur7Tx_QkV6r5zR9004OOmSctivkWo6hLOzCCY046.8u5jD9djBp1_1.8du51oKFimrwEg bZnbqTxPeOQVebYBEa6kccai0C1SP82retwGWuE_unmdvLmO2jTBeOs13a05CHOJkYu7yJdiMeqf 3tAP9LCQ7tdvDaYGVgwvQ8XXFpHpoX69_QnOJmwz4CA-- Received: from sonic.gate.mail.ne1.yahoo.com by sonic309.consmr.mail.ne1.yahoo.com with HTTP; Mon, 4 Mar 2019 09:40:23 +0000 Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.115]) ([67.170.167.181]) by smtp413.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID d5eeb14818ac1606459c94027e379899; Mon, 04 Mar 2019 09:40:19 +0000 (UTC) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: Re: powerpc64 on PowerMac G5 4-core (system total): a hack that so far seem to avoid the stuck-sleeping issue [self-hosted buildworld/buildkernel completed] Date: Mon, 4 Mar 2019 01:40:18 -0800 References: <76E8BF75-A8F5-4A48-9B7C-6494F4A9520B@yahoo.com> To: FreeBSD PowerPC ML , Mark Millard via freebsd-hackers In-Reply-To: <76E8BF75-A8F5-4A48-9B7C-6494F4A9520B@yahoo.com> Message-Id: <75A8BB07-3273-423E-9436-798395BC8640@yahoo.com> X-Mailer: Apple Mail (2.3445.102.3) X-Rspamd-Queue-Id: D66448FD77 X-Spamd-Bar: +++ X-Spamd-Result: default: False [3.37 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net]; DKIM_TRACE(0.00)[yahoo.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36646, ipnet:66.163.184.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; NEURAL_SPAM_SHORT(0.96)[0.960,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(1.28)[ip: (4.16), ipnet: 66.163.184.0/21(1.29), asn: 36646(1.04), country: US(-0.07)]; NEURAL_SPAM_MEDIUM(0.75)[0.754,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(0.89)[0.886,0]; RCVD_IN_DNSWL_NONE(0.00)[146.184.163.66.list.dnswl.org : 127.0.5.0] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2019 09:40:32 -0000 [I did some testing of other figures than testing for < 0x10.] On 2019-Mar-3, at 13:23, Mark Millard wrote: > [So far the hack has been successful. Details given later > below.] >=20 > On 2019-Mar-2, at 21:20, Mark Millard wrote: >=20 >> [This note goes in a different direction compared to my >> prior evidence report for overflows and the later activity >> that has been happening for it. This does *not* involve >> the patches associated with that report.] >>=20 >> I view the following as an evidence-gathering hack: >> showing the change in behavior with the code changes, >> not as directly what FreeBSD should do for powerpc64. >> In code for defined(__powerpc64__) && defined(AIM) >> I freely use knowledge of the PowerMac G5 context >> instead of attempting general code. >>=20 >> Also: the code is set up to record some information >> that I've been looking at via ddb. The recording is >> not part of what changes the behavior but I decided >> to show that code too. >>=20 >> It is preliminary, but, so far, the hack has avoided >> buf*daemon* threads and pmac_thermal getting stuck >> sleeping (or, at least, far less frequently). >>=20 >>=20 >> The tbr-value hack: >>=20 >> =46rom what I see the G5 various cores have each tbr running at the >> same rate but have some some offsets as far as the base time >> goes. cpu_mp_unleash does: >>=20 >> ap_awake =3D 1; >>=20 >> /* Provide our current DEC and TB values for APs */ >> ap_timebase =3D mftb() + 10; >> __asm __volatile("msync; isync"); >>=20 >> /* Let APs continue */ >> atomic_store_rel_int(&ap_letgo, 1); >>=20 >> platform_smp_timebase_sync(ap_timebase, 0); >>=20 >> and machdep_ap_bootstrap does: >>=20 >> /* >> * Set timebase as soon as possible to meet an implicit = rendezvous >> * from cpu_mp_unleash(), which sets ap_letgo and then = immediately >> * sets timebase. >> * >> * Note that this is instrinsically racy and is only relevant = on >> * platforms that do not support better mechanisms. >> */ >> platform_smp_timebase_sync(ap_timebase, 1); >>=20 >>=20 >> which attempts to set the tbrs appropriately. >>=20 >> But on small scales of differences the various tbr >> values from different cpus end up not well ordered >> relative to time, synchronizes with, and the like. >> Only large enough differences can well indicate an >> ordering of interest. >>=20 >> Note: tc->tc_get_timecount(tc) only provides the >> least signficant 32 bits of the tbr value. >> th->th_offset_count is also 32 bits and based on >> truncated tbr values. >>=20 >> So I made binuptime avoid finishing when it sees >> a small (<0x10) step backwards for a new >> tc->tc_get_timecount(tc) value vs. the existing >> th->th_offset_count value (values strongly tied >> to powerpc64 tbr values): >>=20 >> void >> binuptime(struct bintime *bt) >> { >> struct timehands *th; >> u_int gen; >>=20 >> struct bintime old_bt=3D *bt; // HACK!!! >> struct timecounter *tc; // HACK!!! >> u_int tim_cnt, tim_offset, tim_diff; // HACK!!! >> uint64_t freq, scale_factor, diff_scaled; // HACK!!! >>=20 >> u_int try_cnt=3D 0ull; // HACK!!! >>=20 >> do { >> do { // HACK!!! >> th =3D timehands; >> tc =3D th->th_counter; >> gen =3D atomic_load_acq_int(&th->th_generation); >> tim_cnt=3D tc->tc_get_timecount(tc); >> tim_offset=3D th->th_offset_count; >> } while (tim_cnt> *bt =3D th->th_offset; >> tim_diff=3D (tim_cnt - tim_offset) & = tc->tc_counter_mask; >> scale_factor=3D th->th_scale; >> diff_scaled=3D scale_factor * tim_diff; >> bintime_addx(bt, diff_scaled); >> freq=3D tc->tc_frequency; >> atomic_thread_fence_acq(); >> try_cnt++; >> } while (gen =3D=3D 0 || gen !=3D th->th_generation); >>=20 >> if (*(volatile uint64_t*)0xc000000000000020=3D=3D0u && = (0xffffffffffffffffull/scale_factor)> *(volatile uint64_t*)0xc000000000000020=3D = bttosbt(old_bt); >> *(volatile uint64_t*)0xc000000000000028=3D = bttosbt(*bt); >> *(volatile uint64_t*)0xc000000000000030=3D freq; >> *(volatile uint64_t*)0xc000000000000038=3D = scale_factor; >> *(volatile uint64_t*)0xc000000000000040=3D tim_offset; >> *(volatile uint64_t*)0xc000000000000048=3D tim_cnt; >> *(volatile uint64_t*)0xc000000000000050=3D tim_diff; >> *(volatile uint64_t*)0xc000000000000058=3D try_cnt; >> *(volatile uint64_t*)0xc000000000000060=3D diff_scaled; >> *(volatile uint64_t*)0xc000000000000068=3D = scale_factor*freq; >> __asm__ ("sync"); >> } else if (*(volatile uint64_t*)0xc0000000000000a0=3D=3D0u && = (0xffffffffffffffffull/scale_factor)> *(volatile uint64_t*)0xc0000000000000a0=3D = bttosbt(old_bt); >> *(volatile uint64_t*)0xc0000000000000a8=3D = bttosbt(*bt); >> *(volatile uint64_t*)0xc0000000000000b0=3D freq; >> *(volatile uint64_t*)0xc0000000000000b8=3D = scale_factor; >> *(volatile uint64_t*)0xc0000000000000c0=3D tim_offset; >> *(volatile uint64_t*)0xc0000000000000c8=3D tim_cnt; >> *(volatile uint64_t*)0xc0000000000000d0=3D tim_diff; >> *(volatile uint64_t*)0xc0000000000000d8=3D try_cnt; >> *(volatile uint64_t*)0xc0000000000000e0=3D diff_scaled; >> *(volatile uint64_t*)0xc0000000000000e8=3D = scale_factor*freq; >> __asm__ ("sync"); >> } >> } >> #else >> . . . >> #endif >>=20 >> So far as I can tell, the FreeBSD code is not designed to deal >> with small differences in tc->tc_get_timecount(tc) not actually >> indicating a useful < vs. =3D=3D vs. > ordering relation uniquely. >>=20 >> (I make no claim that the hack is a proper way to deal with >> such.) >=20 > I did a somewhat over 7 hours buildworld buildkernel on the > PowerMac G5. Overall the G5 has been up over 13 hours and > none of the buf*daemon* threads have gotten stuck sleeping. > Nor has pmac_thermal gotten stuck. Similarly for vnlru > and syncer: "top -HIStopid" still shows them all as > periodically active. >=20 > Previously for this usefdt=3D1 context (with the modern > VM_MAX_KERNEL_ADDRESS), going more than a few minutes > without at least one of those threads getting stuck > sleeping was rare on the G5 (powerpc64 example). >=20 > So this hack has managed to avoid finding sbinuptime() > in sleepq_timeout being less than the earlier (by call > structure/code sequencing) sbinuptime() in timercb that > lead to the sleepq_timeout callout being called in the > first place. >=20 > So in the sleepq_timeout callout's: >=20 > if (td->td_sleeptimo > sbinuptime() || td->td_sleeptimo =3D=3D = 0) { > /* > * The thread does not want a timeout (yet). > */ > } else . . . >=20 > td->td_sleeptimo > sbinuptime() ends up false now for small > enough original differences. >=20 > This case does not set up another timeout, it just leaves the > thread stuck sleeping, no longer doing periodic activities. >=20 > As stands what I did (presuming an appropriate definition > of "small differences in the problematical direction") should > leave this and other sbinuptime-using code with: >=20 > td->td_sleeptimo <=3D sbinuptime() >=20 > for what were originally "small" tbr value differences in the > problematical direction (in case other places require it in > some way). >=20 > If, instead, just sleepq_timeout's test could allow for > some slop in the ordering, it could be a cheaper hack then > looping in binuptime . >=20 > At this point I've no clue what a correct/efficient FreeBSD > design for allowing the sloppy match across tbr's for different > CPUs would be. Instead of 0x10 in "&& tim_offset-tim_cnt<0x10" I tried the each of following and they all failed: && tim_offset-tim_cnt<0x2 && tim_offset-tim_cnt<0x4 && tim_offset-tim_cnt<0x8 && tim_offset-tim_cnt<0xc 0x2, 0x4, and 0x8 failed for the first boot attempt, almost mediately having stuck-in-sleep threads. 0xc seemed to be working for the first boot (including a buildworld buildkernel that did not have to rebuild much). But the 2nd boot attempt had a stuck-in-sleep thread by the time I logged in. By contrast, for: && tim_offset-tim_cnt<0x10 I've not it fail so far, after many reboots, a full buildworld buildkernel, and running over 24 hours (that included the somewhat over 7 hours for build world buildkernel). But it might be that some boots would need a bigger figure. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) From owner-freebsd-hackers@freebsd.org Mon Mar 4 11:42:00 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 82D9F150DC4D; Mon, 4 Mar 2019 11:42:00 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C4A1C95862; Mon, 4 Mar 2019 11:41:59 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x24BfplY084864 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 4 Mar 2019 13:41:54 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x24BfplY084864 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id x24BfopB084863; Mon, 4 Mar 2019 13:41:50 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 4 Mar 2019 13:41:50 +0200 From: Konstantin Belousov To: Bruce Evans Cc: Mark Millard , freebsd-hackers Hackers , FreeBSD PowerPC ML Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] Message-ID: <20190304114150.GM68879@kib.kiev.ua> References: <20190301194217.GB68879@kib.kiev.ua> <20190302071425.G5025@besplex.bde.org> <20190302105140.GC68879@kib.kiev.ua> <20190302225513.W3408@besplex.bde.org> <20190302142521.GE68879@kib.kiev.ua> <20190303041441.V4781@besplex.bde.org> <20190303111931.GI68879@kib.kiev.ua> <20190303223100.B3572@besplex.bde.org> <20190303161635.GJ68879@kib.kiev.ua> <20190304043416.V5640@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190304043416.V5640@besplex.bde.org> User-Agent: Mutt/1.11.3 (2019-02-01) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2019 11:42:00 -0000 On Mon, Mar 04, 2019 at 05:29:48AM +1100, Bruce Evans wrote: > On Sun, 3 Mar 2019, Konstantin Belousov wrote: > > > On Mon, Mar 04, 2019 at 12:32:12AM +1100, Bruce Evans wrote: > >> On Sun, 3 Mar 2019, Konstantin Belousov wrote: > >> > >>> On Sun, Mar 03, 2019 at 04:43:20AM +1100, Bruce Evans wrote: > >>>> On Sat, 2 Mar 2019, Konstantin Belousov wrote: > >>>> > >>>>> On Sun, Mar 03, 2019 at 12:03:18AM +1100, Bruce Evans wrote: > >>>>>> On Sat, 2 Mar 2019, Konstantin Belousov wrote: > > * ... > >>>> Yes, that was its point. It is a bit annoying to have a hardware > >>>> timecounter like the TSC that doesn't wrap naturally, but then make it > >>>> wrap by masking high bits. > >>>> > >>>> The masking step is also a bit wasteful. For the TSC, it is 1 step to > >>>> discard high bids at the register level, then another step to apply the > >>>> nask to discard th high bits again. > >>> rdtsc-low is implemented in the natural way, after RDTSC, no register > >>> combining into 64bit value is done, instead shrd operates on %edx:%eax > >>> to get the final result into %eax. I am not sure what you refer to. > >> > >> I was referring mostly to the masking step '& tc->tc_counter_mask' and > >> the lack of register combining in rdtsc(). > >> > >> However, shrd in rdtsc-low (tsc_get_timecount_low()) does a slow combining > >> step. i386 used to be faster here -- the first masking step of discarding > >> %edx doesn't take any code. amd64 has to mask out the top bits in %rax. > >> Now for the tsc-low pessimization, i386 has to do a slow shrd, and amd64 > >> has to do a not so slow shr. > > i386 cannot discard %edx after RDTSC since some bits from %edx come into > > the timecounter value. > > These bits are part of the tsc-low pessimization. The shift count should > always be 1, giving a TSC frequency of > INT32_MAX (usually) and > UINT32_MAX > sometimes. > > When tsc-low was new, the shift count was often larger (as much as 8), > and it is still changeable by a read-only tunable, but now it is 1 in > almost all cases. The code only limits the timecounter frequency > to UINT_MAX, except the tunable defaults to 1 so average CPUs running > at nearly 4 GHz are usually limited to about 2 GHz. The comment about > this UINT_MAX doesn't match the code. The comment says int, but the > code says UINT. > > All that a shoft count of 1 does is waste time to lose 1 bit of accuracy. > This much accuracy is noise for most purposes. > > The tunable is fairly undocumented. Its description is "Shift to apply > for the maximum TSC frequency". Of course, it has no effect on the TSC > frequency. It only affects the TSC timecounter frequency. I suspect that the shift of 1 (at least) hides cross-socket inaccuracy. Otherwise, I think, some multi-socket machines would start showing the detectable backward-counting bintime(). At the frequencies at 4GHz and above (Intel has 5Ghz part numbers) I do not think that stability of 100MHz crystall and on-board traces is enough to avoid that. We can try to set the tsc-low shift count to 0 (but keep lfence) and see what is going on in HEAD, but I am afraid that the HEAD users population is not representative enough to catch the issue with the certainity. More, it is unclear to me how to diagnose the cause, e.g. I would expect the sleeps to hang on timeouts, as was reported from the very beginning of this thread. How would we root-cause it ? > > The cputicker normally uses the TSC without even an lfence. This use > only has to be monotonic per-CPU, so this is OK. Also, any bugs hidden > by discarding low bits shouldn't show up per-CPU. However, keeping > the cputicker below 4G actually has some efficiency advantages. For > timecounters, there are no multiplications or divisions by the frequency > in the fast path, but cputicker use isn't so optimized and it does a > slow 64-bit division in cputick2usec(). Keeping cpu_tick_freqency > below UINT_MAX allows dividing by it in integer arithmetic in some cases, > This optimization is not done. > > > amd64 cannot either, but amd64 does not need to mask out top bits in %rax, > > since the whole shrdl calculation occurs in 32bit registers, and the result > > is in %rax where top word is cleared by shrdl instruction automatically. > > But the clearing is not required since result is unsigned int anyway. > > > > Dissassemble of tsc_get_timecount_low() is very clear: > > 0xffffffff806767e4 <+4>: mov 0x30(%rdi),%ecx > > 0xffffffff806767e7 <+7>: rdtsc > > 0xffffffff806767e9 <+9>: shrd %cl,%edx,%eax > > ... > > 0xffffffff806767ed <+13>: retq > > (I removed frame manipulations). > > It would without the shift pessimization, since the function returns uint32_t > but rdtsc() gives uint64_t. Removing the top bits is not needed since > tc_delta() removes them again, but the API doesn't allow expressing this. > > Without the shift pessimization, we just do rdtsc() in all cases and don't > need this function call. I think this is about 5-10 cycles faster after > some parallelism. > > >>>> I prefer my way of writing this in 3 lines. Modifying 'scale' for > >>>> the next step is especially ugly and pessimal when the next step is > >>>> in the caller and this function is not inlined. > >>> Can you show exactly what do you want ? > >> > >> Just write 'scale & 0xffffffff' for the low bits of 'scale' in callers, > >> and don't pass 'scale' indirectly to bintime_helper() and don't modify > >> it there. > >> > >> Oops, there is a problem. 'scale' must be reduced iff bintime_helper() > >> was used. Duplicate some source code so as to not need a fall-through > >> to the fast path. See below. > > Yes, this is the reason why it is passed by pointer (C has no references). > > The indirection is slow no matter how it is spelled, unless it is inlined > away. > > >>> diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c > >>> index 2656fb4d22f..6c41ab22288 100644 > >>> --- a/sys/kern/kern_tc.c > >>> +++ b/sys/kern/kern_tc.c > >>> @@ -72,6 +71,7 @@ struct timehands { > >>> struct timecounter *th_counter; > >>> int64_t th_adjustment; > >>> uint64_t th_scale; > >>> + uint64_t th_large_delta; > >>> u_int th_offset_count; > >>> struct bintime th_offset; > >>> struct bintime th_bintime; > >>> @@ -351,17 +351,45 @@ fbclock_getmicrotime(struct timeval *tvp) > >>> } while (gen == 0 || gen != th->th_generation); > >>> } > >>> #else /* !FFCLOCK */ > >>> + > >>> +static void > >> > >> Add __inline. This is in the fast path for 32-bit systems. > > Compilers do not need this hand-holding, and I prefer to avoid __inline > > unless really necessary. I checked with both clang 7.0 and gcc 8.3 > > that autoinlining did occured. > > But they do. I don't use either of these compilers, and turn of inlining > as much as possible anyway using -fno-inline -fno-inline-functions-called- > once (this is very broken in clang -- -fno-inline turns off inlining of > even functions declared as __inline (like curthread), and clang doesn't > support -fno-inline -fno-inline-functions-called-once. > > >> ... > >> Similarly in bintime(). > > I merged two functions, finally. Having to copy the same code is too > > annoying for this change. > > > > So I verified that: > > - there is no 64bit multiplication in the generated code, for i386 both > > for clang 7.0 and gcc 8.3; > > - that everything is inlined, the only call from bintime/binuptime is > > the indirect call to get the timecounter value. > > I will have to fix it for compilers that I use. Ok, I will add __inline. > > > diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c > > index 2656fb4d22f..0fd39e25058 100644 > > --- a/sys/kern/kern_tc.c > > +++ b/sys/kern/kern_tc.c > + ... > > +static void > > +binnouptime(struct bintime *bt, u_int off) > > { > > struct timehands *th; > > - u_int gen; > > + struct bintime *bts; > > + uint64_t scale; > > + u_int delta, gen; > > > > do { > > th = timehands; > > gen = atomic_load_acq_int(&th->th_generation); > > - *bt = th->th_offset; > > - bintime_addx(bt, th->th_scale * tc_delta(th)); > > + bts = (struct bintime *)(vm_offset_t)th + off; > > I don't like the merging. It obscures the code with conversions like this. > > > + *bt = *bts; > > + scale = th->th_scale; > > + delta = tc_delta(th); > > +#ifdef _LP64 > > + if (__predict_false(th->th_large_delta <= delta)) { > > + /* Avoid overflow for scale * delta. */ > > + bintime_helper(bt, scale, delta); > > + bintime_addx(bt, (scale & 0xffffffff) * delta); > > + } else { > > + bintime_addx(bt, scale * delta); > > + } > > +#else > > + /* > > + * Use bintime_helper() unconditionally, since the fast > > + * path in the above method is not so fast here, since > > + * the 64 x 32 -> 64 bit multiplication is usually not > > + * available in hardware and emulating it using 2 > > + * 32 x 32 -> 64 bit multiplications uses code much > > + * like that in bintime_helper(). > > + */ > > + bintime_helper(bt, scale, delta); > > + bintime_addx(bt, (uint64_t)(uint32_t)scale * delta); > > +#endif > > Check that this method is really better. Without this, the complicated > part is about half as large and duplicating it is smaller than this > version. Better in what sence ? I am fine with the C code, and asm code looks good. > > > @@ -387,16 +430,8 @@ microuptime(struct timeval *tvp) > > void > > bintime(struct bintime *bt) > > { > > - struct timehands *th; > > - u_int gen; > > > > - do { > > - th = timehands; > > - gen = atomic_load_acq_int(&th->th_generation); > > - *bt = th->th_bintime; > > - bintime_addx(bt, th->th_scale * tc_delta(th)); > > - atomic_thread_fence_acq(); > > - } while (gen == 0 || gen != th->th_generation); > > Duplicating this loop is much better than obfuscating it using inline > functions. This loop was almost duplicated (except for the delta > calculation) in no less than 17 functions in kern_tc.c (9 tc ones and > 8 fflock ones). Now it is only duplicated 16 times. How did you counted the 16 ? I can see only 4 instances in the unpatched kern_tc.c, and 3 in patched, but it is 3 and not 1 only because I do not touch ffclock until the patch is finalized. After that, it would be 1 instance for kernel and 1 for userspace. > > > + binnouptime(bt, __offsetof(struct timehands, th_bintime)); > > } > > > > void > > Bruce From owner-freebsd-hackers@freebsd.org Mon Mar 4 15:33:05 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 551BD1516375 for ; Mon, 4 Mar 2019 15:33:05 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-lj1-f173.google.com (mail-lj1-f173.google.com [209.85.208.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3E7A96F6FD for ; Mon, 4 Mar 2019 15:33:04 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-lj1-f173.google.com with SMTP id z20so4678600ljj.10 for ; Mon, 04 Mar 2019 07:33:04 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=0xe9UuPxJzxXPTEcF365k/2KkWtbYwZnhc36PLa+oL0=; b=Q1oPNOY2BLj2kS6KZC7udLvnYDbUxRzF0evzbSF5P+NbUOXky2YMZNQPwVygJVcWw/ Tz6giC0ng+XM3N6tnFo7odO2vIivHLDNCgR1VEiY0I0ZgkjcchXgGo5znNkzC2eCTYFi UiYRTIFoEaylJ01VuMEF5E7GdfB8BUriGVQzyKndIx/fGMctc6w+lCWAKsT/brEGzPWO hGyotD2f8sv3juV4bHmpNYPIpTdgqby0S0gZ/qZfM43Nc8CtZUn0L8etTMvsdFczmlop Qv3BZ3ItmD8mgjBxSY7RX2993bXqFGkTPtwShmnLiavf4XxOPBHc9SCeQWbvyG6mhpKy kunA== X-Gm-Message-State: APjAAAWN6p3FMl1LuAV4+wWPxsnvZw2b769NmuEsCbJy8IEO6LjXHLl3 1acb/AC00lE6vPCmv7OHFP8K5HhpTRzHAZKxCeo= X-Google-Smtp-Source: APXvYqxAJYg9J3WRZbRBomVy7W4CKuzLu4zrUozWbU0V6+Gc8tmQ6XTzvhBp6/3+/AQzCdr7BnipKwcdqReX8+hDnmE= X-Received: by 2002:a2e:1510:: with SMTP id s16mr10965232ljd.62.1551713078715; Mon, 04 Mar 2019 07:24:38 -0800 (PST) MIME-Version: 1.0 References: <20190303110346.GH68879@kib.kiev.ua> In-Reply-To: <20190303110346.GH68879@kib.kiev.ua> From: Alan Somers Date: Mon, 4 Mar 2019 08:24:27 -0700 Message-ID: Subject: Re: Adding namecache entries outside of vfs_lookup and vn_open ? To: Konstantin Belousov Cc: FreeBSD Hackers Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 3E7A96F6FD X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of asomers@gmail.com designates 209.85.208.173 as permitted sender) smtp.mailfrom=asomers@gmail.com X-Spamd-Result: default: False [-3.26 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.99)[-0.994,0]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:209.85.128.0/17]; IP_SCORE(-1.29)[ip: (-0.51), ipnet: 209.85.128.0/17(-3.84), asn: 15169(-2.03), country: US(-0.07)]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org]; DMARC_NA(0.00)[freebsd.org]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_TRACE(0.00)[0:+]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: alt3.gmail-smtp-in.l.google.com]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[173.208.85.209.list.dnswl.org : 127.0.5.0]; NEURAL_HAM_SHORT(-0.97)[-0.970,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_TLS_LAST(0.00)[]; FORGED_SENDER(0.30)[asomers@freebsd.org,asomers@gmail.com]; FREEMAIL_TO(0.00)[gmail.com]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US]; FROM_NEQ_ENVFROM(0.00)[asomers@freebsd.org,asomers@gmail.com]; SUBJECT_ENDS_QUESTION(1.00)[]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2019 15:33:05 -0000 On Sun, Mar 3, 2019 at 4:03 AM Konstantin Belousov wrote: > > On Sat, Mar 02, 2019 at 06:02:06PM -0700, Alan Somers wrote: > > It looks like lookup and open are the only common vops that create new > > namecache entries. At least, those are the only ones that set > > MAKEENTRY in the cn_flags field. However, fuse(4)'s create-like > > operations (FUSE_CREATE, FUSE_SYMLINK, etc) all return enough > > information to create a namecache entry for the newly created file. > > As-is, an operation like FUSE_CREATE will almost always be followed up > > by a FUSE_LOOKUP, necessitating an extra round-trip to userland. > In VFS, creation of the new file is done by VOP_CREATE() after negative > VOP_LOOKUP(). VOP_CREATE() returns the new vnode that is installed into > file. [A flag VN_OPEN_NAMECACHE was added for vn_open_cred() which results > in created name entry insertion into namecache. It was done to handle > very specific situation in core dump code, which is no longer relevant. > The flag is still there.] > > Similar discussion occured some time ago. I think that the current > selection of the cases where namecache entry is created, is optimized > for the scenario where extracting large tarball does not largely affect > the non-directory elements of the cache. If you do such extraction, > it is unlikely that you will access most of the files shortly. I don't understand this objection. When you extract a tarball full of non-empty files, don't you still need to open every file to write its contents, creating a namecache entry for each one? > > > Would it be possible and wise to add these newly created entries to > > the namecache automatically? > Not from VFS, but the policy can be overriden by the filesystem by inserting > the elements into cache from VOPs as it finds suitable. > > Does FUSE cache vnodes ? I would find aggressive caching on the kernel > side somewhat unexpected for it. > From owner-freebsd-hackers@freebsd.org Mon Mar 4 15:42:24 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BA68D15167E5 for ; Mon, 4 Mar 2019 15:42:24 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E40246FCC3; Mon, 4 Mar 2019 15:42:23 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x24FgCt2067452 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 4 Mar 2019 17:42:15 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x24FgCt2067452 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id x24FgC70067451; Mon, 4 Mar 2019 17:42:12 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 4 Mar 2019 17:42:12 +0200 From: Konstantin Belousov To: Alan Somers Cc: FreeBSD Hackers Subject: Re: Adding namecache entries outside of vfs_lookup and vn_open ? Message-ID: <20190304154212.GP68879@kib.kiev.ua> References: <20190303110346.GH68879@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.11.3 (2019-02-01) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2019 15:42:24 -0000 On Mon, Mar 04, 2019 at 08:24:27AM -0700, Alan Somers wrote: > On Sun, Mar 3, 2019 at 4:03 AM Konstantin Belousov wrote: > > Similar discussion occured some time ago. I think that the current > > selection of the cases where namecache entry is created, is optimized > > for the scenario where extracting large tarball does not largely affect > > the non-directory elements of the cache. If you do such extraction, > > it is unlikely that you will access most of the files shortly. > > I don't understand this objection. When you extract a tarball full of > non-empty files, don't you still need to open every file to write its > contents, creating a namecache entry for each one? No, you don't. Typically, when archiver parsed the stream and noted that there is a file to create with a content, it - opens the file, and gets the file descriptor returned to usermode. Internally, kernel does (vn_open_cred()) namei() <- this call returns no vnode because the file is non-existent, and does not create negative cache entry, see NOCACHE argument for cn_flags. VOP_CREATE() <- creating the file, again not caching assign the vnode returned, to the file - now the process has the descriptor for writes, but namecache entry is still not installed. - content is written, file is closed. From owner-freebsd-hackers@freebsd.org Mon Mar 4 16:07:46 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3D749151843C for ; Mon, 4 Mar 2019 16:07:46 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-lj1-f196.google.com (mail-lj1-f196.google.com [209.85.208.196]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C43F2709AD for ; Mon, 4 Mar 2019 16:07:45 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-lj1-f196.google.com with SMTP id q128so4791366ljb.11 for ; Mon, 04 Mar 2019 08:07:45 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8+1TJBVjbUBvS7tBG9NXOZphS4GQaRY9lmI351kexVE=; b=cokLpiY0xQ3ht8M9pcHKKOCezhWhpWyG94DQApcLqIk6xuasDsPAMjh0k9rjMh4w+6 Dl2ldk6Wcvi5Pc/nitNicwI/z7wzhqt+lO6ot6B5EL5Xc2YV78eO1HguP1YLPL/4dJpN Td+lYK9RjXg0pQQl8tCVdDHJsIVMF8iPS2VwPwazfY0mDpNrJZq1PFvh+p8dD37CEejk kC/SPA697sclZpdm3XcfyA6RcIBIyfQVhlQmhpNCvXPM4pslFIFwLdt3qxTHxbf/wuC4 8VE7JUEg4d+URIVxNd5lbeUcu7IW90y6gQzrQWxGL6hqGK2ZkOSPRbqfK/uC5wv4dtSP LTQw== X-Gm-Message-State: APjAAAXsvSkn6INoWoVyorWZKy1zjZoMtPPfuE/n+ZRL7SCa6IPz3szo H6nm/pBj+U1WKI2oQflzuzZpVbC0HBnlQPEZKX0= X-Google-Smtp-Source: APXvYqxdwyciuOwvPOfSbKNTvvTyw4WTeYotf6KOGaNWO4JZhU9K8JcLg9UicM4KDXJZqzLPEXzsz5IFK+uCLTsIT+s= X-Received: by 2002:a2e:1510:: with SMTP id s16mr11056159ljd.62.1551715195466; Mon, 04 Mar 2019 07:59:55 -0800 (PST) MIME-Version: 1.0 References: <20190303110346.GH68879@kib.kiev.ua> <20190304154212.GP68879@kib.kiev.ua> In-Reply-To: <20190304154212.GP68879@kib.kiev.ua> From: Alan Somers Date: Mon, 4 Mar 2019 08:59:43 -0700 Message-ID: Subject: Re: Adding namecache entries outside of vfs_lookup and vn_open ? To: Konstantin Belousov Cc: FreeBSD Hackers Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: C43F2709AD X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-6.98 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; REPLY(-4.00)[]; NEURAL_HAM_SHORT(-0.98)[-0.985,0] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2019 16:07:46 -0000 On Mon, Mar 4, 2019 at 8:42 AM Konstantin Belousov wrote: > > On Mon, Mar 04, 2019 at 08:24:27AM -0700, Alan Somers wrote: > > On Sun, Mar 3, 2019 at 4:03 AM Konstantin Belousov wrote: > > > Similar discussion occured some time ago. I think that the current > > > selection of the cases where namecache entry is created, is optimized > > > for the scenario where extracting large tarball does not largely affect > > > the non-directory elements of the cache. If you do such extraction, > > > it is unlikely that you will access most of the files shortly. > > > > I don't understand this objection. When you extract a tarball full of > > non-empty files, don't you still need to open every file to write its > > contents, creating a namecache entry for each one? > No, you don't. > > Typically, when archiver parsed the stream and noted that there is a file > to create with a content, it > - opens the file, and gets the file descriptor returned to usermode. > Internally, kernel does (vn_open_cred()) > namei() <- this call returns no vnode because the file is non-existent, > and does not create negative cache entry, see NOCACHE > argument for cn_flags. > VOP_CREATE() <- creating the file, again not caching > assign the vnode returned, to the file > - now the process has the descriptor for writes, but namecache entry is > still not installed. > - content is written, file is closed. Ok, that make sense. So I guess the problem only really applies to filetypes like symlinks that can't create-and-open. But in the tarball case, you wouldn't need to access the symlink again anyway. -Alan From owner-freebsd-hackers@freebsd.org Mon Mar 4 16:07:48 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9C023151844E for ; Mon, 4 Mar 2019 16:07:48 +0000 (UTC) (envelope-from ap00@mail.ru) Received: from smtp16.mail.ru (smtp16.mail.ru [94.100.176.153]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id CF88F709AF for ; Mon, 4 Mar 2019 16:07:46 +0000 (UTC) (envelope-from ap00@mail.ru) Received: by smtp16.mail.ru with esmtpa (envelope-from ) id 1h0q7r-0000lX-OC for freebsd-hackers@freebsd.org; Mon, 04 Mar 2019 19:07:36 +0300 Date: Mon, 4 Mar 2019 19:07:32 +0300 From: Anthony Pankov X-Priority: 3 (Normal) Message-ID: <434119194.20190304190732@mail.ru> To: freebsd-hackers@freebsd.org Subject: building with WITHOUT_SSP side effect MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-77F55803: BBE463BEF7A60BD05A78504BD2AC294173B5FE5E8078F29671F281D99DEE432334F7C2E3C076D04EB2A465EF40EED4CF X-7FA49CB5: 0D63561A33F958A576BBAA4014634BBA13B007AA445EFFC5FE0C38FC6E8DE41D8941B15DA834481FA18204E546F3947C744B801E316CB65FF6B57BC7E64490618DEB871D839B7333395957E7521B51C2545D4CF71C94A83E9FA2833FD35BB23D27C277FBC8AE2E8B3733B5EC72352B9FA471835C12D1D977C4224003CC8364767815B9869FA544D8D32BA5DBAC0009BE9E8FC8737B5C2249309DFB797F6729CB3AA81AA40904B5D9CF19DD082D7633A0E7DDDDC251EA7DABD81D268191BDAD3D78DA827A17800CE73753CEE10E4ED4A7CD04E86FAF290E2D40A5AABA2AD3711975ECD9A6C639B01B78DA827A17800CE777EBE22FC43B5F5CA21B9635CCCA6ACB75ECD9A6C639B01B4E70A05D1297E1BBC6867C52282FAC85D9B7C4F32B44FF57285124B2A10EEC6C00306258E7E6ABB4E4A6367B16DE6309 X-Mailru-Sender: D8D48EF70163D79D00784CDFC8FD3107C6C3F753F0081E4B2BBA2B88EEBD1C1303EE00DEB249E58D50D5CF8590B94F4EC77752E0C033A69E81198BD1A48777B793AC9912533B2342AE208404248635DF X-Mras: OK X-Rspamd-Queue-Id: CF88F709AF X-Spamd-Bar: -- X-Spamd-Result: default: False [-2.47 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:94.100.176.0/20]; FREEMAIL_FROM(0.00)[mail.ru]; TO_DN_NONE(0.00)[]; DKIM_TRACE(0.00)[mail.ru:+]; DMARC_POLICY_ALLOW(-0.50)[mail.ru,reject]; HAS_X_PRIO_THREE(0.00)[3]; MX_GOOD(-0.01)[mxs.mail.ru,mxs.mail.ru]; RCVD_IN_DNSWL_LOW(-0.10)[153.176.100.94.list.dnswl.org : 127.0.5.1]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[mail.ru]; ASN(0.00)[asn:47764, ipnet:94.100.176.0/20, country:RU]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[mail.ru.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.98)[-0.977,0]; R_DKIM_ALLOW(-0.20)[mail.ru:s=mail2]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-1.00)[-0.999,0]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; IP_SCORE(0.03)[ipnet: 94.100.176.0/20(0.08), asn: 47764(0.05), country: RU(0.00)]; NEURAL_SPAM_SHORT(0.59)[0.594,0]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2019 16:07:48 -0000 Greetings, I've builded 11-stable ( 11.2-STABLE r344696) from source with option WITHOUT_SSP="yes" in src.conf. Installing kernel and world was OK. But when I tried to build from port it give me an error: configure: error: in `/usr/ports/ports-mgmt/pkg/work/pkg-1.10.5': configure: error: C compiler cannot create executables config.log: ... configure:3555: cc -v >&5 FreeBSD clang version 7.0.1 (tags/RELEASE_701/final 349250) (based on LLVM 7.0.1) Target: x86_64-unknown-freebsd11.2 ... configure:3608: cc -O2 -pipe -Wno-error -fno-strict-aliasing conftest.c >&5 /usr/bin/ld: cannot find /usr/lib/libssp_nonshared.a cc: error: linker command failed with exit code 1 (use -v to see invocation) And yes, there is SSP_UNSAFE=yes in make.conf Is this a bug or feature? -- Best regards, Anthony Pankov mailto:ap00@mail.ru From owner-freebsd-hackers@freebsd.org Mon Mar 4 16:56:18 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 00FDC1519AF8 for ; Mon, 4 Mar 2019 16:56:18 +0000 (UTC) (envelope-from ap00@mail.ru) Received: from smtp29.i.mail.ru (smtp29.i.mail.ru [94.100.177.89]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8BB8F72DDE for ; Mon, 4 Mar 2019 16:56:15 +0000 (UTC) (envelope-from ap00@mail.ru) Received: by smtp29.i.mail.ru with esmtpa (envelope-from ) id 1h0qsn-0002la-WD for freebsd-hackers@freebsd.org; Mon, 04 Mar 2019 19:56:06 +0300 Date: Mon, 4 Mar 2019 19:56:02 +0300 From: Anthony Pankov X-Priority: 3 (Normal) Message-ID: <1122478880.20190304195602@mail.ru> To: Anthony Pankov via freebsd-hackers Subject: Re: building with WITHOUT_SSP side effect In-Reply-To: <434119194.20190304190732@mail.ru> References: <434119194.20190304190732@mail.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-77F55803: BBE463BEF7A60BD05A78504BD2AC294173B5FE5E8078F296209616C39EACCCB9268608C7137D5E2FF74DF7540681202C X-7FA49CB5: 0D63561A33F958A56A3B061B6E4F86418CD123DFA0DB66FE4A423FDB79F1A4D78941B15DA834481FA18204E546F3947CEDCF5861DED71B2F389733CBF5DBD5E9C8A9BA7A39EFB7666BA297DBC24807EA117882F44604297287769387670735209ECD01F8117BC8BEA471835C12D1D977C4224003CC8364767815B9869FA544D8D32BA5DBAC0009BE9E8FC8737B5C2249668B94F0A65C3A0C3AA81AA40904B5D9CF19DD082D7633A0E7DDDDC251EA7DABD81D268191BDAD3D78DA827A17800CE7542AF255F21831B5CD04E86FAF290E2D40A5AABA2AD3711975ECD9A6C639B01B78DA827A17800CE77FA89C872EA2218695742EC39967965D75ECD9A6C639B01B4E70A05D1297E1BBC6867C52282FAC85D9B7C4F32B44FF57285124B2A10EEC6C00306258E7E6ABB4E4A6367B16DE6309 X-Mailru-Sender: D8D48EF70163D79D00784CDFC8FD31072CC0BE42E31726121943B29DF3553A1CFBA4D9A6C41392ED50D5CF8590B94F4EC77752E0C033A69E81198BD1A48777B793AC9912533B2342AE208404248635DF X-Mras: OK X-Rspamd-Queue-Id: 8BB8F72DDE X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.53 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:94.100.176.0/20]; FREEMAIL_FROM(0.00)[mail.ru]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: mxs.mail.ru]; DKIM_TRACE(0.00)[mail.ru:+]; HAS_X_PRIO_THREE(0.00)[3]; NEURAL_HAM_SHORT(-0.47)[-0.468,0]; DMARC_POLICY_ALLOW(-0.50)[mail.ru,reject]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; RCVD_IN_DNSWL_LOW(-0.10)[89.177.100.94.list.dnswl.org : 127.0.5.1]; ASN(0.00)[asn:47764, ipnet:94.100.176.0/20, country:RU]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[mail.ru.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[mail.ru]; R_DKIM_ALLOW(-0.20)[mail.ru:s=mail2]; NEURAL_HAM_MEDIUM(-0.98)[-0.980,0]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-1.00)[-0.999,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(0.03)[ipnet: 94.100.176.0/20(0.08), asn: 47764(0.05), country: RU(0.00)]; RCPT_COUNT_ONE(0.00)[1]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2019 16:56:18 -0000 It seems that world builded with WITHOUT_SSP=yes loose ability to build anything. # cc -v test.c FreeBSD clang version 7.0.1 (tags/RELEASE_701/final 349250) (based on LLVM 7.0.1) Target: x86_64-unknown-freebsd11.2 Thread model: posix InstalledDir: /usr/bin "/usr/bin/cc" -cc1 -triple x86_64-unknown-freebsd11.2 -emit-obj -mrelax-all -disable-free -disable-llvm-verifier -discard-value-names -main-file-name test.c -mrelocation-model static -mthread-model posix -mdisable-fp-elim -masm-verbose -mconstructor-aliases -munwind-tables -target-cpu x86-64 -dwarf-column-info -debugger-tuning=gdb -v -resource-dir /usr/lib/clang/7.0.1 -fdebug-compilation-dir /root/test -ferror-limit 19 -fmessage-length 90 -fobjc-runtime=gnustep -fdiagnostics-show-option -fcolor-diagnostics -o /tmp/test-d853d1.o -x c test.c -faddrsig clang -cc1 version 7.0.1 based upon LLVM 7.0.1 default target x86_64-unknown-freebsd11.2 #include "..." search starts here: #include <...> search starts here: /usr/lib/clang/7.0.1/include /usr/include End of search list. "/usr/bin/ld" --eh-frame-hdr -dynamic-linker /libexec/ld-elf.so.1 --hash-style=both --enable-new-dtags -o a.out /usr/lib/crt1.o /usr/lib/crti.o /usr/lib/crtbegin.o -L/usr/lib /tmp/test-d853d1.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/crtend.o /usr/lib/crtn.o /usr/bin/ld: cannot find /usr/lib/libssp_nonshared.a > Greetings, > I've builded 11-stable ( 11.2-STABLE r344696) from source with option > WITHOUT_SSP="yes" in src.conf. > Installing kernel and world was OK. But when I tried to build from port it give me an error: > configure: error: in `/usr/ports/ports-mgmt/pkg/work/pkg-1.10.5': > configure: error: C compiler cannot create executables > config.log: > ... > configure:3555: cc -v >&5 > FreeBSD clang version 7.0.1 (tags/RELEASE_701/final 349250) (based on LLVM 7.0.1) > Target: x86_64-unknown-freebsd11.2 > ... > configure:3608: cc -O2 -pipe -Wno-error -fno-strict-aliasing conftest.c >&5 > /usr/bin/ld: cannot find /usr/lib/libssp_nonshared.a > cc: error: linker command failed with exit code 1 (use -v to see invocation) > And yes, there is SSP_UNSAFE=yes in make.conf > Is this a bug or feature? -- Best regards, Anthony Pankov mailto:ap00@mail.ru From owner-freebsd-hackers@freebsd.org Mon Mar 4 17:14:01 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C9EE9151B015 for ; Mon, 4 Mar 2019 17:14:01 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id EA65773A44 for ; Mon, 4 Mar 2019 17:14:00 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x24HDqxr095713 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 4 Mar 2019 19:13:55 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x24HDqxr095713 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id x24HDpZF095712; Mon, 4 Mar 2019 19:13:51 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 4 Mar 2019 19:13:51 +0200 From: Konstantin Belousov To: Anthony Pankov Cc: Anthony Pankov via freebsd-hackers Subject: Re: building with WITHOUT_SSP side effect Message-ID: <20190304171351.GQ68879@kib.kiev.ua> References: <434119194.20190304190732@mail.ru> <1122478880.20190304195602@mail.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1122478880.20190304195602@mail.ru> User-Agent: Mutt/1.11.3 (2019-02-01) X-Spam-Status: No, score=0.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM,FREEMAIL_REPLY, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2019 17:14:02 -0000 On Mon, Mar 04, 2019 at 07:56:02PM +0300, Anthony Pankov via freebsd-hackers wrote: > It seems that world builded with WITHOUT_SSP=yes loose ability to > build anything. > > # cc -v test.c > FreeBSD clang version 7.0.1 (tags/RELEASE_701/final 349250) (based on LLVM 7.0.1) > Target: x86_64-unknown-freebsd11.2 > Thread model: posix > InstalledDir: /usr/bin > "/usr/bin/cc" -cc1 -triple x86_64-unknown-freebsd11.2 -emit-obj -mrelax-all -disable-free -disable-llvm-verifier -discard-value-names -main-file-name test.c -mrelocation-model static -mthread-model posix -mdisable-fp-elim -masm-verbose -mconstructor-aliases -munwind-tables -target-cpu x86-64 -dwarf-column-info -debugger-tuning=gdb -v -resource-dir /usr/lib/clang/7.0.1 -fdebug-compilation-dir /root/test -ferror-limit 19 -fmessage-length 90 -fobjc-runtime=gnustep -fdiagnostics-show-option -fcolor-diagnostics -o /tmp/test-d853d1.o -x c test.c -faddrsig > clang -cc1 version 7.0.1 based upon LLVM 7.0.1 default target x86_64-unknown-freebsd11.2 > #include "..." search starts here: > #include <...> search starts here: > /usr/lib/clang/7.0.1/include > /usr/include > End of search list. > "/usr/bin/ld" --eh-frame-hdr -dynamic-linker /libexec/ld-elf.so.1 --hash-style=both --enable-new-dtags -o a.out /usr/lib/crt1.o /usr/lib/crti.o /usr/lib/crtbegin.o -L/usr/lib /tmp/test-d853d1.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/crtend.o /usr/lib/crtn.o > /usr/bin/ld: cannot find /usr/lib/libssp_nonshared.a It seems that you installed without specifying WITHOUT_SSP, which ended up installing wrong linker script as libc.a. Either create dummy libssp_nonshared.a, or reinstall libc.a (look at lib/libc/Makefile for SHLIB_LDSCRIPT), or reinstall the world. > > > > Greetings, > > > I've builded 11-stable ( 11.2-STABLE r344696) from source with option > > WITHOUT_SSP="yes" in src.conf. > > > Installing kernel and world was OK. But when I tried to build from port it give me an error: > > configure: error: in `/usr/ports/ports-mgmt/pkg/work/pkg-1.10.5': > > configure: error: C compiler cannot create executables > > > config.log: > > ... > > configure:3555: cc -v >&5 > > FreeBSD clang version 7.0.1 (tags/RELEASE_701/final 349250) (based on LLVM 7.0.1) > > Target: x86_64-unknown-freebsd11.2 > > ... > > configure:3608: cc -O2 -pipe -Wno-error -fno-strict-aliasing conftest.c >&5 > > /usr/bin/ld: cannot find /usr/lib/libssp_nonshared.a > > cc: error: linker command failed with exit code 1 (use -v to see invocation) > > > And yes, there is SSP_UNSAFE=yes in make.conf > > > Is this a bug or feature? > > > > > -- > Best regards, > Anthony Pankov mailto:ap00@mail.ru > > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" From owner-freebsd-hackers@freebsd.org Mon Mar 4 17:31:47 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6C179151B86C for ; Mon, 4 Mar 2019 17:31:47 +0000 (UTC) (envelope-from ap00@mail.ru) Received: from smtp39.i.mail.ru (smtp39.i.mail.ru [94.100.177.99]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id EF630745DB for ; Mon, 4 Mar 2019 17:31:46 +0000 (UTC) (envelope-from ap00@mail.ru) Received: by smtp39.i.mail.ru with esmtpa (envelope-from ) id 1h0rRB-0000Vh-0l; Mon, 04 Mar 2019 20:31:37 +0300 Date: Mon, 4 Mar 2019 20:31:33 +0300 From: Anthony Pankov X-Priority: 3 (Normal) Message-ID: <1032136115.20190304203133@mail.ru> To: Konstantin Belousov CC: Anthony Pankov via freebsd-hackers Subject: Re: building with WITHOUT_SSP side effect In-Reply-To: <20190304171351.GQ68879@kib.kiev.ua> References: <434119194.20190304190732@mail.ru> <1122478880.20190304195602@mail.ru> <20190304171351.GQ68879@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=windows-1251 Content-Transfer-Encoding: quoted-printable X-77F55803: 257C4F86AB09C89C5A78504BD2AC2941988784FC6C4AE31F8AB91D8030D92387D63009B91FF4146F18B539348254898DE6C6E5F6ACBD9482 X-7FA49CB5: 0D63561A33F958A5D6E224D12FFAC8C5054B814514D99F4C82A1CE9E894AF5B58941B15DA834481FA18204E546F3947CEDCF5861DED71B2F389733CBF5DBD5E9C8A9BA7A39EFB7666BA297DBC24807EA117882F44604297287769387670735209ECD01F8117BC8BEA471835C12D1D977C4224003CC8364767815B9869FA544D8D32BA5DBAC0009BE9E8FC8737B5C2249668B94F0A65C3A0C3AA81AA40904B5D9CF19DD082D7633A0E7DDDDC251EA7DABD81D268191BDAD3D78DA827A17800CE7542AF255F21831B5CD04E86FAF290E2D40A5AABA2AD3711975ECD9A6C639B01B78DA827A17800CE79EBEB503AFBA2DD44EED78E81DD8BDE975ECD9A6C639B01B4E70A05D1297E1BBC6867C52282FAC85D9B7C4F32B44FF57285124B2A10EEC6C00306258E7E6ABB4E4A6367B16DE6309 X-Mailru-Sender: D8D48EF70163D79D00784CDFC8FD31077F8BD95612A2B4898D77F1A9FF468278C0F74C40C00C070950D5CF8590B94F4EC77752E0C033A69E81198BD1A48777B793AC9912533B2342AE208404248635DF X-Mras: OK X-Rspamd-Queue-Id: EF630745DB X-Spamd-Bar: ------ X-Spamd-Result: default: False [-6.99 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-0.99)[-0.993,0]; REPLY(-4.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2019 17:31:47 -0000 Thank you for reply, Do you mean that I must install world explicity as make installworld WITHOUT_SSP=3Dyes and the same string in src.conf is not enough? I'm sure that I didn't touch src.conf between 'buildworld' and 'installworld'. > On Mon, Mar 04, 2019 at 07:56:02PM +0300, Anthony Pankov via freebsd-hack= ers wrote: >> It seems that world builded with WITHOUT_SSP=3Dyes loose ability to >> build anything. >>=20 >> # cc -v test.c >> FreeBSD clang version 7.0.1 (tags/RELEASE_701/final 349250) (based on LL= VM 7.0.1) >> Target: x86_64-unknown-freebsd11.2 >> Thread model: posix >> InstalledDir: /usr/bin >> "/usr/bin/cc" -cc1 -triple x86_64-unknown-freebsd11.2 -emit-obj -mrelax= -all -disable-free -disable-llvm-verifier -discard-value-names -main-file-n= ame test.c -mrelocation-model static -mthread-model posix -mdisable-fp-elim= -masm-verbose -mconstructor-aliases -munwind-tables -target-cpu x86-64 -dw= arf-column-info -debugger-tuning=3Dgdb -v -resource-dir /usr/lib/clang/7.0.= 1 -fdebug-compilation-dir /root/test -ferror-limit 19 -fmessage-length 90 -= fobjc-runtime=3Dgnustep -fdiagnostics-show-option -fcolor-diagnostics -o /t= mp/test-d853d1.o -x c test.c -faddrsig >> clang -cc1 version 7.0.1 based upon LLVM 7.0.1 default target x86_64-unk= nown-freebsd11.2 >> #include "..." search starts here: >> #include <...> search starts here: >> /usr/lib/clang/7.0.1/include >> /usr/include >> End of search list. >> "/usr/bin/ld" --eh-frame-hdr -dynamic-linker /libexec/ld-elf.so.1 --has= h-style=3Dboth --enable-new-dtags -o a.out /usr/lib/crt1.o /usr/lib/crti.o = /usr/lib/crtbegin.o -L/usr/lib /tmp/test-d853d1.o -lgcc --as-needed -lgcc_s= --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/crten= d.o /usr/lib/crtn.o >> /usr/bin/ld: cannot find /usr/lib/libssp_nonshared.a > It seems that you installed without specifying WITHOUT_SSP, which > ended up installing wrong linker script as libc.a. Either create dummy > libssp_nonshared.a, or reinstall libc.a (look at lib/libc/Makefile for > SHLIB_LDSCRIPT), or reinstall the world. >>=20 >>=20 >> > Greetings, >>=20 >> > I've builded 11-stable ( 11.2-STABLE r344696) from source with option >> > WITHOUT_SSP=3D"yes" in src.conf. >>=20 >> > Installing kernel and world was OK. But when I tried to build from po= rt it give me an error: >> > configure: error: in `/usr/ports/ports-mgmt/pkg/work/pkg-1.10.5': >> > configure: error: C compiler cannot create executables >>=20 >> > config.log: >> > ... >> > configure:3555: cc -v >&5 >> > FreeBSD clang version 7.0.1 (tags/RELEASE_701/final 349250) (based on = LLVM 7.0.1) >> > Target: x86_64-unknown-freebsd11.2 >> > ... >> > configure:3608: cc -O2 -pipe -Wno-error -fno-strict-aliasing conf= test.c >&5 >> > /usr/bin/ld: cannot find /usr/lib/libssp_nonshared.a >> > cc: error: linker command failed with exit code 1 (use -v to see invoc= ation) >>=20 >> > And yes, there is SSP_UNSAFE=3Dyes in make.conf >>=20 >> > Is this a bug or feature? >>=20 >>=20 >>=20 >>=20 >> --=20 >> Best regards, >> Anthony Pankov mailto:ap00@mail.ru >>=20 >> _______________________________________________ >> freebsd-hackers@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers >> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.or= g" > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to > "freebsd-hackers-unsubscribe@freebsd.org" --=20 =D1 =F3=E2=E0=E6=E5=ED=E8=E5=EC, Anthony mailto:ap00@mail.ru From owner-freebsd-hackers@freebsd.org Mon Mar 4 17:39:45 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 649DA151BAB3 for ; Mon, 4 Mar 2019 17:39:45 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id AD47D74A09 for ; Mon, 4 Mar 2019 17:39:44 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x24HdbGJ001660 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 4 Mar 2019 19:39:40 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x24HdbGJ001660 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id x24HdbWf001659; Mon, 4 Mar 2019 19:39:37 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 4 Mar 2019 19:39:37 +0200 From: Konstantin Belousov To: Anthony Pankov Cc: Anthony Pankov via freebsd-hackers Subject: Re: building with WITHOUT_SSP side effect Message-ID: <20190304173937.GR68879@kib.kiev.ua> References: <434119194.20190304190732@mail.ru> <1122478880.20190304195602@mail.ru> <20190304171351.GQ68879@kib.kiev.ua> <1032136115.20190304203133@mail.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1032136115.20190304203133@mail.ru> User-Agent: Mutt/1.11.3 (2019-02-01) X-Spam-Status: No, score=0.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM,FREEMAIL_REPLY, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2019 17:39:45 -0000 On Mon, Mar 04, 2019 at 08:31:33PM +0300, Anthony Pankov wrote: > Thank you for reply, > > Do you mean that I must install world explicity as > > make installworld WITHOUT_SSP=yes > > and the same string in src.conf is not enough? I'm sure that I didn't > touch src.conf between 'buildworld' and 'installworld'. Check your /usr/lib/libc.a, if it mentions libssp_nonshared.a then you have something broken. > > > > On Mon, Mar 04, 2019 at 07:56:02PM +0300, Anthony Pankov via freebsd-hackers wrote: > >> It seems that world builded with WITHOUT_SSP=yes loose ability to > >> build anything. > >> > >> # cc -v test.c > >> FreeBSD clang version 7.0.1 (tags/RELEASE_701/final 349250) (based on LLVM 7.0.1) > >> Target: x86_64-unknown-freebsd11.2 > >> Thread model: posix > >> InstalledDir: /usr/bin > >> "/usr/bin/cc" -cc1 -triple x86_64-unknown-freebsd11.2 -emit-obj -mrelax-all -disable-free -disable-llvm-verifier -discard-value-names -main-file-name test.c -mrelocation-model static -mthread-model posix -mdisable-fp-elim -masm-verbose -mconstructor-aliases -munwind-tables -target-cpu x86-64 -dwarf-column-info -debugger-tuning=gdb -v -resource-dir /usr/lib/clang/7.0.1 -fdebug-compilation-dir /root/test -ferror-limit 19 -fmessage-length 90 -fobjc-runtime=gnustep -fdiagnostics-show-option -fcolor-diagnostics -o /tmp/test-d853d1.o -x c test.c -faddrsig > >> clang -cc1 version 7.0.1 based upon LLVM 7.0.1 default target x86_64-unknown-freebsd11.2 > >> #include "..." search starts here: > >> #include <...> search starts here: > >> /usr/lib/clang/7.0.1/include > >> /usr/include > >> End of search list. > >> "/usr/bin/ld" --eh-frame-hdr -dynamic-linker /libexec/ld-elf.so.1 --hash-style=both --enable-new-dtags -o a.out /usr/lib/crt1.o /usr/lib/crti.o /usr/lib/crtbegin.o -L/usr/lib /tmp/test-d853d1.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/crtend.o /usr/lib/crtn.o > >> /usr/bin/ld: cannot find /usr/lib/libssp_nonshared.a > > It seems that you installed without specifying WITHOUT_SSP, which > > ended up installing wrong linker script as libc.a. Either create dummy > > libssp_nonshared.a, or reinstall libc.a (look at lib/libc/Makefile for > > SHLIB_LDSCRIPT), or reinstall the world. > > >> > >> > >> > Greetings, > >> > >> > I've builded 11-stable ( 11.2-STABLE r344696) from source with option > >> > WITHOUT_SSP="yes" in src.conf. > >> > >> > Installing kernel and world was OK. But when I tried to build from port it give me an error: > >> > configure: error: in `/usr/ports/ports-mgmt/pkg/work/pkg-1.10.5': > >> > configure: error: C compiler cannot create executables > >> > >> > config.log: > >> > ... > >> > configure:3555: cc -v >&5 > >> > FreeBSD clang version 7.0.1 (tags/RELEASE_701/final 349250) (based on LLVM 7.0.1) > >> > Target: x86_64-unknown-freebsd11.2 > >> > ... > >> > configure:3608: cc -O2 -pipe -Wno-error -fno-strict-aliasing conftest.c >&5 > >> > /usr/bin/ld: cannot find /usr/lib/libssp_nonshared.a > >> > cc: error: linker command failed with exit code 1 (use -v to see invocation) > >> > >> > And yes, there is SSP_UNSAFE=yes in make.conf > >> > >> > Is this a bug or feature? > >> > >> > >> > >> > >> -- > >> Best regards, > >> Anthony Pankov mailto:ap00@mail.ru > >> > >> _______________________________________________ > >> freebsd-hackers@freebsd.org mailing list > >> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > >> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > > _______________________________________________ > > freebsd-hackers@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > > To unsubscribe, send any mail to > > "freebsd-hackers-unsubscribe@freebsd.org" > > > > -- > С уважением, > Anthony mailto:ap00@mail.ru > From owner-freebsd-hackers@freebsd.org Mon Mar 4 17:56:48 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B8DA1151C57A for ; Mon, 4 Mar 2019 17:56:47 +0000 (UTC) (envelope-from ap00@mail.ru) Received: from smtp5.mail.ru (smtp5.mail.ru [94.100.179.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 48D777586D for ; Mon, 4 Mar 2019 17:56:47 +0000 (UTC) (envelope-from ap00@mail.ru) Received: by smtp5.mail.ru with esmtpa (envelope-from ) id 1h0rpN-0007HQ-AH; Mon, 04 Mar 2019 20:56:37 +0300 Date: Mon, 4 Mar 2019 20:56:34 +0300 From: Anthony Pankov X-Priority: 3 (Normal) Message-ID: <1178496353.20190304205634@mail.ru> To: Konstantin Belousov CC: Anthony Pankov via freebsd-hackers Subject: Re: building with WITHOUT_SSP side effect In-Reply-To: <20190304173937.GR68879@kib.kiev.ua> References: <434119194.20190304190732@mail.ru> <1122478880.20190304195602@mail.ru> <20190304171351.GQ68879@kib.kiev.ua> <1032136115.20190304203133@mail.ru> <20190304173937.GR68879@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-77F55803: BBE463BEF7A60BD05A78504BD2AC294173B5FE5E8078F296934FF6215BFBAED1D9BD7BD9299EB09D76C5711636B2200D X-7FA49CB5: 0D63561A33F958A5268FC42D51BE80F42976F3D4F0E0E38FEFC645FA292A76658941B15DA834481FA18204E546F3947CEDCF5861DED71B2F389733CBF5DBD5E9C8A9BA7A39EFB7666BA297DBC24807EA117882F44604297287769387670735209ECD01F8117BC8BEA471835C12D1D977C4224003CC8364767815B9869FA544D8D32BA5DBAC0009BE9E8FC8737B5C2249668B94F0A65C3A0C3AA81AA40904B5D9CF19DD082D7633A0E7DDDDC251EA7DABD81D268191BDAD3D78DA827A17800CE7542AF255F21831B5CD04E86FAF290E2D40A5AABA2AD3711975ECD9A6C639B01B78DA827A17800CE7DBA9D19EC28D74DCABBED4C59776AF2D75ECD9A6C639B01B4E70A05D1297E1BBC6867C52282FAC85D9B7C4F32B44FF57285124B2A10EEC6C00306258E7E6ABB4E4A6367B16DE6309 X-Mailru-Sender: D8D48EF70163D79D00784CDFC8FD3107B19AC08ED0E7A9241832297969E15CEEB3C773650554347350D5CF8590B94F4EC77752E0C033A69E81198BD1A48777B793AC9912533B2342AE208404248635DF X-Mras: OK X-Rspamd-Queue-Id: 48D777586D X-Spamd-Bar: ------ X-Spamd-Result: default: False [-6.99 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-0.99)[-0.993,0]; REPLY(-4.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2019 17:56:48 -0000 I have looked on it and found no ssp entries: ar t /usr/lib/libc.a |grep ssp wcsspn.o readpassphrase.o P.S. touch /usr/lib/libssp_nonshared.a is a cure. But it seems weird. > On Mon, Mar 04, 2019 at 08:31:33PM +0300, Anthony Pankov wrote: >> Thank you for reply, >>=20 >> Do you mean that I must install world explicity as >>=20 >> make installworld WITHOUT_SSP=3Dyes >>=20 >> and the same string in src.conf is not enough? I'm sure that I didn't >> touch src.conf between 'buildworld' and 'installworld'. > Check your /usr/lib/libc.a, if it mentions libssp_nonshared.a then > you have something broken. >>=20 >>=20 >> > On Mon, Mar 04, 2019 at 07:56:02PM +0300, Anthony Pankov via freebsd-h= ackers wrote: >> >> It seems that world builded with WITHOUT_SSP=3Dyes loose ability= to >> >> build anything. >> >>=20 >> >> # cc -v test.c >> >> FreeBSD clang version 7.0.1 (tags/RELEASE_701/final 349250) (based on= LLVM 7.0.1) >> >> Target: x86_64-unknown-freebsd11.2 >> >> Thread model: posix >> >> InstalledDir: /usr/bin >> >> "/usr/bin/cc" -cc1 -triple x86_64-unknown-freebsd11.2 -emit-obj -mre= lax-all -disable-free -disable-llvm-verifier -discard-value-names -main-fil= e-name test.c -mrelocation-model static -mthread-model posix -mdisable-fp-e= lim -masm-verbose -mconstructor-aliases -munwind-tables -target-cpu x86-64 = -dwarf-column-info -debugger-tuning=3Dgdb -v -resource-dir /usr/lib/clang/7= .0.1 -fdebug-compilation-dir /root/test -ferror-limit 19 -fmessage-length 9= 0 -fobjc-runtime=3Dgnustep -fdiagnostics-show-option -fcolor-diagnostics -o= /tmp/test-d853d1.o -x c test.c -faddrsig >> >> clang -cc1 version 7.0.1 based upon LLVM 7.0.1 default target x86_64-= unknown-freebsd11.2 >> >> #include "..." search starts here: >> >> #include <...> search starts here: >> >> /usr/lib/clang/7.0.1/include >> >> /usr/include >> >> End of search list. >> >> "/usr/bin/ld" --eh-frame-hdr -dynamic-linker /libexec/ld-elf.so.1 --= hash-style=3Dboth --enable-new-dtags -o a.out /usr/lib/crt1.o /usr/lib/crti= .o /usr/lib/crtbegin.o -L/usr/lib /tmp/test-d853d1.o -lgcc --as-needed -lgc= c_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/cr= tend.o /usr/lib/crtn.o >> >> /usr/bin/ld: cannot find /usr/lib/libssp_nonshared.a >> > It seems that you installed without specifying WITHOUT_SSP, which >> > ended up installing wrong linker script as libc.a. Either create dummy >> > libssp_nonshared.a, or reinstall libc.a (look at lib/libc/Makefile for >> > SHLIB_LDSCRIPT), or reinstall the world. >>=20 >> >>=20 >> >>=20 >> >> > Greetings, >> >>=20 >> >> > I've builded 11-stable ( 11.2-STABLE r344696) from source with opt= ion >> >> > WITHOUT_SSP=3D"yes" in src.conf. >> >>=20 >> >> > Installing kernel and world was OK. But when I tried to build from= port it give me an error: >> >> > configure: error: in `/usr/ports/ports-mgmt/pkg/work/pkg-1.10.5': >> >> > configure: error: C compiler cannot create executables >> >>=20 >> >> > config.log: >> >> > ... >> >> > configure:3555: cc -v >&5 >> >> > FreeBSD clang version 7.0.1 (tags/RELEASE_701/final 349250) (based = on LLVM 7.0.1) >> >> > Target: x86_64-unknown-freebsd11.2 >> >> > ... >> >> > configure:3608: cc -O2 -pipe -Wno-error -fno-strict-aliasing c= onftest.c >&5 >> >> > /usr/bin/ld: cannot find /usr/lib/libssp_nonshared.a >> >> > cc: error: linker command failed with exit code 1 (use -v to see in= vocation) >> >>=20 >> >> > And yes, there is SSP_UNSAFE=3Dyes in make.conf >> >>=20 >> >> > Is this a bug or feature? >> >>=20 >> >>=20 >> >>=20 >> >>=20 >> >> --=20 >> >> Best regards, >> >> Anthony Pankov mailto:ap00@mail.ru >> >>=20 >> >> _______________________________________________ >> >> freebsd-hackers@freebsd.org mailing list >> >> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers >> >> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd= .org" >> > _______________________________________________ >> > freebsd-hackers@freebsd.org mailing list >> > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers >> > To unsubscribe, send any mail to >> > "freebsd-hackers-unsubscribe@freebsd.org" >>=20 >>=20 >>=20 >> --=20 >> =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC, >> Anthony mailto:ap00@mail.ru >>=20 > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to > "freebsd-hackers-unsubscribe@freebsd.org" --=20 =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC, Anthony mailto:ap00@mail.ru From owner-freebsd-hackers@freebsd.org Mon Mar 4 18:06:10 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 707B4151D02F for ; Mon, 4 Mar 2019 18:06:10 +0000 (UTC) (envelope-from shawn.webb@hardenedbsd.org) Received: from mail-qt1-x844.google.com (mail-qt1-x844.google.com [IPv6:2607:f8b0:4864:20::844]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 588A676D1A for ; Mon, 4 Mar 2019 18:06:09 +0000 (UTC) (envelope-from shawn.webb@hardenedbsd.org) Received: by mail-qt1-x844.google.com with SMTP id o6so6106637qtk.6 for ; Mon, 04 Mar 2019 10:06:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hardenedbsd.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=e/AgxHwEKMfjJwXgArj73TJhdZgscDaLbOmOPph0Ijc=; b=gs/WjFlt9jkszp61tyxW0/wClMSc9NpY47D0n2qTPlbzZJysHyEBgCFxySnJYMjLpD P/NPgMVjecelqU1UF3OOgH0VHwVFFnpzL40aEWbgLgmRr03aipNwmGMzyWu946pJFltu OXQOq713LgUeGwHvEp7fo7A+a8sC8nxmtjw7DQ/394OHO/cvaJ0eGf4AUwqAOR4OfikP DewSq8bZDrr4rQaZJ7GyiKb/DX8nE4qJrxPSWcbcUU3WA6iCTI3C8dDtP2CJ0h0cTUgd JI7Pt/Ka4xSC6e5UikMODpSFJFt4cWMEcWYPGTZ/L6NJOTSMNKOKQo6+CuZ/JR9C9T68 gM1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=e/AgxHwEKMfjJwXgArj73TJhdZgscDaLbOmOPph0Ijc=; b=dpOn/cntpdIL8udi13UaFpVe6zATfBYYbA2eSKcHF9okTCu+PvSVrqO9LveYtfPzoF 3dOVWSki04Nt0lLm69OJUIvQuoRCBvbqJLSEkXqj/zM8i1PDGtwLZAu3o8ro8LbON3np E1lEwqPPdgeM2A8t108HEdaSabIh98ENv2OVH4rOYy8BQ5tPxJEHnkk7ao3M/OBe1o0r FqL1OXhrr98zvZvkTTOqg4DZhuXYksPcOV2Q/ZRHgnaLlMceFrbtZ3p7Jy+lPg7Uo3fp uOBZ8c4oP5gMjodWdIT+GTewLfX48Ma7EpUt4u9YDM0iqapNYrDfWILaK3bHH7EiWqGv ctZA== X-Gm-Message-State: APjAAAUL9VMy7A1g5+voa9R9vogyvV07Ia9CBmHPn7c1dE+rf1twHzSz nQBM0AbuseT7p+Oqgf7ZqW6UFw== X-Google-Smtp-Source: APXvYqw/fUMIM9Gh9JCYfOROTdtAJ8MRqczipIt2SE470TGwjNg1qxoMW+Pl5wR+jpYtF+jd1VUItw== X-Received: by 2002:ac8:396b:: with SMTP id t40mr15297860qtb.159.1551722768886; Mon, 04 Mar 2019 10:06:08 -0800 (PST) Received: from mutt-hbsd ([63.88.83.108]) by smtp.gmail.com with ESMTPSA id k27sm3514370qki.19.2019.03.04.10.06.07 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 04 Mar 2019 10:06:08 -0800 (PST) Date: Mon, 4 Mar 2019 13:05:33 -0500 From: Shawn Webb To: Anthony Pankov Cc: Konstantin Belousov , Anthony Pankov via freebsd-hackers Subject: Re: building with WITHOUT_SSP side effect Message-ID: <20190304180533.rkpfkg5qxmhifeiy@mutt-hbsd> References: <434119194.20190304190732@mail.ru> <1122478880.20190304195602@mail.ru> <20190304171351.GQ68879@kib.kiev.ua> <1032136115.20190304203133@mail.ru> <20190304173937.GR68879@kib.kiev.ua> <1178496353.20190304205634@mail.ru> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="p2lpnnazo2wwjgz2" Content-Disposition: inline In-Reply-To: <1178496353.20190304205634@mail.ru> X-Operating-System: FreeBSD mutt-hbsd 13.0-CURRENT-HBSD FreeBSD 13.0-CURRENT-HBSD HARDENEDBSD-13-CURRENT amd64 X-PGP-Key: http://pgp.mit.edu/pks/lookup?op=vindex&search=0x6A84658F52456EEE User-Agent: NeoMutt/20180716 X-Rspamd-Queue-Id: 588A676D1A X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=hardenedbsd.org header.s=google header.b=gs/WjFlt; spf=pass (mx1.freebsd.org: domain of shawn.webb@hardenedbsd.org designates 2607:f8b0:4864:20::844 as permitted sender) smtp.mailfrom=shawn.webb@hardenedbsd.org X-Spamd-Result: default: False [-5.54 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: alt1.aspmx.l.google.com]; DKIM_TRACE(0.00)[hardenedbsd.org:+]; NEURAL_HAM_SHORT(-0.97)[-0.972,0]; SIGNED_PGP(-2.00)[]; FREEMAIL_TO(0.00)[mail.ru]; FROM_EQ_ENVFROM(0.00)[]; IP_SCORE(-0.46)[ip: (2.52), ipnet: 2607:f8b0::/32(-2.70), asn: 15169(-2.04), country: US(-0.07)]; MIME_TRACE(0.00)[0:+,1:+]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; RCVD_TLS_LAST(0.00)[]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; R_DKIM_ALLOW(-0.20)[hardenedbsd.org:s=google]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.20)[multipart/signed,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org]; DMARC_NA(0.00)[hardenedbsd.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[4.4.8.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.b.8.f.7.0.6.2.list.dnswl.org : 127.0.5.0]; MID_RHS_NOT_FQDN(0.50)[]; FREEMAIL_CC(0.00)[gmail.com] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2019 18:06:10 -0000 --p2lpnnazo2wwjgz2 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable I'm curious about your use case for building without stack cookies. Thanks, --=20 Shawn Webb Cofounder and Security Engineer HardenedBSD Tor-ified Signal: +1 443-546-8752 Tor+XMPP+OTR: lattera@is.a.hacker.sx GPG Key ID: 0x6A84658F52456EEE GPG Key Fingerprint: 2ABA B6BD EF6A F486 BE89 3D9E 6A84 658F 5245 6EEE On Mon, Mar 04, 2019 at 08:56:34PM +0300, Anthony Pankov via freebsd-hacker= s wrote: > I have looked on it and found no ssp entries: >=20 > ar t /usr/lib/libc.a |grep ssp >=20 > wcsspn.o > readpassphrase.o >=20 > P.S. > touch /usr/lib/libssp_nonshared.a >=20 > is a cure. But it seems weird. --p2lpnnazo2wwjgz2 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEKrq2ve9q9Ia+iT2eaoRlj1JFbu4FAlx9aOgACgkQaoRlj1JF bu6ncw/+NX4oLR0HaXK4bgmth4xMwQ/3MfGyhDT/+p0j/TN6QtlcECKYDdHriFV9 RfjtgsPytdHFb8eb3nwnR4EjL2DqN3y0LQq7WPwZVKPlHm+ohqIGx3F+7REBXCL2 zamwwQSqOgX7EwOXKEQWobGXBMwwTklf8pl9G/h5+1MuwxYANEMKKBGzWsOah+a4 chGIFyi+b8smykeOy7h4y1YznblrbQcbN7IhAaHYpE7NmS8LQLIMcMNdb1baOpOx EhDJmth+UWv+3is2wkL1UCqbMbNfatjs/nOAmUVZIO33GVYPsjI8ElgjhmJ6cz3S Q1HS1ucVNCeB7okcU0Z2DuYexljr8/4k2x9qTE6yJs4N/lMRqm0mEOZmBBnmuhMh OoUl8kj5+U7hSSttNTEgRYQELESCq7pPNJgOeNZLG0h0F7NfLMmqu+PG9fyeELOz L/o+zbxHD3NfWsih+11zEnxJ7XJCcA7LY2Fkl2ekETQk1bA/dfxRaYhuD0bAT+RR 5Eso9mgX3X5DM+KBiJE2zzYEs6P6xnfGkGiFPxTqTn3MxPt5Rcv63OY9+kKGX4fZ DKEjsMLCJNU/+Z4w+KvyqQCJZssn1UawfdA4eGLJfFdhUI0HlJmbwJytagZqJUpP rLAZYJ8bUquh27qV8/MV2RTuhQeGQOEMOfQvuLMZp7PKoVg6gm0= =g3NW -----END PGP SIGNATURE----- --p2lpnnazo2wwjgz2-- From owner-freebsd-hackers@freebsd.org Mon Mar 4 18:17:26 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 75BCC151DB25; Mon, 4 Mar 2019 18:17:26 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id 6A52577842; Mon, 4 Mar 2019 18:17:25 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 41B5B43A329; Tue, 5 Mar 2019 05:17:15 +1100 (AEDT) Date: Tue, 5 Mar 2019 05:17:14 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Konstantin Belousov cc: Bruce Evans , Mark Millard , freebsd-hackers Hackers , FreeBSD PowerPC ML Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] In-Reply-To: <20190304114150.GM68879@kib.kiev.ua> Message-ID: <20190305031010.I4610@besplex.bde.org> References: <20190301194217.GB68879@kib.kiev.ua> <20190302071425.G5025@besplex.bde.org> <20190302105140.GC68879@kib.kiev.ua> <20190302225513.W3408@besplex.bde.org> <20190302142521.GE68879@kib.kiev.ua> <20190303041441.V4781@besplex.bde.org> <20190303111931.GI68879@kib.kiev.ua> <20190303223100.B3572@besplex.bde.org> <20190303161635.GJ68879@kib.kiev.ua> <20190304043416.V5640@besplex.bde.org> <20190304114150.GM68879@kib.kiev.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=2apI1eGbhsv_kSbrP38A:9 a=CjuIK1q_8ugA:10 X-Rspamd-Queue-Id: 6A52577842 X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-6.99 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-0.99)[-0.994,0]; REPLY(-4.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0] X-Mailman-Approved-At: Mon, 04 Mar 2019 19:10:34 +0000 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2019 18:17:26 -0000 On Mon, 4 Mar 2019, Konstantin Belousov wrote: > On Mon, Mar 04, 2019 at 05:29:48AM +1100, Bruce Evans wrote: >> On Sun, 3 Mar 2019, Konstantin Belousov wrote: >> >>> On Mon, Mar 04, 2019 at 12:32:12AM +1100, Bruce Evans wrote: >>>> On Sun, 3 Mar 2019, Konstantin Belousov wrote: >>>> >>>>> On Sun, Mar 03, 2019 at 04:43:20AM +1100, Bruce Evans wrote: >>>>>> On Sat, 2 Mar 2019, Konstantin Belousov wrote: >>>>>> >>>>>>> On Sun, Mar 03, 2019 at 12:03:18AM +1100, Bruce Evans wrote: >>>>>>>> On Sat, 2 Mar 2019, Konstantin Belousov wrote: >>> * ... >>>> However, shrd in rdtsc-low (tsc_get_timecount_low()) does a slow combining >>>> step. i386 used to be faster here -- the first masking step of discarding >>>> %edx doesn't take any code. amd64 has to mask out the top bits in %rax. >>>> Now for the tsc-low pessimization, i386 has to do a slow shrd, and amd64 >>>> has to do a not so slow shr. >>> i386 cannot discard %edx after RDTSC since some bits from %edx come into >>> the timecounter value. >> >> These bits are part of the tsc-low pessimization. The shift count should >> always be 1, giving a TSC frequency of > INT32_MAX (usually) and > UINT32_MAX >> sometimes. >> >> When tsc-low was new, the shift count was often larger (as much as 8), >> and it is still changeable by a read-only tunable, but now it is 1 in >> almost all cases. The code only limits the timecounter frequency >> to UINT_MAX, except the tunable defaults to 1 so average CPUs running >> at nearly 4 GHz are usually limited to about 2 GHz. The comment about >> this UINT_MAX doesn't match the code. The comment says int, but the >> code says UINT. >> >> All that a shoft count of 1 does is waste time to lose 1 bit of accuracy. >> This much accuracy is noise for most purposes. >> >> The tunable is fairly undocumented. Its description is "Shift to apply >> for the maximum TSC frequency". Of course, it has no effect on the TSC >> frequency. It only affects the TSC timecounter frequency. > I suspect that the shift of 1 (at least) hides cross-socket inaccuracy. > Otherwise, I think, some multi-socket machines would start showing the > detectable backward-counting bintime(). At the frequencies at 4GHz and > above (Intel has 5Ghz part numbers) I do not think that stability of > 100MHz crystall and on-board traces is enough to avoid that. I think it is just a kludge that reduced the problem before it was fixed properly using fences. Cross-socket latency is over 100 cycles according to jhb's tscskew benchmark: on Haswell 4x2: CPU | TSC skew (min/avg/max/stddev) ----+------------------------------ 0 | 0 0 0 0.000 1 | 24 49 84 14.353 2 | 164 243 308 47.811 3 | 164 238 312 47.242 4 | 168 242 332 49.593 5 | 168 243 324 48.722 6 | 172 242 320 52.596 7 | 172 240 316 53.014 freefall is similar. Latency is apparently measured relative to CPU 0. It is much lower to CPU 1 since that is on the same core. I played with this program a lot 3 and a half years ago, but forgot mist of what I learned :-(. I tried different fencing in it. This seems to make little difference when the program is rerun. With the default TESTS = 1024, the min skew sometimes goes negative on freefall, but with TESTS = 1024000 that doesn't happen. This is the opposite of what I would expect. freefall has load average about 1. Removing the only fencing in it reduces average latency by 10-20 cycles and minimum latency by over 100 cycles, except on freefall it is reduced from 33 to 6. On Haswell it is 24 with fencing and I didn't test it with no fencing. I think tscskew doesn't really measure tsc skew. What it measures is the time taken for a locking protocol, using the TSCs on different CPUs to make the start and end timestamps. If the TSCs have a lot of skew or jitter, then this will show up indirectly as inconsistent and possibly negative differences. A shift of just 1 can't hide latencies of hundreds of cycles on single- socket machines. Even a shift of 8 only works sometimes, by reducing the chance of observing the TSC going backwards by a factor of 256. E.g., assume for simplicity that all instructions and IPCs take 0-1 cycles, and that unfenced rdtsc's differ by at most +-5 cycles (with the 11 values between -5 and 5 uniformly distributed. Then with a shift of 0 and no fences, a CPU that updates the timehands is ahead of another CPU that spins reading the timehands about 5/11 of the time. With a shift of 8, the CPUs are close enough when the first one reads at least 5 above and at least 5 below a 256-boundary. The chance of seeing a negative difference is reduced by at least a factor of 10/256. > I suspect that the shift of 1 (at least) hides cross-socket inaccuracy. > Otherwise, I think, some multi-socket machines would start showing the > detectable backward-counting bintime(). At the frequencies at 4GHz and > above (Intel has 5Ghz part numbers) I do not think that stability of > 100MHz crystall and on-board traces is enough to avoid that. Why would losing just 1 bit fix that? Fences for rdtsc of course only serialize it for the CPU that runs it. The locking (ordering) protocol (for the generation count) orders the CPUs too. It takes longer than we would like, much more than the 1- cycle error that might be hidden by ignoring the low bit. Surely the ordering protocol must work across sockets? It then gives ordering of rdtsc's. TSC-low was added in 2011. That was long before the ordering was fixed. You added fences in 2012 and memory ordering for the generation count in 2016. Fences slowed everything down by 10-20+ cycles and probably hide bugs in the memory ordering better than TSC-low. Memory ordering plus fences slow down the cross-core case by more than 100 cycles according to tscskew. That is enough to hide large hardware bugs. > We can try to set the tsc-low shift count to 0 (but keep lfence) and see > what is going on in HEAD, but I am afraid that the HEAD users population > is not representative enough to catch the issue with the certainity. > More, it is unclear to me how to diagnose the cause, e.g. I would expect > the sleeps to hang on timeouts, as was reported from the very beginning > of this thread. How would we root-cause it ? Negative time differences cause lots of overflows so break the timecounter. The fix under discussion actually gives larger overflows in the positive direction. E.g., a delta of -1 first overflows to 0xffffffff. The fix prevents overflow on multiplication by that. When the timecounter frequency is small, say 1 MHz, 0xffffffff means 4294 seconds, so the timecounter advances by that. >>> amd64 cannot either, but amd64 does not need to mask out top bits in %rax, >>> since the whole shrdl calculation occurs in 32bit registers, and the result >>> is in %rax where top word is cleared by shrdl instruction automatically. >>> But the clearing is not required since result is unsigned int anyway. >>> >>> Dissassemble of tsc_get_timecount_low() is very clear: >>> 0xffffffff806767e4 <+4>: mov 0x30(%rdi),%ecx >>> 0xffffffff806767e7 <+7>: rdtsc >>> 0xffffffff806767e9 <+9>: shrd %cl,%edx,%eax >>> ... >>> 0xffffffff806767ed <+13>: retq >>> (I removed frame manipulations). I checked that all compilers still produce horrible code for the better source code 'return (rdtsc() << (intptr_t)tc->tc_priv);'. 64-bit shifts are apparently pessimal for compatibility. The above is written mostly in asm to avoid 2-5 extra instructions. >>>> ... >>>> Similarly in bintime(). >>> I merged two functions, finally. Having to copy the same code is too >>> annoying for this change. I strongly disklike the merge. >>> So I verified that: >>> - there is no 64bit multiplication in the generated code, for i386 both >>> for clang 7.0 and gcc 8.3; >>> - that everything is inlined, the only call from bintime/binuptime is >>> the indirect call to get the timecounter value. >> >> I will have to fix it for compilers that I use. > Ok, I will add __inline. That will make it fast enough, but still hard to read. >>> + *bt = *bts; >>> + scale = th->th_scale; >>> + delta = tc_delta(th); >>> +#ifdef _LP64 >>> + if (__predict_false(th->th_large_delta <= delta)) { >>> + /* Avoid overflow for scale * delta. */ >>> + bintime_helper(bt, scale, delta); >>> + bintime_addx(bt, (scale & 0xffffffff) * delta); >>> + } else { >>> + bintime_addx(bt, scale * delta); >>> + } >>> +#else >>> + /* >>> + * Use bintime_helper() unconditionally, since the fast >>> + * path in the above method is not so fast here, since >>> + * the 64 x 32 -> 64 bit multiplication is usually not >>> + * available in hardware and emulating it using 2 >>> + * 32 x 32 -> 64 bit multiplications uses code much >>> + * like that in bintime_helper(). >>> + */ >>> + bintime_helper(bt, scale, delta); >>> + bintime_addx(bt, (uint64_t)(uint32_t)scale * delta); >>> +#endif >> >> Check that this method is really better. Without this, the complicated >> part is about half as large and duplicating it is smaller than this >> version. > Better in what sence ? I am fine with the C code, and asm code looks > good. Better in terms of actually running significantly faster. I fear the 32-bit method is actually slightly slower for the fast path. >>> - do { >>> - th = timehands; >>> - gen = atomic_load_acq_int(&th->th_generation); >>> - *bt = th->th_bintime; >>> - bintime_addx(bt, th->th_scale * tc_delta(th)); >>> - atomic_thread_fence_acq(); >>> - } while (gen == 0 || gen != th->th_generation); >> >> Duplicating this loop is much better than obfuscating it using inline >> functions. This loop was almost duplicated (except for the delta >> calculation) in no less than 17 functions in kern_tc.c (9 tc ones and >> 8 fflock ones). Now it is only duplicated 16 times. > How did you counted the 16 ? I can see only 4 instances in the unpatched > kern_tc.c, and 3 in patched, but it is 3 and not 1 only because I do not > touch ffclock until the patch is finalized. After that, it would be > 1 instance for kernel and 1 for userspace. Grep for the end condition in this loop. There are actually 20 of these. I'm counting the loops and not the previously-simple scaling operation in it. The scaling is indeed only done for 4 cases. I prefer the 20 duplications (except I only want about 6 of the functions). Duplication works even better for only 4 cases. This should be written as a function call to 1 new function to replace the line with the overflowing multiplication. The line is always the same, so the new function call can look like bintime_xxx(bt, th). Bruce From owner-freebsd-hackers@freebsd.org Mon Mar 4 19:25:41 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8EB01151FFD0 for ; Mon, 4 Mar 2019 19:25:41 +0000 (UTC) (envelope-from ap00@mail.ru) Received: from smtp14.mail.ru (smtp14.mail.ru [94.100.181.95]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 328F483649 for ; Mon, 4 Mar 2019 19:25:39 +0000 (UTC) (envelope-from ap00@mail.ru) Received: by smtp14.mail.ru with esmtpa (envelope-from ) id 1h0tDO-00032k-0g; Mon, 04 Mar 2019 22:25:30 +0300 Date: Mon, 4 Mar 2019 22:25:26 +0300 From: Anthony Pankov X-Priority: 3 (Normal) Message-ID: <577261663.20190304222526@mail.ru> To: Shawn Webb CC: Anthony Pankov via freebsd-hackers Subject: Re: building with WITHOUT_SSP side effect In-Reply-To: <20190304180533.rkpfkg5qxmhifeiy@mutt-hbsd> References: <434119194.20190304190732@mail.ru> <1122478880.20190304195602@mail.ru> <20190304171351.GQ68879@kib.kiev.ua> <1032136115.20190304203133@mail.ru> <20190304173937.GR68879@kib.kiev.ua> <1178496353.20190304205634@mail.ru> <20190304180533.rkpfkg5qxmhifeiy@mutt-hbsd> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-77F55803: 2D1AD755E866B1545A78504BD2AC294173B5FE5E8078F296BC8A6DABA11CA760A72207F55358C5EBE5C0825993A36E7A X-7FA49CB5: 0D63561A33F958A5BEFCD66EC12C75CE0B53608618351A6D981F630370E5D2DF8941B15DA834481FA18204E546F3947CD2DCF9CF1F528DBCF6B57BC7E64490618DEB871D839B7333395957E7521B51C2545D4CF71C94A83E9FA2833FD35BB23D27C277FBC8AE2E8B974A882099E279BDA471835C12D1D977C4224003CC8364767815B9869FA544D8D32BA5DBAC0009BE9E8FC8737B5C2249D99FB7B2A39B49613AA81AA40904B5D9CF19DD082D7633A0E7DDDDC251EA7DABD81D268191BDAD3D78DA827A17800CE7FBC5FED0552DA851CD04E86FAF290E2D40A5AABA2AD3711975ECD9A6C639B01B78DA827A17800CE7ED9A86E2EB61E0EA46C550781D382B8C75ECD9A6C639B01B4E70A05D1297E1BBC6867C52282FAC85D9B7C4F32B44FF57285124B2A10EEC6C00306258E7E6ABB4E4A6367B16DE6309 X-Mailru-Sender: D8D48EF70163D79D00784CDFC8FD3107F5F70E5BCFE1B6DD4883F302D92DCF67E9E5CDC777A08C4150D5CF8590B94F4EC77752E0C033A69E81198BD1A48777B793AC9912533B2342AE208404248635DF X-Mras: OK X-Rspamd-Queue-Id: 328F483649 X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.76 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:94.100.176.0/20]; FREEMAIL_FROM(0.00)[mail.ru]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: mxs.mail.ru]; DKIM_TRACE(0.00)[mail.ru:+]; HAS_X_PRIO_THREE(0.00)[3]; NEURAL_HAM_SHORT(-0.68)[-0.677,0]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[mail.ru,reject]; RCVD_IN_DNSWL_LOW(-0.10)[95.181.100.94.list.dnswl.org : 127.0.5.1]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[mail.ru]; ASN(0.00)[asn:47764, ipnet:94.100.176.0/20, country:RU]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[mail.ru.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-0.997,0]; R_DKIM_ALLOW(-0.20)[mail.ru:s=mail2]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; IP_SCORE(0.03)[ipnet: 94.100.176.0/20(0.08), asn: 47764(0.05), country: RU(0.00)]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2019 19:25:41 -0000 In my case no applications from the base "world" listen to the internet (no open ports from syslogd, bind, sendmail, etc). Also there is no public login to servers. So I see SSP as waste of billions and billions instruction. The probability of joint events: the known user become an evil hacker AND the weakest point is the buffer overflow in systems base world - is near zero. At least because weakest point can be obtained more easily from misconfiguration, additional packages etc. The idea was to throw out SSP from kernel and base world but fortify sshd, postfix etc. But things went not as smooth as desired. > I'm curious about your use case for building without stack cookies. > Thanks, -- Best regards, Anthony Pankov mailto:ap00@mail.ru From owner-freebsd-hackers@freebsd.org Mon Mar 4 20:50:22 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4B651152274E for ; Mon, 4 Mar 2019 20:50:22 +0000 (UTC) (envelope-from sjg@juniper.net) Received: from mx0a-00273201.pphosted.com (mx0a-00273201.pphosted.com [208.84.65.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.pphosted.com", Issuer "Thawte RSA CA 2018" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 71BA287375 for ; Mon, 4 Mar 2019 20:50:20 +0000 (UTC) (envelope-from sjg@juniper.net) Received: from pps.filterd (m0108159.ppops.net [127.0.0.1]) by mx0a-00273201.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x24KhqeF007164; Mon, 4 Mar 2019 12:50:19 -0800 Received: from nam01-bn3-obe.outbound.protection.outlook.com (mail-bn3nam01lp2056.outbound.protection.outlook.com [104.47.33.56]) by mx0a-00273201.pphosted.com with ESMTP id 2r167s8g20-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Mon, 04 Mar 2019 12:50:18 -0800 Received: from SN4PR0501CA0046.namprd05.prod.outlook.com (2603:10b6:803:41::23) by CY4PR05MB3079.namprd05.prod.outlook.com (2603:10b6:903:fd::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1686.6; Mon, 4 Mar 2019 20:50:16 +0000 Received: from BY2NAM05FT008.eop-nam05.prod.protection.outlook.com (2a01:111:f400:7e52::204) by SN4PR0501CA0046.outlook.office365.com (2603:10b6:803:41::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.1686.6 via Frontend Transport; Mon, 4 Mar 2019 20:50:16 +0000 Received-SPF: SoftFail (protection.outlook.com: domain of transitioning juniper.net discourages use of 66.129.239.13 as permitted sender) Received: from P-EXFEND-EQX-02.jnpr.net (66.129.239.13) by BY2NAM05FT008.mail.protection.outlook.com (10.152.100.145) with Microsoft SMTP Server (version=TLS1_0, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA) id 15.20.1686.5 via Frontend Transport; Mon, 4 Mar 2019 20:50:15 +0000 Received: from P-EXBEND-EQX-02.jnpr.net (10.104.8.53) by P-EXFEND-EQX-02.jnpr.net (10.104.8.55) with Microsoft SMTP Server (TLS) id 15.0.847.32; Mon, 4 Mar 2019 12:50:13 -0800 Received: from p-mailhub01.juniper.net (10.104.20.6) by P-EXBEND-EQX-02.jnpr.net (10.104.8.53) with Microsoft SMTP Server (TLS) id 15.0.1367.3 via Frontend Transport; Mon, 4 Mar 2019 12:50:13 -0800 Received: from kaos.jnpr.net (kaos.jnpr.net [172.23.50.162]) by p-mailhub01.juniper.net (8.14.4/8.11.3) with ESMTP id x24KoBsU011256; Mon, 4 Mar 2019 12:50:11 -0800 (envelope-from sjg@juniper.net) Received: by kaos.jnpr.net (Postfix, from userid 1377) id C58D4737A5; Mon, 4 Mar 2019 12:50:11 -0800 (PST) Received: from kaos.jnpr.net (localhost [127.0.0.1]) by kaos.jnpr.net (Postfix) with ESMTP id C50AB737A4; Mon, 4 Mar 2019 12:50:11 -0800 (PST) To: Shawn Webb CC: Anthony Pankov , Konstantin Belousov , "Anthony Pankov via freebsd-hackers" , Subject: Re: building with WITHOUT_SSP side effect In-Reply-To: <20190304180533.rkpfkg5qxmhifeiy@mutt-hbsd> References: <434119194.20190304190732@mail.ru> <1122478880.20190304195602@mail.ru> <20190304171351.GQ68879@kib.kiev.ua> <1032136115.20190304203133@mail.ru> <20190304173937.GR68879@kib.kiev.ua> <1178496353.20190304205634@mail.ru> <20190304180533.rkpfkg5qxmhifeiy@mutt-hbsd> Comments: In-reply-to: Shawn Webb message dated "Mon, 04 Mar 2019 13:05:33 -0500." From: "Simon J. Gerraty" X-Mailer: MH-E 8.6+git; nmh 1.7.1; GNU Emacs 26.1 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <21488.1551732611.1@kaos.jnpr.net> Date: Mon, 4 Mar 2019 12:50:11 -0800 Message-ID: <23396.1551732611@kaos.jnpr.net> X-EXCLAIMER-MD-CONFIG: e3cb0ff2-54e7-4646-8a04-0dae4ac7b136 X-EOPAttributedMessage: 0 X-MS-Office365-Filtering-HT: Tenant X-Forefront-Antispam-Report: CIP:66.129.239.13; IPV:NLI; CTRY:US; EFV:NLI; SFV:NSPM; SFS:(10019020)(39860400002)(136003)(346002)(376002)(396003)(2980300002)(189003)(199004)(478600001)(16586007)(54906003)(117636001)(86362001)(97876018)(9686003)(47776003)(126002)(76176011)(558084003)(305945005)(4326008)(7696005)(55016002)(356004)(5660300002)(97756001)(90966002)(2906002)(23726003)(53936002)(69596002)(229853002)(68736007)(53416004)(97736004)(336012)(316002)(76506005)(26005)(81156014)(81166006)(8676002)(77096007)(93886005)(186003)(7126003)(6266002)(50226002)(446003)(11346002)(107886003)(50466002)(486006)(476003)(106466001)(105596002)(6246003)(8936002)(46406003)(6916009); DIR:OUT; SFP:1102; SCL:1; SRVR:CY4PR05MB3079; H:P-EXFEND-EQX-02.jnpr.net; FPR:; SPF:SoftFail; LANG:en; PTR:InfoDomainNonexistent; MX:1; A:1; X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 32768963-7a30-4ccd-e7bd-08d6a0e30146 X-Microsoft-Antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600127)(711020)(4605104)(4710095)(4711035)(2017052603328)(7153060); SRVR:CY4PR05MB3079; X-MS-TrafficTypeDiagnostic: CY4PR05MB3079: X-Microsoft-Exchange-Diagnostics: 1; CY4PR05MB3079; 20:0oKfRvIfpf5RrvcjHGPwqog2yU3PuT9HZHakHbd/13JSPcL/m1YE1Hp0zV61BhOI36koJA2zkez0ZGfVrxyvoOJFEf9RrfXcoeTS86EESg3kZafPrFjttfac5iApZsNJa6wgSnfsJEkl1aQM4I8kU6GSMSr4pGVYGUPWJoBU/QNe9WgnoOcLDGmnC4SBEniLujqnKDIikv8p+wqInaqSBwvq1ZCX9ON2dPKBJRzjl+BrpXc+fHiti2YXi2i5451/qPF32nNmk+1/sp+F6b/JTGI9j047bfPCl5UpwCF/pkMb5s9UvzagWbyttvggMbQvVAvVa6ugq2RhhYe+N+qpqZMhSWCFcLlNQI1Sp8r0IZNAITmMf4225PvYv5kVErcWZCWAWuHJID6w+1lS4H+gopX+3GgILWhZ+++G1OFBCCbnAhb4q/Ukgm2NcnkezCy2dp1dZz2eR1GbJR5NUYbyllKkYOTXZCy0seuaLK2pMWcH/IOLk8UJonqXG0zYG9yo X-Microsoft-Antispam-PRVS: X-Forefront-PRVS: 09669DB681 X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; CY4PR05MB3079; 23:MhBfLVydSxAC4LohaH7dTFGNh9t6JqYW0h90o8Qmd?= =?us-ascii?Q?RAEWDhRUXl4TlYqT78lLK8cCiLFZLAT2v2vWJremnLk8yMP/iCksDVxfItcw?= =?us-ascii?Q?QfyJaxa9AvdR199/3DBxt7+sAaTIG8ps8ssAe/2AfKRfbT3722A2Wsj7+tpZ?= =?us-ascii?Q?7mXc0OfTdWGNPzrQ4bO1JJ6zYeSkP0wabTC0zavwZMYVkOMjL0w7Fe5fhXov?= =?us-ascii?Q?dAw7d+kUVdzkBE2yHSXUM5Ru6sbEJR3xgV7yxTTBEhVoIefF6lm9PxLnfN11?= =?us-ascii?Q?/+h1Jtrdi3OjgXOj0eYE6FZF8kpT3Fs9Gy7+MqaD7VvQciTayo28KecfPSBR?= =?us-ascii?Q?UXzbYFSkZDwHLBs0SeiO3cgiu9b+I4JKttk5RabcIK9QxdutyijL+CLLxyTh?= =?us-ascii?Q?IM400+Cb1z5ev5VX2iedPE+yrF4AZNd162nJIJmueNpb2y0yJx5OWopTB+Bq?= =?us-ascii?Q?4ypWbN31dBV648R59OUgPWuP6OM+g6aufE511L2o8zPciEB3b4WrX6JfNRDw?= =?us-ascii?Q?JU5zTELnGMpwNnjVGVPUEYAZ0RC8bphR8BO0/rieWZBZaVO7OGKassw5fh9h?= =?us-ascii?Q?JU9BX5tYqyFa2bHtDdEU7J7fm18/1+klIjSHbntr43UfQxL5EF9HacO1ffKJ?= =?us-ascii?Q?dIHXLQsAnumrqKBHA2hJq+gG02Y/OJRLxiHig78dDpgC6z5U2vKN/gfD3OIp?= =?us-ascii?Q?lmPcePwh1bScIoWsJB04lUhIP+dA/ek4Lgtd2FUhdZShlP4YKPzIQazBI48v?= =?us-ascii?Q?QcmjivpQB1ltXPAlarB0Hm5LBYpyS6j8zHj+2cViZHORd1SX7tCVKP91pfNv?= =?us-ascii?Q?SrEZITr9fNTrmITINCDuS7WqeDC2+jGYjSERo1vzsM3CLYIPgqQ+Zx5QQIBo?= =?us-ascii?Q?J/WQlHAAbepPd+AQJCyLDPXqqRragzuWQJXde4BVtZR52/R0Mn6IA8nUjyJR?= =?us-ascii?Q?lCAkjtdc+BhIB3ncqdb1rXMpF4OtIl+fp4Qa8nNT7N3omjGvYHSw7smTaPaK?= =?us-ascii?Q?s0ux8fnuEaX4xgOYmwA+o4HmOgIUMgB4t3PukkUQNOxSGfUgmNGY834o1D0q?= =?us-ascii?Q?Lb4qzCwT2qVNbEq9IKmO4Ay5lX7Q2lgz1PJHWEMOYp01ZrLpZfEWfU62OaFj?= =?us-ascii?Q?bUOEKEjT1vFP0VagnD6+bKRapheJ1zfaEQ2rv641T494Sf4ju4QEUcee8f6i?= =?us-ascii?Q?OtM53cSKKMvmRn4JiyeMlC/5tyD990BRVBPWi268uTueEFZeqIgPuuaPS+0m?= =?us-ascii?Q?LfebnjFwkU4MJVdw7n3O36Q51RmPqcd84XqmeN75XDIFEE526vrQF1R/i0fm?= =?us-ascii?Q?hNibvhXgEOT4ZhLng2+/t9+qVBsuNrKirmJFd8z9JZu?= X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Message-Info: xnSaVrRq6KSCLjc+kZRqYUnV6BAQD99H9iYtftYPAK9aE3z0BH4EdIrVj/+E6FAJduyfisNjlqLr/21h7Huz4Y+WrRR0JSgYrlvrDtcVLj11gIXGHJiXfoEAhUBSwWErIP02mkrKn4VJvQaJ1eCyeJFjOPW9ju5VecFQMDx79SXX/hIM50CbNyLfaiQRLVCSFxL0tNM/6Gc8xGbqQYKGQYhSRPeyOKe9tXJCTS/qCewk1hcURXK64L9PiTtRzqtEv5wgLJZBbvogyqGsKg3nsn6fkePCxgyGhi262Skk7Z87KSc761YhE11lrbKCYRyr5XF1ErXE0SDyg2prx+XhvI86Aql3t+JphVoIASo94JEOAlDmwEcX7z4qMAGXHefUaC2mljN9yHopvAGiK7qCv9oaLjfojgf+M30TtzpJg7s= X-OriginatorOrg: juniper.net X-MS-Exchange-CrossTenant-OriginalArrivalTime: 04 Mar 2019 20:50:15.8668 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 32768963-7a30-4ccd-e7bd-08d6a0e30146 X-MS-Exchange-CrossTenant-Id: bea78b3c-4cdb-4130-854a-1d193232e5f4 X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=bea78b3c-4cdb-4130-854a-1d193232e5f4; Ip=[66.129.239.13]; Helo=[P-EXFEND-EQX-02.jnpr.net] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR05MB3079 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-03-04_11:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_spam_notspam policy=outbound_spam score=0 priorityscore=1501 malwarescore=0 suspectscore=18 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=455 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1903040147 X-Rspamd-Queue-Id: 71BA287375 X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.30 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-0.998,0]; R_DKIM_ALLOW(-0.20)[juniper.net:s=PPS1017]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:208.84.65.16]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; RCPT_COUNT_FIVE(0.00)[5]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[juniper.net:+]; DMARC_POLICY_ALLOW(-0.50)[juniper.net,quarantine]; MX_GOOD(-0.01)[mxb-00273201.gslb.pphosted.com,mxa-00273201.gslb.pphosted.com]; IP_SCORE(-0.06)[ip: (-0.15), ipnet: 208.84.65.0/24(-0.07), asn: 26211(0.01), country: US(-0.07)]; NEURAL_HAM_SHORT(-0.13)[-0.134,0]; RCVD_IN_DNSWL_LOW(-0.10)[16.65.84.208.list.dnswl.org : 127.0.3.1]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:26211, ipnet:208.84.65.0/24, country:US]; FREEMAIL_CC(0.00)[mail.ru]; RCVD_COUNT_SEVEN(0.00)[11] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2019 20:50:22 -0000 Shawn Webb wrote: > I'm curious about your use case for building without stack cookies. GPL ? From owner-freebsd-hackers@freebsd.org Mon Mar 4 20:58:26 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6447E1522B61 for ; Mon, 4 Mar 2019 20:58:26 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic317-34.consmr.mail.ne1.yahoo.com (sonic317-34.consmr.mail.ne1.yahoo.com [66.163.184.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 236EB879B3 for ; Mon, 4 Mar 2019 20:58:25 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: Av7ZzPMVM1l7UlZBbiUW8idImHi0nHz4ITc3Jyo21R27rl_HGqXYs41UrIuhVHR BH0YrPEQ9Fm0UeS6UuNHdJ9Enbwsx0MyVCLaZVgg10n5hzowe3_n3FiKUbWi30GFxlIWSKPOufWa mzYT6yxrjrYX3C9HQZYWM6M.87.FH1lGBa8CkzSIj0tCwJB4UUcwg9OWmwT6aInR7bK_.qMYiQHl vZL5v6PAzqhAqvTRQcTrUh4lHaVYj2.wbhjMSG9Cgz7xRHTEzK1ERYYOZaAkCrAlXQNj1wC7iVvh PUJDJ_2iurCK0M_HL_LcDOEf5P3pwH9HhBHq8wIp4Tx1VrhHmtgRM0F61A0NdPP.7nBw1hg3HKrj RggP9i6Tjb.NvwL1o_lmo8pqT3AoNkomzm66.H1HqzyHh7BeXG8RN7miZOY6_nhzZ4geXrU6l4tI 3_ypiTtgepCy1ogdW.BqjFwG7Ds0o1OZ5ESDA0cAkkPgjA.i_6PLz_pfcSmFvB42iA4mK_ORiry5 A82qjPSCJWJmlFH2BV2FOiFbhMSSgaB85Sq2XoqAYnQwba0aVZRL_vo0xAU_GKqrgDRC8HEcZaeY QSXAPNKSjXdv8UzKdFi37fVCwPn8GpYXkyZA7tpiCsTO1oN6LgSlC3MYxz73qCkqaaWSLHoBLFqv vsHoau68oZDuuN8_preWIrBlzxOEMqTfJ3AzaItnQubExOZHzVcvO_ZBxZIj.dBKzuSO_JAGR87V Rtma0p_xcGI9pUKysOMuWUUdqaLReL_YOBCgaMSpFVupNAc3SKRdbkU2jb2StJ_vnyGPhUwNajjW aqKitxLmSrKAGNyRH0Hz.eKfhw1YiwlVgq37FZE3VPgKSB6KBRfQbsj7MMgZWvthI.OOX7DvoUC8 yQfCKPczTar5VcW0S1BeZw_Vn6nFGx.Fj9w8Nb1IgGckb1GBH64.2tcLxo3bFQJsOdJdZIAkhwgd s8g7C4dbBozuTbGFs8w_7a.W7IKHEMrcobqYm1O5pdyTI7TRDJAksNSnEDtD3Dhl3Qme6dVg3tdZ igPjHHwaTKd853_QYquCMKOfTPILL8VEgxMpgRJA3Vv90BpZUvARBYhpK8ETy6ja0Els- Received: from sonic.gate.mail.ne1.yahoo.com by sonic317.consmr.mail.ne1.yahoo.com with HTTP; Mon, 4 Mar 2019 20:58:23 +0000 Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.113]) ([67.170.167.181]) by smtp417.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID d6e36c6b74fad01663fd179bddcbc796; Mon, 04 Mar 2019 20:58:16 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] From: Mark Millard In-Reply-To: <20190305031010.I4610@besplex.bde.org> Date: Mon, 4 Mar 2019 12:58:14 -0800 Cc: Konstantin Belousov , freebsd-hackers Hackers , FreeBSD PowerPC ML Content-Transfer-Encoding: 7bit Message-Id: References: <20190301194217.GB68879@kib.kiev.ua> <20190302071425.G5025@besplex.bde.org> <20190302105140.GC68879@kib.kiev.ua> <20190302225513.W3408@besplex.bde.org> <20190302142521.GE68879@kib.kiev.ua> <20190303041441.V4781@besplex.bde.org> <20190303111931.GI68879@kib.kiev.ua> <20190303223100.B3572@besplex.bde.org> <20190303161635.GJ68879@kib.kiev.ua> <20190304043416.V5640@besplex.bde.org> <20190304114150.GM68879@kib.kiev.ua> <20190305031010.I4610@besplex.bde.org> To: Bruce Evans X-Mailer: Apple Mail (2.3445.102.3) X-Rspamd-Queue-Id: 236EB879B3 X-Spamd-Bar: ++ X-Spamd-Result: default: False [2.06 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net]; DKIM_TRACE(0.00)[yahoo.com:+]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FREEMAIL_TO(0.00)[optusnet.com.au]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36646, ipnet:66.163.184.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; NEURAL_SPAM_SHORT(0.62)[0.622,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(1.22)[ip: (3.86), ipnet: 66.163.184.0/21(1.29), asn: 36646(1.03), country: US(-0.07)]; NEURAL_SPAM_MEDIUM(0.13)[0.130,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(0.60)[0.598,0]; RCVD_IN_DNSWL_NONE(0.00)[45.184.163.66.list.dnswl.org : 127.0.5.0]; RWL_MAILSPIKE_POSSIBLE(0.00)[45.184.163.66.rep.mailspike.net : 127.0.0.17]; FREEMAIL_CC(0.00)[gmail.com] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2019 20:58:26 -0000 On 2019-Mar-4, at 10:17, Bruce Evans wrote: >> . . . > > I think it is just a kludge that reduced the problem before it was fixed > properly using fences. > > Cross-socket latency is over 100 cycles according to jhb's tscskew > benchmark: on Haswell 4x2: > > CPU | TSC skew (min/avg/max/stddev) > ----+------------------------------ > 0 | 0 0 0 0.000 > 1 | 24 49 84 14.353 > 2 | 164 243 308 47.811 > 3 | 164 238 312 47.242 > 4 | 168 242 332 49.593 > 5 | 168 243 324 48.722 > 6 | 172 242 320 52.596 > 7 | 172 240 316 53.014 > > freefall is similar. Latency is apparently measured relative to CPU 0. > It is much lower to CPU 1 since that is on the same core. > You may want to look at: https://lists.freebsd.org/pipermail/freebsd-hackers/2019-March/054218.html for cruder, but somewhat related, information for the old Powermac G5 2-socket with 2 cores each, given how FreeBSD tries to synchronize the tbr's across cores as it starts up the CPUs. It may give some idea of a ball-park scale involved for such context, especially the reports of what happened for varying one figure in the source code. As stands, I've only done the experiments with a debug kernel build. I built using devel/powerpc64-xtoolchain-gcc related infrastructure, not gcc 4.2.1 . (This is typical for me.) === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) From owner-freebsd-hackers@freebsd.org Tue Mar 5 11:11:13 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D14C615267E3 for ; Tue, 5 Mar 2019 11:11:13 +0000 (UTC) (envelope-from shreyankfbsd@gmail.com) Received: from mail-yw1-xc2b.google.com (mail-yw1-xc2b.google.com [IPv6:2607:f8b0:4864:20::c2b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id BE48D8D2EB for ; Tue, 5 Mar 2019 11:11:12 +0000 (UTC) (envelope-from shreyankfbsd@gmail.com) Received: by mail-yw1-xc2b.google.com with SMTP id z191so6632624ywa.6 for ; Tue, 05 Mar 2019 03:11:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to:cc; bh=ZJ2lKYolBUTkXCV9yzqz7A3os4X3qabD3/CzWVeXWX8=; b=hn+S3G75eJOWHQZdIv2XdsfL1SRHYkh4srRwzr3TnJWVK3DzU+6EKPapXwixmmuwVh Rv6vMGKKDX77QiqUzBDlIuUwzIXafU6wbtOaHrf9i/7GZaz7+d1+beGCgvvlVbQIr1As mrCP8hC///WdhQa7kLA0pNBijhLDmICOc/VMLs0G/2tVpAjV3nGcMEUSrVM4AGlirYqr 4Gi1Mh52ocejV8j4lBIjlFc9iXurdgWRHDrUyRWZ5VaVwZ+ezDaLXXYXRg1Xgq85XFRv NQa3daD/PQdhN/aK2PVTjkVP1TSW3N34La6efgE+sqhPosrAuV9FTi1xxpQB/yV01rHO kwaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc; bh=ZJ2lKYolBUTkXCV9yzqz7A3os4X3qabD3/CzWVeXWX8=; b=m6bNk5sNOEMz9RHr12nPY560/ULowdLn/cskWOj683razVuQVNpBPJhFinAH1gwMCl 4pzA4n8I56/dYGY9MOPgpgE7xzxkmvqw25ql89EmJ5YcB8bG1AEl4sVd8Xtc0pzXSqNo 4fFbad1SGFPXccdQ7cLkB9YtVyoQ18CTIaZjBH9KMBth/CHDT0NgWW5jwQ/Qso3au6oP FHBsoIbenG2eQiznNJYTv7dD4oQcs9O1M1DL+Oq77wPXcbQoj3BfO6RMdVeyGElHa9H2 LA6N+fcarUkyfVp66dVLp0b/OkVRCZZB1zfbqFacvMfME7JLZihN+hZRkm0wTCY0eHXE Gyqg== X-Gm-Message-State: APjAAAXXaPyTrbuSuMAeT3UAAvtC79MmBidyFf07ns/lqjlshcjJjSF3 LiK8gemjqFaKy4ALbA8QacmebeHAJ6WJ15j8PZJmI0E= X-Google-Smtp-Source: APXvYqzUpAsOYXeJkhqgflTjRdOPThVjIFZZmBYt1Uws4AL0oKoV+qjIbxkH0kBgZBHFrdx1a9ppKlLX9aYxfK4Jg+w= X-Received: by 2002:a0d:e082:: with SMTP id j124mr383939ywe.33.1551784272129; Tue, 05 Mar 2019 03:11:12 -0800 (PST) MIME-Version: 1.0 From: shreyank amartya Date: Tue, 5 Mar 2019 16:41:00 +0530 Message-ID: Subject: iflib MSI init To: mmacy@mattmacy.io Cc: freebsd-hackers@freebsd.org X-Rspamd-Queue-Id: BE48D8D2EB X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=hn+S3G75; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of shreyankfbsd@gmail.com designates 2607:f8b0:4864:20::c2b as permitted sender) smtp.mailfrom=shreyankfbsd@gmail.com X-Spamd-Result: default: False [-6.40 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; FREEMAIL_FROM(0.00)[gmail.com]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org]; TO_DN_NONE(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; MX_GOOD(-0.01)[cached: alt3.gmail-smtp-in.l.google.com]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[b.2.c.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.b.8.f.7.0.6.2.list.dnswl.org : 127.0.5.0]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; RCVD_TLS_LAST(0.00)[]; NEURAL_HAM_SHORT(-0.51)[-0.515,0]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; RCVD_COUNT_TWO(0.00)[2]; IP_SCORE(-2.88)[ip: (-9.66), ipnet: 2607:f8b0::/32(-2.66), asn: 15169(-2.00), country: US(-0.07)]; DWL_DNSWL_NONE(0.00)[gmail.com.dwl.dnswl.org : 127.0.5.0] Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2019 11:11:14 -0000 Hi, I'm trying to initialize a network interface using iflib. While configuring MSI interrupts for the device, the number of vectors returned by the pci_msi_count is 32 (max supported) in my case due to which the condition (vectors == 1) fails and as a result legacy mode is selected. Is this intentional? In which case, how can I make sure number of MSI vectors is 1? /sys/net/iflib.c 6126 msi: 6127 vectors = pci_msi_count(dev); 6128 scctx->isc_nrxqsets = 1; 6129 scctx->isc_ntxqsets = 1; 6130 scctx->isc_vectors = vectors; 6131 if (vectors == 1 && pci_alloc_msi(dev, &vectors) == 0) { 6132 device_printf(dev,"Using an MSI interrupt\n"); 6133 scctx->isc_intr = IFLIB_INTR_MSI; 6134 } else { 6135 scctx->isc_vectors = 1; 6136 device_printf(dev,"Using a Legacy interrupt\n"); 6137 scctx->isc_intr = IFLIB_INTR_LEGACY; 6138 } 6139 6140 return (vectors); Thanks Shreyank Amartya From owner-freebsd-hackers@freebsd.org Tue Mar 5 13:19:45 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1BAEF1529D22; Tue, 5 Mar 2019 13:19:45 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id 3E5966A676; Tue, 5 Mar 2019 13:19:42 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 9CDA243BF06; Wed, 6 Mar 2019 00:19:39 +1100 (AEDT) Date: Wed, 6 Mar 2019 00:19:38 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans cc: Konstantin Belousov , Mark Millard , freebsd-hackers Hackers , FreeBSD PowerPC ML Subject: TSC "skew" (was: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed]) In-Reply-To: <20190305031010.I4610@besplex.bde.org> Message-ID: <20190305223415.U1563@besplex.bde.org> References: <20190301194217.GB68879@kib.kiev.ua> <20190302071425.G5025@besplex.bde.org> <20190302105140.GC68879@kib.kiev.ua> <20190302225513.W3408@besplex.bde.org> <20190302142521.GE68879@kib.kiev.ua> <20190303041441.V4781@besplex.bde.org> <20190303111931.GI68879@kib.kiev.ua> <20190303223100.B3572@besplex.bde.org> <20190303161635.GJ68879@kib.kiev.ua> <20190304043416.V5640@besplex.bde.org> <20190304114150.GM68879@kib.kiev.ua> <20190305031010.I4610@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=UJetJGXy c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=aZ2SpzNVlL9aNEeq27IA:9 a=CjuIK1q_8ugA:10 X-Rspamd-Queue-Id: 3E5966A676 X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of brde@optusnet.com.au designates 211.29.132.246 as permitted sender) smtp.mailfrom=brde@optusnet.com.au X-Spamd-Result: default: False [-6.13 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; RCVD_IN_DNSWL_LOW(-0.10)[246.132.29.211.list.dnswl.org : 127.0.5.1]; FROM_HAS_DN(0.00)[]; FREEMAIL_FROM(0.00)[optusnet.com.au]; R_SPF_ALLOW(-0.20)[+ip4:211.29.132.0/23]; MIME_GOOD(-0.10)[text/plain]; MIME_TRACE(0.00)[0:+]; DMARC_NA(0.00)[optusnet.com.au]; RCPT_COUNT_FIVE(0.00)[5]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: extmail.optusnet.com.au]; NEURAL_HAM_SHORT(-0.74)[-0.739,0]; IP_SCORE(-3.08)[ip: (-8.06), ipnet: 211.28.0.0/14(-4.06), asn: 4804(-3.24), country: AU(-0.04)]; FREEMAIL_TO(0.00)[optusnet.com.au]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[optusnet.com.au]; ASN(0.00)[asn:4804, ipnet:211.28.0.0/14, country:AU]; FREEMAIL_CC(0.00)[gmail.com]; RCVD_COUNT_TWO(0.00)[2] X-Mailman-Approved-At: Tue, 05 Mar 2019 13:36:43 +0000 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2019 13:19:45 -0000 On Tue, 5 Mar 2019, Bruce Evans wrote: > On Mon, 4 Mar 2019, Konstantin Belousov wrote: >* [... shift for bogus TSC-low timecounter] >> I suspect that the shift of 1 (at least) hides cross-socket inaccuracy. >> Otherwise, I think, some multi-socket machines would start showing the >> detectable backward-counting bintime(). At the frequencies at 4GHz and >> above (Intel has 5Ghz part numbers) I do not think that stability of >> 100MHz crystall and on-board traces is enough to avoid that. > > I think it is just a kludge that reduced the problem before it was fixed > properly using fences. > > Cross-socket latency is over 100 cycles according to jhb's tscskew > benchmark: on Haswell 4x2: > > CPU | TSC skew (min/avg/max/stddev) > ----+------------------------------ > 0 | 0 0 0 0.000 > 1 | 24 49 84 14.353 > 2 | 164 243 308 47.811 > 3 | 164 238 312 47.242 > 4 | 168 242 332 49.593 > 5 | 168 243 324 48.722 > 6 | 172 242 320 52.596 > 7 | 172 240 316 53.014 > > freefall is similar. Latency is apparently measured relative to CPU 0. > It is much lower to CPU 1 since that is on the same core. > > I played with this program a lot 3 and a half years ago, but forgot > mist of what I learned :-(. I tried different fencing in it. This > seems to make little difference when the program is rerun. With the > default TESTS = 1024, the min skew sometimes goes negative on freefall, > but with TESTS = 1024000 that doesn't happen. This is the opposite > of what I would expect. freefall has load average about 1. I understand this program again. First, its name is actually tscdrift. I tested the 2015 version, and this version is still in /usr/src/tools/tools/tscdrift/tscdrift.c, with no changes to except to the copyright (rgrimes wouldn't like this) and to $FreeBSD$. The program doesn't actually measure either TSC drift or TSC skew, except indirectly. What it actually measures is the IPC (Inter-Process- Communication) time for synchronizing the drift and skew measurments, except bugs or intentional sloppiness in its synchronization also make it give an indirect measurement of similar bugs or sloppiness in normal use. After changing TESTS from 1024 to 1024000, it shows large errors in the negative direction, as expected from either large negative skew or program bugs: this is on freefall: XX CPU | TSC skew (min/avg/max/stddev) XX ----+------------------------------ XX 0 | 0 0 0 0.000 XX 1 | -6148 108 10232 46.871 XX 2 | 114 209 95676 163.359 XX 3 | 96 202 47835 101.250 XX 4 | -2223 207 34017 117.257 XX 5 | -2349 206 33837 106.259 XX 6 | -2664 213 33579 96.048 XX 7 | -2451 212 49242 126.428 The negative "skews" occur because the server and the clients (1 client at a time) read the TSC with uncontrolled timing after the server opens the gate for this read (gate = 2). The IPC time is about 200 cycles to CPUs on different cores. So when neither thread is preempted, the TSC on the server is about 200 cycles in advance. Sometimes the server is preempted, so it reads its TSC later than the client (a maximum of about 6148 cycles later in this test). More often the client is preempted, since the IPC time is march larger than the time between the server opening the gate and the server reading its TSC. The server is also missing fencing for its TSC read, so this read may appear to occur several cycles before opening the gate. This gives a an error in the positive direction for the reported "skew" (the error is actually in the positive direction for the reported IPC time). It would be useful to measure this error by intentionally omitting fencing, but currently it is just a small amount of noise on top of the noise from preemption. After fixing the syncronization: XX CPU | TSC skew (min/avg/max/stddev) XX ----+------------------------------ XX 0 | 0 0 0 0.000 XX 1 | 33 62 49161 57.652 XX 2 | 108 169 33678 73.456 XX 3 | 108 171 43053 119.256 XX 4 | 141 169 41289 114.567 XX 5 | 141 169 40035 112.755 XX 6 | 132 186 147099 269.449 XX 7 | 153 183 431526 436.689 Synchronization apparenly takes a long time, especially to other cores. The minimum and avergae now gives the best-case IPC time very accurately. The average is 20-30 cycles smaller than before, probably because I fixed the fencing. The maximum and standard deviation are garbage noise from preemption. Preemption should be disabled somehow. Large drifts and skews would show up as nonsense values for the minimum IPC time. Small drifts would soon give large skews. To measure small skews, change the CPU of the server to measure the minimum IPC time in the opposite direction. Fixes: XX --- tscdrift.c 2015-07-10 06:22:36.505493000 +0000 XX +++ w.c 2019-03-05 11:22:22.232341000 +0000 XX @@ -32,6 +32,15 @@ XX #include XX #include XX #include XX +/* XX + * XXX: atomic.h is not used. Instead we depend on x86 memory ordering and XX + * do direct assignments to and comparisons of 'gate', and sometimes add XX + * memory barriers. The correct atomic ops would do much the same with XX + * clearer spelling. The 'lock' prefix is never needed and the barriers are XX + * only to get program order so as to give acq or rel semantics for ether XX + * the loads, the stores or for buggy unfenced rdtsc's. Fences also give XX + * program order, so some of the explicit barriers are redundant. XX + */ XX #include XX #include XX #include XX @@ -45,7 +54,7 @@ XX XX #define barrier() __asm __volatile("" ::: "memory") XX XX -#define TESTS 1024 XX +#define TESTS 1024000 XX XX static volatile int gate; XX static volatile uint64_t thread_tsc; XX @@ -74,12 +83,12 @@ XX gate = 1; XX while (gate == 1) XX cpu_spinwait(); XX - barrier(); XX XX + barrier(); XX __asm __volatile("lfence"); XX thread_tsc = rdtsc(); XX - XX barrier(); XX + XX gate = 3; XX while (gate == 3) XX cpu_spinwait(); This is the client. The explicit barriers are confusing, and the blank lines are in all the wrong places. All the accesses to 'gate' need to be in program order. x86 memory ordering gives this automatically at the hardware level. 'gate' being volatile gives it at the compiler level. Both rdtsc() and storing the result to thread_tsc need to be in program order. lfence() in cpufunc.h has a memory clobber which gives the former, but we use a direct asm and need a barrier() before it to do the same thing. Then we need another barrier() after the assignment to thread_tsc so that the store for this is before the store to 'gate' (I think gate being volatile doesn't give this). This also keeps the rdtsc() in program order (the asm for rdtsc() doesn't have a memory clobber. I haven't noticed care about this being taken anywhere else). Summary: only style changes in this section. XX @@ -139,12 +148,13 @@ XX for (j = 0; j < TESTS; j++) { XX while (gate != 1) XX cpu_spinwait(); XX - gate = 2; XX - barrier(); Move down opening the gate so that it not opened until after reading the TSC on the server. XX XX + barrier(); XX + __asm __volatile("lfence"); Fencing is not critical here. Using an early TSC value just gives a larger reported IPC time. The barrier is important for getting program order of rdtsc(). XX tsc = rdtsc(); XX - XX barrier(); This barrier is still associated with the TSC read, and the blank like is moved to reflect this. Here rdtsc() must occur in program order, but storing to tsc can be after storing to 'gate'. The barrier gives ordering for the store to tsc too. XX + XX + gate = 2; XX while (gate != 3) XX cpu_spinwait(); XX gate = 4; I tried some locked atomic ops on 'gate') and mfence instead of lfence to try to speed up the IPC. Nothing helped. We noticed long ago that fence instructions tend to be even slower that locked atomic ops for mutexes, and jhb guessed that this might be because fence instructions don't do so much to force out stores. Similar IPC is needed for updating timecounters. This benchmark indicates that after an update, the change usually won't be visible on other CPUs for 100+ cycles. Since updates are rare, this isn't much of a problem. Similar IPC is needed for comparing timecounters across CPUs. Any activity on different CPUs is incomparable without synchronization to establish an ordering. Since fences give ordering relative to memory and timecounters don't use anything except fences and memory order for the generation count to establish their order, the synchronization for comparing timecounters (or clock_gettime() at higher levels) must also use memory order. If the synchronization takes over 100 cycles, then smaller TSC skews don't matter much (they never break monotonicity, and only show up time differences varying by 100 or so cycles depending on which CPU measures the start and end events). Small differences don't matter at all. Skews may be caused by the TSCs actually being out of sync, or hardware only syncing them on average (hopefully with small jitter) or bugs like missing fences. Missing fences don't matter much provided unserialized TSC reads aren't too far in the past. E.g., if we had a guarantee of only 10 cycles in the past for the TSC and 160 cycles for IPCs to other CPUs, then we could omit the fences. But IPCs to the same core are 100 cycles faster so the margin is too close for ommitting fences in all cases. Similarly for imperfect hardware. Hopefully its skew is in the +-1 cycle range, but even +-10 isn't a problem if the IPC time is a bit larger than 10 and even +-100 if the IPC time is a bit larger than 100. And the problem scales nicely with the distance of the CPUs -- when they are further apart so that hardware synchronization of their TSCs is more difficult, the IPC time is large too. Hmm, that is only with physical IPCs. Since timecounters use physical IPCs for everything, they can't work right with virtual synchronization. Something like ntpd is needed to compare times across even small local networks. It does virtual synchronization by compensating for delays. Bruce From owner-freebsd-hackers@freebsd.org Wed Mar 6 17:20:15 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EF9CF1520067; Wed, 6 Mar 2019 17:20:14 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4883494D6E; Wed, 6 Mar 2019 17:20:14 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x26HK4Km092433 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 6 Mar 2019 19:20:07 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x26HK4Km092433 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id x26HK3r1092419; Wed, 6 Mar 2019 19:20:03 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 6 Mar 2019 19:20:03 +0200 From: Konstantin Belousov To: Bruce Evans Cc: Mark Millard , freebsd-hackers Hackers , FreeBSD PowerPC ML Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] Message-ID: <20190306172003.GD2492@kib.kiev.ua> References: <20190302105140.GC68879@kib.kiev.ua> <20190302225513.W3408@besplex.bde.org> <20190302142521.GE68879@kib.kiev.ua> <20190303041441.V4781@besplex.bde.org> <20190303111931.GI68879@kib.kiev.ua> <20190303223100.B3572@besplex.bde.org> <20190303161635.GJ68879@kib.kiev.ua> <20190304043416.V5640@besplex.bde.org> <20190304114150.GM68879@kib.kiev.ua> <20190305031010.I4610@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190305031010.I4610@besplex.bde.org> User-Agent: Mutt/1.11.3 (2019-02-01) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2019 17:20:15 -0000 On Tue, Mar 05, 2019 at 05:17:14AM +1100, Bruce Evans wrote: > On Mon, 4 Mar 2019, Konstantin Belousov wrote: > > > On Mon, Mar 04, 2019 at 05:29:48AM +1100, Bruce Evans wrote: > >> On Sun, 3 Mar 2019, Konstantin Belousov wrote: > >> > >>> On Mon, Mar 04, 2019 at 12:32:12AM +1100, Bruce Evans wrote: > >>>> On Sun, 3 Mar 2019, Konstantin Belousov wrote: > >>>> > >>>>> On Sun, Mar 03, 2019 at 04:43:20AM +1100, Bruce Evans wrote: > >>>>>> On Sat, 2 Mar 2019, Konstantin Belousov wrote: > >>>>>> > >>>>>>> On Sun, Mar 03, 2019 at 12:03:18AM +1100, Bruce Evans wrote: > >>>>>>>> On Sat, 2 Mar 2019, Konstantin Belousov wrote: > >>> * ... > >>>> However, shrd in rdtsc-low (tsc_get_timecount_low()) does a slow combining > >>>> step. i386 used to be faster here -- the first masking step of discarding > >>>> %edx doesn't take any code. amd64 has to mask out the top bits in %rax. > >>>> Now for the tsc-low pessimization, i386 has to do a slow shrd, and amd64 > >>>> has to do a not so slow shr. > >>> i386 cannot discard %edx after RDTSC since some bits from %edx come into > >>> the timecounter value. > >> > >> These bits are part of the tsc-low pessimization. The shift count should > >> always be 1, giving a TSC frequency of > INT32_MAX (usually) and > UINT32_MAX > >> sometimes. > >> > >> When tsc-low was new, the shift count was often larger (as much as 8), > >> and it is still changeable by a read-only tunable, but now it is 1 in > >> almost all cases. The code only limits the timecounter frequency > >> to UINT_MAX, except the tunable defaults to 1 so average CPUs running > >> at nearly 4 GHz are usually limited to about 2 GHz. The comment about > >> this UINT_MAX doesn't match the code. The comment says int, but the > >> code says UINT. > >> > >> All that a shoft count of 1 does is waste time to lose 1 bit of accuracy. > >> This much accuracy is noise for most purposes. > >> > >> The tunable is fairly undocumented. Its description is "Shift to apply > >> for the maximum TSC frequency". Of course, it has no effect on the TSC > >> frequency. It only affects the TSC timecounter frequency. > > I suspect that the shift of 1 (at least) hides cross-socket inaccuracy. > > Otherwise, I think, some multi-socket machines would start showing the > > detectable backward-counting bintime(). At the frequencies at 4GHz and > > above (Intel has 5Ghz part numbers) I do not think that stability of > > 100MHz crystall and on-board traces is enough to avoid that. > > I think it is just a kludge that reduced the problem before it was fixed > properly using fences. > > Cross-socket latency is over 100 cycles according to jhb's tscskew > benchmark: on Haswell 4x2: > > CPU | TSC skew (min/avg/max/stddev) > ----+------------------------------ > 0 | 0 0 0 0.000 > 1 | 24 49 84 14.353 > 2 | 164 243 308 47.811 > 3 | 164 238 312 47.242 > 4 | 168 242 332 49.593 > 5 | 168 243 324 48.722 > 6 | 172 242 320 52.596 > 7 | 172 240 316 53.014 > > freefall is similar. Latency is apparently measured relative to CPU 0. > It is much lower to CPU 1 since that is on the same core. > > I played with this program a lot 3 and a half years ago, but forgot > mist of what I learned :-(. I tried different fencing in it. This > seems to make little difference when the program is rerun. With the > default TESTS = 1024, the min skew sometimes goes negative on freefall, > but with TESTS = 1024000 that doesn't happen. This is the opposite > of what I would expect. freefall has load average about 1. > > Removing the only fencing in it reduces average latency by 10-20 cycles > and minimum latency by over 100 cycles, except on freefall it is > reduced from 33 to 6. On Haswell it is 24 with fencing and I didn't > test it with no fencing. > > I think tscskew doesn't really measure tsc skew. What it measures is > the time taken for a locking protocol, using the TSCs on different > CPUs to make the start and end timestamps. If the TSCs have a lot of > skew or jitter, then this will show up indirectly as inconsistent and > possibly negative differences. > > A shift of just 1 can't hide latencies of hundreds of cycles on single- > socket machines. Even a shift of 8 only works sometimes, by reducing > the chance of observing the TSC going backwards by a factor of 256. > E.g., assume for simplicity that all instructions and IPCs take 0-1 > cycles, and that unfenced rdtsc's differ by at most +-5 cycles (with > the 11 values between -5 and 5 uniformly distributed. Then with a > shift of 0 and no fences, a CPU that updates the timehands is ahead of > another CPU that spins reading the timehands about 5/11 of the time. > With a shift of 8, the CPUs are close enough when the first one reads > at least 5 above and at least 5 below a 256-boundary. The chance of > seeing a negative difference is reduced by at least a factor of 10/256. > > > I suspect that the shift of 1 (at least) hides cross-socket inaccuracy. > > Otherwise, I think, some multi-socket machines would start showing the > > detectable backward-counting bintime(). At the frequencies at 4GHz and > > above (Intel has 5Ghz part numbers) I do not think that stability of > > 100MHz crystall and on-board traces is enough to avoid that. > > Why would losing just 1 bit fix that? > > Fences for rdtsc of course only serialize it for the CPU that runs it. > The locking (ordering) protocol (for the generation count) orders the > CPUs too. It takes longer than we would like, much more than the 1- > cycle error that might be hidden by ignoring the low bit. Surely the > ordering protocol must work across sockets? It then gives ordering of > rdtsc's. > > TSC-low was added in 2011. That was long before the ordering was fixed. > You added fences in 2012 and memory ordering for the generation count in > 2016. Fences slowed everything down by 10-20+ cycles and probably hide > bugs in the memory ordering better than TSC-low. Memory ordering plus > fences slow down the cross-core case by more than 100 cycles according > to tscskew. That is enough to hide large hardware bugs. > > > We can try to set the tsc-low shift count to 0 (but keep lfence) and see > > what is going on in HEAD, but I am afraid that the HEAD users population > > is not representative enough to catch the issue with the certainity. > > More, it is unclear to me how to diagnose the cause, e.g. I would expect > > the sleeps to hang on timeouts, as was reported from the very beginning > > of this thread. How would we root-cause it ? > > Negative time differences cause lots of overflows so break the timecounter. > The fix under discussion actually gives larger overflows in the positive > direction. E.g., a delta of -1 first overflows to 0xffffffff. The fix > prevents overflow on multiplication by that. When the timecounter > frequency is small, say 1 MHz, 0xffffffff means 4294 seconds, so the > timecounter advances by that. > > >>> amd64 cannot either, but amd64 does not need to mask out top bits in %rax, > >>> since the whole shrdl calculation occurs in 32bit registers, and the result > >>> is in %rax where top word is cleared by shrdl instruction automatically. > >>> But the clearing is not required since result is unsigned int anyway. > >>> > >>> Dissassemble of tsc_get_timecount_low() is very clear: > >>> 0xffffffff806767e4 <+4>: mov 0x30(%rdi),%ecx > >>> 0xffffffff806767e7 <+7>: rdtsc > >>> 0xffffffff806767e9 <+9>: shrd %cl,%edx,%eax > >>> ... > >>> 0xffffffff806767ed <+13>: retq > >>> (I removed frame manipulations). > > I checked that all compilers still produce horrible code for the better > source code 'return (rdtsc() << (intptr_t)tc->tc_priv);'. 64-bit shifts > are apparently pessimal for compatibility. The above is written mostly > in asm to avoid 2-5 extra instructions. > > >>>> ... > >>>> Similarly in bintime(). > >>> I merged two functions, finally. Having to copy the same code is too > >>> annoying for this change. > > I strongly disklike the merge. > > >>> So I verified that: > >>> - there is no 64bit multiplication in the generated code, for i386 both > >>> for clang 7.0 and gcc 8.3; > >>> - that everything is inlined, the only call from bintime/binuptime is > >>> the indirect call to get the timecounter value. > >> > >> I will have to fix it for compilers that I use. > > Ok, I will add __inline. > > That will make it fast enough, but still hard to read. > > >>> + *bt = *bts; > >>> + scale = th->th_scale; > >>> + delta = tc_delta(th); > >>> +#ifdef _LP64 > >>> + if (__predict_false(th->th_large_delta <= delta)) { > >>> + /* Avoid overflow for scale * delta. */ > >>> + bintime_helper(bt, scale, delta); > >>> + bintime_addx(bt, (scale & 0xffffffff) * delta); > >>> + } else { > >>> + bintime_addx(bt, scale * delta); > >>> + } > >>> +#else > >>> + /* > >>> + * Use bintime_helper() unconditionally, since the fast > >>> + * path in the above method is not so fast here, since > >>> + * the 64 x 32 -> 64 bit multiplication is usually not > >>> + * available in hardware and emulating it using 2 > >>> + * 32 x 32 -> 64 bit multiplications uses code much > >>> + * like that in bintime_helper(). > >>> + */ > >>> + bintime_helper(bt, scale, delta); > >>> + bintime_addx(bt, (uint64_t)(uint32_t)scale * delta); > >>> +#endif > >> > >> Check that this method is really better. Without this, the complicated > >> part is about half as large and duplicating it is smaller than this > >> version. > > Better in what sence ? I am fine with the C code, and asm code looks > > good. > > Better in terms of actually running significantly faster. I fear the > 32-bit method is actually slightly slower for the fast path. > > >>> - do { > >>> - th = timehands; > >>> - gen = atomic_load_acq_int(&th->th_generation); > >>> - *bt = th->th_bintime; > >>> - bintime_addx(bt, th->th_scale * tc_delta(th)); > >>> - atomic_thread_fence_acq(); > >>> - } while (gen == 0 || gen != th->th_generation); > >> > >> Duplicating this loop is much better than obfuscating it using inline > >> functions. This loop was almost duplicated (except for the delta > >> calculation) in no less than 17 functions in kern_tc.c (9 tc ones and > >> 8 fflock ones). Now it is only duplicated 16 times. > > How did you counted the 16 ? I can see only 4 instances in the unpatched > > kern_tc.c, and 3 in patched, but it is 3 and not 1 only because I do not > > touch ffclock until the patch is finalized. After that, it would be > > 1 instance for kernel and 1 for userspace. > > Grep for the end condition in this loop. There are actually 20 of these. > I'm counting the loops and not the previously-simple scaling operation in > it. The scaling is indeed only done for 4 cases. I prefer the 20 > duplications (except I only want about 6 of the functions). Duplication > works even better for only 4 cases. Ok, I merged these as well. Now there are only four loops left in kernel. I do not think that merging them is beneficial, since they have sufficiently different bodies. I disagree with you characterization of it as obfuscation, IMO it improves the maintainability of the code by reducing number of places which need careful inspection of the lock-less algorithm. > > This should be written as a function call to 1 new function to replace > the line with the overflowing multiplication. The line is always the > same, so the new function call can look like bintime_xxx(bt, th). Again, please provide at least of a pseudocode of your preference. The current patch becomes to large already, I want to test/commit what I already have, and I will need to split it for the commit. diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c index 2656fb4d22f..7114a0e5219 100644 --- a/sys/kern/kern_tc.c +++ b/sys/kern/kern_tc.c @@ -72,6 +72,7 @@ struct timehands { struct timecounter *th_counter; int64_t th_adjustment; uint64_t th_scale; + uint64_t th_large_delta; u_int th_offset_count; struct bintime th_offset; struct bintime th_bintime; @@ -200,22 +201,77 @@ tc_delta(struct timehands *th) * the comment in for a description of these 12 functions. */ -#ifdef FFCLOCK -void -fbclock_binuptime(struct bintime *bt) +static __inline void +bintime_helper(struct bintime *bt, uint64_t scale, u_int delta) +{ + uint64_t x; + + x = (scale >> 32) * delta; + bt->sec += x >> 32; + bintime_addx(bt, x << 32); +} + +static __inline void +binnouptime(struct bintime *bt, u_int off) { struct timehands *th; - unsigned int gen; + struct bintime *bts; + uint64_t scale; + u_int delta, gen; do { th = timehands; gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_offset; - bintime_addx(bt, th->th_scale * tc_delta(th)); + bts = (struct bintime *)(vm_offset_t)th + off; + *bt = *bts; + scale = th->th_scale; + delta = tc_delta(th); +#ifdef _LP64 + if (__predict_false(th->th_large_delta <= delta)) { + /* Avoid overflow for scale * delta. */ + bintime_helper(bt, scale, delta); + bintime_addx(bt, (scale & 0xffffffff) * delta); + } else { + bintime_addx(bt, scale * delta); + } +#else + /* + * Use bintime_helper() unconditionally, since the fast + * path in the above method is not so fast here, since + * the 64 x 32 -> 64 bit multiplication is usually not + * available in hardware and emulating it using 2 + * 32 x 32 -> 64 bit multiplications uses code much + * like that in bintime_helper(). + */ + bintime_helper(bt, scale, delta); + bintime_addx(bt, (uint64_t)(uint32_t)scale * delta); +#endif atomic_thread_fence_acq(); } while (gen == 0 || gen != th->th_generation); } +static __inline void +getbinnouptime(void *out, size_t out_size, u_int off) +{ + struct timehands *th; + u_int gen; + + do { + th = timehands; + gen = atomic_load_acq_int(&th->th_generation); + memcpy(out, (char *)th + off, out_size); + atomic_thread_fence_acq(); + } while (gen == 0 || gen != th->th_generation); +} + +#ifdef FFCLOCK +void +fbclock_binuptime(struct bintime *bt) +{ + + binnouptime(bt, __offsetof(struct timehands, th_offset)); +} + void fbclock_nanouptime(struct timespec *tsp) { @@ -237,16 +293,8 @@ fbclock_microuptime(struct timeval *tvp) void fbclock_bintime(struct bintime *bt) { - struct timehands *th; - unsigned int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_bintime; - bintime_addx(bt, th->th_scale * tc_delta(th)); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + binnouptime(bt, __offsetof(struct timehands, th_bintime)); } void @@ -270,100 +318,61 @@ fbclock_microtime(struct timeval *tvp) void fbclock_getbinuptime(struct bintime *bt) { - struct timehands *th; - unsigned int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_offset; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getbinnouptime(bt, sizeof(*bt), __offsetof(struct timehands, + th_offset)); } void fbclock_getnanouptime(struct timespec *tsp) { - struct timehands *th; - unsigned int gen; + struct bintime bt; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - bintime2timespec(&th->th_offset, tsp); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getbinnouptime(&bt, sizeof(bt), __offsetof(struct timehands, + th_offset)); + bintime2timespec(&bt, tsp); } void fbclock_getmicrouptime(struct timeval *tvp) { - struct timehands *th; - unsigned int gen; + struct bintime bt; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - bintime2timeval(&th->th_offset, tvp); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getbinnouptime(&bt, sizeof(bt), __offsetof(struct timehands, + th_offset)); + bintime2timeval(&bt, tvp); } void fbclock_getbintime(struct bintime *bt) { - struct timehands *th; - unsigned int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_bintime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getbinnouptime(bt, sizeof(*bt), __offsetof(struct timehands, + th_bintime)); } void fbclock_getnanotime(struct timespec *tsp) { - struct timehands *th; - unsigned int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *tsp = th->th_nanotime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getbinnouptime(tsp, sizeof(*tsp), __offsetof(struct timehands, + th_nanotime)); } void fbclock_getmicrotime(struct timeval *tvp) { - struct timehands *th; - unsigned int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *tvp = th->th_microtime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getbinnouptime(tvp, sizeof(*tvp), __offsetof(struct timehands, + th_microtime)); } #else /* !FFCLOCK */ + void binuptime(struct bintime *bt) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_offset; - bintime_addx(bt, th->th_scale * tc_delta(th)); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + binnouptime(bt, __offsetof(struct timehands, th_offset)); } void @@ -387,16 +396,8 @@ microuptime(struct timeval *tvp) void bintime(struct bintime *bt) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_bintime; - bintime_addx(bt, th->th_scale * tc_delta(th)); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + binnouptime(bt, __offsetof(struct timehands, th_bintime)); } void @@ -420,85 +421,53 @@ microtime(struct timeval *tvp) void getbinuptime(struct bintime *bt) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_offset; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getbinnouptime(bt, sizeof(*bt), __offsetof(struct timehands, + th_offset)); } void getnanouptime(struct timespec *tsp) { - struct timehands *th; - u_int gen; + struct bintime bt; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - bintime2timespec(&th->th_offset, tsp); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getbinnouptime(&bt, sizeof(bt), __offsetof(struct timehands, + th_offset)); + bintime2timespec(&bt, tsp); } void getmicrouptime(struct timeval *tvp) { - struct timehands *th; - u_int gen; + struct bintime bt; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - bintime2timeval(&th->th_offset, tvp); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getbinnouptime(&bt, sizeof(bt), __offsetof(struct timehands, + th_offset)); + bintime2timeval(&bt, tvp); } void getbintime(struct bintime *bt) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_bintime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getbinnouptime(bt, sizeof(*bt), __offsetof(struct timehands, + th_bintime)); } void getnanotime(struct timespec *tsp) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *tsp = th->th_nanotime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getbinnouptime(tsp, sizeof(*tsp), __offsetof(struct timehands, + th_nanotime)); } void getmicrotime(struct timeval *tvp) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *tvp = th->th_microtime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getbinnouptime(tvp, sizeof(*tvp), __offsetof(struct timehands, + th_microtime)); } #endif /* FFCLOCK */ @@ -514,15 +483,9 @@ getboottime(struct timeval *boottime) void getboottimebin(struct bintime *boottimebin) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *boottimebin = th->th_boottime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getbinnouptime(boottimebin, sizeof(*boottimebin), + __offsetof(struct timehands, th_boottime)); } #ifdef FFCLOCK @@ -1038,15 +1001,9 @@ getmicrotime(struct timeval *tvp) void dtrace_getnanotime(struct timespec *tsp) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *tsp = th->th_nanotime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getbinnouptime(tsp, sizeof(*tsp), __offsetof(struct timehands, + th_nanotime)); } /* @@ -1464,6 +1421,7 @@ tc_windup(struct bintime *new_boottimebin) scale += (th->th_adjustment / 1024) * 2199; scale /= th->th_counter->tc_frequency; th->th_scale = scale * 2; + th->th_large_delta = ((uint64_t)1 << 63) / scale; /* * Now that the struct timehands is again consistent, set the new From owner-freebsd-hackers@freebsd.org Wed Mar 6 21:03:53 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id ADDC415273FF for ; Wed, 6 Mar 2019 21:03:53 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic305-22.consmr.mail.ne1.yahoo.com (sonic305-22.consmr.mail.ne1.yahoo.com [66.163.185.148]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A21DA70560 for ; Wed, 6 Mar 2019 21:03:52 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: ok9IFkgVM1mDn1NyROc6pAsNPoJjWLIQ1eZslNsDEbrtjuz3QthYjLnuT9m_imE x8kHcG0P3LT0gM_jJzyZfpMf_hLJKNikukaSwy1_JgGOmitIXNwkKV9MUYSJibh7_zNic79Ik_wH AsIejzs2Qvx17ShWTM43j5R3Z48XajS8.WZ4BR3rrnhu86bxqHfH89ssV3gXQcoqFUUMse4BOoEy 2wnWtLflah9DxhzKkYynVsa8Hyc7zaRLti3OTgvI5D.oW8_flNX7gRWXITj6crJb3_yTadMYYHP1 _YnQLgHpLTePtvbgrJmgAWzyQhtVctI1HHWtTZBlmiyq795aohDShVi0WML9z1aUnDczI2BhgIeJ WR7n1iVU8Pu1CJ7LVAYgpe5CNtuR1BQxXfwwCEhePGqMDxdqUSZ2rDXYynvNILC5M.QMrZQl4eJK FJoUwKdHfIefHQ2LpRxauxlI6wW9TSsN5eASnZsQGFJMNArscbxaZaBnMZBBcbNNec7o1nNhzi71 g0HmscP41eaFZLmAUMqSZWC0tkmhomHm1ej30..gaVcO6lz.5AYV9ix7UOMNoH1XCiG_0Ksn6MFg y2TrGQRA5QgzfU6B7vAicY4KKN2ojD27hf19gDvhzp8niI8.uusz.bPknnt1ID6WBEQTY_cdUzck Jbt0D6FmbTPZEKo5_5Se1d5F3I4ZwLUhex38sw.h75AK.URMKakP5M36z5rPERZq6oKtadg_Yfh2 sQq6.oV97g3HY6s3u2MQcRTlX6B83ztQ4.mwNeV8qOHhE_HQ3r.h_2CegvO9JwKbTAN9NmY8kqaR Qa9CLX5.hpuVQbmTFIA71rve6anB7DHQROaYsuAwPg8HjVfxN2lsI.qAZkJdtk7O1h8OjpNk6GI_ 0JHiOfJLR0UkGVuYRLSh51WGEaTsa4iylXSDbvFFAof7AlYUHAXjSO7.p4GRAunkU6V55oeap0S0 vTVHZ87aN36JLhrIJyWArJVSjrcvF.mn6JCS4TeGnZpNfXMKBXRnJ6oANhyBmk1mR6vkg68b2WhZ 0YwHYLpzFNh3C8Ean2uWb4G40WYMjFMa7xfQHO9K1mdd2_9vIeLgb99ckKkkOzCQl.Q-- Received: from sonic.gate.mail.ne1.yahoo.com by sonic305.consmr.mail.ne1.yahoo.com with HTTP; Wed, 6 Mar 2019 21:03:46 +0000 Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.115]) ([67.170.167.181]) by smtp407.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID 989e6db6d298809b3619be81032a35c7; Wed, 06 Mar 2019 21:03:43 +0000 (UTC) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: Re: powerpc64 on PowerMac G5 4-core (system total): a hack that so far seem to avoid the stuck-sleeping issue [self-hosted buildworld/buildkernel completed] Date: Wed, 6 Mar 2019 13:03:42 -0800 References: <76E8BF75-A8F5-4A48-9B7C-6494F4A9520B@yahoo.com> <75A8BB07-3273-423E-9436-798395BC8640@yahoo.com> To: FreeBSD PowerPC ML , Mark Millard via freebsd-hackers In-Reply-To: <75A8BB07-3273-423E-9436-798395BC8640@yahoo.com> Message-Id: <23683875-418E-4E48-BE26-01221EABC906@yahoo.com> X-Mailer: Apple Mail (2.3445.102.3) X-Rspamd-Queue-Id: A21DA70560 X-Spamd-Bar: +++ X-Spamd-Result: default: False [3.11 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net]; DKIM_TRACE(0.00)[yahoo.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36646, ipnet:66.163.184.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; NEURAL_SPAM_SHORT(0.79)[0.791,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(1.31)[ip: (4.36), ipnet: 66.163.184.0/21(1.25), asn: 36646(1.00), country: US(-0.07)]; NEURAL_SPAM_MEDIUM(0.81)[0.808,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(0.71)[0.709,0]; RCVD_IN_DNSWL_NONE(0.00)[148.185.163.66.list.dnswl.org : 127.0.5.0] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2019 21:03:53 -0000 [I have a new observed maximum difference, having changed the code record such.] On 2019-Mar-4, at 01:40, Mark Millard wrote: > [I did some testing of other figures than testing for < 0x10.] >=20 > On 2019-Mar-3, at 13:23, Mark Millard wrote: >=20 >> [So far the hack has been successful. Details given later >> below.] >>=20 >> On 2019-Mar-2, at 21:20, Mark Millard wrote: >>=20 >>> [This note goes in a different direction compared to my >>> prior evidence report for overflows and the later activity >>> that has been happening for it. This does *not* involve >>> the patches associated with that report.] >>>=20 >>> I view the following as an evidence-gathering hack: >>> showing the change in behavior with the code changes, >>> not as directly what FreeBSD should do for powerpc64. >>> In code for defined(__powerpc64__) && defined(AIM) >>> I freely use knowledge of the PowerMac G5 context >>> instead of attempting general code. >>>=20 >>> Also: the code is set up to record some information >>> that I've been looking at via ddb. The recording is >>> not part of what changes the behavior but I decided >>> to show that code too. >>>=20 >>> It is preliminary, but, so far, the hack has avoided >>> buf*daemon* threads and pmac_thermal getting stuck >>> sleeping (or, at least, far less frequently). >>>=20 >>>=20 >>> The tbr-value hack: >>>=20 >>> =46rom what I see the G5 various cores have each tbr running at the >>> same rate but have some some offsets as far as the base time >>> goes. cpu_mp_unleash does: >>>=20 >>> ap_awake =3D 1; >>>=20 >>> /* Provide our current DEC and TB values for APs */ >>> ap_timebase =3D mftb() + 10; >>> __asm __volatile("msync; isync"); >>>=20 >>> /* Let APs continue */ >>> atomic_store_rel_int(&ap_letgo, 1); >>>=20 >>> platform_smp_timebase_sync(ap_timebase, 0); >>>=20 >>> and machdep_ap_bootstrap does: >>>=20 >>> /* >>> * Set timebase as soon as possible to meet an implicit = rendezvous >>> * from cpu_mp_unleash(), which sets ap_letgo and then = immediately >>> * sets timebase. >>> * >>> * Note that this is instrinsically racy and is only relevant = on >>> * platforms that do not support better mechanisms. >>> */ >>> platform_smp_timebase_sync(ap_timebase, 1); >>>=20 >>>=20 >>> which attempts to set the tbrs appropriately. >>>=20 >>> But on small scales of differences the various tbr >>> values from different cpus end up not well ordered >>> relative to time, synchronizes with, and the like. >>> Only large enough differences can well indicate an >>> ordering of interest. >>>=20 >>> Note: tc->tc_get_timecount(tc) only provides the >>> least signficant 32 bits of the tbr value. >>> th->th_offset_count is also 32 bits and based on >>> truncated tbr values. >>>=20 >>> So I made binuptime avoid finishing when it sees >>> a small (<0x10) step backwards for a new >>> tc->tc_get_timecount(tc) value vs. the existing >>> th->th_offset_count value (values strongly tied >>> to powerpc64 tbr values): >>>=20 >>> . . . [old code omitted] . . . >>>=20 >>> So far as I can tell, the FreeBSD code is not designed to deal >>> with small differences in tc->tc_get_timecount(tc) not actually >>> indicating a useful < vs. =3D=3D vs. > ordering relation uniquely. >>>=20 >>> (I make no claim that the hack is a proper way to deal with >>> such.) >>=20 >> I did a somewhat over 7 hours buildworld buildkernel on the >> PowerMac G5. Overall the G5 has been up over 13 hours and >> none of the buf*daemon* threads have gotten stuck sleeping. >> Nor has pmac_thermal gotten stuck. Similarly for vnlru >> and syncer: "top -HIStopid" still shows them all as >> periodically active. >>=20 >> Previously for this usefdt=3D1 context (with the modern >> VM_MAX_KERNEL_ADDRESS), going more than a few minutes >> without at least one of those threads getting stuck >> sleeping was rare on the G5 (powerpc64 example). >>=20 >> So this hack has managed to avoid finding sbinuptime() >> in sleepq_timeout being less than the earlier (by call >> structure/code sequencing) sbinuptime() in timercb that >> lead to the sleepq_timeout callout being called in the >> first place. >>=20 >> So in the sleepq_timeout callout's: >>=20 >> if (td->td_sleeptimo > sbinuptime() || td->td_sleeptimo =3D=3D = 0) { >> /* >> * The thread does not want a timeout (yet). >> */ >> } else . . . >>=20 >> td->td_sleeptimo > sbinuptime() ends up false now for small >> enough original differences. >>=20 >> This case does not set up another timeout, it just leaves the >> thread stuck sleeping, no longer doing periodic activities. >>=20 >> As stands what I did (presuming an appropriate definition >> of "small differences in the problematical direction") should >> leave this and other sbinuptime-using code with: >>=20 >> td->td_sleeptimo <=3D sbinuptime() >>=20 >> for what were originally "small" tbr value differences in the >> problematical direction (in case other places require it in >> some way). >>=20 >> If, instead, just sleepq_timeout's test could allow for >> some slop in the ordering, it could be a cheaper hack then >> looping in binuptime . >>=20 >> At this point I've no clue what a correct/efficient FreeBSD >> design for allowing the sloppy match across tbr's for different >> CPUs would be. >=20 > Instead of 0x10 in "&& tim_offset-tim_cnt<0x10" I tried > the each of following and they all failed: >=20 > && tim_offset-tim_cnt<0x2 > && tim_offset-tim_cnt<0x4 > && tim_offset-tim_cnt<0x8 > && tim_offset-tim_cnt<0xc I've now seen a difference of 0x11 that lead to hung up threads, hung waiting for sleep. > 0x2, 0x4, and 0x8 failed for the first boot attempt, > almost mediately having stuck-in-sleep threads. >=20 > 0xc seemed to be working for the first boot (including > a buildworld buildkernel that did not have to rebuild > much). But the 2nd boot attempt had a stuck-in-sleep > thread by the time I logged in. >=20 > By contrast, for: >=20 > && tim_offset-tim_cnt<0x10 >=20 > I've not it fail so far, after many reboots, a full > buildworld buildkernel, and running over 24 hours > (that included the somewhat over 7 hours for build > world buildkernel). But it might be that some boots > would need a bigger figure. >=20 During a ports-mgmt/poudriere-devel run I had some threads hang in sleep when the code was based on less than 0x10 differences. But I'd changed to be recording the maximum "small difference in the problematical direction" observed and so was able to see that it got a: 0x11 difference. The below is the newer code structure as far as what is recorded. It already has 0x14 instead of 0x10 for the bound it uses to control the loop. I omitted #if 0 . . . #endif code that I'm not currently using. #if defined(__powerpc64__) && defined(AIM) void binuptime(struct bintime *bt) { struct timehands *th; u_int gen; u_int tim_cnt, tim_offset; // HACK!!! (for "small difference is = problem direction loop") struct timecounter *tc; // HACK!!! (for recording other data for = inspection via ddb) u_int tim_diff; // HACK!!! uint64_t scale_factor, diff_scaled; // HACK!!! #if 1 u_int tim_wrong_order_diff=3D 0u; // HACK!!! u_int max_wrong_order_diff=3D 0u; // HACK!!! u_int wrong_order_cnt=3D 0u; // HACK!!! u_int wrong_order_offset=3D 0u; // HACK!!! #endif do { do { // HACK!!! th=3D timehands; tc=3D th->th_counter; gen=3D atomic_load_acq_int(&th->th_generation); tim_cnt=3D tc->tc_get_timecount(tc); tim_offset=3D th->th_offset_count; #if 1 tim_wrong_order_diff=3D tim_offset-tim_cnt; if ( tim_cntth_offset; tim_diff=3D (tim_cnt - tim_offset) & = tc->tc_counter_mask; scale_factor=3D th->th_scale; diff_scaled=3D scale_factor * tim_diff; bintime_addx(bt, diff_scaled); atomic_thread_fence_acq(); } while (gen =3D=3D 0 || gen !=3D th->th_generation); #if 1 // Uses direct-map addresses (mapping to the most signficant c = being masked off). // Justin H. reported that some of the 0x0..0xff addresses were = unused // and available. The 2 larger ranges that I observed to stay at = zero // were 0x20..0x7f and 0xa..0xff --so that is what I limited = myself to. if (*(volatile = uint64_t*)0xc0000000000000b0 Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A3F14152D5CB; Thu, 7 Mar 2019 12:22:23 +0000 (UTC) (envelope-from babupalit@gmail.com) Received: from mail-qt1-x844.google.com (mail-qt1-x844.google.com [IPv6:2607:f8b0:4864:20::844]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9DD33821E1; Thu, 7 Mar 2019 12:22:22 +0000 (UTC) (envelope-from babupalit@gmail.com) Received: by mail-qt1-x844.google.com with SMTP id y4so16760213qtc.10; Thu, 07 Mar 2019 04:22:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=MkiqQeA1vGpih3llJog3SU2FoClkoQ+xZvBJkRhxkDY=; b=qo9RhNaZBEb8+79Shho0kFOhr0oO6NWgHqdQf3ra3dX9SrtXvZMd+MjQ6FlHsIVGWk 6VLrzIYrjnDwKqp0lM7xZ+LfUy5oL3eR63wXu5I9Lmfy4l+qqcY7MZqlwXIjZ0q7+RxN OjUx1Wvw4nduiSzI39iDzGAixOz7BhDHdIXkhQuFEjvIc63w0jcZAii26o8lfwB4LbNz BUuwFgF9j7r3eaQ46xmdItxnCNcF4D5qpH2SNswuL25G3GFCm1sfTHCeuTFfp3IMBq4h XRpY/ejWMwYMXJn25Af71pj1ahp8mSQtsS955nrzlP6RXhxvfoel/vEuGTu0SU1Fv0tI FnfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=MkiqQeA1vGpih3llJog3SU2FoClkoQ+xZvBJkRhxkDY=; b=MKJ00E+lXMyfiyOql9drEEVtBPL/2UJMGSEZmqNrgNTNKtaKGv9Qm0R2gwqh6DOQSD yRKkBBd6q1XjMvreZsgY1Kzjb6+2q9DUquwkHlZFeiB09HjOsHyVLP9RNEJSfFDZqeJ5 06QW/YY2c+bgjpK0oKyRFv9mJ5LGHQaHodUdbKOj6yLUm3rTBY5lv+l6tnXwlR21dQcB xM+22BSMx0vNy6JtZuVsUVNuRBuF6lTf/ZnkTjh4UKapPVo3QZnFJyp26cS2IAxz9/Hv +R9tLb2kCn/vgvgV7qIaJ27NM8NQzKaaKl2MbXIzOUu305rP9kTiEJ2Lu0zkPQee5ieI akiQ== X-Gm-Message-State: APjAAAXE752Z3KhbJ4g+sxh3X6gZyrPmlaM4iWu9IVZd2EJGyRteiYJg 4F9PTI/yyQzyy/3kyWQo8mU+VHZlBMjjGRwPz83tTqVF X-Google-Smtp-Source: APXvYqzwurFceC/4oWAMIDHLObqM6nb3VGXcdeLNtM567KpjjgEntEBPdmebp+v5STwIxnM3W67MSIgJ/AYQa7KkW3U= X-Received: by 2002:ac8:1761:: with SMTP id u30mr9675836qtk.354.1551961340985; Thu, 07 Mar 2019 04:22:20 -0800 (PST) MIME-Version: 1.0 From: Arpan Palit Date: Thu, 7 Mar 2019 17:52:09 +0530 Message-ID: Subject: How to access external PHY on MDIO bus? To: freebsd-drivers@freebsd.org, freebsd-hackers@freebsd.org X-Rspamd-Queue-Id: 9DD33821E1 X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=qo9RhNaZ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of babupalit@gmail.com designates 2607:f8b0:4864:20::844 as permitted sender) smtp.mailfrom=babupalit@gmail.com X-Spamd-Result: default: False [-3.00 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-0.999,0]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; FREEMAIL_FROM(0.00)[gmail.com]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; TO_DN_NONE(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_SHORT(-0.48)[-0.478,0]; RCVD_TLS_LAST(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; MX_GOOD(-0.01)[cached: alt3.gmail-smtp-in.l.google.com]; SUBJECT_ENDS_QUESTION(1.00)[]; RCVD_IN_DNSWL_NONE(0.00)[4.4.8.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.b.8.f.7.0.6.2.list.dnswl.org : 127.0.5.0]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; RCVD_COUNT_TWO(0.00)[2]; IP_SCORE(-0.51)[ip: (2.26), ipnet: 2607:f8b0::/32(-2.72), asn: 15169(-2.05), country: US(-0.07)]; DWL_DNSWL_NONE(0.00)[gmail.com.dwl.dnswl.org : 127.0.5.0] X-Mailman-Approved-At: Thu, 07 Mar 2019 12:38:44 +0000 Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2019 12:22:23 -0000 Hi, I need to know how can I access the specific register offset in external phy freebsd. In linux the equivalent routine is phy_read/phy_write to read/write a specific register, which internally call the mdiobus_read/mdiobus_write function. I could see that there is a mdio_readreg/mdio_writereg MDIO interface present which is driven by stack, what if driver needs to do the same, is there any equivalent present or any other way to do that. Thanks, Arpan Palit From owner-freebsd-hackers@freebsd.org Thu Mar 7 14:31:41 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D4DDC153112F; Thu, 7 Mar 2019 14:31:40 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id 74F1388991; Thu, 7 Mar 2019 14:31:39 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 7BC2C3D92DB; Fri, 8 Mar 2019 01:31:32 +1100 (AEDT) Date: Fri, 8 Mar 2019 01:31:30 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Konstantin Belousov cc: Bruce Evans , Mark Millard , freebsd-hackers Hackers , FreeBSD PowerPC ML Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] In-Reply-To: <20190306172003.GD2492@kib.kiev.ua> Message-ID: <20190308001005.M2756@besplex.bde.org> References: <20190302105140.GC68879@kib.kiev.ua> <20190302225513.W3408@besplex.bde.org> <20190302142521.GE68879@kib.kiev.ua> <20190303041441.V4781@besplex.bde.org> <20190303111931.GI68879@kib.kiev.ua> <20190303223100.B3572@besplex.bde.org> <20190303161635.GJ68879@kib.kiev.ua> <20190304043416.V5640@besplex.bde.org> <20190304114150.GM68879@kib.kiev.ua> <20190305031010.I4610@besplex.bde.org> <20190306172003.GD2492@kib.kiev.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=GReyFr9QJwj15KPVhA0A:9 a=CjuIK1q_8ugA:10 X-Rspamd-Queue-Id: 74F1388991 X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-6.91 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; REPLY(-4.00)[]; NEURAL_HAM_SHORT(-0.91)[-0.914,0] X-Mailman-Approved-At: Thu, 07 Mar 2019 16:29:06 +0000 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2019 14:31:41 -0000 On Wed, 6 Mar 2019, Konstantin Belousov wrote: > On Tue, Mar 05, 2019 at 05:17:14AM +1100, Bruce Evans wrote: >> On Mon, 4 Mar 2019, Konstantin Belousov wrote: >> >>> On Mon, Mar 04, 2019 at 05:29:48AM +1100, Bruce Evans wrote: >>>> On Sun, 3 Mar 2019, Konstantin Belousov wrote: >>>> >>>>> On Mon, Mar 04, 2019 at 12:32:12AM +1100, Bruce Evans wrote: >* ... >> I strongly disklike the merge. >> >>>>> So I verified that: >>>>> - there is no 64bit multiplication in the generated code, for i386 both >>>>> for clang 7.0 and gcc 8.3; >>>>> - that everything is inlined, the only call from bintime/binuptime is >>>>> the indirect call to get the timecounter value. >>>> >>>> I will have to fix it for compilers that I use. >>> Ok, I will add __inline. >> >> That will make it fast enough, but still hard to read. >> >>>>> + *bt = *bts; >>>>> + scale = th->th_scale; >>>>> + delta = tc_delta(th); >>>>> +#ifdef _LP64 >>>>> + if (__predict_false(th->th_large_delta <= delta)) { >>>>> + /* Avoid overflow for scale * delta. */ >>>>> + bintime_helper(bt, scale, delta); >>>>> + bintime_addx(bt, (scale & 0xffffffff) * delta); >>>>> + } else { >>>>> + bintime_addx(bt, scale * delta); >>>>> + } >>>>> +#else >>>>> + /* >>>>> + * Use bintime_helper() unconditionally, since the fast >>>>> + * path in the above method is not so fast here, since >>>>> + * the 64 x 32 -> 64 bit multiplication is usually not >>>>> + * available in hardware and emulating it using 2 >>>>> + * 32 x 32 -> 64 bit multiplications uses code much >>>>> + * like that in bintime_helper(). >>>>> + */ >>>>> + bintime_helper(bt, scale, delta); >>>>> + bintime_addx(bt, (uint64_t)(uint32_t)scale * delta); >>>>> +#endif >>>> >>>> Check that this method is really better. Without this, the complicated >>>> part is about half as large and duplicating it is smaller than this >>>> version. >>> Better in what sence ? I am fine with the C code, and asm code looks >>> good. >> >> Better in terms of actually running significantly faster. I fear the >> 32-bit method is actually slightly slower for the fast path. I checked that it is just worse. Significantly slower and more complicated. I wrote and run a lot of timing benchmarks of various versions. All times in cycles on Haswell @4.08 GHz. On i386 except where noted: - the fastest case is when compiled by clang with the default of -O2. binuptime() in a loop then takes 34 cycles. This is faster than possible for latency, since rdtsc alone has a latency of 24 cycles. There must be several iterations of the loop running in parallel. - the slowest case is when compiled by gcc-4.2.1 with my config of -Os. binuptime() in a loop then takes 116 cycles. -Os does at least the following pessimization: use memcpy() for copying the 12-byte struct bitime. - gcc-4.2.1 -O2 takes 74 cycles. -O2 still does the following pessimization: do a 64 x 32 -> 64 bit multiplication after not noticing that the first operand has been reduced to 32 bits by a shift or mask. The above tests were done with the final version. The version which tested alternatives used switch (method) and takes about 20 cycles longer for the fastest version, presumably by defeating parallelism. Times for various methods: - with clang -Os, about 54 cycles for the old method that allowed overflow, and the same for the version with the check of the overflow threshold (but with the threshold never reached), and 59 cycles for the branch- free method. 100-116 cycles with gcc-4.2.1 -Os, with the branch-free method taking 5-10 cycles longer. - on amd64, only a couple of cycles faster (49-50 cycles in best cases), and gcc-4.2.1 only taking a ouple of cycles longer. The branch-free method still takes about 59 cycles so it is relatively worse. In userland, using the syscall for syscall for clock_gettime(), the extra 5-10 cycles for the branch-free method is relatively insignificat. It is about 2 nanonseconds. Other pessimizatations are more significant. Times for this syscall: - amd64 now: 224 nsec (with gcc-4.2.1 -Os) - i386 4+4 nopae: 500-580 nsec (depending on clang/gcc and -O2/-Os) even getpid(2) takes 280 nsec. Add at least 140 more nsec for pae. - i386 3+1: 224 nsec (with gcc 4.2.1 -Os) - i386 FreeBSD-5 UP: 193 nsec (with gcc-3.3.3 -O). - i386 4+4 nopae old library version of clock_gettime() compiled by clang: 29 nsec. In some tests, the version with the branch was even a cycle or two faster. In the tests, the branch was always perfectly predicted, so costs nothing except possibly by changing scheduling in an accidentally good way. The tests were too small to measure the cost of using branch prediction resources. I've never noticed a case where 1 more branch causes thrashing. >>>>> - do { >>>>> - th = timehands; >>>>> - gen = atomic_load_acq_int(&th->th_generation); >>>>> - *bt = th->th_bintime; >>>>> - bintime_addx(bt, th->th_scale * tc_delta(th)); >>>>> - atomic_thread_fence_acq(); >>>>> - } while (gen == 0 || gen != th->th_generation); >>>> >>>> Duplicating this loop is much better than obfuscating it using inline >>>> functions. This loop was almost duplicated (except for the delta >>>> calculation) in no less than 17 functions in kern_tc.c (9 tc ones and >>>> 8 fflock ones). Now it is only duplicated 16 times. >>> How did you counted the 16 ? I can see only 4 instances in the unpatched >>> kern_tc.c, and 3 in patched, but it is 3 and not 1 only because I do not >>> touch ffclock until the patch is finalized. After that, it would be >>> 1 instance for kernel and 1 for userspace. >> >> Grep for the end condition in this loop. There are actually 20 of these. >> I'm counting the loops and not the previously-simple scaling operation in >> it. The scaling is indeed only done for 4 cases. I prefer the 20 >> duplications (except I only want about 6 of the functions). Duplication >> works even better for only 4 cases. > Ok, I merged these as well. Now there are only four loops left in kernel. > I do not think that merging them is beneficial, since they have sufficiently > different bodies. This is exacly what I don't want. > > I disagree with you characterization of it as obfuscation, IMO it improves > the maintainability of the code by reducing number of places which need > careful inspection of the lock-less algorithm. It makes the inspection and changes more difficult for each instance. General functions are more difficult to work with since they need more args to control them and can't be changed without affecting all callers. In another thread, you didn't like similar churn for removing td args. Here there isn't even a bug, since overflow only occurs when an invariant is violated. >> This should be written as a function call to 1 new function to replace >> the line with the overflowing multiplication. The line is always the >> same, so the new function call can look like bintime_xxx(bt, th). > Again, please provide at least of a pseudocode of your preference. The following is a complete tested and benchmarked implementation, with a couple more minor fixes: XX Index: kern_tc.c XX =================================================================== XX --- kern_tc.c (revision 344852) XX +++ kern_tc.c (working copy) XX @@ -72,6 +72,7 @@ XX struct timecounter *th_counter; XX int64_t th_adjustment; XX uint64_t th_scale; XX + u_int th_large_delta; XX u_int th_offset_count; XX struct bintime th_offset; XX struct bintime th_bintime; Improvement not already discussed: use a u_int limit for the u_int variable. XX @@ -90,6 +91,7 @@ XX static struct timehands th0 = { XX .th_counter = &dummy_timecounter, XX .th_scale = (uint64_t)-1 / 1000000, XX + .th_large_delta = 1000000, XX .th_offset = { .sec = 1 }, XX .th_generation = 1, XX .th_next = &th1 Fix not already discussed: th_large_delta was used in the dummy timehands before it was initialized. Static initialization to 0 gives fail-safe behaviour and unintended exercizing of the slow path. The dummy timecounter has a low frequency, so its overflow threshold is quite low. I think it is not used even 1000000 times unless there is a bug in the boot code, so it doesn't overflow in practice. I did see some strange crashes at boot time while testing this. XX @@ -351,6 +353,26 @@ XX } while (gen == 0 || gen != th->th_generation); XX } XX #else /* !FFCLOCK */ XX + XX +static __inline void XX +bintime_adddelta(struct bintime *bt, struct timehands *th) Only 1 utility function now. XX +{ XX + uint64_t scale, x; XX + u_int delta; XX + XX + scale = th->th_scale; XX + delta = tc_delta(th); XX + if (__predict_false(delta < th->th_large_delta)) { XX + /* Avoid overflow for scale * delta. */ XX + x = (scale >> 32) * delta; XX + bt->sec += x >> 32; XX + bintime_addx(bt, x << 32); XX + bintime_addx(bt, (scale & 0xffffffff) * delta); This is clearer with all the scaling code together. I thought of renaming x to x95_32 to sort of document that it holds bits 95..32 in a component of the product. XX + } else { XX + bintime_addx(bt, scale * delta); XX + } XX +} XX + XX void XX binuptime(struct bintime *bt) XX { XX @@ -361,7 +383,7 @@ XX th = timehands; XX gen = atomic_load_acq_int(&th->th_generation); XX *bt = th->th_offset; XX - bintime_addx(bt, th->th_scale * tc_delta(th)); XX + bintime_adddelta(bt, th); XX atomic_thread_fence_acq(); XX } while (gen == 0 || gen != th->th_generation); XX } This is the kind of non-churning change that I like. The function name bintime_adddelta() isn't so good, but it is in the same style as bintime_addx() where the names are worse. bintime_addx() is global so it needs a descriptive name more. 'delta' is more descriptive than 'x' (x means a scalar and not a bintime). The 'bintime' prefix is verbose. It should be bt, especially in non-global APIs. XX @@ -394,7 +416,7 @@ XX th = timehands; XX gen = atomic_load_acq_int(&th->th_generation); XX *bt = th->th_bintime; XX - bintime_addx(bt, th->th_scale * tc_delta(th)); XX + bintime_adddelta(bt, th); XX atomic_thread_fence_acq(); XX } while (gen == 0 || gen != th->th_generation); XX } XX @@ -1464,6 +1486,7 @@ XX scale += (th->th_adjustment / 1024) * 2199; XX scale /= th->th_counter->tc_frequency; XX th->th_scale = scale * 2; XX + th->th_large_delta = MIN(((uint64_t)1 << 63) / scale, UINT_MAX); XX XX /* XX * Now that the struct timehands is again consistent, set the new Clamp this to UINT_MAX now that it is stored in a u_int. > The current patch becomes to large already, I want to test/commit what > I already have, and I will need to split it for the commit. It was already too large. > > diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c > index 2656fb4d22f..7114a0e5219 100644 > --- a/sys/kern/kern_tc.c > +++ b/sys/kern/kern_tc.c > ... > @@ -200,22 +201,77 @@ tc_delta(struct timehands *th) > * the comment in for a description of these 12 functions. > */ > > -#ifdef FFCLOCK > -void > -fbclock_binuptime(struct bintime *bt) > +static __inline void > +bintime_helper(struct bintime *bt, uint64_t scale, u_int delta) This name is not descriptive. > +static __inline void > +binnouptime(struct bintime *bt, u_int off) This name is an example of further problems with the naming scheme. The bintime_ prefix used above is verbose, but it is at least a prefix and is in the normal bintime_ namespace. Here the prefix is 'bin', which is neither of these. It means bintime_ again, but this duplicates 'time'. If I liked churn, then I would have changed all names here long ago. E.g.: - bintime_ -> bt_, and use it consistently - timecounter -> tc except for the timecounter public variable - fb_ -> facebook_ -> /dev/null. Er, fb_ -> fbt_ or -> ft_. - bt -> btp when bt is a pointer. You used bts for a struct in this patch - unsigned int -> u_int. I policed this in early timecounter code. You fixed some instances of this too. - th_generation -> th_gen. Bruce From owner-freebsd-hackers@freebsd.org Thu Mar 7 22:22:35 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EF9B215280EE; Thu, 7 Mar 2019 22:22:34 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id EACD675CB4; Thu, 7 Mar 2019 22:22:33 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x27MMMbY024576 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Fri, 8 Mar 2019 00:22:25 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x27MMMbY024576 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id x27MMKjN024519; Fri, 8 Mar 2019 00:22:20 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 8 Mar 2019 00:22:20 +0200 From: Konstantin Belousov To: Bruce Evans Cc: Mark Millard , freebsd-hackers Hackers , FreeBSD PowerPC ML Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] Message-ID: <20190307222220.GK2492@kib.kiev.ua> References: <20190302142521.GE68879@kib.kiev.ua> <20190303041441.V4781@besplex.bde.org> <20190303111931.GI68879@kib.kiev.ua> <20190303223100.B3572@besplex.bde.org> <20190303161635.GJ68879@kib.kiev.ua> <20190304043416.V5640@besplex.bde.org> <20190304114150.GM68879@kib.kiev.ua> <20190305031010.I4610@besplex.bde.org> <20190306172003.GD2492@kib.kiev.ua> <20190308001005.M2756@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190308001005.M2756@besplex.bde.org> User-Agent: Mutt/1.11.3 (2019-02-01) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2019 22:22:35 -0000 On Fri, Mar 08, 2019 at 01:31:30AM +1100, Bruce Evans wrote: > On Wed, 6 Mar 2019, Konstantin Belousov wrote: > > > On Tue, Mar 05, 2019 at 05:17:14AM +1100, Bruce Evans wrote: > >> On Mon, 4 Mar 2019, Konstantin Belousov wrote: > >> > >>> On Mon, Mar 04, 2019 at 05:29:48AM +1100, Bruce Evans wrote: > >>>> On Sun, 3 Mar 2019, Konstantin Belousov wrote: > >>>> > >>>>> On Mon, Mar 04, 2019 at 12:32:12AM +1100, Bruce Evans wrote: > >* ... > >> I strongly disklike the merge. > >> > >>>>> So I verified that: > >>>>> - there is no 64bit multiplication in the generated code, for i386 both > >>>>> for clang 7.0 and gcc 8.3; > >>>>> - that everything is inlined, the only call from bintime/binuptime is > >>>>> the indirect call to get the timecounter value. > >>>> > >>>> I will have to fix it for compilers that I use. > >>> Ok, I will add __inline. > >> > >> That will make it fast enough, but still hard to read. > >> > >>>>> + *bt = *bts; > >>>>> + scale = th->th_scale; > >>>>> + delta = tc_delta(th); > >>>>> +#ifdef _LP64 > >>>>> + if (__predict_false(th->th_large_delta <= delta)) { > >>>>> + /* Avoid overflow for scale * delta. */ > >>>>> + bintime_helper(bt, scale, delta); > >>>>> + bintime_addx(bt, (scale & 0xffffffff) * delta); > >>>>> + } else { > >>>>> + bintime_addx(bt, scale * delta); > >>>>> + } > >>>>> +#else > >>>>> + /* > >>>>> + * Use bintime_helper() unconditionally, since the fast > >>>>> + * path in the above method is not so fast here, since > >>>>> + * the 64 x 32 -> 64 bit multiplication is usually not > >>>>> + * available in hardware and emulating it using 2 > >>>>> + * 32 x 32 -> 64 bit multiplications uses code much > >>>>> + * like that in bintime_helper(). > >>>>> + */ > >>>>> + bintime_helper(bt, scale, delta); > >>>>> + bintime_addx(bt, (uint64_t)(uint32_t)scale * delta); > >>>>> +#endif > >>>> > >>>> Check that this method is really better. Without this, the complicated > >>>> part is about half as large and duplicating it is smaller than this > >>>> version. > >>> Better in what sence ? I am fine with the C code, and asm code looks > >>> good. > >> > >> Better in terms of actually running significantly faster. I fear the > >> 32-bit method is actually slightly slower for the fast path. > > I checked that it is just worse. Significantly slower and more complicated. > > I wrote and run a lot of timing benchmarks of various versions. All > times in cycles on Haswell @4.08 GHz. On i386 except where noted: > > - the fastest case is when compiled by clang with the default of -O2. > binuptime() in a loop then takes 34 cycles. This is faster than possible > for latency, since rdtsc alone has a latency of 24 cycles. There must be > several iterations of the loop running in parallel. > > - the slowest case is when compiled by gcc-4.2.1 with my config of -Os. > binuptime() in a loop then takes 116 cycles. -Os does at least the > following pessimization: use memcpy() for copying the 12-byte struct > bitime. > > - gcc-4.2.1 -O2 takes 74 cycles. -O2 still does the following pessimization: > do a 64 x 32 -> 64 bit multiplication after not noticing that the first > operand has been reduced to 32 bits by a shift or mask. > > The above tests were done with the final version. The version which tested > alternatives used switch (method) and takes about 20 cycles longer for the > fastest version, presumably by defeating parallelism. Times for various > methods: > > - with clang -Os, about 54 cycles for the old method that allowed overflow, > and the same for the version with the check of the overflow threshold > (but with the threshold never reached), and 59 cycles for the branch- > free method. 100-116 cycles with gcc-4.2.1 -Os, with the branch-free > method taking 5-10 cycles longer. > > - on amd64, only a couple of cycles faster (49-50 cycles in best cases), > and gcc-4.2.1 only taking a ouple of cycles longer. The branch-free > method still takes about 59 cycles so it is relatively worse. > > In userland, using the syscall for syscall for clock_gettime(), the > extra 5-10 cycles for the branch-free method is relatively insignificat. > It is about 2 nanonseconds. Other pessimizatations are more significant. > Times for this syscall: > - amd64 now: 224 nsec (with gcc-4.2.1 -Os) > - i386 4+4 nopae: 500-580 nsec (depending on clang/gcc and -O2/-Os) > even getpid(2) takes 280 nsec. Add at least 140 more nsec for pae. > - i386 3+1: 224 nsec (with gcc 4.2.1 -Os) > - i386 FreeBSD-5 UP: 193 nsec (with gcc-3.3.3 -O). > - i386 4+4 nopae old library version of clock_gettime() compiled by > clang: 29 nsec. > > In some tests, the version with the branch was even a cycle or two faster. > In the tests, the branch was always perfectly predicted, so costs nothing > except possibly by changing scheduling in an accidentally good way. The > tests were too small to measure the cost of using branch prediction > resources. I've never noticed a case where 1 more branch causes thrashing. About testing such tight loops. There is a known phenomen where Intel CPUs give absurd times when code in the loop has unsuitable alignment. The manifestation of the phenomen is very surprising and hardly controllable. It is due to the way the CPU front-end prefetches blocks of bytes for instruction decoding and jmps locations in the blocks. The only source explaining it is https://www.youtube.com/watch?v=IX16gcX4vDQ the talk of Intel engineer. > > >>>>> - do { > >>>>> - th = timehands; > >>>>> - gen = atomic_load_acq_int(&th->th_generation); > >>>>> - *bt = th->th_bintime; > >>>>> - bintime_addx(bt, th->th_scale * tc_delta(th)); > >>>>> - atomic_thread_fence_acq(); > >>>>> - } while (gen == 0 || gen != th->th_generation); > >>>> > >>>> Duplicating this loop is much better than obfuscating it using inline > >>>> functions. This loop was almost duplicated (except for the delta > >>>> calculation) in no less than 17 functions in kern_tc.c (9 tc ones and > >>>> 8 fflock ones). Now it is only duplicated 16 times. > >>> How did you counted the 16 ? I can see only 4 instances in the unpatched > >>> kern_tc.c, and 3 in patched, but it is 3 and not 1 only because I do not > >>> touch ffclock until the patch is finalized. After that, it would be > >>> 1 instance for kernel and 1 for userspace. > >> > >> Grep for the end condition in this loop. There are actually 20 of these. > >> I'm counting the loops and not the previously-simple scaling operation in > >> it. The scaling is indeed only done for 4 cases. I prefer the 20 > >> duplications (except I only want about 6 of the functions). Duplication > >> works even better for only 4 cases. > > Ok, I merged these as well. Now there are only four loops left in kernel. > > I do not think that merging them is beneficial, since they have sufficiently > > different bodies. > > This is exacly what I don't want. > > > > I disagree with you characterization of it as obfuscation, IMO it improves > > the maintainability of the code by reducing number of places which need > > careful inspection of the lock-less algorithm. > > It makes the inspection and changes more difficult for each instance. > General functions are more difficult to work with since they need more > args to control them and can't be changed without affecting all callers. > > In another thread, you didn't like similar churn for removing td args. It is not similar. I do valid refactoring there (in terms of that thread, I do not like the term refactoring). I eliminate dozen instrances of very intricate loop which implements quite delicate lockless algorithm. Its trickiness can be illustrated by the fact that it is only valid use of thread_fence_acq() which cannot be replaced by load_acq() (similar case is present in sys/seq.h). > Here there isn't even a bug, since overflow only occurs when an invariant > is violated. > > >> This should be written as a function call to 1 new function to replace > >> the line with the overflowing multiplication. The line is always the > >> same, so the new function call can look like bintime_xxx(bt, th). > > Again, please provide at least of a pseudocode of your preference. > > The following is a complete tested and benchmarked implementation, with a > couple more minor fixes: > > XX Index: kern_tc.c > XX =================================================================== > XX --- kern_tc.c (revision 344852) > XX +++ kern_tc.c (working copy) > XX @@ -72,6 +72,7 @@ > XX struct timecounter *th_counter; > XX int64_t th_adjustment; > XX uint64_t th_scale; > XX + u_int th_large_delta; > XX u_int th_offset_count; > XX struct bintime th_offset; > XX struct bintime th_bintime; > > Improvement not already discussed: use a u_int limit for the u_int variable. > > XX @@ -90,6 +91,7 @@ > XX static struct timehands th0 = { > XX .th_counter = &dummy_timecounter, > XX .th_scale = (uint64_t)-1 / 1000000, > XX + .th_large_delta = 1000000, > XX .th_offset = { .sec = 1 }, > XX .th_generation = 1, > XX .th_next = &th1 > > Fix not already discussed: th_large_delta was used in the dummy timehands > before it was initialized. Static initialization to 0 gives fail-safe > behaviour and unintended exercizing of the slow path. > > The dummy timecounter has a low frequency, so its overflow threshold is > quite low. I think it is not used even 1000000 times unless there is a > bug in the boot code, so it doesn't overflow in practice. I did see > some strange crashes at boot time while testing this. > > XX @@ -351,6 +353,26 @@ > XX } while (gen == 0 || gen != th->th_generation); > XX } > XX #else /* !FFCLOCK */ > XX + > XX +static __inline void > XX +bintime_adddelta(struct bintime *bt, struct timehands *th) > > Only 1 utility function now. And in my patch this helper function is called only once, so I inlined it manually. > > XX +{ > XX + uint64_t scale, x; > XX + u_int delta; > XX + > XX + scale = th->th_scale; > XX + delta = tc_delta(th); > XX + if (__predict_false(delta < th->th_large_delta)) { > XX + /* Avoid overflow for scale * delta. */ > XX + x = (scale >> 32) * delta; > XX + bt->sec += x >> 32; > XX + bintime_addx(bt, x << 32); > XX + bintime_addx(bt, (scale & 0xffffffff) * delta); > > This is clearer with all the scaling code together. > > I thought of renaming x to x95_32 to sort of document that it holds bits > 95..32 in a component of the product. > > XX + } else { > XX + bintime_addx(bt, scale * delta); > XX + } > XX +} > XX + > XX void > XX binuptime(struct bintime *bt) > XX { > XX @@ -361,7 +383,7 @@ > XX th = timehands; > XX gen = atomic_load_acq_int(&th->th_generation); > XX *bt = th->th_offset; > XX - bintime_addx(bt, th->th_scale * tc_delta(th)); > XX + bintime_adddelta(bt, th); > XX atomic_thread_fence_acq(); > XX } while (gen == 0 || gen != th->th_generation); > XX } > > This is the kind of non-churning change that I like. Ok. I made all cases where timehands are read, more uniform by moving calculations after the generation loop. This makes the atomic part of the functions easier to see, and loop body has lower chance to hit generation reset. > > The function name bintime_adddelta() isn't so good, but it is in the same > style as bintime_addx() where the names are worse. bintime_addx() is global > so it needs a descriptive name more. 'delta' is more descriptive than 'x' > (x means a scalar and not a bintime). The 'bintime' prefix is verbose. It > should be bt, especially in non-global APIs. > > XX @@ -394,7 +416,7 @@ > XX th = timehands; > XX gen = atomic_load_acq_int(&th->th_generation); > XX *bt = th->th_bintime; > XX - bintime_addx(bt, th->th_scale * tc_delta(th)); > XX + bintime_adddelta(bt, th); > XX atomic_thread_fence_acq(); > XX } while (gen == 0 || gen != th->th_generation); > XX } > XX @@ -1464,6 +1486,7 @@ > XX scale += (th->th_adjustment / 1024) * 2199; > XX scale /= th->th_counter->tc_frequency; > XX th->th_scale = scale * 2; > XX + th->th_large_delta = MIN(((uint64_t)1 << 63) / scale, UINT_MAX); > XX > XX /* > XX * Now that the struct timehands is again consistent, set the new > > Clamp this to UINT_MAX now that it is stored in a u_int. > > > The current patch becomes to large already, I want to test/commit what > > I already have, and I will need to split it for the commit. > > It was already too large. > > > > diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c > > index 2656fb4d22f..7114a0e5219 100644 > > --- a/sys/kern/kern_tc.c > > +++ b/sys/kern/kern_tc.c > > ... > > @@ -200,22 +201,77 @@ tc_delta(struct timehands *th) > > * the comment in for a description of these 12 functions. > > */ > > > > -#ifdef FFCLOCK > > -void > > -fbclock_binuptime(struct bintime *bt) > > +static __inline void > > +bintime_helper(struct bintime *bt, uint64_t scale, u_int delta) > > This name is not descriptive. > > > +static __inline void > > +binnouptime(struct bintime *bt, u_int off) > > This name is an example of further problems with the naming scheme. > The bintime_ prefix used above is verbose, but it is at least a prefix > and is in the normal bintime_ namespace. Here the prefix is 'bin', > which is neither of these. It means bintime_ again, but this duplicates > 'time'. I agree, and I made a name getthmember for the other function which clearly reflect its operation. For this one, I ended with bintime_off(). > > If I liked churn, then I would have changed all names here long ago. > E.g.: > - bintime_ -> bt_, and use it consistently > - timecounter -> tc except for the timecounter public variable > - fb_ -> facebook_ -> /dev/null. Er, fb_ -> fbt_ or -> ft_. > - bt -> btp when bt is a pointer. You used bts for a struct in this patch > - unsigned int -> u_int. I policed this in early timecounter code. > You fixed some instances of this too. > - th_generation -> th_gen. diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c index 2656fb4d22f..8d12847f2cd 100644 --- a/sys/kern/kern_tc.c +++ b/sys/kern/kern_tc.c @@ -72,6 +72,7 @@ struct timehands { struct timecounter *th_counter; int64_t th_adjustment; uint64_t th_scale; + u_int th_large_delta; u_int th_offset_count; struct bintime th_offset; struct bintime th_bintime; @@ -90,6 +91,7 @@ static struct timehands th1 = { static struct timehands th0 = { .th_counter = &dummy_timecounter, .th_scale = (uint64_t)-1 / 1000000, + .th_large_delta = 1000000, .th_offset = { .sec = 1 }, .th_generation = 1, .th_next = &th1 @@ -200,20 +202,56 @@ tc_delta(struct timehands *th) * the comment in for a description of these 12 functions. */ -#ifdef FFCLOCK -void -fbclock_binuptime(struct bintime *bt) +static __inline void +bintime_off(struct bintime *bt, u_int off) { struct timehands *th; - unsigned int gen; + struct bintime *btp; + uint64_t scale, x; + u_int delta, gen, large_delta; do { th = timehands; gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_offset; - bintime_addx(bt, th->th_scale * tc_delta(th)); + btp = (struct bintime *)((vm_offset_t)th + off); + *bt = *btp; + scale = th->th_scale; + delta = tc_delta(th); + large_delta = th->th_large_delta; atomic_thread_fence_acq(); } while (gen == 0 || gen != th->th_generation); + + if (__predict_false(delta < large_delta)) { + /* Avoid overflow for scale * delta. */ + x = (scale >> 32) * delta; + bt->sec += x >> 32; + bintime_addx(bt, x << 32); + bintime_addx(bt, (scale & 0xffffffff) * delta); + } else { + bintime_addx(bt, scale * delta); + } +} + +static __inline void +getthmember(void *out, size_t out_size, u_int off) +{ + struct timehands *th; + u_int gen; + + do { + th = timehands; + gen = atomic_load_acq_int(&th->th_generation); + memcpy(out, (char *)th + off, out_size); + atomic_thread_fence_acq(); + } while (gen == 0 || gen != th->th_generation); +} + +#ifdef FFCLOCK +void +fbclock_binuptime(struct bintime *bt) +{ + + bintime_off(bt, __offsetof(struct timehands, th_offset)); } void @@ -237,16 +275,8 @@ fbclock_microuptime(struct timeval *tvp) void fbclock_bintime(struct bintime *bt) { - struct timehands *th; - unsigned int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_bintime; - bintime_addx(bt, th->th_scale * tc_delta(th)); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + bintime_off(bt, __offsetof(struct timehands, th_bintime)); } void @@ -270,100 +300,61 @@ fbclock_microtime(struct timeval *tvp) void fbclock_getbinuptime(struct bintime *bt) { - struct timehands *th; - unsigned int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_offset; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getthmember(bt, sizeof(*bt), __offsetof(struct timehands, + th_offset)); } void fbclock_getnanouptime(struct timespec *tsp) { - struct timehands *th; - unsigned int gen; + struct bintime bt; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - bintime2timespec(&th->th_offset, tsp); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getthmember(&bt, sizeof(bt), __offsetof(struct timehands, + th_offset)); + bintime2timespec(&bt, tsp); } void fbclock_getmicrouptime(struct timeval *tvp) { - struct timehands *th; - unsigned int gen; + struct bintime bt; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - bintime2timeval(&th->th_offset, tvp); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getthmember(&bt, sizeof(bt), __offsetof(struct timehands, + th_offset)); + bintime2timeval(&bt, tvp); } void fbclock_getbintime(struct bintime *bt) { - struct timehands *th; - unsigned int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_bintime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getthmember(bt, sizeof(*bt), __offsetof(struct timehands, + th_bintime)); } void fbclock_getnanotime(struct timespec *tsp) { - struct timehands *th; - unsigned int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *tsp = th->th_nanotime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getthmember(tsp, sizeof(*tsp), __offsetof(struct timehands, + th_nanotime)); } void fbclock_getmicrotime(struct timeval *tvp) { - struct timehands *th; - unsigned int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *tvp = th->th_microtime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getthmember(tvp, sizeof(*tvp), __offsetof(struct timehands, + th_microtime)); } #else /* !FFCLOCK */ + void binuptime(struct bintime *bt) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_offset; - bintime_addx(bt, th->th_scale * tc_delta(th)); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + bintime_off(bt, __offsetof(struct timehands, th_offset)); } void @@ -387,16 +378,8 @@ microuptime(struct timeval *tvp) void bintime(struct bintime *bt) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_bintime; - bintime_addx(bt, th->th_scale * tc_delta(th)); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + bintime_off(bt, __offsetof(struct timehands, th_bintime)); } void @@ -420,85 +403,53 @@ microtime(struct timeval *tvp) void getbinuptime(struct bintime *bt) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_offset; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getthmember(bt, sizeof(*bt), __offsetof(struct timehands, + th_offset)); } void getnanouptime(struct timespec *tsp) { - struct timehands *th; - u_int gen; + struct bintime bt; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - bintime2timespec(&th->th_offset, tsp); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getthmember(&bt, sizeof(bt), __offsetof(struct timehands, + th_offset)); + bintime2timespec(&bt, tsp); } void getmicrouptime(struct timeval *tvp) { - struct timehands *th; - u_int gen; + struct bintime bt; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - bintime2timeval(&th->th_offset, tvp); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getthmember(&bt, sizeof(bt), __offsetof(struct timehands, + th_offset)); + bintime2timeval(&bt, tvp); } void getbintime(struct bintime *bt) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_bintime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getthmember(bt, sizeof(*bt), __offsetof(struct timehands, + th_bintime)); } void getnanotime(struct timespec *tsp) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *tsp = th->th_nanotime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getthmember(tsp, sizeof(*tsp), __offsetof(struct timehands, + th_nanotime)); } void getmicrotime(struct timeval *tvp) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *tvp = th->th_microtime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getthmember(tvp, sizeof(*tvp), __offsetof(struct timehands, + th_microtime)); } #endif /* FFCLOCK */ @@ -514,15 +465,9 @@ getboottime(struct timeval *boottime) void getboottimebin(struct bintime *boottimebin) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *boottimebin = th->th_boottime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getthmember(boottimebin, sizeof(*boottimebin), + __offsetof(struct timehands, th_boottime)); } #ifdef FFCLOCK @@ -1038,15 +983,9 @@ getmicrotime(struct timeval *tvp) void dtrace_getnanotime(struct timespec *tsp) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *tsp = th->th_nanotime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getthmember(tsp, sizeof(*tsp), __offsetof(struct timehands, + th_nanotime)); } /* @@ -1464,6 +1403,7 @@ tc_windup(struct bintime *new_boottimebin) scale += (th->th_adjustment / 1024) * 2199; scale /= th->th_counter->tc_frequency; th->th_scale = scale * 2; + th->th_large_delta = MIN(((uint64_t)1 << 63) / scale, UINT_MAX); /* * Now that the struct timehands is again consistent, set the new From owner-freebsd-hackers@freebsd.org Fri Mar 8 01:30:04 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D94C6152DFD0 for ; Fri, 8 Mar 2019 01:30:03 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic309-22.consmr.mail.ne1.yahoo.com (sonic309-22.consmr.mail.ne1.yahoo.com [66.163.184.148]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6A57A847A6 for ; Fri, 8 Mar 2019 01:30:03 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: z7zvHfUVM1mQmMyjx4YL2r07_97JyYrstZve3Kprd7hdb8Zf3FX97lkWdYwxOem K2rTM9be1lSJH0yuEcOwo5Y_k8ea01xm6t23GwJ7ygwcqU2OX7MfiRkd0BE_855EnA9xkvZzrzC_ hZIEB7PU9iMioc9RmA06Q51xjPHwK8HRwodgvDtbmjrOrSHA9hYMdbQO_leLQhqk_3mnAPRhCUK9 rg0TX3rXjrlUFb0xy32FJ2ta2UO4Zhlal63JsUZlhIqwr6YrZY8R7L2xM7bWPioYv8i59NaZtYAz kRMLHnqYtjS.U0qigzQ6Lwvf0_w37ZXjOcGhiOes9KFNj96rc4bE3_XQs9MqaRLzeVWYSAatz4tQ wca8vd6IZzAMs3xkUv1Ul4v7gcZzMiC_95BAYGo6kA9sN7z6AgsfTx_lxt7neFgthSyc81OZ2EqI lDsLQOT3Tl3LLqW87xR0kXAUdE9NTZyW7y1B4NNK8RMGnwFeXJec1guAf7RQ.55K89Do.zv9dUEA kBuOK_QvLZ3JkhKpISbVSEOzSFdL5mzHaOsVGMYwBgfGgAVZ.6ZMHDoMVnf5pYY7P_eEv9.XNNsG 8b3QGI7KlOksdMqNie3QfnFb_SMMWVAcSgqWxxpY4f2cYErrY4XJETCph4K4nIzplxe9fgYj30_7 QjhqetzjHOmYVqy4cx8S4LTDT57GfoBd3kRLY.6YVXSjEb1WvjY6LHnnFgggVrhAvSXA.U24EJxE F8VRxFRy04MVERSirC7mn6I0X0kBZ6mjrfRV9ZFdWJFpJ5hSavjOOR6s4.9nebI9iBjy2C51Y3i9 NmA0sT6_1315MSiC156DRUL1MU0t7GGSUCbWyvv8__d0wPatERnPbZqpMzbWvV1IfxJd06pFEvFx fr0X30.4Ga8AeYR_g6jg7F3r2_Wuf7yEz9Sl7536KbCPGZFLnUR.fttF1p6yrL0pz_PzJzSZIe1R VCkXhztBy3VMjKLyGCPFQ8U4MjMTT3B34beonXdJXa_44xixvLGSGrNnVt.7moNBKXiRGrRkzDOO F6Wgpyj565EcsXqeCr8E_H5H7qF5jcT1zzv9ZeZxaTu6vPZNEk03Yp9L4NxU5U8sbmOHTfg-- Received: from sonic.gate.mail.ne1.yahoo.com by sonic309.consmr.mail.ne1.yahoo.com with HTTP; Fri, 8 Mar 2019 01:29:56 +0000 Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.113]) ([67.170.167.181]) by smtp428.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID f665bae4c52bbab61751dd46e593eb0e; Fri, 08 Mar 2019 01:29:53 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] From: Mark Millard In-Reply-To: <20190307222220.GK2492@kib.kiev.ua> Date: Thu, 7 Mar 2019 17:29:51 -0800 Cc: Bruce Evans , freebsd-hackers Hackers , FreeBSD PowerPC ML Content-Transfer-Encoding: 7bit Message-Id: <5EED3352-2E8C-4BEE-B281-4AC8DE9570C2@yahoo.com> References: <20190302142521.GE68879@kib.kiev.ua> <20190303041441.V4781@besplex.bde.org> <20190303111931.GI68879@kib.kiev.ua> <20190303223100.B3572@besplex.bde.org> <20190303161635.GJ68879@kib.kiev.ua> <20190304043416.V5640@besplex.bde.org> <20190304114150.GM68879@kib.kiev.ua> <20190305031010.I4610@besplex.bde.org> <20190306172003.GD2492@kib.kiev.ua> <20190308001005.M2756@besplex.bde.org> <20190307222220.GK2492@kib.kiev.ua> To: Konstantin Belousov X-Mailer: Apple Mail (2.3445.102.3) X-Rspamd-Queue-Id: 6A57A847A6 X-Spamd-Bar: ------ X-Spamd-Result: default: False [-6.98 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; REPLY(-4.00)[]; NEURAL_HAM_SHORT(-0.98)[-0.977,0] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Mar 2019 01:30:04 -0000 A basic question and a small note. Question's context for it tc->tc_get_timecount(tc) values: In the powerpc64 context tc->tc_get_timecount(tc) is the lower 32 bits of the tbr, in my context having a 33,333,333 MHz or so increment rate for a machine with a 2.5 GHz or so clock rate. The truncated 32 bit tbr value wraps every 128 seconds or so. 2 sockets, 2 cores per socket, so 4 separate tbr values. The question is . . . In tc_delta's: tc->tc_get_timecount(tc) - th->th_offset_count is observing tc->tc_get_timecount(tc) < th->th_offset_count ever supposed to be possible in correct operation, other than tc->tc_get_timecount(tc) having wrapped around (and so being newly 0 or "near" 0, no evidence of of having it having been near 128 seconds or more for my context)? The note: On 2019-Mar-7, at 14:22, Konstantin Belousov wrote: > . . . > + > + if (__predict_false(delta < large_delta)) { I thought that delta=large_delta . > + /* Avoid overflow for scale * delta. */ > + x = (scale >> 32) * delta; > + bt->sec += x >> 32; > + bintime_addx(bt, x << 32); > + bintime_addx(bt, (scale & 0xffffffff) * delta); > + } else { > + bintime_addx(bt, scale * delta); > + } > . . . === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) From owner-freebsd-hackers@freebsd.org Fri Mar 8 23:36:24 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 39C0C152DF4C for ; Fri, 8 Mar 2019 23:36:24 +0000 (UTC) (envelope-from darius@dons.net.au) Received: from ipmail07.adl2.internode.on.net (ipmail07.adl2.internode.on.net [150.101.137.131]) by mx1.freebsd.org (Postfix) with ESMTP id 2F7F891D23 for ; Fri, 8 Mar 2019 23:36:14 +0000 (UTC) (envelope-from darius@dons.net.au) Received: from ppp118-210-135-201.adl-adc-lon-bras33.tpg.internode.on.net (HELO midget.dons.net.au) ([118.210.135.201]) by ipmail07.adl2.internode.on.net with ESMTP; 09 Mar 2019 10:00:53 +1030 Received: from midget.dons.net.au (localhost [127.0.0.1]) by midget.dons.net.au (8.15.2/8.15.2) with ESMTPS id x28NUgQt013125 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Sat, 9 Mar 2019 10:00:47 +1030 (ACDT) (envelope-from darius@dons.net.au) Received: (from mailnull@localhost) by midget.dons.net.au (8.15.2/8.15.2/Submit) id x28N8fHB094865 for ; Sat, 9 Mar 2019 09:38:41 +1030 (ACDT) (envelope-from darius@dons.net.au) X-Authentication-Warning: midget.dons.net.au: mailnull set sender to using -f Received: from [10.0.2.26] ([10.0.2.26]) by ns.dons.net.au (envelope-sender ) (MIMEDefang) with ESMTP id x28N8fq1094864; Sat, 09 Mar 2019 09:38:41 +1030 From: "O'Connor, Daniel" Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Date: Sat, 9 Mar 2019 09:38:40 +1030 Subject: USB stack getting confused Message-Id: To: FreeBSD Hackers X-Mailer: Apple Mail (2.3445.102.3) X-Spam-Score: -1 () No, score=-1.0 required=5.0 tests=ALL_TRUSTED autolearn=unavailable autolearn_force=no version=3.4.1 X-Scanned-By: MIMEDefang 2.83 on 10.0.2.1 X-Rspamd-Queue-Id: 2F7F891D23 X-Spamd-Bar: +++++ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [5.16 / 15.00]; MV_CASE(0.50)[]; HAS_XAW(0.00)[]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[midget.dons.net.au]; RCVD_NO_TLS_LAST(0.10)[]; RECEIVED_SPAMHAUS_PBL(0.00)[201.135.210.118.zen.spamhaus.org : 127.0.0.11]; RCVD_IN_DNSWL_LOW(-0.10)[131.137.101.150.list.dnswl.org : 127.0.5.1]; R_DKIM_NA(0.00)[]; ASN(0.00)[asn:4739, ipnet:150.101.0.0/16, country:AU]; MIME_TRACE(0.00)[0:+]; MID_RHS_MATCH_FROM(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; ARC_NA(0.00)[]; RCVD_COUNT_FIVE(0.00)[5]; IP_SCORE(0.80)[ip: (2.65), ipnet: 150.101.0.0/16(1.08), asn: 4739(0.33), country: AU(-0.04)]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_SPAM_SHORT(0.97)[0.970,0]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org]; AUTH_NA(1.00)[]; NEURAL_SPAM_MEDIUM(1.00)[1.000,0]; RCPT_COUNT_ONE(0.00)[1]; DMARC_NA(0.00)[dons.net.au]; NEURAL_SPAM_LONG(1.00)[1.000,0]; R_SPF_NA(0.00)[] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Mar 2019 23:36:24 -0000 Hi, I'm developing a data acquisition system on FreeBSD using a USB3 = interface (the OrangeTree ZestSC3) and I find that the USB stack appears = to 'lose' the device after a while. My program normally runs continually doing acquisitions of data for N = seconds, doing some checks and restarting. After a while (~30 1 minute = acquisitions or ~8 30 minute ones) my program can't 'see' the device (it = uses libusb10) any more (it reconnects each acquisition for $REASONS). = Also pretty weirdly usbconfig can't see it either(!). If I stop my program the device reappears in usbconfig. If I restart my = program it works. I did some GDB'ing and it appears that ugen20_enumerate (the libusb10 = interface is implemented by calling libusb20 functions) can't open = /dev/ugenX.Y and errno is 12 (ENOMEM). After digging with dtrace I have seen the open method be something = different for this device. I have also seen it where opening the device = doesn't call usb_fifo_open (not sure what it *does* call though - I see = user land call openat but haven't traced through what gets called). I'm still digging but am somewhat hopeful someone can suggest some = things to look at :) This is on 11.2 if it matters. Thanks. -- Daniel O'Connor "The nice thing about standards is that there are so many of them to choose from." -- Andrew Tanenbaum From owner-freebsd-hackers@freebsd.org Sat Mar 9 09:01:23 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B9D8D153DCAC for ; Sat, 9 Mar 2019 09:01:23 +0000 (UTC) (envelope-from hps@selasky.org) Received: from mail.turbocat.net (turbocat.net [IPv6:2a01:4f8:c17:6c4b::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B018F758CA for ; Sat, 9 Mar 2019 09:01:22 +0000 (UTC) (envelope-from hps@selasky.org) Received: from hps2016.home.selasky.org (unknown [176.74.212.121]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id F368B260209; Sat, 9 Mar 2019 10:01:18 +0100 (CET) Subject: Re: USB stack getting confused To: "O'Connor, Daniel" , FreeBSD Hackers References: From: Hans Petter Selasky Message-ID: Date: Sat, 9 Mar 2019 10:00:56 +0100 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: B018F758CA X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of hps@selasky.org designates 2a01:4f8:c17:6c4b::2 as permitted sender) smtp.mailfrom=hps@selasky.org X-Spamd-Result: default: False [-5.86 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+a:mail.turbocat.net]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[selasky.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: mail.turbocat.net]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-0.90)[-0.895,0]; IP_SCORE(-2.66)[ip: (-8.73), ipnet: 2a01:4f8::/29(-2.31), asn: 24940(-2.23), country: DE(-0.01)]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:24940, ipnet:2a01:4f8::/29, country:DE]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Mar 2019 09:01:23 -0000 On 3/9/19 12:08 AM, O'Connor, Daniel wrote: > My program normally runs continually doing acquisitions of data for N seconds, doing some checks and restarting. After a while (~30 1 minute acquisitions or ~8 30 minute ones) my program can't 'see' the device (it uses libusb10) any more (it reconnects each acquisition for $REASONS). Also pretty weirdly usbconfig can't see it either(!). What is printed in dmesg? Maybe the device has a problem. --HPS From owner-freebsd-hackers@freebsd.org Sat Mar 9 10:36:15 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9DD631540E58 for ; Sat, 9 Mar 2019 10:36:15 +0000 (UTC) (envelope-from darius@dons.net.au) Received: from ipmail02.adl2.internode.on.net (ipmail02.adl2.internode.on.net [150.101.137.139]) by mx1.freebsd.org (Postfix) with ESMTP id 3E0308115D for ; Sat, 9 Mar 2019 10:36:11 +0000 (UTC) (envelope-from darius@dons.net.au) Received: from 124-148-131-52.dyn.iinet.net.au (HELO midget.dons.net.au) ([124.148.131.52]) by ipmail02.adl2.internode.on.net with ESMTP; 09 Mar 2019 21:00:56 +1030 Received: from midget.dons.net.au (localhost [127.0.0.1]) by midget.dons.net.au (8.15.2/8.15.2) with ESMTPS id x29AUhwe080338 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Sat, 9 Mar 2019 21:00:52 +1030 (ACDT) (envelope-from darius@dons.net.au) Received: (from mailnull@localhost) by midget.dons.net.au (8.15.2/8.15.2/Submit) id x29ATVuT076664 for ; Sat, 9 Mar 2019 20:59:31 +1030 (ACDT) (envelope-from darius@dons.net.au) X-Authentication-Warning: midget.dons.net.au: mailnull set sender to using -f Received: from [10.0.2.26] ([10.0.2.26]) by ns.dons.net.au (envelope-sender ) (MIMEDefang) with ESMTP id x29ATUqq076662; Sat, 09 Mar 2019 20:59:31 +1030 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: Re: USB stack getting confused From: "O'Connor, Daniel" In-Reply-To: Date: Sat, 9 Mar 2019 20:59:30 +1030 Cc: FreeBSD Hackers Content-Transfer-Encoding: quoted-printable Message-Id: <3B29D870-41F9-46AF-B9F3-03106DEC417D@dons.net.au> References: To: Hans Petter Selasky X-Mailer: Apple Mail (2.3445.102.3) X-Spam-Score: -1 () No, score=-1.0 required=5.0 tests=ALL_TRUSTED autolearn=unavailable autolearn_force=no version=3.4.1 X-Scanned-By: MIMEDefang 2.83 on 10.0.2.1 X-Rspamd-Queue-Id: 3E0308115D X-Spamd-Bar: ++++ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [4.41 / 15.00]; MV_CASE(0.50)[]; HAS_XAW(0.00)[]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: midget.dons.net.au]; RCPT_COUNT_TWO(0.00)[2]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; RCVD_IN_DNSWL_LOW(-0.10)[139.137.101.150.list.dnswl.org : 127.0.5.1]; R_DKIM_NA(0.00)[]; ASN(0.00)[asn:4739, ipnet:150.101.0.0/16, country:AU]; MIME_TRACE(0.00)[0:+]; MID_RHS_MATCH_FROM(0.00)[]; RECEIVED_SPAMHAUS_PBL(0.00)[52.131.148.124.zen.spamhaus.org : 127.0.0.11]; ARC_NA(0.00)[]; RCVD_COUNT_FIVE(0.00)[5]; IP_SCORE(0.27)[ipnet: 150.101.0.0/16(1.06), asn: 4739(0.33), country: AU(-0.04)]; FROM_HAS_DN(0.00)[]; NEURAL_SPAM_SHORT(0.93)[0.931,0]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org]; AUTH_NA(1.00)[]; NEURAL_SPAM_MEDIUM(0.90)[0.901,0]; DMARC_NA(0.00)[dons.net.au]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(0.92)[0.916,0]; R_SPF_NA(0.00)[] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Mar 2019 10:36:15 -0000 > On 9 Mar 2019, at 19:30, Hans Petter Selasky wrote: > On 3/9/19 12:08 AM, O'Connor, Daniel wrote: >> My program normally runs continually doing acquisitions of data for N = seconds, doing some checks and restarting. After a while (~30 1 minute = acquisitions or ~8 30 minute ones) my program can't 'see' the device (it = uses libusb10) any more (it reconnects each acquisition for $REASONS). = Also pretty weirdly usbconfig can't see it either(!). >=20 > What is printed in dmesg? Maybe the device has a problem. There is nothing in dmesg - no disconnect / reconnect etc. If I hold the user space process in gdb 'forever' (eg over night) = usbconfig doesn't see the device, but the moment I quit the user space = process it can be seen again. -- Daniel O'Connor "The nice thing about standards is that there are so many of them to choose from." -- Andrew Tanenbaum From owner-freebsd-hackers@freebsd.org Sat Mar 9 15:26:32 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F143B1529B52 for ; Sat, 9 Mar 2019 15:26:31 +0000 (UTC) (envelope-from hps@selasky.org) Received: from mail.turbocat.net (turbocat.net [88.99.82.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C30F98BE9B for ; Sat, 9 Mar 2019 15:26:30 +0000 (UTC) (envelope-from hps@selasky.org) Received: from hps2016.home.selasky.org (unknown [176.74.212.121]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id 3B1A32603CF; Sat, 9 Mar 2019 16:26:21 +0100 (CET) Subject: Re: USB stack getting confused To: "O'Connor, Daniel" Cc: FreeBSD Hackers References: <3B29D870-41F9-46AF-B9F3-03106DEC417D@dons.net.au> From: Hans Petter Selasky Message-ID: <6dd8fe5f-6835-d98a-7592-0293406ccd63@selasky.org> Date: Sat, 9 Mar 2019 16:25:58 +0100 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: <3B29D870-41F9-46AF-B9F3-03106DEC417D@dons.net.au> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: C30F98BE9B X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of hps@selasky.org designates 88.99.82.50 as permitted sender) smtp.mailfrom=hps@selasky.org X-Spamd-Result: default: False [-6.55 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+a:mail.turbocat.net]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[selasky.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: mail.turbocat.net]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-0.96)[-0.963,0]; IP_SCORE(-3.28)[ip: (-9.49), ipnet: 88.99.0.0/16(-4.66), asn: 24940(-2.23), country: DE(-0.01)]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:24940, ipnet:88.99.0.0/16, country:DE]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Mar 2019 15:26:32 -0000 On 3/9/19 11:29 AM, O'Connor, Daniel wrote: > If I hold the user space process in gdb 'forever' (eg over night) usbconfig doesn't see the device, but the moment I quit the user space process it can be seen again. Check the output from "procstat -ak". Likely your application is not closing the USB handle during device detach and so a deadlock happens. Also see: libusb20_dev_check_connected() . Poll this function regularly to figure out if disconnect is needed. --HPS From owner-freebsd-hackers@freebsd.org Sat Mar 9 16:27:10 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 614A5152CA8F for ; Sat, 9 Mar 2019 16:27:10 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9D3328DCFE for ; Sat, 9 Mar 2019 16:27:09 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x29GQgHF086341 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sat, 9 Mar 2019 18:26:45 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x29GQgHF086341 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id x29GQeOL086339; Sat, 9 Mar 2019 18:26:40 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 9 Mar 2019 18:26:40 +0200 From: Konstantin Belousov To: Hans Petter Selasky Cc: "O'Connor, Daniel" , FreeBSD Hackers Subject: Re: USB stack getting confused Message-ID: <20190309162640.GN2492@kib.kiev.ua> References: <3B29D870-41F9-46AF-B9F3-03106DEC417D@dons.net.au> <20190309152613.GM2492@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.11.3 (2019-02-01) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Mar 2019 16:27:10 -0000 On Sat, Mar 09, 2019 at 04:42:50PM +0100, Hans Petter Selasky wrote: > On 3/9/19 4:26 PM, Konstantin Belousov wrote: > > On Sat, Mar 09, 2019 at 08:59:30PM +1030, O'Connor, Daniel wrote: > >> > >> > >>> On 9 Mar 2019, at 19:30, Hans Petter Selasky wrote: > >>> On 3/9/19 12:08 AM, O'Connor, Daniel wrote: > >>>> My program normally runs continually doing acquisitions of data for N seconds, doing some checks and restarting. After a while (~30 1 minute acquisitions or ~8 30 minute ones) my program can't 'see' the device (it uses libusb10) any more (it reconnects each acquisition for $REASONS). Also pretty weirdly usbconfig can't see it either(!). > >>> > >>> What is printed in dmesg? Maybe the device has a problem. > >> > >> There is nothing in dmesg - no disconnect / reconnect etc. > >> > >> If I hold the user space process in gdb 'forever' (eg over night) usbconfig doesn't see the device, but the moment I quit the user space process it can be seen again. > > > > Does it mean that the file descriptor opened for ugen has a chance to > > be closed ? > > The USB stack will wait for all FDs to be closed during detach also via > destroy_dev(). So my guess was correct. Do you agree that this behaviour is wrong ? In fact I saw something similar with apcupsd and either usb/com adapters or native usb control card for APC UPSes. For reasons I do not understand, these devices are often disconnected. For older versions of apcupsd, it required restart for newly reattached device to be recreated in /dev. Sometimes it hangs whole usb stack. Newer apcupsd seems to open /dev/ugen only for the duration of the query, which makes the erratic behaviour is much less likely, but could still cause breakage when device disappear while apcupsd has it opened. > > > > > I suspect that usb subsystem tried to destroy the device but some internal > > refcounting prevents it. Proper use of destroy_dev(_cb)(9) avoids > > the issue. > > --HPS From owner-freebsd-hackers@freebsd.org Sat Mar 9 07:00:28 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2067F153A1F7; Sat, 9 Mar 2019 07:00:28 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id 22F7F71414; Sat, 9 Mar 2019 07:00:25 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 84AD9105AD5E; Sat, 9 Mar 2019 18:00:15 +1100 (AEDT) Date: Sat, 9 Mar 2019 18:00:14 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Konstantin Belousov cc: Bruce Evans , Mark Millard , freebsd-hackers Hackers , FreeBSD PowerPC ML Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] In-Reply-To: <20190307222220.GK2492@kib.kiev.ua> Message-ID: <20190309144844.K1166@besplex.bde.org> References: <20190302142521.GE68879@kib.kiev.ua> <20190303041441.V4781@besplex.bde.org> <20190303111931.GI68879@kib.kiev.ua> <20190303223100.B3572@besplex.bde.org> <20190303161635.GJ68879@kib.kiev.ua> <20190304043416.V5640@besplex.bde.org> <20190304114150.GM68879@kib.kiev.ua> <20190305031010.I4610@besplex.bde.org> <20190306172003.GD2492@kib.kiev.ua> <20190308001005.M2756@besplex.bde.org> <20190307222220.GK2492@kib.kiev.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=UJetJGXy c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=vnREMb7VAAAA:8 a=ClMc5Of-GfaXbdAZ3JQA:9 a=f8I4eRmMFRTVFEQH:21 a=DjpI8WK0P_VDdg0N:21 a=CjuIK1q_8ugA:10 X-Rspamd-Queue-Id: 22F7F71414 X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of brde@optusnet.com.au designates 211.29.132.249 as permitted sender) smtp.mailfrom=brde@optusnet.com.au X-Spamd-Result: default: False [-6.00 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; RCVD_IN_DNSWL_LOW(-0.10)[249.132.29.211.list.dnswl.org : 127.0.5.1]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:211.29.132.0/23]; FREEMAIL_FROM(0.00)[optusnet.com.au]; MIME_GOOD(-0.10)[text/plain]; MIME_TRACE(0.00)[0:+]; DMARC_NA(0.00)[optusnet.com.au]; RCPT_COUNT_FIVE(0.00)[5]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: extmail.optusnet.com.au]; NEURAL_HAM_SHORT(-0.83)[-0.826,0]; IP_SCORE(-2.86)[ip: (-7.21), ipnet: 211.28.0.0/14(-3.92), asn: 4804(-3.13), country: AU(-0.04)]; FREEMAIL_TO(0.00)[gmail.com]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[optusnet.com.au]; ASN(0.00)[asn:4804, ipnet:211.28.0.0/14, country:AU]; FREEMAIL_CC(0.00)[optusnet.com.au]; RCVD_COUNT_TWO(0.00)[2] X-Mailman-Approved-At: Sat, 09 Mar 2019 14:00:12 +0000 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Mar 2019 07:00:28 -0000 On Fri, 8 Mar 2019, Konstantin Belousov wrote: > On Fri, Mar 08, 2019 at 01:31:30AM +1100, Bruce Evans wrote: >> On Wed, 6 Mar 2019, Konstantin Belousov wrote: >> >>> On Tue, Mar 05, 2019 at 05:17:14AM +1100, Bruce Evans wrote: >>>> On Mon, 4 Mar 2019, Konstantin Belousov wrote: >>>> >>>>> On Mon, Mar 04, 2019 at 05:29:48AM +1100, Bruce Evans wrote: >>>>>> On Sun, 3 Mar 2019, Konstantin Belousov wrote: >>>>>> >>>>>>> On Mon, Mar 04, 2019 at 12:32:12AM +1100, Bruce Evans wrote: >>> * ... >>>> I strongly disklike the merge. I more strongly disclike (sic) the more complete merge. The central APIs have even more parameters and reduced type safety to describe objects as (offset, size) pairs. >* ... >>>>>>> +#else >>>>>>> + /* >>>>>>> + * Use bintime_helper() unconditionally, since the fast >>>>>>> + * path in the above method is not so fast here, since >>>>>>> + * the 64 x 32 -> 64 bit multiplication is usually not >>>>>>> + * available in hardware and emulating it using 2 >>>>>>> + * 32 x 32 -> 64 bit multiplications uses code much >>>>>>> + * like that in bintime_helper(). >>>>>>> + */ >>>>>>> + bintime_helper(bt, scale, delta); >>>>>>> + bintime_addx(bt, (uint64_t)(uint32_t)scale * delta); >>>>>>> +#endif >>>>>> >>>>>> Check that this method is really better. Without this, the complicated >>>>>> part is about half as large and duplicating it is smaller than this >>>>>> version. >>>>> Better in what sence ? I am fine with the C code, and asm code looks >>>>> good. >>>> >>>> Better in terms of actually running significantly faster. I fear the >>>> 32-bit method is actually slightly slower for the fast path. >> >> I checked that it is just worse. Significantly slower and more complicated. >> >> I wrote and run a lot of timing benchmarks of various versions. All >> times in cycles on Haswell @4.08 GHz. On i386 except where noted: >> ... >> The above tests were done with the final version. The version which tested >> alternatives used switch (method) and takes about 20 cycles longer for the >> fastest version, presumably by defeating parallelism. Times for various >> methods: >> >> - with clang -Os, about 54 cycles for the old method that allowed overflow, >> and the same for the version with the check of the overflow threshold >> (but with the threshold never reached), and 59 cycles for the branch- >> free method. 100-116 cycles with gcc-4.2.1 -Os, with the branch-free >> method taking 5-10 cycles longer. >> >> - on amd64, only a couple of cycles faster (49-50 cycles in best cases), >> and gcc-4.2.1 only taking a ouple of cycles longer. The branch-free >> method still takes about 59 cycles so it is relatively worse. >> >> In userland, using the syscall for syscall for clock_gettime(), the >> extra 5-10 cycles for the branch-free method is relatively insignificat. >> It is about 2 nanonseconds. Other pessimizatations are more significant. >> Times for this syscall: >> - amd64 now: 224 nsec (with gcc-4.2.1 -Os) >> - i386 4+4 nopae: 500-580 nsec (depending on clang/gcc and -O2/-Os) >> even getpid(2) takes 280 nsec. Add at least 140 more nsec for pae. >> - i386 3+1: 224 nsec (with gcc 4.2.1 -Os) >> - i386 FreeBSD-5 UP: 193 nsec (with gcc-3.3.3 -O). >> - i386 4+4 nopae old library version of clock_gettime() compiled by >> clang: 29 nsec. >> >> In some tests, the version with the branch was even a cycle or two faster. >> In the tests, the branch was always perfectly predicted, so costs nothing >> except possibly by changing scheduling in an accidentally good way. The >> tests were too small to measure the cost of using branch prediction >> resources. I've never noticed a case where 1 more branch causes thrashing. > About testing such tight loops. There is a known phenomen where Intel > CPUs give absurd times when code in the loop has unsuitable alignment. > The manifestation of the phenomen is very surprising and hardly > controllable. It is due to the way the CPU front-end prefetches blocks > of bytes for instruction decoding and jmps locations in the blocks. > > The only source explaining it is https://www.youtube.com/watch?v=IX16gcX4vDQ > the talk of Intel engineer. I know a little about such tests since I have written thousands and interpreted millions of them (mostly automatically). There are a lot of other side effects of caching resources that usually make more difference than alignment. The most mysterious one that I noticed was apparently due to alignment, but in a makeworld macro-benchmark. Minor changes in even in unused functions or data gave differences of about 2% in real time and many more % in system time. This only showed up on an old Turion2 (early Athlon64) system. I think it is due to limited cache associativity causing many cache misses by lining up unrelated far apart code or data adresses mod some power of 2. Padding to give the same alignment as the best case was too hard, but I eventually found a configuration accidentally giving nearly the best case even with its alignments changed by small modifications the areas that I was working on. >* ... >>>>>>> - do { >>>>>>> - th = timehands; >>>>>>> - gen = atomic_load_acq_int(&th->th_generation); >>>>>>> - *bt = th->th_bintime; >>>>>>> - bintime_addx(bt, th->th_scale * tc_delta(th)); >>>>>>> - atomic_thread_fence_acq(); >>>>>>> - } while (gen == 0 || gen != th->th_generation); >>>>>> >>>>>> Duplicating this loop is much better than obfuscating it using inline >>>>>> functions. This loop was almost duplicated (except for the delta >>>>>> calculation) in no less than 17 functions in kern_tc.c (9 tc ones and >>>>>> 8 fflock ones). Now it is only duplicated 16 times. >>>>> How did you counted the 16 ? I can see only 4 instances in the unpatched >>>>> kern_tc.c, and 3 in patched, but it is 3 and not 1 only because I do not >>>>> touch ffclock until the patch is finalized. After that, it would be >>>>> 1 instance for kernel and 1 for userspace. >>>> >>>> Grep for the end condition in this loop. There are actually 20 of these. >>>> I'm counting the loops and not the previously-simple scaling operation in >>>> it. The scaling is indeed only done for 4 cases. I prefer the 20 >>>> duplications (except I only want about 6 of the functions). Duplication >>>> works even better for only 4 cases. >>> Ok, I merged these as well. Now there are only four loops left in kernel. >>> I do not think that merging them is beneficial, since they have sufficiently >>> different bodies. >> >> This is exacly what I don't want. >>> >>> I disagree with you characterization of it as obfuscation, IMO it improves >>> the maintainability of the code by reducing number of places which need >>> careful inspection of the lock-less algorithm. >> >> It makes the inspection and changes more difficult for each instance. >> General functions are more difficult to work with since they need more >> args to control them and can't be changed without affecting all callers. >> >> In another thread, you didn't like similar churn for removing td args. > It is not similar. I do valid refactoring there (in terms of that > thread, I do not like the term refactoring). I eliminate dozen instrances > of very intricate loop which implements quite delicate lockless algorithm. > Its trickiness can be illustrated by the fact that it is only valid > use of thread_fence_acq() which cannot be replaced by load_acq() (similar > case is present in sys/seq.h). Small delicate loops are ideal for duplicating. They are easier to understand individually and short enough to compare without using diff to see gratuitous and substantive differences. Multiple instances are only hard to write and maintain. Since these multiple instances are already written, they are only harder to maintain. >> XX void >> XX binuptime(struct bintime *bt) >> XX { >> XX @@ -361,7 +383,7 @@ >> XX th = timehands; >> XX gen = atomic_load_acq_int(&th->th_generation); >> XX *bt = th->th_offset; >> XX - bintime_addx(bt, th->th_scale * tc_delta(th)); >> XX + bintime_adddelta(bt, th); >> XX atomic_thread_fence_acq(); >> XX } while (gen == 0 || gen != th->th_generation); >> XX } >> >> This is the kind of non-churning change that I like. > Ok. I made all cases where timehands are read, more uniform by > moving calculations after the generation loop. This makes the > atomic part of the functions easier to see, and loop body has lower > chance to hit generation reset. I think this change is slightly worse: - it increases register pressure. 'scale' and 'delta' must be read in a alost program program before the loop exit test. The above order uses them and stores the results to memory, so more registers are free for the exit test. i386 certainly runs out of registers. IIRC, i386 now spills 'gen'. It would have to spill something to load 'gen' or 'th' for the test. - it enlarges the window between reading 'scale' and 'delta' and the caller seeing the results. Preemption in this window gives results that may be far in the past. >>> diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c >>> index 2656fb4d22f..7114a0e5219 100644 >>> --- a/sys/kern/kern_tc.c >>> +++ b/sys/kern/kern_tc.c >>> ... >>> @@ -200,22 +201,77 @@ tc_delta(struct timehands *th) >>> * the comment in for a description of these 12 functions. >>> */ >>> >>> -#ifdef FFCLOCK >>> -void >>> -fbclock_binuptime(struct bintime *bt) >>> +static __inline void >>> +bintime_helper(struct bintime *bt, uint64_t scale, u_int delta) >> >> This name is not descriptive. >> >>> +static __inline void >>> +binnouptime(struct bintime *bt, u_int off) >> >> This name is an example of further problems with the naming scheme. >> The bintime_ prefix used above is verbose, but it is at least a prefix >> and is in the normal bintime_ namespace. Here the prefix is 'bin', >> which is neither of these. It means bintime_ again, but this duplicates >> 'time'. > I agree, and I made a name getthmember for the other function which clearly > reflect its operation. For this one, I ended with bintime_off(). The 'get' name is another problem. I would like all the get*time functions and not add new names starting with 'get'. The library implementation already doesn't bother optimizing the get*time functions, but always uses the hardware timecounter. getfoo() is a more natural name than foo_get() for the action of getting foo, but the latter is better for consistency, especially in code that puts the subsystem name first in nearby code. The get*time functions would be better if they were more like time_second. Note that time_second is racy if time_t is too larger for the arch so that accesses to it are not atomic, as happens on 32-bit arches with premature 64-bit time_t. However, in this 32/64 case, the race is only run every 136 years, with the next event scheduled in 2038, so this race is even less important now than other events scheduled in 2038. Bintimes are 96 or 128 bits, so directly copying a global like time_second for them would race every 1/2**32 second on 2-bit arches or every 1 second on 64-bit arches. Most of the loops on the generation count are for fixing these races, but perhaps a simpler method would work. On 64-bit arches with atomic 64 accesses on 32-bit boundaries, the following would work: - set the lower 32 bits of the fraction to 0, or ignore them - load the higher 32 bits of the fraction and the lower 32 bits of the seconds - race once every 136 years starting in 2038 reading the higher 32 bits of the seconds non-atomically. - alternatively, break instead of racing in 2038 by setting the higher 32 bits to 0. This is the same as using sbintimes instead of bintimes. - drop a few more lower bits by storing a right-shifted value. Right shifting by just 1 gives a race frequency of once per 272 years, with the next one in 2006. > diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c > index 2656fb4d22f..8d12847f2cd 100644 > --- a/sys/kern/kern_tc.c > +++ b/sys/kern/kern_tc.c > @@ -200,20 +202,56 @@ tc_delta(struct timehands *th) > * the comment in for a description of these 12 functions. > */ > > -#ifdef FFCLOCK > -void > -fbclock_binuptime(struct bintime *bt) > +static __inline void > +bintime_off(struct bintime *bt, u_int off) > { > struct timehands *th; > - unsigned int gen; > + struct bintime *btp; > + uint64_t scale, x; > + u_int delta, gen, large_delta; > > do { > th = timehands; > gen = atomic_load_acq_int(&th->th_generation); > - *bt = th->th_offset; > - bintime_addx(bt, th->th_scale * tc_delta(th)); You didn't fully obfuscate this by combinining this function with getthmember() so as to deduplicate the loop. > + btp = (struct bintime *)((vm_offset_t)th + off); Ugly conversion to share code. This is technically incorrect. Improving the casts gives: btp = (void *)(uintptr_t)((uintptr_t)(void *)th + off); but this assumes that arithmetic on the intermediate integer does what is espected. uintptr_t is only guaranteed to work when the intermediate representation held in it is not adjusted. Fixing the API gives static __inline void bintime_off(struct bintime *btp, struct bintime *base_btp) where base_btp is &th->th_bintime or &th->th_offset. (th_offset and th_bintime are badly named. th_offset is really a base time and the offset is tc_delta(). th_bintime is also a base time. It is the same as th_offset with another actual offset (the difference between UTC and local time) already added to it as an optimization. In old versions, th_bintime didn't exist, but the related struct members th_nanotime and th_microtime existed, since these benefit more from not converting on every call. My old version even documents the struct members, while -current still has no comments. The comments were lost to staticization. My version mostly adds "duh" to the banal comments after recovering them: XX /* XX * XXX rotted comment cloned from . XX * XX * th_counter is undocumented (duh). XX * XX * th_adjustment [PPM << 16] which means that the smallest unit of correction XX * you can apply amounts to 481.5 usec/year. XX * XX * th_scale is undocumented (duh). XX * XX * th_offset_count is the contents of the counter which corresponds to the XX * XX * rest of the offset_* values. XX * XX * th_offset is undocumented (duh). XX * XX * th_microtime is undocumented (duh). XX * XX * th_nanotime is undocumented (duh). XX * XX * XXX especially massive bitrot here. "three" is now "many"... XX * Each timecounter must supply an array of three timecounters. This is needed XX * to guarantee atomicity in the code. Index zero is used to transport XX * modifications, for instance done with sysctl, into the timecounter being XX * used in a safe way. Such changes may be adopted with a delay of up to 1/HZ. XX * Index one and two are used alternately for the actual timekeeping. XX * XX * th_generation is undocumented (duh). XX * XX * th_next is undocumented (duh). XX */ > + *bt = *btp; > + scale = th->th_scale; > + delta = tc_delta(th); > + large_delta = th->th_large_delta; I had forgotten that th_scale is so volatile (it may be adjusted on every windup). th_large_delta is equally volatile. So moving the calculation outside of the loop gives even more register pressure than I noticed above. > atomic_thread_fence_acq(); > } while (gen == 0 || gen != th->th_generation); > + > + if (__predict_false(delta < large_delta)) { > + /* Avoid overflow for scale * delta. */ > + x = (scale >> 32) * delta; > + bt->sec += x >> 32; > + bintime_addx(bt, x << 32); > + bintime_addx(bt, (scale & 0xffffffff) * delta); > + } else { > + bintime_addx(bt, scale * delta); > + } > +} > + > +static __inline void > +getthmember(void *out, size_t out_size, u_int off) > +{ > + struct timehands *th; > + u_int gen; > + > + do { > + th = timehands; > + gen = atomic_load_acq_int(&th->th_generation); > + memcpy(out, (char *)th + off, out_size); This isn't so ugly or technically incorrect. Now the object is generic, so the reference to it should be passed as (void *objp, size_t objsize) instead of the type-safe (struct bintime *base_bpt). > + atomic_thread_fence_acq(); > + } while (gen == 0 || gen != th->th_generation); > +} I can see a useful use of copying methods like this for sysctls. All sysctl accesses except possibly for aligned register_t's were orginally racy, but we sprinkled mutexes for large objects and reduced race windows for smaller objects. E.g., sysctl_handle_long() still makes a copy with no locking, but this has no effect except on my i386-with-64-bit-longs since longs have the same size as ints so are as atomic as ints on 32-bit arches. sysctl_handle_64() uses the same method. It works to reduce the race window on 32-bit arches. sysctl_handle_string() makes a copy to malloc()ed storage. memcpy() to that risks losing the NUL terminator, and subsequent strlen() on the copy gives buffer overrun if the result has no terminators. sysctl_handle_opaque() uses a generation count method, like the one used by timecounters before the ordering bugs were fixed, but even more primitive and probably even more in need of ordering fixes. It would be good to fix all sysctl using the same generation count method as above. A loop at the top level might work. I wouldn't like a structure like the above where the top level calls individual sysctl functions which do nothing except wrap themselves in a generic function like the above. The above does give this structure to clock_gettime() calls. The top level converts the clock id to a function and the above makes the function essentially convert back to another clock id (the offset of the relevant field in timehands), especially for the get*time functions where the call just copies the relevant field to userland. Unfortunately, the indivual time functions are called directly in the kernel. I prefer this to generic APIs based on ids. So that callers can use simple efficient APIs like nanouptime() and instead of using complicated inefficieciencies like kern_clock_gettime_generic(int clock_id = CLOCK_MONOTONIC, int format_id = CLOCK_TYPE_TIMESPEC, int precision = CLOCK_PRECISION_NSEC, void *dstp = &ts); Bruce From owner-freebsd-hackers@freebsd.org Sat Mar 9 15:43:16 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C9B92152A3A0 for ; Sat, 9 Mar 2019 15:43:16 +0000 (UTC) (envelope-from hps@selasky.org) Received: from mail.turbocat.net (turbocat.net [88.99.82.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6C8F78C75B for ; Sat, 9 Mar 2019 15:43:16 +0000 (UTC) (envelope-from hps@selasky.org) Received: from hps2016.home.selasky.org (unknown [176.74.212.121]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id 6C5152603CF; Sat, 9 Mar 2019 16:43:13 +0100 (CET) Subject: Re: USB stack getting confused To: Konstantin Belousov , "O'Connor, Daniel" Cc: FreeBSD Hackers References: <3B29D870-41F9-46AF-B9F3-03106DEC417D@dons.net.au> <20190309152613.GM2492@kib.kiev.ua> From: Hans Petter Selasky Message-ID: Date: Sat, 9 Mar 2019 16:42:50 +0100 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: <20190309152613.GM2492@kib.kiev.ua> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 6C8F78C75B X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-7.00 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-1.00)[-0.996,0]; REPLY(-4.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Mar 2019 15:43:17 -0000 On 3/9/19 4:26 PM, Konstantin Belousov wrote: > On Sat, Mar 09, 2019 at 08:59:30PM +1030, O'Connor, Daniel wrote: >> >> >>> On 9 Mar 2019, at 19:30, Hans Petter Selasky wrote: >>> On 3/9/19 12:08 AM, O'Connor, Daniel wrote: >>>> My program normally runs continually doing acquisitions of data for N seconds, doing some checks and restarting. After a while (~30 1 minute acquisitions or ~8 30 minute ones) my program can't 'see' the device (it uses libusb10) any more (it reconnects each acquisition for $REASONS). Also pretty weirdly usbconfig can't see it either(!). >>> >>> What is printed in dmesg? Maybe the device has a problem. >> >> There is nothing in dmesg - no disconnect / reconnect etc. >> >> If I hold the user space process in gdb 'forever' (eg over night) usbconfig doesn't see the device, but the moment I quit the user space process it can be seen again. > > Does it mean that the file descriptor opened for ugen has a chance to > be closed ? The USB stack will wait for all FDs to be closed during detach also via destroy_dev(). > > I suspect that usb subsystem tried to destroy the device but some internal > refcounting prevents it. Proper use of destroy_dev(_cb)(9) avoids > the issue. --HPS From owner-freebsd-hackers@freebsd.org Sat Mar 9 19:28:36 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8BE8B15334A8 for ; Sat, 9 Mar 2019 19:28:36 +0000 (UTC) (envelope-from rozhuk.im@gmail.com) Received: from mail-lj1-x230.google.com (mail-lj1-x230.google.com [IPv6:2a00:1450:4864:20::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E7D066F08B for ; Sat, 9 Mar 2019 19:28:35 +0000 (UTC) (envelope-from rozhuk.im@gmail.com) Received: by mail-lj1-x230.google.com with SMTP id d14so754016ljl.9 for ; Sat, 09 Mar 2019 11:28:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:date:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ASIRQ+rXHp3Y/pFm1CLL1O/yDFyfVNzdlP2/UTzs3w4=; b=mO4Z1mneEYhpBIAurcsCfEx+1AbPLAa0dSeWHv0kkxJhEk8mH59B0KBw83Xb5o8pCl EL0/jNliAp8w5wDlbej8D9cfaCqXy65sCJiOTvpStGQWW7/DGYxbN7MiQkHoZvXW8A/A cIg4yf/hcVvFWzf6VZysqtN5I7Db6MUCCORwD/OWb/tBnumE7andxhRmpYfpQLUAZ/Gy +UpQY5nS0J02f6rxiuwDgJBFqx0FBUSZy4mbRBuuTVW42ZZ99guun6D1+zB60eMjojr8 IZYGb14EcW+jzrDWyucJgyTm+H0MFd6cdKgSEcOb9RFeKy1kn7hqPXy0tvv5wsZ/jQNQ P1vg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:date:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ASIRQ+rXHp3Y/pFm1CLL1O/yDFyfVNzdlP2/UTzs3w4=; b=axMGInJ0kUUaAEhFqahYQynl4t26lEPT6I/zeb0n8PjVLw8GaCC8r5aCP+n2LgEdop f6YDVqyvc52T/GkRslCnMzGxt5ArjHFc9+oI4iQkWSckDgEtToKu9debRNuYmZYy6Sqh qZuDznFhA4jrTks2lavsFvrd8mgB2+KhsYDrT/Szj4y1kWoJwBY/Bk5iUP3p9YJpUwhr ugtij0aHp6iXiXW62GeSNCekMXU41EHyz46wwqxxcB2JIAXmHdQS7rFwawVnP3n3TY/G AtdzYF6e+BSRcZIrQIO32qoU0czDcuEQeUgP6R7eRyZKSd1QToJOyqpwjxHOaQc+X/wr tYIA== X-Gm-Message-State: APjAAAXgCQlI0Bq8pst3uFUW/BVYMny394Wz9eWdWoaO0RrUnYp54ghs vLz1kfwJizwcLRjjngDEscI= X-Google-Smtp-Source: APXvYqxqY5y6S+5iqPw7lpdAkyqEy4rWoN9V3ytW/Y9XESJqSTN77lBBgeqG/hmWtUAo+sMzgJpU1g== X-Received: by 2002:a2e:7314:: with SMTP id o20mr12478741ljc.111.1552159714347; Sat, 09 Mar 2019 11:28:34 -0800 (PST) Received: from rimwks ([2001:470:1f15:3d8:7285:c2ff:fe43:675b]) by smtp.gmail.com with ESMTPSA id m1sm287795lfh.36.2019.03.09.11.28.33 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Sat, 09 Mar 2019 11:28:33 -0800 (PST) From: Rozhuk Ivan X-Google-Original-From: Rozhuk Ivan Date: Sat, 9 Mar 2019 22:28:27 +0300 To: Konstantin Belousov Cc: Hans Petter Selasky , FreeBSD Hackers , "O'Connor, Daniel" Subject: Re: USB stack getting confused Message-ID: <20190309222827.5407ddbf@rimwks> In-Reply-To: <20190309162640.GN2492@kib.kiev.ua> References: <3B29D870-41F9-46AF-B9F3-03106DEC417D@dons.net.au> <20190309152613.GM2492@kib.kiev.ua> <20190309162640.GN2492@kib.kiev.ua> X-Mailer: Claws Mail 3.17.3 (GTK+ 2.24.32; amd64-portbld-freebsd12.0) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: E7D066F08B X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-7.00 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-1.00)[-0.997,0]; REPLY(-4.00)[]; TAGGED_FROM(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Mar 2019 19:28:36 -0000 On Sat, 9 Mar 2019 18:26:40 +0200 Konstantin Belousov wrote: > In fact I saw something similar with apcupsd and either usb/com > adapters or native usb control card for APC UPSes. For reasons I do > not understand, these devices are often disconnected. For older > versions of apcupsd, it required restart for newly reattached device > to be recreated in /dev. Sometimes it hangs whole usb stack. > > Newer apcupsd seems to open /dev/ugen only for the duration of the > query, which makes the erratic behaviour is much less likely, but > could still cause breakage when device disappear while apcupsd has it > opened. > Same problem with usb sound cards. I try to fix it, but fail with dsp, only mixer can be fixed with small code change. https://reviews.freebsd.org/D11140 From owner-freebsd-hackers@freebsd.org Sat Mar 9 15:26:39 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9335B1529B54 for ; Sat, 9 Mar 2019 15:26:39 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A6DD28BE9C for ; Sat, 9 Mar 2019 15:26:38 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x29FQDfp071741 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sat, 9 Mar 2019 17:26:16 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x29FQDfp071741 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id x29FQDYo071740; Sat, 9 Mar 2019 17:26:13 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 9 Mar 2019 17:26:13 +0200 From: Konstantin Belousov To: "O'Connor, Daniel" Cc: Hans Petter Selasky , FreeBSD Hackers Subject: Re: USB stack getting confused Message-ID: <20190309152613.GM2492@kib.kiev.ua> References: <3B29D870-41F9-46AF-B9F3-03106DEC417D@dons.net.au> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3B29D870-41F9-46AF-B9F3-03106DEC417D@dons.net.au> User-Agent: Mutt/1.11.3 (2019-02-01) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Mar 2019 15:26:39 -0000 On Sat, Mar 09, 2019 at 08:59:30PM +1030, O'Connor, Daniel wrote: > > > > On 9 Mar 2019, at 19:30, Hans Petter Selasky wrote: > > On 3/9/19 12:08 AM, O'Connor, Daniel wrote: > >> My program normally runs continually doing acquisitions of data for N seconds, doing some checks and restarting. After a while (~30 1 minute acquisitions or ~8 30 minute ones) my program can't 'see' the device (it uses libusb10) any more (it reconnects each acquisition for $REASONS). Also pretty weirdly usbconfig can't see it either(!). > > > > What is printed in dmesg? Maybe the device has a problem. > > There is nothing in dmesg - no disconnect / reconnect etc. > > If I hold the user space process in gdb 'forever' (eg over night) usbconfig doesn't see the device, but the moment I quit the user space process it can be seen again. Does it mean that the file descriptor opened for ugen has a chance to be closed ? I suspect that usb subsystem tried to destroy the device but some internal refcounting prevents it. Proper use of destroy_dev(_cb)(9) avoids the issue. From owner-freebsd-hackers@freebsd.org Sat Mar 9 21:35:55 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A516A1537F76 for ; Sat, 9 Mar 2019 21:35:55 +0000 (UTC) (envelope-from hps@selasky.org) Received: from mail.turbocat.net (turbocat.net [88.99.82.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3718274233 for ; Sat, 9 Mar 2019 21:35:55 +0000 (UTC) (envelope-from hps@selasky.org) Received: from hps2016.home.selasky.org (unknown [176.74.212.121]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id 4C742260377; Sat, 9 Mar 2019 22:35:52 +0100 (CET) Subject: Re: USB stack getting confused To: Konstantin Belousov , Warner Losh Cc: FreeBSD Hackers , "O'Connor, Daniel" References: <3B29D870-41F9-46AF-B9F3-03106DEC417D@dons.net.au> <20190309152613.GM2492@kib.kiev.ua> <20190309162640.GN2492@kib.kiev.ua> <20190309192330.GO2492@kib.kiev.ua> From: Hans Petter Selasky Message-ID: Date: Sat, 9 Mar 2019 22:35:28 +0100 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: <20190309192330.GO2492@kib.kiev.ua> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 3718274233 X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-6.94 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-0.94)[-0.944,0]; REPLY(-4.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Mar 2019 21:35:55 -0000 On 3/9/19 8:23 PM, Konstantin Belousov wrote: > On Sat, Mar 09, 2019 at 11:41:31AM -0700, Warner Losh wrote: >> >> Is there a form of destroy_dev() that does a revoke on all open instances? >> Eg, this is gone, you can't use it anymore, and all further attempts to use >> the device will generate an error, but in the mean time we destroy the >> device and let the detach routine get on with life. waiting may make sense >> when you are merely unloading the driver (and getting to the detach routine >> that way), but when the device is gone, I've come around to the point of >> view that we should just destroy it w/o waiting for closes and anybody that >> touches it afterwards gets an error and has to cope with the error. But >> even in the unload case, we maybe we shouldn't get to the detach routine >> unless we're forcing and/or the detach routine just returns EBUSY since the >> only one that knows what dev_t's are associated with the device_t is the >> driver itself. > You are asking very basic questions about devfs there. > > destroy_dev(9) waits for two things: > - that all threads left the cdevsw methods for the given device; > - that all cdevpriv destructors finished running. Hi, > To facilitate waking up threads potentially sleeping inside the cdevsw > methods, drivers might implement d_purge method which must weed out sleeping > threads from inside the code in the bound time. USB will purge all callers before calling destroy_dev(). This is not the problem. > After that we return from destroy_dev(9) and guarantee that no new calls > into cdevsw is done for this device. devfs magic consumes the fo_ and > VOP_ calls and does not allow them to reach into the driver. When I designed the current USB devfs it was important to me to keep open() and close() calls balanced to avoid situations where an open call may setup some resource and then close(), which free this resource again, never gets called. destroy_dev(9) makes no such guarantee, and I think that is a failure of destroy_dev(9). That's when I started using the cdev's destructor callback function. > So what usb does there is actively defeating existing mechanism by > keeping internal refcount on opens and refusing to call destroy_dev() > until the count goes to zero The FreeBSD USB stack also is used in environments w/o DEVFS and need own refcounts. > (I did not read the usb code, but I believe > that I am not too wrong). >Would usb core just destroy_dev() when the > physical device goes away, then at worst the existing file descriptors > opened against the lost devices would become dead (not same dead as > terminals after revoke(2), but very similar). Yes, I can do that if destroy_dev() ensures that d_close is called for all open file handles once and only once before it returns. I think this is where the problem comes from. > > If the problem is due to keeping some instance data for the opened device, > then cdevpriv might be the better fit (at least the KPI was designed > to be) than blocking destroy until all users are gone. > The USB stack does not use MMAP, so this is not a problem. --HPS From owner-freebsd-hackers@freebsd.org Sat Mar 9 18:41:45 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4C25F1531D8D for ; Sat, 9 Mar 2019 18:41:45 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-qt1-x841.google.com (mail-qt1-x841.google.com [IPv6:2607:f8b0:4864:20::841]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CF1B06D62B for ; Sat, 9 Mar 2019 18:41:44 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-qt1-x841.google.com with SMTP id s1so882815qte.5 for ; Sat, 09 Mar 2019 10:41:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=24DamIhEtYgu9AJuB4aZZtn3WPvNEr/Wvg55cTkzOto=; b=ms6GynYaXuGJPP/9M/LM5u5zDF/yOhmzCgmkOPgBSwkGVhX5HTpGQc/X0/vRN/O9f3 kNCxVNWnvOmyokKRcsh7e3RAcEzxXj6R6n0VNVseBYaU9E9GS0SL0+SPWcFVwSF+hvZL MQFVkf3+aD03xQM1i7SzJ8W3JI5CrWERhJGUNK3cUMjN86MRhWtll4FRcTf3f0vEJmIl RTqCiJivIx10cwQDJ0psSI18tVJhuQ2GoR1ZW/WinPsntyP+nzv2ho/TwRHRti7o0W2H VT0iCG6MrPjH2nEvC1MWbsMS3NvRODf6M+/1iIJSDYDP6qXkultYG4gO/rfJT5e4smy4 jpbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=24DamIhEtYgu9AJuB4aZZtn3WPvNEr/Wvg55cTkzOto=; b=HuLn4mBdPDTe6bbt8akE+3ZCx7ClhhG3mUeDrnvoQKxLVXy8S74uXKHfLHb6XMZ58j 3es7BTuQgHoduliSyzViZiNvsAwz1NoIxXfChmGN4AL7lB2o5TYLSRDZD3I2doq9u6XI 3mLds2qW/hXKK3nRbS8Pk9TyxGCEVRoq0ZPOclc00OLXF3GRxaDD4UVjP+JdZzRD30ej mHh8tsQv8xqC+Nu79F2b/Ui3qckqQUdMSb9HLxeSVGke/WJCqoDEDHlDkEx0zMF/KkrQ l9assGMp8DY1GwNymnNsK2r5WPLrUF8IOH8ZFAxJRzDp+85fzM3ySc5adFS9F48uojAC 8JEQ== X-Gm-Message-State: APjAAAW6MOjwoskq4jIhmm8ODC5K6VgQjrf2mAmSBhG2mW1O7euWBkm3 fNwQDYxro4ZQ5jvVPjHP8pkGsNozWcZXPgSqh++kKg== X-Google-Smtp-Source: APXvYqyp2msurvoc+LZPlRiHfBN/AHYoxsS4b6sfY8TzmeRCqr7j2rU40tWFDQ7yUfRMlKwuE6PtMDeeGqzjAMN5Ju8= X-Received: by 2002:a0c:9ba7:: with SMTP id o39mr19638971qve.153.1552156904308; Sat, 09 Mar 2019 10:41:44 -0800 (PST) MIME-Version: 1.0 References: <3B29D870-41F9-46AF-B9F3-03106DEC417D@dons.net.au> <20190309152613.GM2492@kib.kiev.ua> <20190309162640.GN2492@kib.kiev.ua> In-Reply-To: <20190309162640.GN2492@kib.kiev.ua> From: Warner Losh Date: Sat, 9 Mar 2019 11:41:31 -0700 Message-ID: Subject: Re: USB stack getting confused To: Konstantin Belousov Cc: Hans Petter Selasky , FreeBSD Hackers , "O'Connor, Daniel" X-Rspamd-Queue-Id: CF1B06D62B X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-6.99 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-0.99)[-0.993,0]; REPLY(-4.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0] Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Mar 2019 18:41:45 -0000 On Sat, Mar 9, 2019 at 11:25 AM Konstantin Belousov wrote: > On Sat, Mar 09, 2019 at 04:42:50PM +0100, Hans Petter Selasky wrote: > > On 3/9/19 4:26 PM, Konstantin Belousov wrote: > > > On Sat, Mar 09, 2019 at 08:59:30PM +1030, O'Connor, Daniel wrote: > > >> > > >> > > >>> On 9 Mar 2019, at 19:30, Hans Petter Selasky > wrote: > > >>> On 3/9/19 12:08 AM, O'Connor, Daniel wrote: > > >>>> My program normally runs continually doing acquisitions of data for > N seconds, doing some checks and restarting. After a while (~30 1 minute > acquisitions or ~8 30 minute ones) my program can't 'see' the device (it > uses libusb10) any more (it reconnects each acquisition for $REASONS). Also > pretty weirdly usbconfig can't see it either(!). > > >>> > > >>> What is printed in dmesg? Maybe the device has a problem. > > >> > > >> There is nothing in dmesg - no disconnect / reconnect etc. > > >> > > >> If I hold the user space process in gdb 'forever' (eg over night) > usbconfig doesn't see the device, but the moment I quit the user space > process it can be seen again. > > > > > > Does it mean that the file descriptor opened for ugen has a chance to > > > be closed ? > > > > The USB stack will wait for all FDs to be closed during detach also via > > destroy_dev(). > So my guess was correct. Do you agree that this behaviour is wrong ? > > In fact I saw something similar with apcupsd and either usb/com adapters > or native usb control card for APC UPSes. For reasons I do not understand, > these devices are often disconnected. For older versions of apcupsd, > it required restart for newly reattached device to be recreated in /dev. > Sometimes it hangs whole usb stack. > > Newer apcupsd seems to open /dev/ugen only for the duration of the query, > which makes the erratic behaviour is much less likely, but could still > cause > breakage when device disappear while apcupsd has it opened. > Is there a form of destroy_dev() that does a revoke on all open instances? Eg, this is gone, you can't use it anymore, and all further attempts to use the device will generate an error, but in the mean time we destroy the device and let the detach routine get on with life. waiting may make sense when you are merely unloading the driver (and getting to the detach routine that way), but when the device is gone, I've come around to the point of view that we should just destroy it w/o waiting for closes and anybody that touches it afterwards gets an error and has to cope with the error. But even in the unload case, we maybe we shouldn't get to the detach routine unless we're forcing and/or the detach routine just returns EBUSY since the only one that knows what dev_t's are associated with the device_t is the driver itself. Warner > > > > > > > I suspect that usb subsystem tried to destroy the device but some > internal > > > refcounting prevents it. Proper use of destroy_dev(_cb)(9) avoids > > > the issue. > > > > --HPS > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > From owner-freebsd-hackers@freebsd.org Sat Mar 9 19:23:57 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2E8C91533181 for ; Sat, 9 Mar 2019 19:23:57 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7B8916EDDD for ; Sat, 9 Mar 2019 19:23:56 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x29JNV95026317 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sat, 9 Mar 2019 21:23:34 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x29JNV95026317 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id x29JNUJK026315; Sat, 9 Mar 2019 21:23:30 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 9 Mar 2019 21:23:30 +0200 From: Konstantin Belousov To: Warner Losh Cc: Hans Petter Selasky , FreeBSD Hackers , "O'Connor, Daniel" Subject: Re: USB stack getting confused Message-ID: <20190309192330.GO2492@kib.kiev.ua> References: <3B29D870-41F9-46AF-B9F3-03106DEC417D@dons.net.au> <20190309152613.GM2492@kib.kiev.ua> <20190309162640.GN2492@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.11.3 (2019-02-01) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Mar 2019 19:23:57 -0000 On Sat, Mar 09, 2019 at 11:41:31AM -0700, Warner Losh wrote: > On Sat, Mar 9, 2019 at 11:25 AM Konstantin Belousov > wrote: > > > On Sat, Mar 09, 2019 at 04:42:50PM +0100, Hans Petter Selasky wrote: > > > On 3/9/19 4:26 PM, Konstantin Belousov wrote: > > > > On Sat, Mar 09, 2019 at 08:59:30PM +1030, O'Connor, Daniel wrote: > > > >> > > > >> > > > >>> On 9 Mar 2019, at 19:30, Hans Petter Selasky > > wrote: > > > >>> On 3/9/19 12:08 AM, O'Connor, Daniel wrote: > > > >>>> My program normally runs continually doing acquisitions of data for > > N seconds, doing some checks and restarting. After a while (~30 1 minute > > acquisitions or ~8 30 minute ones) my program can't 'see' the device (it > > uses libusb10) any more (it reconnects each acquisition for $REASONS). Also > > pretty weirdly usbconfig can't see it either(!). > > > >>> > > > >>> What is printed in dmesg? Maybe the device has a problem. > > > >> > > > >> There is nothing in dmesg - no disconnect / reconnect etc. > > > >> > > > >> If I hold the user space process in gdb 'forever' (eg over night) > > usbconfig doesn't see the device, but the moment I quit the user space > > process it can be seen again. > > > > > > > > Does it mean that the file descriptor opened for ugen has a chance to > > > > be closed ? > > > > > > The USB stack will wait for all FDs to be closed during detach also via > > > destroy_dev(). > > So my guess was correct. Do you agree that this behaviour is wrong ? > > > > In fact I saw something similar with apcupsd and either usb/com adapters > > or native usb control card for APC UPSes. For reasons I do not understand, > > these devices are often disconnected. For older versions of apcupsd, > > it required restart for newly reattached device to be recreated in /dev. > > Sometimes it hangs whole usb stack. > > > > Newer apcupsd seems to open /dev/ugen only for the duration of the query, > > which makes the erratic behaviour is much less likely, but could still > > cause > > breakage when device disappear while apcupsd has it opened. > > > > Is there a form of destroy_dev() that does a revoke on all open instances? > Eg, this is gone, you can't use it anymore, and all further attempts to use > the device will generate an error, but in the mean time we destroy the > device and let the detach routine get on with life. waiting may make sense > when you are merely unloading the driver (and getting to the detach routine > that way), but when the device is gone, I've come around to the point of > view that we should just destroy it w/o waiting for closes and anybody that > touches it afterwards gets an error and has to cope with the error. But > even in the unload case, we maybe we shouldn't get to the detach routine > unless we're forcing and/or the detach routine just returns EBUSY since the > only one that knows what dev_t's are associated with the device_t is the > driver itself. You are asking very basic questions about devfs there. destroy_dev(9) waits for two things: - that all threads left the cdevsw methods for the given device; - that all cdevpriv destructors finished running. To facilitate waking up threads potentially sleeping inside the cdevsw methods, drivers might implement d_purge method which must weed out sleeping threads from inside the code in the bound time. After that we return from destroy_dev(9) and guarantee that no new calls into cdevsw is done for this device. devfs magic consumes the fo_ and VOP_ calls and does not allow them to reach into the driver. So what usb does there is actively defeating existing mechanism by keeping internal refcount on opens and refusing to call destroy_dev() until the count goes to zero (I did not read the usb code, but I believe that I am not too wrong). Would usb core just destroy_dev() when the physical device goes away, then at worst the existing file descriptors opened against the lost devices would become dead (not same dead as terminals after revoke(2), but very similar). If the problem is due to keeping some instance data for the opened device, then cdevpriv might be the better fit (at least the KPI was designed to be) than blocking destroy until all users are gone. From owner-freebsd-hackers@freebsd.org Sat Mar 9 20:57:40 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B9D281536CCA for ; Sat, 9 Mar 2019 20:57:40 +0000 (UTC) (envelope-from hps@selasky.org) Received: from mail.turbocat.net (turbocat.net [88.99.82.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 683457253D for ; Sat, 9 Mar 2019 20:57:39 +0000 (UTC) (envelope-from hps@selasky.org) Received: from hps2016.home.selasky.org (unknown [176.74.212.121]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id 86CCB26011B; Sat, 9 Mar 2019 21:57:36 +0100 (CET) Subject: Re: USB stack getting confused To: Warner Losh , Konstantin Belousov Cc: FreeBSD Hackers , "O'Connor, Daniel" References: <3B29D870-41F9-46AF-B9F3-03106DEC417D@dons.net.au> <20190309152613.GM2492@kib.kiev.ua> <20190309162640.GN2492@kib.kiev.ua> From: Hans Petter Selasky Message-ID: <44116887-3dc8-d3a9-e9b6-c32a6876b1ec@selasky.org> Date: Sat, 9 Mar 2019 21:57:13 +0100 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 683457253D X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of hps@selasky.org designates 88.99.82.50 as permitted sender) smtp.mailfrom=hps@selasky.org X-Spamd-Result: default: False [-6.26 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; R_SPF_ALLOW(-0.20)[+a:mail.turbocat.net]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[selasky.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[mail.turbocat.net]; NEURAL_HAM_SHORT(-0.67)[-0.671,0]; IP_SCORE(-3.28)[ip: (-9.49), ipnet: 88.99.0.0/16(-4.66), asn: 24940(-2.23), country: DE(-0.01)]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:24940, ipnet:88.99.0.0/16, country:DE]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Mar 2019 20:57:41 -0000 On 3/9/19 7:41 PM, Warner Losh wrote: >> Newer apcupsd seems to open /dev/ugen only for the duration of the query, >> which makes the erratic behaviour is much less likely, but could still >> cause >> breakage when device disappear while apcupsd has it opened. >> > Is there a form of destroy_dev() that does a revoke on all open instances? > Eg, this is gone, you can't use it anymore, and all further attempts to use > the device will generate an error, but in the mean time we destroy the > device and let the detach routine get on with life. waiting may make sense > when you are merely unloading the driver (and getting to the detach routine > that way), but when the device is gone, I've come around to the point of > view that we should just destroy it w/o waiting for closes and anybody that > touches it afterwards gets an error and has to cope with the error. But > even in the unload case, we maybe we shouldn't get to the detach routine > unless we're forcing and/or the detach routine just returns EBUSY since the > only one that knows what dev_t's are associated with the device_t is the > driver itself. Hi, There are multiple issues here: 1) The USB stack use device numbers from device_get_unit() when creating character devices. That means it must wait at least until the VNODE in /dev is removed, and the same device name can be re-used. 2) When disconnecting the "struct file" from the USB, lost memory might pile up if these daemons which are typically created by devd don't get killed. Many of these applications are using libusb. We can add a heartbeat thread inside there to simply close the ugen device handle when we understand the device is gone. That will close 99% of these issues. --HPS --HPS From owner-freebsd-hackers@freebsd.org Sat Mar 9 21:40:29 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 52B7B1538274 for ; Sat, 9 Mar 2019 21:40:29 +0000 (UTC) (envelope-from hps@selasky.org) Received: from mail.turbocat.net (turbocat.net [88.99.82.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 93766744CF for ; Sat, 9 Mar 2019 21:40:28 +0000 (UTC) (envelope-from hps@selasky.org) Received: from hps2016.home.selasky.org (unknown [176.74.212.121]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id 6CD1D260377; Sat, 9 Mar 2019 22:40:26 +0100 (CET) Subject: Re: USB stack getting confused To: Rozhuk Ivan , Konstantin Belousov Cc: FreeBSD Hackers , "O'Connor, Daniel" References: <3B29D870-41F9-46AF-B9F3-03106DEC417D@dons.net.au> <20190309152613.GM2492@kib.kiev.ua> <20190309162640.GN2492@kib.kiev.ua> <20190309222827.5407ddbf@rimwks> From: Hans Petter Selasky Message-ID: Date: Sat, 9 Mar 2019 22:40:02 +0100 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: <20190309222827.5407ddbf@rimwks> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 93766744CF X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of hps@selasky.org designates 88.99.82.50 as permitted sender) smtp.mailfrom=hps@selasky.org X-Spamd-Result: default: False [-6.47 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; R_SPF_ALLOW(-0.20)[+a:mail.turbocat.net]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; TAGGED_RCPT(0.00)[]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[selasky.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: mail.turbocat.net]; NEURAL_HAM_SHORT(-0.88)[-0.878,0]; IP_SCORE(-3.28)[ip: (-9.49), ipnet: 88.99.0.0/16(-4.66), asn: 24940(-2.23), country: DE(-0.01)]; FREEMAIL_TO(0.00)[gmail.com]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:24940, ipnet:88.99.0.0/16, country:DE]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Mar 2019 21:40:29 -0000 On 3/9/19 8:28 PM, Rozhuk Ivan wrote: > On Sat, 9 Mar 2019 18:26:40 +0200 > Konstantin Belousov wrote: > >> In fact I saw something similar with apcupsd and either usb/com >> adapters or native usb control card for APC UPSes. For reasons I do >> not understand, these devices are often disconnected. For older >> versions of apcupsd, it required restart for newly reattached device >> to be recreated in /dev. Sometimes it hangs whole usb stack. >> >> Newer apcupsd seems to open /dev/ugen only for the duration of the >> query, which makes the erratic behaviour is much less likely, but >> could still cause breakage when device disappear while apcupsd has it >> opened. >> > > Same problem with usb sound cards. > I try to fix it, but fail with dsp, only mixer can be fixed with small code change. > https://reviews.freebsd.org/D11140 > Hi, How will these apps detect that they need to open the new /dev/mixer node? I mean, after hang is fixed, mixer app will still try to query the old file handle forever? --HPS From owner-freebsd-hackers@freebsd.org Sat Mar 9 22:56:15 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 364FC153B914 for ; Sat, 9 Mar 2019 22:56:15 +0000 (UTC) (envelope-from rozhuk.im@gmail.com) Received: from mail-lf1-x130.google.com (mail-lf1-x130.google.com [IPv6:2a00:1450:4864:20::130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 0D21877E9C for ; Sat, 9 Mar 2019 22:56:14 +0000 (UTC) (envelope-from rozhuk.im@gmail.com) Received: by mail-lf1-x130.google.com with SMTP id f16so822196lfk.12 for ; Sat, 09 Mar 2019 14:56:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:date:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=U+R8/bTK3ypl25ePSe246exdMmdiI8tlt9vUM1360wI=; b=dPgKmX5DRVIpVfSK3HMKtZFx063b827+L/8BGytYaYezu6jtFf+Jx+IzH2/PRV192M FPtfbJJbjvKOpmGd+wlLxcxGP+Pv58Q6EqImuQncQP1LBfNlfQwbY2stkO4x34C89jh+ vK0bpSck2lmfzxQWCeSzrAT8jbzwDYqcWX60IGqMNj+T1IVspXX1D4oGMqgMLAf+lmWx LgKX6+7QJmBs8zwIFiYYMr0HDR2iKvJubrHCeuoN4fgTZrS+TiY0G15dZ/YJajHGzMmN yQC0AygpJnjjPQaHJ+TR+ggPKIA7pjiVGSArX5Qs1aX5PMMBNYxXMuSPJhTbuCSF2Bl9 CFug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:date:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=U+R8/bTK3ypl25ePSe246exdMmdiI8tlt9vUM1360wI=; b=nU/Z3j2Ly9nKemK+ts8d7FFZ57x05Qu+T1FB9lsplHSTWw/LwhzW85/lEWpBWOJgui 5T8/d1qRoyH07Hoo4lFKwl5WdjrmHyJ00HVnmnkD5WyokaYQXcsxF1KokMNiUK124M/4 j4dPg+XN3zYKzYU3dTiQHJYTIPtRYq24gaECFujvZPYSn4q9XM8gBr2p8mnvb8+o+Bou g+ue7Op8i9F8fosSWolj6YDlaT6RjEqOEPPxUk7o/oHowpxH0o/CbzFmBgjMnXUYNi64 k1M/tqTwy5xWYJ05sF8cHyh4KJm/yMb0juGb2fsIFCqgCE35qFqdpwN3tYC02pR6ELLX T/bg== X-Gm-Message-State: APjAAAWrT8VpPySiRZl9A7S2Ib/63l9TKmTOMQ7hZn1yyu0ss1M/F8iE Wu7LnjsO2f1jdUtFpF5o5qE= X-Google-Smtp-Source: APXvYqzXBjyRwwO49nWDgFXE13ZC1k1Pxt3x8qHdS7sTGwnjNlB47QmsFvRPBBMRoftLUTJtai0LMg== X-Received: by 2002:ac2:5228:: with SMTP id i8mr13587152lfl.162.1552172171278; Sat, 09 Mar 2019 14:56:11 -0800 (PST) Received: from rimwks ([2001:470:1f15:3d8:7285:c2ff:fe43:675b]) by smtp.gmail.com with ESMTPSA id u18sm338516lfd.15.2019.03.09.14.56.10 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Sat, 09 Mar 2019 14:56:10 -0800 (PST) From: Rozhuk Ivan X-Google-Original-From: Rozhuk Ivan Date: Sun, 10 Mar 2019 01:56:08 +0300 To: Hans Petter Selasky Cc: Konstantin Belousov , FreeBSD Hackers , "O'Connor, Daniel" Subject: Re: USB stack getting confused Message-ID: <20190310015608.4d32e14f@rimwks> In-Reply-To: References: <3B29D870-41F9-46AF-B9F3-03106DEC417D@dons.net.au> <20190309152613.GM2492@kib.kiev.ua> <20190309162640.GN2492@kib.kiev.ua> <20190309222827.5407ddbf@rimwks> X-Mailer: Claws Mail 3.17.3 (GTK+ 2.24.32; amd64-portbld-freebsd12.0) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 0D21877E9C X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=dPgKmX5D; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of rozhukim@gmail.com designates 2a00:1450:4864:20::130 as permitted sender) smtp.mailfrom=rozhukim@gmail.com X-Spamd-Result: default: False [-6.25 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36]; FREEMAIL_FROM(0.00)[gmail.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: alt3.gmail-smtp-in.l.google.com]; DKIM_TRACE(0.00)[gmail.com:+]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; NEURAL_HAM_SHORT(-0.98)[-0.977,0]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; TAGGED_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[0.3.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.5.4.1.0.0.a.2.list.dnswl.org : 127.0.5.0]; IP_SCORE(-2.76)[ip: (-9.34), ipnet: 2a00:1450::/32(-2.32), asn: 15169(-2.06), country: US(-0.07)]; MID_RHS_NOT_FQDN(0.50)[]; FREEMAIL_CC(0.00)[gmail.com] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Mar 2019 22:56:15 -0000 On Sat, 9 Mar 2019 22:40:02 +0100 Hans Petter Selasky wrote: > > Same problem with usb sound cards. > > I try to fix it, but fail with dsp, only mixer can be fixed with > > small code change. https://reviews.freebsd.org/D11140 > > > > Hi, > > How will these apps detect that they need to open the new /dev/mixer > node? > > I mean, after hang is fixed, mixer app will still try to query the > old file handle forever? > Main problem for me is: usb device lost/reconnected, new device connected, but FreeBSD does nothink because USB stack hang - it wait for all fd closed for mixer and dsp. Apps can be rewrited/pathed: on dev lost - get error on operations with fd, then try to reopen it. I dont remember now how that work in patch, it is undone. Another OSS issue - apps do not react on hw.snd.default_unit change. I mitigate reconnect issue in hardware: - switch to sound via HDMI - add real LC filter to home power line: I have long USB link from PC to work place USB HUB with kb, mouse, usb sound ...,and every time after refregerator start/stop I got lost USB link to hub, LC filter fix this. After that kb, mouse and other usb devices does not replug untill I close all apps that have opened fd from mixer and dsp.