From owner-freebsd-hackers@freebsd.org  Sat Mar  2 17:14:31 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 45B3E150565B;
 Sat,  2 Mar 2019 17:14:31 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au
 [211.29.132.42])
 by mx1.freebsd.org (Postfix) with ESMTP id 78E0B700AC;
 Sat,  2 Mar 2019 17:14:29 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au
 [110.21.101.228])
 by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 5BD983DD847;
 Sun,  3 Mar 2019 04:14:24 +1100 (AEDT)
Date: Sun, 3 Mar 2019 04:14:23 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Poul-Henning Kamp <phk@phk.freebsd.dk>
cc: Konstantin Belousov <kostikbel@gmail.com>, Ian Lepore <ian@freebsd.org>, 
 Mark Millard <marklmi@yahoo.com>, 
 Mark Millard via freebsd-hackers <freebsd-hackers@freebsd.org>, 
 Konstantin Belousov <kib@freebsd.org>, 
 FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale
 * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed]
In-Reply-To: <9993.1551536230@critter.freebsd.dk>
Message-ID: <20190303032006.T4781@besplex.bde.org>
References: <D3D7E9F4-9A5E-4320-B3C8-EC5CEF4A2764@yahoo.com>
 <20190228145542.GT2420@kib.kiev.ua> <20190228150811.GU2420@kib.kiev.ua>
 <962D78C3-65BE-40C1-BB50-A0088223C17B@yahoo.com>
 <28C2BB0A-3DAA-4D18-A317-49A8DD52778F@yahoo.com>
 <20190301112717.GW2420@kib.kiev.ua>
 <679402FF-907C-43AF-B18C-8C9CC857D7A6@yahoo.com>
 <6669.1551473821@critter.freebsd.dk>
 <210dfd0f50ee6b1149c914ee503502654eb5f328.camel@freebsd.org>
 <20190302105652.GD68879@kib.kiev.ua> <9993.1551536230@critter.freebsd.dk>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=UJetJGXy c=1 sm=1 tr=0
 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17
 a=kj9zAlcOel0A:10 a=nwOOQBBF5AvJ24hNhIcA:9 a=CjuIK1q_8ugA:10
X-Rspamd-Queue-Id: 78E0B700AC
X-Spamd-Bar: ------
Authentication-Results: mx1.freebsd.org;
 spf=pass (mx1.freebsd.org: domain of brde@optusnet.com.au designates
 211.29.132.42 as permitted sender) smtp.mailfrom=brde@optusnet.com.au
X-Spamd-Result: default: False [-6.25 / 15.00]; ARC_NA(0.00)[];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 RCVD_IN_DNSWL_LOW(-0.10)[42.132.29.211.list.dnswl.org : 127.0.5.1];
 FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:211.29.132.0/23];
 FREEMAIL_FROM(0.00)[optusnet.com.au];
 MIME_GOOD(-0.10)[text/plain]; MIME_TRACE(0.00)[0:+];
 DMARC_NA(0.00)[optusnet.com.au];
 NEURAL_HAM_LONG(-1.00)[-1.000,0];
 TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[];
 MX_GOOD(-0.01)[cached: extmail.optusnet.com.au];
 NEURAL_HAM_SHORT(-0.83)[-0.827,0]; RCPT_COUNT_SEVEN(0.00)[7];
 IP_SCORE(-3.11)[ip: (-8.30), ipnet: 211.28.0.0/14(-4.01), asn: 4804(-3.19),
 country: AU(-0.04)]; RCVD_NO_TLS_LAST(0.10)[];
 FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[];
 FREEMAIL_ENVFROM(0.00)[optusnet.com.au];
 ASN(0.00)[asn:4804, ipnet:211.28.0.0/14, country:AU];
 FREEMAIL_CC(0.00)[gmail.com]; RCVD_COUNT_TWO(0.00)[2]
X-Mailman-Approved-At: Sun, 03 Mar 2019 01:45:11 +0000
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Mar 2019 17:14:31 -0000

On Sat, 2 Mar 2019, Poul-Henning Kamp wrote:

> --------
> In message <20190302105652.GD68879@kib.kiev.ua>, Konstantin Belousov writes:
>
>> Using more than two timehands increases a chance of reader to try to
>> use outdated timehands.
>
> No, using only two timehands increase the chance that the reader tries
> to use the timehand which is being updated.

Then it sees the generation change and retries.  We fixed the ordering
of accesses to the generation count so that this is robust.  1 timehands
is always valid, so with 2 timehands there is no wait for the retry
except in the very unlikely event that the generation changes for the
new timehands too.  1 timehands would work too, but the retries would
have to wait while it is updated.

> As long as the reader does not use the timehand being updated, using
> a one or two generations old timehand is OK.

In old versions, there were races checking the generation count.  Having
multiple timehands made these races more unlikely to matter.

> The target-value for delta-t was "a few milliseconds" when I wrote
> timecounters, if somebody has changed that since, I hope they did
> their math first.

Tickless kernels complicate things.  It's surprising that tc_ticktock()
works so well with them.  Calls to hardclock() are not periodic, so
calls to tc_ticktock() are not periodic either.  It has to handle
coalesced and 1/hz ticks.  Too much coalescing would break it.  With
my normal hz = 100, cpu0:timer interrupts still occur at at least 100 Hz.
These presumably go to hardclock(), so the timing is satisfied.  With
hz = 1000, cpu0:timer interrupts only occur at at least 200 Hz.  This
is less than tc_ticktock() expects, but it still works.

Bruce

From owner-freebsd-hackers@freebsd.org  Sun Mar  3 01:02:26 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3FE541519EE1
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Sun,  3 Mar 2019 01:02:26 +0000 (UTC)
 (envelope-from asomers@gmail.com)
Received: from mail-lf1-f52.google.com (mail-lf1-f52.google.com
 [209.85.167.52])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 5CCC68D115
 for <freebsd-hackers@freebsd.org>; Sun,  3 Mar 2019 01:02:25 +0000 (UTC)
 (envelope-from asomers@gmail.com)
Received: by mail-lf1-f52.google.com with SMTP id 131so1055631lfa.5
 for <freebsd-hackers@freebsd.org>; Sat, 02 Mar 2019 17:02:25 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:from:date:message-id:subject:to;
 bh=ohD1f3Ox+9VHloeb66MbX2uGbpZ30c9wSGwtkKW8C5w=;
 b=p7aHhcwjsTomAl/uN1e5YfuAIPoqqJwoSDUiBqrtp3E+klzRCpF2+Q11V7PCpiwLPK
 obj/ds7nLaML0MGn95Y+wiDFXYi5mGk+r+Sy0fKkJrg5aFKhvjT0YiyrF1uCyu4n/hdT
 SLX6UGiXodIHgMTFpC8utqFCGedIc2hZm5rc/ccTCbFTH8n/ygTEatw1V2oM/CBAMSsK
 tQSHuQLNsptLb6CiycSxCIO/f6LoCsy5WIdoNTyVlcyNBWC2zDn2RVoal5iCIi6AyN2Q
 GF8+Yn8lwWUDF5anWyqYuSlrMwtRZ2697o+5EeUcTacLYbSEc9aM/unPJE1V1kBgz1sp
 IgCA==
X-Gm-Message-State: APjAAAXF248RWuFsEuOcT5tv+UZzMXWxdw3XvOiD3bAnXbtGq+fT1FMV
 SDkKP+89/tN5STefz4UJSI5qWkuUObHmXGQLMwSHEg==
X-Google-Smtp-Source: APXvYqxyr/2AFnUi+2fv/yYsZQK4YB1PpPGQYemI0NsgVqU/opjX40dJPsZpNRtnI52jGD0WCzc0FsHkqgsfNREdv4E=
X-Received: by 2002:ac2:4343:: with SMTP id o3mr6013319lfl.129.1551574937397; 
 Sat, 02 Mar 2019 17:02:17 -0800 (PST)
MIME-Version: 1.0
From: Alan Somers <asomers@freebsd.org>
Date: Sat, 2 Mar 2019 18:02:06 -0700
Message-ID: <CAOtMX2inYez8dXbmA5b1wj9Uhh_Nbp-gnFmtT_=T1mpWdyAUVw@mail.gmail.com>
Subject: Adding namecache entries outside of vfs_lookup and vn_open ?
To: FreeBSD Hackers <freebsd-hackers@freebsd.org>
Content-Type: text/plain; charset="UTF-8"
X-Rspamd-Queue-Id: 5CCC68D115
X-Spamd-Bar: --
Authentication-Results: mx1.freebsd.org;
 spf=pass (mx1.freebsd.org: domain of asomers@gmail.com designates
 209.85.167.52 as permitted sender) smtp.mailfrom=asomers@gmail.com
X-Spamd-Result: default: False [-2.73 / 15.00]; ARC_NA(0.00)[];
 NEURAL_HAM_MEDIUM(-0.95)[-0.954,0]; RCVD_COUNT_TWO(0.00)[2];
 FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:209.85.128.0/17];
 TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain];
 PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org];
 DMARC_NA(0.00)[freebsd.org]; RCPT_COUNT_ONE(0.00)[1];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_TRACE(0.00)[0:+];
 TO_DN_ALL(0.00)[];
 MX_GOOD(-0.01)[cached: alt3.gmail-smtp-in.l.google.com];
 NEURAL_HAM_SHORT(-0.47)[-0.468,0];
 RCVD_IN_DNSWL_NONE(0.00)[52.167.85.209.list.dnswl.org : 127.0.5.0];
 IP_SCORE(-1.30)[ip: (-0.58), ipnet: 209.85.128.0/17(-3.83), asn: 15169(-2.03),
 country: US(-0.07)]; RCVD_TLS_LAST(0.00)[];
 FORGED_SENDER(0.30)[asomers@freebsd.org,asomers@gmail.com];
 R_DKIM_NA(0.00)[]; SUBJECT_ENDS_QUESTION(1.00)[];
 ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US];
 FROM_NEQ_ENVFROM(0.00)[asomers@freebsd.org,asomers@gmail.com];
 FREEMAIL_ENVFROM(0.00)[gmail.com]; TO_DOM_EQ_FROM_DOM(0.00)[]
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Mar 2019 01:02:26 -0000

It looks like lookup and open are the only common vops that create new
namecache entries.  At least, those are the only ones that set
MAKEENTRY in the cn_flags field.  However, fuse(4)'s create-like
operations (FUSE_CREATE, FUSE_SYMLINK, etc) all return enough
information to create a namecache entry for the newly created file.
As-is, an operation like FUSE_CREATE will almost always be followed up
by a FUSE_LOOKUP, necessitating an extra round-trip to userland.

Would it be possible and wise to add these newly created entries to
the namecache automatically?

-Alan

From owner-freebsd-hackers@freebsd.org  Sat Mar  2 17:43:24 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 993A2150643F;
 Sat,  2 Mar 2019 17:43:24 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au
 [211.29.132.246])
 by mx1.freebsd.org (Postfix) with ESMTP id E180270CEF;
 Sat,  2 Mar 2019 17:43:23 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au
 [110.21.101.228])
 by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 02F9B432B9E;
 Sun,  3 Mar 2019 04:43:20 +1100 (AEDT)
Date: Sun, 3 Mar 2019 04:43:20 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Konstantin Belousov <kostikbel@gmail.com>
cc: Mark Millard <marklmi@yahoo.com>, 
 freebsd-hackers Hackers <freebsd-hackers@freebsd.org>, 
 FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale
 * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed]
In-Reply-To: <20190302142521.GE68879@kib.kiev.ua>
Message-ID: <20190303041441.V4781@besplex.bde.org>
References: <20190228145542.GT2420@kib.kiev.ua>
 <20190228150811.GU2420@kib.kiev.ua>
 <962D78C3-65BE-40C1-BB50-A0088223C17B@yahoo.com>
 <28C2BB0A-3DAA-4D18-A317-49A8DD52778F@yahoo.com>
 <20190301112717.GW2420@kib.kiev.ua> <20190302043936.A4444@besplex.bde.org>
 <20190301194217.GB68879@kib.kiev.ua> <20190302071425.G5025@besplex.bde.org>
 <20190302105140.GC68879@kib.kiev.ua> <20190302225513.W3408@besplex.bde.org>
 <20190302142521.GE68879@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0
 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17
 a=kj9zAlcOel0A:10 a=14Grze90KK8wkU9TH5gA:9 a=CjuIK1q_8ugA:10
X-Rspamd-Queue-Id: E180270CEF
X-Spamd-Bar: ------
Authentication-Results: mx1.freebsd.org
X-Spamd-Result: default: False [-6.98 / 15.00];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 NEURAL_HAM_SHORT(-0.98)[-0.983,0];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]; REPLY(-4.00)[]
X-Mailman-Approved-At: Sun, 03 Mar 2019 01:53:54 +0000
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Mar 2019 17:43:24 -0000

On Sat, 2 Mar 2019, Konstantin Belousov wrote:

> On Sun, Mar 03, 2019 at 12:03:18AM +1100, Bruce Evans wrote:
>> On Sat, 2 Mar 2019, Konstantin Belousov wrote:
>>> ...
>>> So I am able to reproduce it with some surprising ease on HPET running
>>> on Haswell.
>>
>> So what is the cause of it?  Maybe the tickless code doesn't generate
>> fake clock ticks right.  Or it is just a library bug.  The kernel has
>> to be slightly real-time to satisfy the requirement of 1 update per.
>> Applications are further from being real-time.  But isn't it enough
>> for the kernel to ensure that the timehands cycle more than once per
>> second?
> No, I entered ddb as you suggested.

But using ddb is not normal.  It is convenient that this fixes HPET and
ACPI timecounters after using ddb, but this method doesn't help for
timecounters that wrap fast.  TSC-low at 2GHz wraps in 2 seconds, and
i8254 wraps in a few milliseconds.

>> I don't changing this at all this.  binuptime() was carefully written
>> to not need so much 64-bit arithmetic.
>>
>> If this pessimization is allowed, then it can also handle a 64-bit
>> deltas.  Using the better kernel method:
>>
>>  		if (__predict_false(delta >= th->th_large_delta)) {
>>  			bt->sec += (scale >> 32) * (delta >> 32);
>>  			x = (scale >> 32) * (delta & 0xffffffff);
>>  			bt->sec += x >> 32;
>>  			bintime_addx(bt, x << 32);
>>  			x = (scale & 0xffffffff) * (delta >> 32);
>>  			bt->sec += x >> 32;
>>  			bintime_addx(bt, x << 32);
>>  			bintime_addx(bt, (scale & 0xffffffff) *
>>  			    (delta & 0xffffffff));
>>  		} else
>>  			bintime_addx(bt, scale * (delta & 0xffffffff));
> This only makes sense if delta is extended to uint64_t, which requires
> the pass over timecounters.

Yes, that was its point.  It is a bit annoying to have a hardware
timecounter like the TSC that doesn't wrap naturally, but then make it
wrap by masking high bits.

The masking step is also a bit wasteful.  For the TSC, it is 1 step to
discard high bids at the register level, then another step to apply the
nask to discard th high bits again.

>> I just noticed that there is a 64 x 32 -> 64 bit multiplication in the
>> current method.  This can be changed to do expicit 32 x 32 -> 64 bit
>> multiplications and fix the overflow problem at small extra cost on
>> 32-bit arches:
>>
>>  		/* 32-bit arches did the next multiplication implicitly. */
>>  		x = (scale >> 32) * delta;
>>  		/*
>>  		 * And they did the following shifts and most of the adds
>>  		 * implicitly too.  Except shifting x left by 32 lost the
>>  		 * seconds part that the next line handles.  The next line
>>  		 * is the only extra cost for them.
>>  		 */
>>  		bt->sec += x >> 32;
>>  		bintime_addx(bt, (x << 32) + (scale & 0xffffffff) * delta);
>
> Ok, what about the following.

I'm not sure that I really want this, even if the pessimization is done.
But it avoids using fls*(), so is especially good for 32-bit systems and
OK for 64-bit systems too, especially in userland where fls*() is in the
fast path.

>
> diff --git a/lib/libc/sys/__vdso_gettimeofday.c b/lib/libc/sys/__vdso_gettimeofday.c
> index 3749e0473af..cfe3d96d001 100644
> --- a/lib/libc/sys/__vdso_gettimeofday.c
> +++ b/lib/libc/sys/__vdso_gettimeofday.c
> @@ -32,6 +32,8 @@ __FBSDID("$FreeBSD$");
> #include <sys/time.h>
> #include <sys/vdso.h>
> #include <errno.h>
> +#include <limits.h>

Not needed with 0xffffffff instead of UINT_MAX.

The userland part is otherwise little changed.

> diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
> index 2656fb4d22f..2e28f872229 100644
> --- a/sys/kern/kern_tc.c
> +++ b/sys/kern/kern_tc.c
> ...
> @@ -351,17 +352,44 @@ fbclock_getmicrotime(struct timeval *tvp)
> 	} while (gen == 0 || gen != th->th_generation);
> }
> #else /* !FFCLOCK */
> +
> +static void
> +bintime_helper(struct bintime *bt, uint64_t *scale, u_int delta)
> +{
> +	uint64_t x;
> +
> +	x = (*scale >> 32) * delta;
> +	*scale &= 0xffffffff;
> +	bt->sec += x >> 32;
> +	bintime_addx(bt, x << 32);
> +}

It is probably best to not inline the slow path, but clang tends to
inline everything anyway.

I prefer my way of writing this in 3 lines.  Modifying 'scale' for
the next step is especially ugly and pessimal when the next step is
in the caller and this function is not inlined.

> +
> void
> binuptime(struct bintime *bt)
> {
> 	struct timehands *th;
> -	u_int gen;
> +	uint64_t scale;
> +	u_int delta, gen;
>
> 	do {
> 		th = timehands;
> 		gen = atomic_load_acq_int(&th->th_generation);
> 		*bt = th->th_offset;
> -		bintime_addx(bt, th->th_scale * tc_delta(th));
> +		scale = th->th_scale;
> +		delta = tc_delta(th);
> +#ifdef _LP64
> +		/* Avoid overflow for scale * delta. */
> +		if (__predict_false(th->th_large_delta <= delta))
> +			bintime_helper(bt, &scale, delta);
> +		bintime_addx(bt, scale * delta);
> +#else
> +		/*
> +		 * Also avoid (uint64_t, uint32_t) -> uint64_t
> +		 * multiplication on 32bit arches.
> +		 */

"Also avoid overflow for ..."

> +		bintime_helper(bt, &scale, delta);
> +		bintime_addx(bt, (u_int)scale * delta);

The cast should be to uint32_t, but better write it as & 0xffffffff as
elsewhere.

bintime_helper() already reduced 'scale' to 32 bits.  The cast might be
needed to tell the compiler this, especially when the function is not
inlined.  Better not do it in the function.  The function doesn't even
use the reduced value.

bintime_helper() is in the fast path in this case, so should be inlined.

> +#endif
> 		atomic_thread_fence_acq();
> 	} while (gen == 0 || gen != th->th_generation);
> }

This needs lots of testing of course.

Bruce

From owner-freebsd-hackers@freebsd.org  Sun Mar  3 05:21:06 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id DF6A915244D5
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Sun,  3 Mar 2019 05:21:05 +0000 (UTC)
 (envelope-from marklmi@yahoo.com)
Received: from sonic317-36.consmr.mail.ne1.yahoo.com
 (sonic317-36.consmr.mail.ne1.yahoo.com [66.163.184.47])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 9747A6EDE5
 for <freebsd-hackers@freebsd.org>; Sun,  3 Mar 2019 05:21:04 +0000 (UTC)
 (envelope-from marklmi@yahoo.com)
X-YMail-OSG: qYN4k3IVM1lUtekxELAbd42EJuBb7C7eA4P_EyTRQ_K6TdyTM1cbzUqDPu4KV3M
 gCAgqy6wc2BiQtk5oJNAIq01M_Uu4Asixqsb7ZN3hnyc69CwT_Msst4bN0GVoolnXqlb0rLIJPnj
 FXS2gaBHA1Isy7P0rBXxr4KIVv2_BMmb3KVqJfGOxjqMwW.y_sEXy30xBoP6SnY3OwiHv0IiPUxq
 b9eQRWZPz15hZaTBofYkBuyN96FULOQ_zhxgrVEcTWUFvCfv.Aik6oELWCjsDLf1iABFREPoTfXm
 MnM4sSEFx9j3n.xlYnHdZdFGMFU8UoHf2Zi38trFbU6aYre2z9qVvyeJrSt2AxfyUqedFroS9LBc
 76NMtjfd24WYb1_Juzx5gFqqyi2H3COYAzSiDjV_WSMa6FptXaIgB9HT6UI0KlAlG9m9zLJeKPXQ
 2L2wfD_VAl8e_ld73o091ws80Vigl0FZBKmlUBE5KyRyFAiQ4Dd37xATiStRGJZ7VKx8XXmcnook
 R7fpOBOjcr5gvJEN3WMQaUf9hl1NLvKw009PAPSvvjx8yDmtdpdT2HvQSVclpudBO8cKaTngu3Uc
 Ts6Ls9NtHb_Cbn2ZIwXH42CEhL8lgwX9sC282o8COW9JJSnd9FjU9UE4qoIba7REjzvIEEe5fuUS
 stMoDANNAyGzoGHmaD_3EWyHtK9_.bAGO_OLTi9siYT6cdo_zMCfywcG37E6o_1SNlzs8YQPEKjr
 9CJBwkeYNRTLiw0lVPqa.QOtcNFlZgGgTDg7XHbamIxD6jdLoJ5u1lD6dGalUKYl7ucX3KRSYxUP
 idSJGzkneMgDgQEU6lRkUCrE0PxiAZYOpmWKzR2TXU8AYb.egbqYnMJIsyALShoDB8N7B6L_mMI1
 hbHRXrwNWu9d7KTy.qc0gY5.ab2lZ2PrDqcJLJiSjUl0T5y9EfgZI_U5dUH2b2RNHVGvnsnkf0BH
 _MVAefJoZxjqY4TgAFPpbNVfAta4tY_D6Rv.8uE5eXOscoHxOEovK.x22KjM547i9jJzRgwyGfiq
 yTg1ryJr5cwkeW.zWR4ocR51z.CGrUbBMCRZ9zUx5GLYxUW25PdM5lhvl0nxz2GwWhYQ6lJQ-
Received: from sonic.gate.mail.ne1.yahoo.com by
 sonic317.consmr.mail.ne1.yahoo.com with HTTP; Sun, 3 Mar 2019 05:21:02 +0000
Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.115])
 ([67.170.167.181])
 by smtp421.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID
 44d2170f61a51d6c5540268ab4cad8d3; 
 Sun, 03 Mar 2019 05:21:00 +0000 (UTC)
From: Mark Millard <marklmi@yahoo.com>
Content-Type: text/plain;
	charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\))
Subject: powerpc64 on PowerMac G5 4-core (system total): a hack that so far
 seem to avoid the stuck-sleeping issue
Message-Id: <B898BF60-2872-4FFC-AD72-A32591BC7D20@yahoo.com>
Date: Sat, 2 Mar 2019 21:20:58 -0800
To: FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>,
 Mark Millard via freebsd-hackers <freebsd-hackers@freebsd.org>
X-Mailer: Apple Mail (2.3445.102.3)
X-Rspamd-Queue-Id: 9747A6EDE5
X-Spamd-Bar: +++
X-Spamd-Result: default: False [3.33 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[];
 R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[];
 FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3];
 TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+];
 MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net];
 RCPT_COUNT_TWO(0.00)[2];
 DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject];
 FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[];
 MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com];
 ASN(0.00)[asn:36646, ipnet:66.163.184.0/21, country:US];
 MID_RHS_MATCH_FROM(0.00)[];
 DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0];
 ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048];
 FROM_HAS_DN(0.00)[]; NEURAL_SPAM_SHORT(0.67)[0.672,0];
 MIME_GOOD(-0.10)[text/plain];
 IP_SCORE(1.33)[ip: (4.40), ipnet: 66.163.184.0/21(1.29), asn: 36646(1.03),
 country: US(-0.07)]; NEURAL_SPAM_MEDIUM(0.96)[0.956,0];
 TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(0.88)[0.879,0];
 RCVD_IN_DNSWL_NONE(0.00)[47.184.163.66.list.dnswl.org : 127.0.5.0]
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Mar 2019 05:21:06 -0000

[This note goes in a different direction compared to my
prior evidence report for overflows and the later activity
that has been happening for it. This does *not* involve
the patches associated with that report.]

I view the following as an evidence-gathering hack:
showing the change in behavior with the code changes,
not as directly what FreeBSD should do for powerpc64.
In code for defined(__powerpc64__) && defined(AIM)
I freely use knowledge of the PowerMac G5 context
instead of attempting general code.

Also: the code is set up to record some information
that I've been looking at via ddb. The recording is
not part of what changes the behavior but I decided
to show that code too.

It is preliminary, but, so far, the hack has avoided
buf*daemon* threads and pmac_thermal getting stuck
sleeping (or, at least, far less frequently).


The tbr-value hack:

=46rom what I see the G5 various cores have each tbr running at the
same rate but have some some offsets as far as the base time
goes. cpu_mp_unleash does:

        ap_awake =3D 1;

        /* Provide our current DEC and TB values for APs */
        ap_timebase =3D mftb() + 10;
        __asm __volatile("msync; isync");

        /* Let APs continue */
        atomic_store_rel_int(&ap_letgo, 1);

        platform_smp_timebase_sync(ap_timebase, 0);

and machdep_ap_bootstrap does:

        /*
         * Set timebase as soon as possible to meet an implicit =
rendezvous
         * from cpu_mp_unleash(), which sets ap_letgo and then =
immediately
         * sets timebase.
         *
         * Note that this is instrinsically racy and is only relevant on
         * platforms that do not support better mechanisms.
         */
        platform_smp_timebase_sync(ap_timebase, 1);


which attempts to set the tbrs appropriately.

But on small scales of differences the various tbr
values from different cpus end up not well ordered
relative to time, synchronizes with, and the like.
Only large enough differences can well indicate an
ordering of interest.

Note: tc->tc_get_timecount(tc) only provides the
least signficant 32 bits of the tbr value.
th->th_offset_count is also 32 bits and based on
truncated tbr values.

So I made binuptime avoid finishing when it sees
a small (<0x10) step backwards for a new
tc->tc_get_timecount(tc) value vs. the existing
th->th_offset_count value (values strongly tied
to powerpc64 tbr values):

void
binuptime(struct bintime *bt)
{
        struct timehands *th;
        u_int gen;

        struct bintime old_bt=3D *bt; // HACK!!!
        struct timecounter *tc; // HACK!!!
        u_int tim_cnt, tim_offset, tim_diff; // HACK!!!
        uint64_t freq, scale_factor, diff_scaled; // HACK!!!

        u_int try_cnt=3D 0ull; // HACK!!!

        do {
                do { // HACK!!!
                    th =3D timehands;
                    tc =3D th->th_counter;
                    gen =3D atomic_load_acq_int(&th->th_generation);
                    tim_cnt=3D tc->tc_get_timecount(tc);
                    tim_offset=3D th->th_offset_count;
                } while (tim_cnt<tim_offset && tim_offset-tim_cnt<0x10);
                *bt =3D th->th_offset;
                tim_diff=3D (tim_cnt - tim_offset) & =
tc->tc_counter_mask;
                scale_factor=3D th->th_scale;
                diff_scaled=3D scale_factor * tim_diff;
                bintime_addx(bt, diff_scaled);
                freq=3D tc->tc_frequency;
                atomic_thread_fence_acq();
                try_cnt++;
        } while (gen =3D=3D 0 || gen !=3D th->th_generation);

        if (*(volatile uint64_t*)0xc000000000000020=3D=3D0u && =
(0xffffffffffffffffull/scale_factor)<tim_diff) { // HACK!!!
                *(volatile uint64_t*)0xc000000000000020=3D =
bttosbt(old_bt);
                *(volatile uint64_t*)0xc000000000000028=3D bttosbt(*bt);
                *(volatile uint64_t*)0xc000000000000030=3D freq;
                *(volatile uint64_t*)0xc000000000000038=3D scale_factor;
                *(volatile uint64_t*)0xc000000000000040=3D tim_offset;
                *(volatile uint64_t*)0xc000000000000048=3D tim_cnt;
                *(volatile uint64_t*)0xc000000000000050=3D tim_diff;
                *(volatile uint64_t*)0xc000000000000058=3D try_cnt;
                *(volatile uint64_t*)0xc000000000000060=3D diff_scaled;
                *(volatile uint64_t*)0xc000000000000068=3D =
scale_factor*freq;
                __asm__ ("sync");
        } else if (*(volatile uint64_t*)0xc0000000000000a0=3D=3D0u && =
(0xffffffffffffffffull/scale_factor)<tim_diff) { // HACK!!!
                *(volatile uint64_t*)0xc0000000000000a0=3D =
bttosbt(old_bt);
                *(volatile uint64_t*)0xc0000000000000a8=3D bttosbt(*bt);
                *(volatile uint64_t*)0xc0000000000000b0=3D freq;
                *(volatile uint64_t*)0xc0000000000000b8=3D scale_factor;
                *(volatile uint64_t*)0xc0000000000000c0=3D tim_offset;
                *(volatile uint64_t*)0xc0000000000000c8=3D tim_cnt;
                *(volatile uint64_t*)0xc0000000000000d0=3D tim_diff;
                *(volatile uint64_t*)0xc0000000000000d8=3D try_cnt;
                *(volatile uint64_t*)0xc0000000000000e0=3D diff_scaled;
                *(volatile uint64_t*)0xc0000000000000e8=3D =
scale_factor*freq;
                __asm__ ("sync");
        }
}
#else
. . .
#endif

So far as I can tell, the FreeBSD code is not designed to deal
with small differences in tc->tc_get_timecount(tc) not actually
indicating a useful < vs. =3D=3D vs. > ordering relation uniquely.

(I make no claim that the hack is a proper way to deal with
such.)

=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)


From owner-freebsd-hackers@freebsd.org  Sun Mar  3 11:03:55 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9523F1506379
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Sun,  3 Mar 2019 11:03:55 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id BF52C804F5;
 Sun,  3 Mar 2019 11:03:54 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x23B3lct050818
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Sun, 3 Mar 2019 13:03:50 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x23B3lct050818
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id x23B3kgh050817;
 Sun, 3 Mar 2019 13:03:46 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Sun, 3 Mar 2019 13:03:46 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Alan Somers <asomers@freebsd.org>
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject: Re: Adding namecache entries outside of vfs_lookup and vn_open ?
Message-ID: <20190303110346.GH68879@kib.kiev.ua>
References: <CAOtMX2inYez8dXbmA5b1wj9Uhh_Nbp-gnFmtT_=T1mpWdyAUVw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAOtMX2inYez8dXbmA5b1wj9Uhh_Nbp-gnFmtT_=T1mpWdyAUVw@mail.gmail.com>
User-Agent: Mutt/1.11.3 (2019-02-01)
X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM,
 NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Mar 2019 11:03:55 -0000

On Sat, Mar 02, 2019 at 06:02:06PM -0700, Alan Somers wrote:
> It looks like lookup and open are the only common vops that create new
> namecache entries.  At least, those are the only ones that set
> MAKEENTRY in the cn_flags field.  However, fuse(4)'s create-like
> operations (FUSE_CREATE, FUSE_SYMLINK, etc) all return enough
> information to create a namecache entry for the newly created file.
> As-is, an operation like FUSE_CREATE will almost always be followed up
> by a FUSE_LOOKUP, necessitating an extra round-trip to userland.
In VFS, creation of the new file is done by VOP_CREATE() after negative
VOP_LOOKUP().   VOP_CREATE() returns the new vnode that is installed into
file.  [A flag VN_OPEN_NAMECACHE was added for vn_open_cred() which results
in created name entry insertion into namecache.  It was done to handle
very specific situation in core dump code, which is no longer relevant.
The flag is still there.]

Similar discussion occured some time ago.  I think that the current
selection of the cases where namecache entry is created, is optimized
for the scenario where extracting large tarball does not largely affect
the non-directory elements of the cache.  If you do such extraction,
it is unlikely that you will access most of the files shortly.

> Would it be possible and wise to add these newly created entries to
> the namecache automatically?
Not from VFS, but the policy can be overriden by the filesystem by inserting
the elements into cache from VOPs as it finds suitable.

Does FUSE cache vnodes ?  I would find aggressive caching on the kernel
side somewhat unexpected for it.


From owner-freebsd-hackers@freebsd.org  Sun Mar  3 16:16:47 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1BDCC150FB82;
 Sun,  3 Mar 2019 16:16:47 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 41FCD8A1A6;
 Sun,  3 Mar 2019 16:16:46 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x23GGaML078609
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Sun, 3 Mar 2019 18:16:39 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x23GGaML078609
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id x23GGZF2078608;
 Sun, 3 Mar 2019 18:16:35 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Sun, 3 Mar 2019 18:16:35 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Bruce Evans <brde@optusnet.com.au>
Cc: Mark Millard <marklmi@yahoo.com>,
 freebsd-hackers Hackers <freebsd-hackers@freebsd.org>,
 FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale *
 tc_delta(th) overflows unsigned 64 bits sometimes [patched failed]
Message-ID: <20190303161635.GJ68879@kib.kiev.ua>
References: <20190301112717.GW2420@kib.kiev.ua>
 <20190302043936.A4444@besplex.bde.org>
 <20190301194217.GB68879@kib.kiev.ua>
 <20190302071425.G5025@besplex.bde.org>
 <20190302105140.GC68879@kib.kiev.ua>
 <20190302225513.W3408@besplex.bde.org>
 <20190302142521.GE68879@kib.kiev.ua>
 <20190303041441.V4781@besplex.bde.org>
 <20190303111931.GI68879@kib.kiev.ua>
 <20190303223100.B3572@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20190303223100.B3572@besplex.bde.org>
User-Agent: Mutt/1.11.3 (2019-02-01)
X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM,
 NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Mar 2019 16:16:47 -0000

On Mon, Mar 04, 2019 at 12:32:12AM +1100, Bruce Evans wrote:
> On Sun, 3 Mar 2019, Konstantin Belousov wrote:
> 
> > On Sun, Mar 03, 2019 at 04:43:20AM +1100, Bruce Evans wrote:
> >> On Sat, 2 Mar 2019, Konstantin Belousov wrote:
> >>
> >>> On Sun, Mar 03, 2019 at 12:03:18AM +1100, Bruce Evans wrote:
> >>>> On Sat, 2 Mar 2019, Konstantin Belousov wrote:
> >* ...
> >>>> I don't changing this at all this.  binuptime() was carefully written
> >>>> to not need so much 64-bit arithmetic.
> >>>>
> >>>> If this pessimization is allowed, then it can also handle a 64-bit
> >>>> deltas.  Using the better kernel method:
> >>>>
> >>>>  		if (__predict_false(delta >= th->th_large_delta)) {
> >>>>  			bt->sec += (scale >> 32) * (delta >> 32);
> >>>>  			x = (scale >> 32) * (delta & 0xffffffff);
> >>>>  			bt->sec += x >> 32;
> >>>>  			bintime_addx(bt, x << 32);
> >>>>  			x = (scale & 0xffffffff) * (delta >> 32);
> >>>>  			bt->sec += x >> 32;
> >>>>  			bintime_addx(bt, x << 32);
> >>>>  			bintime_addx(bt, (scale & 0xffffffff) *
> >>>>  			    (delta & 0xffffffff));
> >>>>  		} else
> >>>>  			bintime_addx(bt, scale * (delta & 0xffffffff));
> >>> This only makes sense if delta is extended to uint64_t, which requires
> >>> the pass over timecounters.
> >>
> >> Yes, that was its point.  It is a bit annoying to have a hardware
> >> timecounter like the TSC that doesn't wrap naturally, but then make it
> >> wrap by masking high bits.
> >>
> >> The masking step is also a bit wasteful.  For the TSC, it is 1 step to
> >> discard high bids at the register level, then another step to apply the
> >> nask to discard th high bits again.
> > rdtsc-low is implemented in the natural way, after RDTSC, no register
> > combining into 64bit value is done, instead shrd operates on %edx:%eax
> > to get the final result into %eax.  I am not sure what you refer to.
> 
> I was referring mostly to the masking step '& tc->tc_counter_mask' and
> the lack of register combining in rdtsc().
> 
> However, shrd in rdtsc-low (tsc_get_timecount_low()) does a slow combining
> step.  i386 used to be faster here -- the first masking step of discarding
> %edx doesn't take any code.  amd64 has to mask out the top bits in %rax.
> Now for the tsc-low pessimization, i386 has to do a slow shrd, and amd64
> has to do a not so slow shr.
i386 cannot discard %edx after RDTSC since some bits from %edx come into
the timecounter value.
amd64 cannot either, but amd64 does not need to mask out top bits in %rax,
since the whole shrdl calculation occurs in 32bit registers, and the result
is in %rax where top word is cleared by shrdl instruction automatically.
But the clearing is not required since result is unsigned int anyway.

Dissassemble of tsc_get_timecount_low() is very clear:
   0xffffffff806767e4 <+4>:     mov    0x30(%rdi),%ecx
   0xffffffff806767e7 <+7>:     rdtsc  
   0xffffffff806767e9 <+9>:     shrd   %cl,%edx,%eax
...
   0xffffffff806767ed <+13>:    retq
(I removed frame manipulations).

> 
> Then the '& tc->tc_counter_mask' step has no effect.
This is true.

> 
> All this is wrapped in many layers of function calls which are quite slow
> but this lets the other operations run in parallel on some CPUs.
> 
> >>>>  		/* 32-bit arches did the next multiplication implicitly. */
> >>>>  		x = (scale >> 32) * delta;
> >>>>  		/*
> >>>>  		 * And they did the following shifts and most of the adds
> >>>>  		 * implicitly too.  Except shifting x left by 32 lost the
> >>>>  		 * seconds part that the next line handles.  The next line
> >>>>  		 * is the only extra cost for them.
> >>>>  		 */
> >>>>  		bt->sec += x >> 32;
> >>>>  		bintime_addx(bt, (x << 32) + (scale & 0xffffffff) * delta);
> >>>
> >>> Ok, what about the following.
> >>
> >> I'm not sure that I really want this, even if the pessimization is done.
> >> But it avoids using fls*(), so is especially good for 32-bit systems and
> >> OK for 64-bit systems too, especially in userland where fls*() is in the
> >> fast path.
> > For userland I looked at the generated code, and BSR usage seems to be
> > good enough, for default compilation settings with clang.
> 
> I use gcc-4.2.1, and it doesn't do this optimization.
> 
> I already reported this in connection with fixing calcru1().  calcru1()
> is unnecessarily several times slower on i386 than on amd64 even after
> avoiding using flsll() on it.  The main slowness is in converting 'usec'
> to tv_sec and tv_usec, due to the bad design and implementation of the
> __udivdi3 and __umoddi3 libcalls.  The bad design is having to make 2
> libcalls to get the quotient and remainder.  The bad implementation is
> the portable C version in libkern.  libgcc provides a better implementation,
> but this is not available in the kernel.
> 
> >>> diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
> >>> index 2656fb4d22f..2e28f872229 100644
> >>> --- a/sys/kern/kern_tc.c
> >>> +++ b/sys/kern/kern_tc.c
> >>> ...
> >>> @@ -351,17 +352,44 @@ fbclock_getmicrotime(struct timeval *tvp)
> >>> 	} while (gen == 0 || gen != th->th_generation);
> >>> }
> >>> #else /* !FFCLOCK */
> >>> +
> >>> +static void
> >>> +bintime_helper(struct bintime *bt, uint64_t *scale, u_int delta)
> >>> +{
> >>> +	uint64_t x;
> >>> +
> >>> +	x = (*scale >> 32) * delta;
> >>> +	*scale &= 0xffffffff;
> >>> +	bt->sec += x >> 32;
> >>> +	bintime_addx(bt, x << 32);
> >>> +}
> >>
> >> It is probably best to not inline the slow path, but clang tends to
> >> inline everything anyway.
> > It does not matter if it inlines it, as far as it is moved out of the
> > linear sequence for the fast path.
> >>
> >> I prefer my way of writing this in 3 lines.  Modifying 'scale' for
> >> the next step is especially ugly and pessimal when the next step is
> >> in the caller and this function is not inlined.
> > Can you show exactly what do you want ?
> 
> Just write 'scale & 0xffffffff' for the low bits of 'scale' in callers,
> and don't pass 'scale' indirectly to bintime_helper() and don't modify
> it there.
> 
> Oops, there is a problem.  'scale' must be reduced iff bintime_helper()
> was used.  Duplicate some source code so as to not need a fall-through
> to the fast path.  See below.
Yes, this is the reason why it is passed by pointer (C has no references).

> 
> >>> void
> >>> binuptime(struct bintime *bt)
> >>> {
> >>> 	struct timehands *th;
> >>> -	u_int gen;
> >>> +	uint64_t scale;
> >>> +	u_int delta, gen;
> >>>
> >>> 	do {
> >>> 		th = timehands;
> >>> 		gen = atomic_load_acq_int(&th->th_generation);
> >>> 		*bt = th->th_offset;
> >>> -		bintime_addx(bt, th->th_scale * tc_delta(th));
> >>> +		scale = th->th_scale;
> >>> +		delta = tc_delta(th);
> >>> +#ifdef _LP64
> >>> +		/* Avoid overflow for scale * delta. */
> >>> +		if (__predict_false(th->th_large_delta <= delta))
> >>> +			bintime_helper(bt, &scale, delta);
> >>> +		bintime_addx(bt, scale * delta);
> >>> +#else
> >>> +		/*
> >>> +		 * Also avoid (uint64_t, uint32_t) -> uint64_t
> >>> +		 * multiplication on 32bit arches.
> >>> +		 */
> >>
> >> "Also avoid overflow for ..."
> >>
> >>> +		bintime_helper(bt, &scale, delta);
> >>> +		bintime_addx(bt, (u_int)scale * delta);
> >>
> >> The cast should be to uint32_t, but better write it as & 0xffffffff as
> >> elsewhere.
> 
> This is actually very broken.  The cast gives a 32 x 32 -> 32 bit
> multiplication, but all 64 bits of the result are needed.
Yes, fixed in the updated version.

> 
> >>
> >> bintime_helper() already reduced 'scale' to 32 bits.  The cast might be
> >> needed to tell the compiler this, especially when the function is not
> >> inlined.  Better not do it in the function.  The function doesn't even
> >> use the reduced value.
> > I used cast to use 32x32 multiplication.  I am not sure that all (or any)
> > compilers are smart enough to deduce that they can use 32 bit mul.
> 
> Writing the reduction to 32 bits using a mask instead of a cast automatically
> avoids the bug, but might not give the optimization.
> 
> They do do this optimization, but might need the cast as well as the mask.
> At worst, '(uint64_t)(uint32_t)(scale & 0xffffffff)', where the mask is
> now redundant but the cast back to 64 bits is needed if the cast to 32
> bits is used.
> 
> You already depended on them not needing the cast for the expression
> '(*scale >> 32) * delta'.  Here delta is 32 bits and the other operand
> must remain 64 bits so that after default promotions the multiplication
> is 64 x 64 -> 64 bits, but the compiler should optimize this to
> 32 x 32 -> 64 bits.  (*scale >> 32) would need to be cast to 32 bits
> and then back to 64 bits if the compiler can't do this automatically.
> 
> I checked what some compilers do.  Both gcc-3.3.3 and gcc-4.2.1
> optimize only (uint64_t)x * y (where x and y have type uint32_t), so they
> need to be helped by casts if x and y have have a larger type even if
> their values obviously fit in 32 bits.  So the expressions should be
> written as:
> 
>  	(uint64_t)(uint32_t)(scale >> 32) * delta;
> 
> and
> 
>  	(uint64_t)(uint32_t)scale * delta;
> 
> The 2 casts are always needed, but the '& 0xffffffff' operation doesn't
> need to be explicit because the cast does.
This is what I do now.

> 
> >> This needs lots of testing of course.
> >
> > Current kernel-only part of the change is below, see the question about
> > your preference for binuptime_helper().
> >
> > diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
> > index 2656fb4d22f..6c41ab22288 100644
> > --- a/sys/kern/kern_tc.c
> > +++ b/sys/kern/kern_tc.c
> > @@ -72,6 +71,7 @@ struct timehands {
> > 	struct timecounter	*th_counter;
> > 	int64_t			th_adjustment;
> > 	uint64_t		th_scale;
> > +	uint64_t		th_large_delta;
> > 	u_int	 		th_offset_count;
> > 	struct bintime		th_offset;
> > 	struct bintime		th_bintime;
> > @@ -351,17 +351,45 @@ fbclock_getmicrotime(struct timeval *tvp)
> > 	} while (gen == 0 || gen != th->th_generation);
> > }
> > #else /* !FFCLOCK */
> > +
> > +static void
> 
> Add __inline.  This is in the fast path for 32-bit systems.
Compilers do not need this hand-holding, and I prefer to avoid __inline
unless really necessary.  I checked with both clang 7.0 and gcc 8.3
that autoinlining did occured.

> 
> > +bintime_helper(struct bintime *bt, uint64_t *scale, u_int delta)
> > +{
> > +	uint64_t x;
> > +
> > +	x = (*scale >> 32) * delta;
> > +	*scale &= 0xffffffff;
> 
> Remove the '*' on scale, cast (scale >> 32) to
> (uint64_t)(uint32_t)(scale >> 32), and remove the change to *scale.
> 
> > +	bt->sec += x >> 32;
> > +	bintime_addx(bt, x << 32);
> > +}
> > +
> > void
> > binuptime(struct bintime *bt)
> > {
> > 	struct timehands *th;
> > -	u_int gen;
> > +	uint64_t scale;
> > +	u_int delta, gen;
> >
> > 	do {
> > 		th = timehands;
> > 		gen = atomic_load_acq_int(&th->th_generation);
> > 		*bt = th->th_offset;
> > -		bintime_addx(bt, th->th_scale * tc_delta(th));
> > +		scale = th->th_scale;
> > +		delta = tc_delta(th);
> > +#ifdef _LP64
> > +		/* Avoid overflow for scale * delta. */
> > +		if (__predict_false(th->th_large_delta <= delta))
> > +			bintime_helper(bt, &scale, delta);
> > +		bintime_addx(bt, scale * delta);
> 
> Change to:
> 
>  		if (__predict_false(th->th_large_delta <= delta)) {
>  			bintime_helper(bt, scale, delta);
>  			bintime_addx(bt, (scale & 0xffffffff) * delta);
>  		} else
>  			bintime_addx(bt, scale * delta);
I do not like it, but ok.

> 
> > +#else
> > +		/*
> > +		 * Avoid both overflow as above and
> > +		 * (uint64_t, uint32_t) -> uint64_t
> > +		 * multiplication on 32bit arches.
> > +		 */
> 
> This is a bit unclear.  Better emphasize avoidance of the 64 x 32 -> 64 bit
> multiplication.  Something like:
> 
>  		/*
>  		 * Use bintime_helper() unconditionally, since the fast
>  		 * path in the above method is not so fast here, since
>  		 * the 64 x 32 -> 64 bit multiplication is usually not
>  		 * available in hardware and emulating it using 2
>  		 * 32 x 32 -> 64 bit multiplications uses code much
>  		 * like that in bintime_helper().
>  		 */
> 
> > +		bintime_helper(bt, &scale, delta);
> > +		bintime_addx(bt, (uint32_t)scale * delta);
> > +#endif
> 
> Remove '&' as usual, and fix this by casting the reduced scale back to
> 64 bits.
> 
> Similarly in bintime().
I merged two functions, finally.  Having to copy the same code is too
annoying for this change.

So I verified that:
- there is no 64bit multiplication in the generated code, for i386 both
  for clang 7.0 and gcc 8.3;
- that everything is inlined, the only call from bintime/binuptime is
  the indirect call to get the timecounter value.

> 
> Similarly in libc -- don't use the slow flsll() method in the 32-bit
> case where it is especially slow.  Don't use it in the 64-bit case either,
> since this would need to be change when th_large_delta is added to the
> API.
> 
> Now I don't like my method in the kernel.  It is is unnecessarily
> complicated to have a specal case, and not faster either.

diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
index 2656fb4d22f..0fd39e25058 100644
--- a/sys/kern/kern_tc.c
+++ b/sys/kern/kern_tc.c
@@ -72,6 +72,7 @@ struct timehands {
 	struct timecounter	*th_counter;
 	int64_t			th_adjustment;
 	uint64_t		th_scale;
+	uint64_t		th_large_delta;
 	u_int	 		th_offset_count;
 	struct bintime		th_offset;
 	struct bintime		th_bintime;
@@ -351,21 +352,63 @@ fbclock_getmicrotime(struct timeval *tvp)
 	} while (gen == 0 || gen != th->th_generation);
 }
 #else /* !FFCLOCK */
-void
-binuptime(struct bintime *bt)
+
+static void
+bintime_helper(struct bintime *bt, uint64_t scale, u_int delta)
+{
+	uint64_t x;
+
+	x = (scale >> 32) * delta;
+	bt->sec += x >> 32;
+	bintime_addx(bt, x << 32);
+}
+
+static void
+binnouptime(struct bintime *bt, u_int off)
 {
 	struct timehands *th;
-	u_int gen;
+	struct bintime *bts;
+	uint64_t scale;
+	u_int delta, gen;
 
 	do {
 		th = timehands;
 		gen = atomic_load_acq_int(&th->th_generation);
-		*bt = th->th_offset;
-		bintime_addx(bt, th->th_scale * tc_delta(th));
+		bts = (struct bintime *)(vm_offset_t)th + off;
+		*bt = *bts;
+		scale = th->th_scale;
+		delta = tc_delta(th);
+#ifdef _LP64
+		if (__predict_false(th->th_large_delta <= delta)) {
+			/* Avoid overflow for scale * delta. */
+			bintime_helper(bt, scale, delta);
+			bintime_addx(bt, (scale & 0xffffffff) * delta);
+		} else {
+			bintime_addx(bt, scale * delta);
+		}
+#else
+		/*
+		 * Use bintime_helper() unconditionally, since the fast
+		 * path in the above method is not so fast here, since
+		 * the 64 x 32 -> 64 bit multiplication is usually not
+		 * available in hardware and emulating it using 2
+		 * 32 x 32 -> 64 bit multiplications uses code much
+		 * like that in bintime_helper().
+		 */
+		bintime_helper(bt, scale, delta);
+		bintime_addx(bt, (uint64_t)(uint32_t)scale * delta);
+#endif
 		atomic_thread_fence_acq();
 	} while (gen == 0 || gen != th->th_generation);
 }
 
+void
+binuptime(struct bintime *bt)
+{
+
+	binnouptime(bt, __offsetof(struct timehands, th_offset));
+}
+
 void
 nanouptime(struct timespec *tsp)
 {
@@ -387,16 +430,8 @@ microuptime(struct timeval *tvp)
 void
 bintime(struct bintime *bt)
 {
-	struct timehands *th;
-	u_int gen;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		*bt = th->th_bintime;
-		bintime_addx(bt, th->th_scale * tc_delta(th));
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	binnouptime(bt, __offsetof(struct timehands, th_bintime));
 }
 
 void
@@ -1464,6 +1499,7 @@ tc_windup(struct bintime *new_boottimebin)
 	scale += (th->th_adjustment / 1024) * 2199;
 	scale /= th->th_counter->tc_frequency;
 	th->th_scale = scale * 2;
+	th->th_large_delta = ((uint64_t)1 << 63) / scale;
 
 	/*
 	 * Now that the struct timehands is again consistent, set the new

From owner-freebsd-hackers@freebsd.org  Sun Mar  3 16:25:26 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 07C73151015F
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Sun,  3 Mar 2019 16:25:26 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 522448A6C6;
 Sun,  3 Mar 2019 16:25:25 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x23GPIna080120
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Sun, 3 Mar 2019 18:25:21 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x23GPIna080120
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id x23GPIVo080118;
 Sun, 3 Mar 2019 18:25:18 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Sun, 3 Mar 2019 18:25:18 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Alan Somers <asomers@freebsd.org>
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject: Re: Adding namecache entries outside of vfs_lookup and vn_open ?
Message-ID: <20190303162518.GK68879@kib.kiev.ua>
References: <CAOtMX2inYez8dXbmA5b1wj9Uhh_Nbp-gnFmtT_=T1mpWdyAUVw@mail.gmail.com>
 <20190303110346.GH68879@kib.kiev.ua>
 <CAOtMX2jTjocm1u60hCXF9+XRLhpK90HWtkPx_OEO=j10WxGWzw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAOtMX2jTjocm1u60hCXF9+XRLhpK90HWtkPx_OEO=j10WxGWzw@mail.gmail.com>
User-Agent: Mutt/1.11.3 (2019-02-01)
X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM,
 NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Mar 2019 16:25:26 -0000

On Sun, Mar 03, 2019 at 09:02:07AM -0700, Alan Somers wrote:
> On Sun, Mar 3, 2019 at 4:03 AM Konstantin Belousov <kostikbel@gmail.com> wrote:
> >
> > On Sat, Mar 02, 2019 at 06:02:06PM -0700, Alan Somers wrote:
> > > It looks like lookup and open are the only common vops that create new
> > > namecache entries.  At least, those are the only ones that set
> > > MAKEENTRY in the cn_flags field.  However, fuse(4)'s create-like
> > > operations (FUSE_CREATE, FUSE_SYMLINK, etc) all return enough
> > > information to create a namecache entry for the newly created file.
> > > As-is, an operation like FUSE_CREATE will almost always be followed up
> > > by a FUSE_LOOKUP, necessitating an extra round-trip to userland.
> > In VFS, creation of the new file is done by VOP_CREATE() after negative
> > VOP_LOOKUP().   VOP_CREATE() returns the new vnode that is installed into
> > file.  [A flag VN_OPEN_NAMECACHE was added for vn_open_cred() which results
> > in created name entry insertion into namecache.  It was done to handle
> > very specific situation in core dump code, which is no longer relevant.
> > The flag is still there.]
> >
> > Similar discussion occured some time ago.  I think that the current
> > selection of the cases where namecache entry is created, is optimized
> > for the scenario where extracting large tarball does not largely affect
> > the non-directory elements of the cache.  If you do such extraction,
> > it is unlikely that you will access most of the files shortly.
> >
> > > Would it be possible and wise to add these newly created entries to
> > > the namecache automatically?
> > Not from VFS, but the policy can be overriden by the filesystem by inserting
> > the elements into cache from VOPs as it finds suitable.
> 
> So MAKEENTRY is just advisory, and there shouldn't be a problem with
> inserting cache entries from fuse_nop_create even if MAKEENTRY isn't
> set?  I might try that.  The penalty for not doing so is an extra trip
> to userland, which is greater than the penalty for other file systems
> not doing it.
There can be problems from the too aggressive caching.  See below.

> 
> >
> > Does FUSE cache vnodes ?  I would find aggressive caching on the kernel
> > side somewhat unexpected for it.
> 
> No, it just uses the regular vnode cache.  The unique things that it
> does is it caches file attributes within the vnode, and the daemon can
> request a timeout period for either the attr cache or the entry cache.
> When the timeout expires, the kernel is supposed to purge (or ignore)
> its cached values.

This is what I mean, e.g. one of the strategy there might be to reclaim
fuse vnode on inactivation.  This is very harsh, of course, but was done
by nullfs not too long time ago.

For less contrived example, on NFS with its relatively defined semantic,
caching on the client sometimes become problematic. AFAIR, nfs client
re-checks mtime in strategic places, and ensures close-to-open
consistency by always flushing attributes on close, at least for NFS v3.

I am somewhat surprised that for FUSE it is considered safe (and useful)
to cache at all.

From owner-freebsd-hackers@freebsd.org  Sun Mar  3 11:19:45 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9A6E11506BBC;
 Sun,  3 Mar 2019 11:19:45 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 02A5180B42;
 Sun,  3 Mar 2019 11:19:44 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x23BJWMX054208
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Sun, 3 Mar 2019 13:19:36 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x23BJWMX054208
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id x23BJVXN054206;
 Sun, 3 Mar 2019 13:19:31 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Sun, 3 Mar 2019 13:19:31 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Bruce Evans <brde@optusnet.com.au>
Cc: Mark Millard <marklmi@yahoo.com>,
 freebsd-hackers Hackers <freebsd-hackers@freebsd.org>,
 FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale *
 tc_delta(th) overflows unsigned 64 bits sometimes [patched failed]
Message-ID: <20190303111931.GI68879@kib.kiev.ua>
References: <962D78C3-65BE-40C1-BB50-A0088223C17B@yahoo.com>
 <28C2BB0A-3DAA-4D18-A317-49A8DD52778F@yahoo.com>
 <20190301112717.GW2420@kib.kiev.ua>
 <20190302043936.A4444@besplex.bde.org>
 <20190301194217.GB68879@kib.kiev.ua>
 <20190302071425.G5025@besplex.bde.org>
 <20190302105140.GC68879@kib.kiev.ua>
 <20190302225513.W3408@besplex.bde.org>
 <20190302142521.GE68879@kib.kiev.ua>
 <20190303041441.V4781@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20190303041441.V4781@besplex.bde.org>
User-Agent: Mutt/1.11.3 (2019-02-01)
X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM,
 NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Mar 2019 11:19:45 -0000

On Sun, Mar 03, 2019 at 04:43:20AM +1100, Bruce Evans wrote:
> On Sat, 2 Mar 2019, Konstantin Belousov wrote:
> 
> > On Sun, Mar 03, 2019 at 12:03:18AM +1100, Bruce Evans wrote:
> >> On Sat, 2 Mar 2019, Konstantin Belousov wrote:
> >>> ...
> >>> So I am able to reproduce it with some surprising ease on HPET running
> >>> on Haswell.
> >>
> >> So what is the cause of it?  Maybe the tickless code doesn't generate
> >> fake clock ticks right.  Or it is just a library bug.  The kernel has
> >> to be slightly real-time to satisfy the requirement of 1 update per.
> >> Applications are further from being real-time.  But isn't it enough
> >> for the kernel to ensure that the timehands cycle more than once per
> >> second?
> > No, I entered ddb as you suggested.
> 
> But using ddb is not normal.  It is convenient that this fixes HPET and
> ACPI timecounters after using ddb, but this method doesn't help for
> timecounters that wrap fast.  TSC-low at 2GHz wraps in 2 seconds, and
> i8254 wraps in a few milliseconds.
> 
> >> I don't changing this at all this.  binuptime() was carefully written
> >> to not need so much 64-bit arithmetic.
> >>
> >> If this pessimization is allowed, then it can also handle a 64-bit
> >> deltas.  Using the better kernel method:
> >>
> >>  		if (__predict_false(delta >= th->th_large_delta)) {
> >>  			bt->sec += (scale >> 32) * (delta >> 32);
> >>  			x = (scale >> 32) * (delta & 0xffffffff);
> >>  			bt->sec += x >> 32;
> >>  			bintime_addx(bt, x << 32);
> >>  			x = (scale & 0xffffffff) * (delta >> 32);
> >>  			bt->sec += x >> 32;
> >>  			bintime_addx(bt, x << 32);
> >>  			bintime_addx(bt, (scale & 0xffffffff) *
> >>  			    (delta & 0xffffffff));
> >>  		} else
> >>  			bintime_addx(bt, scale * (delta & 0xffffffff));
> > This only makes sense if delta is extended to uint64_t, which requires
> > the pass over timecounters.
> 
> Yes, that was its point.  It is a bit annoying to have a hardware
> timecounter like the TSC that doesn't wrap naturally, but then make it
> wrap by masking high bits.
> 
> The masking step is also a bit wasteful.  For the TSC, it is 1 step to
> discard high bids at the register level, then another step to apply the
> nask to discard th high bits again.
rdtsc-low is implemented in the natural way, after RDTSC, no register
combining into 64bit value is done, instead shrd operates on %edx:%eax
to get the final result into %eax.  I am not sure what you refer to.

> 
> >> I just noticed that there is a 64 x 32 -> 64 bit multiplication in the
> >> current method.  This can be changed to do expicit 32 x 32 -> 64 bit
> >> multiplications and fix the overflow problem at small extra cost on
> >> 32-bit arches:
> >>
> >>  		/* 32-bit arches did the next multiplication implicitly. */
> >>  		x = (scale >> 32) * delta;
> >>  		/*
> >>  		 * And they did the following shifts and most of the adds
> >>  		 * implicitly too.  Except shifting x left by 32 lost the
> >>  		 * seconds part that the next line handles.  The next line
> >>  		 * is the only extra cost for them.
> >>  		 */
> >>  		bt->sec += x >> 32;
> >>  		bintime_addx(bt, (x << 32) + (scale & 0xffffffff) * delta);
> >
> > Ok, what about the following.
> 
> I'm not sure that I really want this, even if the pessimization is done.
> But it avoids using fls*(), so is especially good for 32-bit systems and
> OK for 64-bit systems too, especially in userland where fls*() is in the
> fast path.
For userland I looked at the generated code, and BSR usage seems to be
good enough, for default compilation settings with clang.

> 
> >
> > diff --git a/lib/libc/sys/__vdso_gettimeofday.c b/lib/libc/sys/__vdso_gettimeofday.c
> > index 3749e0473af..cfe3d96d001 100644
> > --- a/lib/libc/sys/__vdso_gettimeofday.c
> > +++ b/lib/libc/sys/__vdso_gettimeofday.c
> > @@ -32,6 +32,8 @@ __FBSDID("$FreeBSD$");
> > #include <sys/time.h>
> > #include <sys/vdso.h>
> > #include <errno.h>
> > +#include <limits.h>
> 
> Not needed with 0xffffffff instead of UINT_MAX.
> 
> The userland part is otherwise little changed.
Yes, see above.  If ABI for shared page going to be changed in some future,
I will export th_large_delta as well.

> 
> > diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
> > index 2656fb4d22f..2e28f872229 100644
> > --- a/sys/kern/kern_tc.c
> > +++ b/sys/kern/kern_tc.c
> > ...
> > @@ -351,17 +352,44 @@ fbclock_getmicrotime(struct timeval *tvp)
> > 	} while (gen == 0 || gen != th->th_generation);
> > }
> > #else /* !FFCLOCK */
> > +
> > +static void
> > +bintime_helper(struct bintime *bt, uint64_t *scale, u_int delta)
> > +{
> > +	uint64_t x;
> > +
> > +	x = (*scale >> 32) * delta;
> > +	*scale &= 0xffffffff;
> > +	bt->sec += x >> 32;
> > +	bintime_addx(bt, x << 32);
> > +}
> 
> It is probably best to not inline the slow path, but clang tends to
> inline everything anyway.
It does not matter if it inlines it, as far as it is moved out of the
linear sequence for the fast path.

> 
> I prefer my way of writing this in 3 lines.  Modifying 'scale' for
> the next step is especially ugly and pessimal when the next step is
> in the caller and this function is not inlined.
Can you show exactly what do you want ?

> 
> > +
> > void
> > binuptime(struct bintime *bt)
> > {
> > 	struct timehands *th;
> > -	u_int gen;
> > +	uint64_t scale;
> > +	u_int delta, gen;
> >
> > 	do {
> > 		th = timehands;
> > 		gen = atomic_load_acq_int(&th->th_generation);
> > 		*bt = th->th_offset;
> > -		bintime_addx(bt, th->th_scale * tc_delta(th));
> > +		scale = th->th_scale;
> > +		delta = tc_delta(th);
> > +#ifdef _LP64
> > +		/* Avoid overflow for scale * delta. */
> > +		if (__predict_false(th->th_large_delta <= delta))
> > +			bintime_helper(bt, &scale, delta);
> > +		bintime_addx(bt, scale * delta);
> > +#else
> > +		/*
> > +		 * Also avoid (uint64_t, uint32_t) -> uint64_t
> > +		 * multiplication on 32bit arches.
> > +		 */
> 
> "Also avoid overflow for ..."
> 
> > +		bintime_helper(bt, &scale, delta);
> > +		bintime_addx(bt, (u_int)scale * delta);
> 
> The cast should be to uint32_t, but better write it as & 0xffffffff as
> elsewhere.
> 
> bintime_helper() already reduced 'scale' to 32 bits.  The cast might be
> needed to tell the compiler this, especially when the function is not
> inlined.  Better not do it in the function.  The function doesn't even
> use the reduced value.
I used cast to use 32x32 multiplication.  I am not sure that all (or any)
compilers are smart enough to deduce that they can use 32 bit mul.

> 
> bintime_helper() is in the fast path in this case, so should be inlined.
> 
> > +#endif
> > 		atomic_thread_fence_acq();
> > 	} while (gen == 0 || gen != th->th_generation);
> > }
> 
> This needs lots of testing of course.

Current kernel-only part of the change is below, see the question about
your preference for binuptime_helper().

diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
index 2656fb4d22f..6c41ab22288 100644
--- a/sys/kern/kern_tc.c
+++ b/sys/kern/kern_tc.c
@@ -72,6 +71,7 @@ struct timehands {
 	struct timecounter	*th_counter;
 	int64_t			th_adjustment;
 	uint64_t		th_scale;
+	uint64_t		th_large_delta;
 	u_int	 		th_offset_count;
 	struct bintime		th_offset;
 	struct bintime		th_bintime;
@@ -351,17 +351,45 @@ fbclock_getmicrotime(struct timeval *tvp)
 	} while (gen == 0 || gen != th->th_generation);
 }
 #else /* !FFCLOCK */
+
+static void
+bintime_helper(struct bintime *bt, uint64_t *scale, u_int delta)
+{
+	uint64_t x;
+
+	x = (*scale >> 32) * delta;
+	*scale &= 0xffffffff;
+	bt->sec += x >> 32;
+	bintime_addx(bt, x << 32);
+}
+
 void
 binuptime(struct bintime *bt)
 {
 	struct timehands *th;
-	u_int gen;
+	uint64_t scale;
+	u_int delta, gen;
 
 	do {
 		th = timehands;
 		gen = atomic_load_acq_int(&th->th_generation);
 		*bt = th->th_offset;
-		bintime_addx(bt, th->th_scale * tc_delta(th));
+		scale = th->th_scale;
+		delta = tc_delta(th);
+#ifdef _LP64
+		/* Avoid overflow for scale * delta. */
+		if (__predict_false(th->th_large_delta <= delta))
+			bintime_helper(bt, &scale, delta);
+		bintime_addx(bt, scale * delta);
+#else
+		/*
+		 * Avoid both overflow as above and
+		 * (uint64_t, uint32_t) -> uint64_t
+		 * multiplication on 32bit arches.
+		 */
+		bintime_helper(bt, &scale, delta);
+		bintime_addx(bt, (uint32_t)scale * delta);
+#endif
 		atomic_thread_fence_acq();
 	} while (gen == 0 || gen != th->th_generation);
 }
@@ -388,13 +416,29 @@ void
 bintime(struct bintime *bt)
 {
 	struct timehands *th;
-	u_int gen;
+	uint64_t scale;
+	u_int delta, gen;
 
 	do {
 		th = timehands;
 		gen = atomic_load_acq_int(&th->th_generation);
 		*bt = th->th_bintime;
-		bintime_addx(bt, th->th_scale * tc_delta(th));
+		scale = th->th_scale;
+		delta = tc_delta(th);
+#ifdef _LP64
+		/* Avoid overflow for scale * delta. */
+		if (__predict_false(th->th_large_delta <= delta))
+			bintime_helper(bt, &scale, delta);
+		bintime_addx(bt, scale * delta);
+#else
+		/*
+		 * Avoid both overflow as above and
+		 * (uint64_t, uint32_t) -> uint64_t
+		 * multiplication on 32bit arches.
+		 */
+		bintime_helper(bt, &scale, delta);
+		bintime_addx(bt, (uint32_t)scale * delta);
+#endif
 		atomic_thread_fence_acq();
 	} while (gen == 0 || gen != th->th_generation);
 }
@@ -1464,6 +1508,7 @@ tc_windup(struct bintime *new_boottimebin)
 	scale += (th->th_adjustment / 1024) * 2199;
 	scale /= th->th_counter->tc_frequency;
 	th->th_scale = scale * 2;
+	th->th_large_delta = ((uint64_t)1 << 63) / scale;
 
 	/*
 	 * Now that the struct timehands is again consistent, set the new

From owner-freebsd-hackers@freebsd.org  Sun Mar  3 16:02:27 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 76CA0150F4E2
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Sun,  3 Mar 2019 16:02:27 +0000 (UTC)
 (envelope-from asomers@gmail.com)
Received: from mail-lj1-f194.google.com (mail-lj1-f194.google.com
 [209.85.208.194])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id E644C89C18
 for <freebsd-hackers@freebsd.org>; Sun,  3 Mar 2019 16:02:26 +0000 (UTC)
 (envelope-from asomers@gmail.com)
Received: by mail-lj1-f194.google.com with SMTP id g80so2163275ljg.6
 for <freebsd-hackers@freebsd.org>; Sun, 03 Mar 2019 08:02:26 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=I9i2Bsej1cVHQ5NM52IjlaLqeWvyrxisw8H355lTI1g=;
 b=PqpfHPro9TOH1G0R/Hgzf9FFK4N8XExW7TPltsEr8dj7r4Yq5821JkUbsesuALPY5F
 7JgLJIR/I667fJpEmPymaunhK3eZT1dUK0sz9PxMQh+OP6eSX7OzlZmFeheVhbGto22p
 DKjT2QMPB3TA3c+/I9CVr7oDFUJRwnvQMFXvowA1mYe1bTkMmI0XvVE5t17afTWNWD3i
 og1jIWTg8OEbOahoTroO6vxGEUOt83xKAoRlKzI/H26w/mzetTn8rrcewCtqDThxFNSc
 fYL4eeqyEbLlU5y1NpaWLpQyoU8iD9ci3qXL1RalSPgBHPs24p6C6H/T6WPMYIBbhjXs
 UHbw==
X-Gm-Message-State: APjAAAXRqdUQsRLfizYhOfYLbG6rBGm/EzHeb/IsM9AG6HEhVAFqOgeh
 jMMD87qseKTjE02oNorIHOdGxMb3FZzO8o8QGxc=
X-Google-Smtp-Source: APXvYqzOJFa2V4amUNR7ZKvz94vBiZ5LuyoZD4+uVpZW7mwcsZjlLziPi73MQx+sGbmtBbQTtzkPjZF0GSMCLxolKY8=
X-Received: by 2002:a2e:1510:: with SMTP id s16mr8276238ljd.62.1551628938908; 
 Sun, 03 Mar 2019 08:02:18 -0800 (PST)
MIME-Version: 1.0
References: <CAOtMX2inYez8dXbmA5b1wj9Uhh_Nbp-gnFmtT_=T1mpWdyAUVw@mail.gmail.com>
 <20190303110346.GH68879@kib.kiev.ua>
In-Reply-To: <20190303110346.GH68879@kib.kiev.ua>
From: Alan Somers <asomers@freebsd.org>
Date: Sun, 3 Mar 2019 09:02:07 -0700
Message-ID: <CAOtMX2jTjocm1u60hCXF9+XRLhpK90HWtkPx_OEO=j10WxGWzw@mail.gmail.com>
Subject: Re: Adding namecache entries outside of vfs_lookup and vn_open ?
To: Konstantin Belousov <kostikbel@gmail.com>
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>
Content-Type: text/plain; charset="UTF-8"
X-Rspamd-Queue-Id: E644C89C18
X-Spamd-Bar: ------
Authentication-Results: mx1.freebsd.org
X-Spamd-Result: default: False [-6.98 / 15.00];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 NEURAL_HAM_SHORT(-0.98)[-0.981,0]; REPLY(-4.00)[];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Mar 2019 16:02:27 -0000

On Sun, Mar 3, 2019 at 4:03 AM Konstantin Belousov <kostikbel@gmail.com> wrote:
>
> On Sat, Mar 02, 2019 at 06:02:06PM -0700, Alan Somers wrote:
> > It looks like lookup and open are the only common vops that create new
> > namecache entries.  At least, those are the only ones that set
> > MAKEENTRY in the cn_flags field.  However, fuse(4)'s create-like
> > operations (FUSE_CREATE, FUSE_SYMLINK, etc) all return enough
> > information to create a namecache entry for the newly created file.
> > As-is, an operation like FUSE_CREATE will almost always be followed up
> > by a FUSE_LOOKUP, necessitating an extra round-trip to userland.
> In VFS, creation of the new file is done by VOP_CREATE() after negative
> VOP_LOOKUP().   VOP_CREATE() returns the new vnode that is installed into
> file.  [A flag VN_OPEN_NAMECACHE was added for vn_open_cred() which results
> in created name entry insertion into namecache.  It was done to handle
> very specific situation in core dump code, which is no longer relevant.
> The flag is still there.]
>
> Similar discussion occured some time ago.  I think that the current
> selection of the cases where namecache entry is created, is optimized
> for the scenario where extracting large tarball does not largely affect
> the non-directory elements of the cache.  If you do such extraction,
> it is unlikely that you will access most of the files shortly.
>
> > Would it be possible and wise to add these newly created entries to
> > the namecache automatically?
> Not from VFS, but the policy can be overriden by the filesystem by inserting
> the elements into cache from VOPs as it finds suitable.

So MAKEENTRY is just advisory, and there shouldn't be a problem with
inserting cache entries from fuse_nop_create even if MAKEENTRY isn't
set?  I might try that.  The penalty for not doing so is an extra trip
to userland, which is greater than the penalty for other file systems
not doing it.

>
> Does FUSE cache vnodes ?  I would find aggressive caching on the kernel
> side somewhat unexpected for it.

No, it just uses the regular vnode cache.  The unique things that it
does is it caches file attributes within the vnode, and the daemon can
request a timeout period for either the attr cache or the entry cache.
When the timeout expires, the kernel is supposed to purge (or ignore)
its cached values.

-Alan

From owner-freebsd-hackers@freebsd.org  Sun Mar  3 16:41:06 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id E777315109D3
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Sun,  3 Mar 2019 16:41:05 +0000 (UTC)
 (envelope-from asomers@gmail.com)
Received: from mail-lf1-f65.google.com (mail-lf1-f65.google.com
 [209.85.167.65])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 6034D8AECB
 for <freebsd-hackers@freebsd.org>; Sun,  3 Mar 2019 16:41:05 +0000 (UTC)
 (envelope-from asomers@gmail.com)
Received: by mail-lf1-f65.google.com with SMTP id p73so1177672lfe.10
 for <freebsd-hackers@freebsd.org>; Sun, 03 Mar 2019 08:41:05 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=02ke6Y9nioz7OMFWzwHE0x2KijJllI7UJRyaDPy7VTU=;
 b=Y16F6QX924mPKerRwCajcgz5SiK6Oo0WzrNF/JzJBUHztpEIA4LPXUm5xIsMu4nEV5
 XFm9Zhtw1dXp0kTzTZKkoL8HTpqgt12EMHi7rey4yPTkPTpht25Hj6a7wj+M5Z8wvP/O
 YSw+DTGkO/Y7SpbxchHMqcEYs21ScL3Wo4VLfgcTlwXyCErfRTAtBK08y3smLuGsZcAO
 pfL0/qjZRwIQtSbZj6j/saOLUBLPkFKFit5uYQTWQW1RLYaP6SrRUZQbbslVm/YaOTcg
 erJU9dTVDdna+iCAG6Wv4QEOeFPccD4/kMCwFFU2RD0v1bhCQHP3PdPDn3WkXvYMgMSq
 rTDQ==
X-Gm-Message-State: APjAAAVNPtsQ0vqyLD8cPfd7JuKimA60ri1mvdG7R/jWWo4QZogXqCpy
 APxZ9qTj1mlo1t7brEZbGBVej0iBmtRf7Uqbtc0=
X-Google-Smtp-Source: APXvYqzNsnGg/S6labwB9bvrho+WnYU88Zad7yY3AuWMRFNE2EiJX+vlcmL9UtmOqux9YaGn09TwqvFQJjrjG3hwnls=
X-Received: by 2002:a19:c1c4:: with SMTP id r187mr8070925lff.10.1551631257890; 
 Sun, 03 Mar 2019 08:40:57 -0800 (PST)
MIME-Version: 1.0
References: <CAOtMX2inYez8dXbmA5b1wj9Uhh_Nbp-gnFmtT_=T1mpWdyAUVw@mail.gmail.com>
 <20190303110346.GH68879@kib.kiev.ua>
 <CAOtMX2jTjocm1u60hCXF9+XRLhpK90HWtkPx_OEO=j10WxGWzw@mail.gmail.com>
 <20190303162518.GK68879@kib.kiev.ua>
In-Reply-To: <20190303162518.GK68879@kib.kiev.ua>
From: Alan Somers <asomers@freebsd.org>
Date: Sun, 3 Mar 2019 09:40:46 -0700
Message-ID: <CAOtMX2i0j+1vSmPM6jN3dzxCDyO-73hnnePmwjTrDsiAg-1H+g@mail.gmail.com>
Subject: Re: Adding namecache entries outside of vfs_lookup and vn_open ?
To: Konstantin Belousov <kostikbel@gmail.com>
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>
Content-Type: text/plain; charset="UTF-8"
X-Rspamd-Queue-Id: 6034D8AECB
X-Spamd-Bar: ------
Authentication-Results: mx1.freebsd.org
X-Spamd-Result: default: False [-6.98 / 15.00];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 NEURAL_HAM_SHORT(-0.98)[-0.981,0]; REPLY(-4.00)[];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Mar 2019 16:41:06 -0000

On Sun, Mar 3, 2019 at 9:25 AM Konstantin Belousov <kostikbel@gmail.com> wrote:
>
> On Sun, Mar 03, 2019 at 09:02:07AM -0700, Alan Somers wrote:
> > On Sun, Mar 3, 2019 at 4:03 AM Konstantin Belousov <kostikbel@gmail.com> wrote:
> > >
> > > On Sat, Mar 02, 2019 at 06:02:06PM -0700, Alan Somers wrote:
> > > > It looks like lookup and open are the only common vops that create new
> > > > namecache entries.  At least, those are the only ones that set
> > > > MAKEENTRY in the cn_flags field.  However, fuse(4)'s create-like
> > > > operations (FUSE_CREATE, FUSE_SYMLINK, etc) all return enough
> > > > information to create a namecache entry for the newly created file.
> > > > As-is, an operation like FUSE_CREATE will almost always be followed up
> > > > by a FUSE_LOOKUP, necessitating an extra round-trip to userland.
> > > In VFS, creation of the new file is done by VOP_CREATE() after negative
> > > VOP_LOOKUP().   VOP_CREATE() returns the new vnode that is installed into
> > > file.  [A flag VN_OPEN_NAMECACHE was added for vn_open_cred() which results
> > > in created name entry insertion into namecache.  It was done to handle
> > > very specific situation in core dump code, which is no longer relevant.
> > > The flag is still there.]
> > >
> > > Similar discussion occured some time ago.  I think that the current
> > > selection of the cases where namecache entry is created, is optimized
> > > for the scenario where extracting large tarball does not largely affect
> > > the non-directory elements of the cache.  If you do such extraction,
> > > it is unlikely that you will access most of the files shortly.
> > >
> > > > Would it be possible and wise to add these newly created entries to
> > > > the namecache automatically?
> > > Not from VFS, but the policy can be overriden by the filesystem by inserting
> > > the elements into cache from VOPs as it finds suitable.
> >
> > So MAKEENTRY is just advisory, and there shouldn't be a problem with
> > inserting cache entries from fuse_nop_create even if MAKEENTRY isn't
> > set?  I might try that.  The penalty for not doing so is an extra trip
> > to userland, which is greater than the penalty for other file systems
> > not doing it.
> There can be problems from the too aggressive caching.  See below.
>
> >
> > >
> > > Does FUSE cache vnodes ?  I would find aggressive caching on the kernel
> > > side somewhat unexpected for it.
> >
> > No, it just uses the regular vnode cache.  The unique things that it
> > does is it caches file attributes within the vnode, and the daemon can
> > request a timeout period for either the attr cache or the entry cache.
> > When the timeout expires, the kernel is supposed to purge (or ignore)
> > its cached values.
>
> This is what I mean, e.g. one of the strategy there might be to reclaim
> fuse vnode on inactivation.  This is very harsh, of course, but was done
> by nullfs not too long time ago.

Currently fuse doesn't do anything special when the timeout expires.
It only checks the timeout on lookup, and ignores the cached value if
the timeout has already expired.

>
> For less contrived example, on NFS with its relatively defined semantic,
> caching on the client sometimes become problematic. AFAIR, nfs client
> re-checks mtime in strategic places, and ensures close-to-open
> consistency by always flushing attributes on close, at least for NFS v3.
>
> I am somewhat surprised that for FUSE it is considered safe (and useful)
> to cache at all.

The daemon can choose the timeout period.  For local filesystems like
fusefs-ext2 it might set the timeout to infinity.  For simple network
filesystems like fusefs-sshfs it might set the timeout to 0, disabling
all kernel cacheing.  And for more sophisticated network filesystems
like an NFSv4 client might set the timeout to a finite non-zero time.
Later versions of the fuse protocol also allow the daemon to tell the
kernel to  immediately expire its cache.

-Alan

From owner-freebsd-hackers@freebsd.org  Sun Mar  3 21:33:40 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 50E6F151CACC
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Sun,  3 Mar 2019 21:33:40 +0000 (UTC)
 (envelope-from marklmi@yahoo.com)
Received: from sonic305-49.consmr.mail.ne1.yahoo.com
 (sonic305-49.consmr.mail.ne1.yahoo.com [66.163.185.175])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 3A7026E392
 for <freebsd-hackers@freebsd.org>; Sun,  3 Mar 2019 21:33:39 +0000 (UTC)
 (envelope-from marklmi@yahoo.com)
X-YMail-OSG: qD9TOnMVM1k.Ghd4.8T_W8XTbUVNtLtatAuJ7hQZ5Ks1VpuxiRcKsrOf8r66Bod
 mP9F5P4PQQGh22H8HbhChBHcmXxKQ6SA_llrTDQY1oawxWa4RUDk-
Received: from sonic.gate.mail.ne1.yahoo.com by
 sonic305.consmr.mail.ne1.yahoo.com with HTTP; Sun, 3 Mar 2019 21:33:31 +0000
Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.115])
 ([67.170.167.181])
 by smtp410.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID
 908822f7437e5714b55851d382380e1a; 
 Sun, 03 Mar 2019 21:23:06 +0000 (UTC)
From: Mark Millard <marklmi@yahoo.com>
Content-Type: text/plain;
	charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\))
Subject: Re: powerpc64 on PowerMac G5 4-core (system total): a hack that so
 far seem to avoid the stuck-sleeping issue [self-hosted
 buildworld/buildkernel completed]
Date: Sun, 3 Mar 2019 13:23:04 -0800
References: <B898BF60-2872-4FFC-AD72-A32591BC7D20@yahoo.com>
To: FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>,
 Mark Millard via freebsd-hackers <freebsd-hackers@freebsd.org>
In-Reply-To: <B898BF60-2872-4FFC-AD72-A32591BC7D20@yahoo.com>
Message-Id: <76E8BF75-A8F5-4A48-9B7C-6494F4A9520B@yahoo.com>
X-Mailer: Apple Mail (2.3445.102.3)
X-Rspamd-Queue-Id: 3A7026E392
X-Spamd-Bar: /
X-Spamd-Result: default: False [-0.66 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[];
 R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[];
 FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3];
 TO_DN_ALL(0.00)[];
 MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net];
 DKIM_TRACE(0.00)[yahoo.com:+]; RCPT_COUNT_TWO(0.00)[2];
 DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject];
 FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[];
 MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com];
 ASN(0.00)[asn:36646, ipnet:66.163.184.0/21, country:US];
 MID_RHS_MATCH_FROM(0.00)[];
 DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0];
 ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.30)[-0.299,0];
 R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[];
 NEURAL_SPAM_SHORT(0.58)[0.583,0];
 NEURAL_HAM_LONG(-0.88)[-0.883,0]; MIME_GOOD(-0.10)[text/plain];
 IP_SCORE(0.45)[ipnet: 66.163.184.0/21(1.29), asn: 36646(1.04), country:
 US(-0.07)]; TO_MATCH_ENVRCPT_SOME(0.00)[];
 RCVD_IN_DNSWL_NONE(0.00)[175.185.163.66.list.dnswl.org : 127.0.5.0]
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Mar 2019 21:33:40 -0000

[So far the hack has been successful. Details given later
below.]

On 2019-Mar-2, at 21:20, Mark Millard <marklmi at yahoo.com> wrote:

> [This note goes in a different direction compared to my
> prior evidence report for overflows and the later activity
> that has been happening for it. This does *not* involve
> the patches associated with that report.]
>=20
> I view the following as an evidence-gathering hack:
> showing the change in behavior with the code changes,
> not as directly what FreeBSD should do for powerpc64.
> In code for defined(__powerpc64__) && defined(AIM)
> I freely use knowledge of the PowerMac G5 context
> instead of attempting general code.
>=20
> Also: the code is set up to record some information
> that I've been looking at via ddb. The recording is
> not part of what changes the behavior but I decided
> to show that code too.
>=20
> It is preliminary, but, so far, the hack has avoided
> buf*daemon* threads and pmac_thermal getting stuck
> sleeping (or, at least, far less frequently).
>=20
>=20
> The tbr-value hack:
>=20
> =46rom what I see the G5 various cores have each tbr running at the
> same rate but have some some offsets as far as the base time
> goes. cpu_mp_unleash does:
>=20
>        ap_awake =3D 1;
>=20
>        /* Provide our current DEC and TB values for APs */
>        ap_timebase =3D mftb() + 10;
>        __asm __volatile("msync; isync");
>=20
>        /* Let APs continue */
>        atomic_store_rel_int(&ap_letgo, 1);
>=20
>        platform_smp_timebase_sync(ap_timebase, 0);
>=20
> and machdep_ap_bootstrap does:
>=20
>        /*
>         * Set timebase as soon as possible to meet an implicit =
rendezvous
>         * from cpu_mp_unleash(), which sets ap_letgo and then =
immediately
>         * sets timebase.
>         *
>         * Note that this is instrinsically racy and is only relevant =
on
>         * platforms that do not support better mechanisms.
>         */
>        platform_smp_timebase_sync(ap_timebase, 1);
>=20
>=20
> which attempts to set the tbrs appropriately.
>=20
> But on small scales of differences the various tbr
> values from different cpus end up not well ordered
> relative to time, synchronizes with, and the like.
> Only large enough differences can well indicate an
> ordering of interest.
>=20
> Note: tc->tc_get_timecount(tc) only provides the
> least signficant 32 bits of the tbr value.
> th->th_offset_count is also 32 bits and based on
> truncated tbr values.
>=20
> So I made binuptime avoid finishing when it sees
> a small (<0x10) step backwards for a new
> tc->tc_get_timecount(tc) value vs. the existing
> th->th_offset_count value (values strongly tied
> to powerpc64 tbr values):
>=20
> void
> binuptime(struct bintime *bt)
> {
>        struct timehands *th;
>        u_int gen;
>=20
>        struct bintime old_bt=3D *bt; // HACK!!!
>        struct timecounter *tc; // HACK!!!
>        u_int tim_cnt, tim_offset, tim_diff; // HACK!!!
>        uint64_t freq, scale_factor, diff_scaled; // HACK!!!
>=20
>        u_int try_cnt=3D 0ull; // HACK!!!
>=20
>        do {
>                do { // HACK!!!
>                    th =3D timehands;
>                    tc =3D th->th_counter;
>                    gen =3D atomic_load_acq_int(&th->th_generation);
>                    tim_cnt=3D tc->tc_get_timecount(tc);
>                    tim_offset=3D th->th_offset_count;
>                } while (tim_cnt<tim_offset && =
tim_offset-tim_cnt<0x10);
>                *bt =3D th->th_offset;
>                tim_diff=3D (tim_cnt - tim_offset) & =
tc->tc_counter_mask;
>                scale_factor=3D th->th_scale;
>                diff_scaled=3D scale_factor * tim_diff;
>                bintime_addx(bt, diff_scaled);
>                freq=3D tc->tc_frequency;
>                atomic_thread_fence_acq();
>                try_cnt++;
>        } while (gen =3D=3D 0 || gen !=3D th->th_generation);
>=20
>        if (*(volatile uint64_t*)0xc000000000000020=3D=3D0u && =
(0xffffffffffffffffull/scale_factor)<tim_diff) { // HACK!!!
>                *(volatile uint64_t*)0xc000000000000020=3D =
bttosbt(old_bt);
>                *(volatile uint64_t*)0xc000000000000028=3D =
bttosbt(*bt);
>                *(volatile uint64_t*)0xc000000000000030=3D freq;
>                *(volatile uint64_t*)0xc000000000000038=3D =
scale_factor;
>                *(volatile uint64_t*)0xc000000000000040=3D tim_offset;
>                *(volatile uint64_t*)0xc000000000000048=3D tim_cnt;
>                *(volatile uint64_t*)0xc000000000000050=3D tim_diff;
>                *(volatile uint64_t*)0xc000000000000058=3D try_cnt;
>                *(volatile uint64_t*)0xc000000000000060=3D diff_scaled;
>                *(volatile uint64_t*)0xc000000000000068=3D =
scale_factor*freq;
>                __asm__ ("sync");
>        } else if (*(volatile uint64_t*)0xc0000000000000a0=3D=3D0u && =
(0xffffffffffffffffull/scale_factor)<tim_diff) { // HACK!!!
>                *(volatile uint64_t*)0xc0000000000000a0=3D =
bttosbt(old_bt);
>                *(volatile uint64_t*)0xc0000000000000a8=3D =
bttosbt(*bt);
>                *(volatile uint64_t*)0xc0000000000000b0=3D freq;
>                *(volatile uint64_t*)0xc0000000000000b8=3D =
scale_factor;
>                *(volatile uint64_t*)0xc0000000000000c0=3D tim_offset;
>                *(volatile uint64_t*)0xc0000000000000c8=3D tim_cnt;
>                *(volatile uint64_t*)0xc0000000000000d0=3D tim_diff;
>                *(volatile uint64_t*)0xc0000000000000d8=3D try_cnt;
>                *(volatile uint64_t*)0xc0000000000000e0=3D diff_scaled;
>                *(volatile uint64_t*)0xc0000000000000e8=3D =
scale_factor*freq;
>                __asm__ ("sync");
>        }
> }
> #else
> . . .
> #endif
>=20
> So far as I can tell, the FreeBSD code is not designed to deal
> with small differences in tc->tc_get_timecount(tc) not actually
> indicating a useful < vs. =3D=3D vs. > ordering relation uniquely.
>=20
> (I make no claim that the hack is a proper way to deal with
> such.)

I did a somewhat over 7 hours buildworld buildkernel on the
PowerMac G5. Overall the G5 has been up over 13 hours and
none of the buf*daemon* threads have gotten stuck sleeping.
Nor has pmac_thermal gotten stuck. Similarly for vnlru
and syncer: "top -HIStopid" still shows them all as
periodically active.

Previously for this usefdt=3D1 context (with the modern
VM_MAX_KERNEL_ADDRESS), going more than a few minutes
without at least one of those threads getting stuck
sleeping was rare on the G5 (powerpc64 example).

So this hack has managed to avoid finding sbinuptime()
in sleepq_timeout being less than the earlier (by call
structure/code sequencing) sbinuptime() in timercb that
lead to the sleepq_timeout callout being called in the
first place.

So in the sleepq_timeout callout's:

        if (td->td_sleeptimo > sbinuptime() || td->td_sleeptimo =3D=3D =
0) {
                /*
                 * The thread does not want a timeout (yet).
                 */
        } else . . .

td->td_sleeptimo > sbinuptime() ends up false now for small
enough original differences.

This case does not set up another timeout, it just leaves the
thread stuck sleeping, no longer doing periodic activities.

As stands what I did (presuming an appropriate definition
of "small differences in the problematical direction") should
leave this and other sbinuptime-using code with:

td->td_sleeptimo <=3D sbinuptime()

for what were originally "small" tbr value differences in the
problematical direction (in case other places require it in
some way).

If, instead, just sleepq_timeout's test could allow for
some slop in the ordering, it could be a cheaper hack then
looping in binuptime .

At this point I've no clue what a correct/efficient FreeBSD
design for allowing the sloppy match across tbr's for different
CPUs would be.


=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)


From owner-freebsd-hackers@freebsd.org  Sun Mar  3 13:32:23 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id D82A8150AD29;
 Sun,  3 Mar 2019 13:32:22 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au
 [211.29.132.246])
 by mx1.freebsd.org (Postfix) with ESMTP id 2DFAB84F1C;
 Sun,  3 Mar 2019 13:32:21 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au
 [110.21.101.228])
 by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 2FB8F436AEC;
 Mon,  4 Mar 2019 00:32:12 +1100 (AEDT)
Date: Mon, 4 Mar 2019 00:32:12 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Konstantin Belousov <kostikbel@gmail.com>
cc: Mark Millard <marklmi@yahoo.com>, 
 freebsd-hackers Hackers <freebsd-hackers@freebsd.org>, 
 FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale
 * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed]
In-Reply-To: <20190303111931.GI68879@kib.kiev.ua>
Message-ID: <20190303223100.B3572@besplex.bde.org>
References: <962D78C3-65BE-40C1-BB50-A0088223C17B@yahoo.com>
 <28C2BB0A-3DAA-4D18-A317-49A8DD52778F@yahoo.com>
 <20190301112717.GW2420@kib.kiev.ua>
 <20190302043936.A4444@besplex.bde.org> <20190301194217.GB68879@kib.kiev.ua>
 <20190302071425.G5025@besplex.bde.org> <20190302105140.GC68879@kib.kiev.ua>
 <20190302225513.W3408@besplex.bde.org> <20190302142521.GE68879@kib.kiev.ua>
 <20190303041441.V4781@besplex.bde.org> <20190303111931.GI68879@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0
 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17
 a=kj9zAlcOel0A:10 a=L2uf15vNulIdqj9DapQA:9 a=CjuIK1q_8ugA:10
X-Rspamd-Queue-Id: 2DFAB84F1C
X-Spamd-Bar: ------
Authentication-Results: mx1.freebsd.org
X-Spamd-Result: default: False [-6.90 / 15.00];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 NEURAL_HAM_SHORT(-0.90)[-0.900,0]; REPLY(-4.00)[];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]
X-Mailman-Approved-At: Sun, 03 Mar 2019 22:44:33 +0000
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Mar 2019 13:32:23 -0000

On Sun, 3 Mar 2019, Konstantin Belousov wrote:

> On Sun, Mar 03, 2019 at 04:43:20AM +1100, Bruce Evans wrote:
>> On Sat, 2 Mar 2019, Konstantin Belousov wrote:
>>
>>> On Sun, Mar 03, 2019 at 12:03:18AM +1100, Bruce Evans wrote:
>>>> On Sat, 2 Mar 2019, Konstantin Belousov wrote:
>* ...
>>>> I don't changing this at all this.  binuptime() was carefully written
>>>> to not need so much 64-bit arithmetic.
>>>>
>>>> If this pessimization is allowed, then it can also handle a 64-bit
>>>> deltas.  Using the better kernel method:
>>>>
>>>>  		if (__predict_false(delta >= th->th_large_delta)) {
>>>>  			bt->sec += (scale >> 32) * (delta >> 32);
>>>>  			x = (scale >> 32) * (delta & 0xffffffff);
>>>>  			bt->sec += x >> 32;
>>>>  			bintime_addx(bt, x << 32);
>>>>  			x = (scale & 0xffffffff) * (delta >> 32);
>>>>  			bt->sec += x >> 32;
>>>>  			bintime_addx(bt, x << 32);
>>>>  			bintime_addx(bt, (scale & 0xffffffff) *
>>>>  			    (delta & 0xffffffff));
>>>>  		} else
>>>>  			bintime_addx(bt, scale * (delta & 0xffffffff));
>>> This only makes sense if delta is extended to uint64_t, which requires
>>> the pass over timecounters.
>>
>> Yes, that was its point.  It is a bit annoying to have a hardware
>> timecounter like the TSC that doesn't wrap naturally, but then make it
>> wrap by masking high bits.
>>
>> The masking step is also a bit wasteful.  For the TSC, it is 1 step to
>> discard high bids at the register level, then another step to apply the
>> nask to discard th high bits again.
> rdtsc-low is implemented in the natural way, after RDTSC, no register
> combining into 64bit value is done, instead shrd operates on %edx:%eax
> to get the final result into %eax.  I am not sure what you refer to.

I was referring mostly to the masking step '& tc->tc_counter_mask' and
the lack of register combining in rdtsc().

However, shrd in rdtsc-low (tsc_get_timecount_low()) does a slow combining
step.  i386 used to be faster here -- the first masking step of discarding
%edx doesn't take any code.  amd64 has to mask out the top bits in %rax.
Now for the tsc-low pessimization, i386 has to do a slow shrd, and amd64
has to do a not so slow shr.

Then the '& tc->tc_counter_mask' step has no effect.

All this is wrapped in many layers of function calls which are quite slow
but this lets the other operations run in parallel on some CPUs.

>>>>  		/* 32-bit arches did the next multiplication implicitly. */
>>>>  		x = (scale >> 32) * delta;
>>>>  		/*
>>>>  		 * And they did the following shifts and most of the adds
>>>>  		 * implicitly too.  Except shifting x left by 32 lost the
>>>>  		 * seconds part that the next line handles.  The next line
>>>>  		 * is the only extra cost for them.
>>>>  		 */
>>>>  		bt->sec += x >> 32;
>>>>  		bintime_addx(bt, (x << 32) + (scale & 0xffffffff) * delta);
>>>
>>> Ok, what about the following.
>>
>> I'm not sure that I really want this, even if the pessimization is done.
>> But it avoids using fls*(), so is especially good for 32-bit systems and
>> OK for 64-bit systems too, especially in userland where fls*() is in the
>> fast path.
> For userland I looked at the generated code, and BSR usage seems to be
> good enough, for default compilation settings with clang.

I use gcc-4.2.1, and it doesn't do this optimization.

I already reported this in connection with fixing calcru1().  calcru1()
is unnecessarily several times slower on i386 than on amd64 even after
avoiding using flsll() on it.  The main slowness is in converting 'usec'
to tv_sec and tv_usec, due to the bad design and implementation of the
__udivdi3 and __umoddi3 libcalls.  The bad design is having to make 2
libcalls to get the quotient and remainder.  The bad implementation is
the portable C version in libkern.  libgcc provides a better implementation,
but this is not available in the kernel.

>>> diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
>>> index 2656fb4d22f..2e28f872229 100644
>>> --- a/sys/kern/kern_tc.c
>>> +++ b/sys/kern/kern_tc.c
>>> ...
>>> @@ -351,17 +352,44 @@ fbclock_getmicrotime(struct timeval *tvp)
>>> 	} while (gen == 0 || gen != th->th_generation);
>>> }
>>> #else /* !FFCLOCK */
>>> +
>>> +static void
>>> +bintime_helper(struct bintime *bt, uint64_t *scale, u_int delta)
>>> +{
>>> +	uint64_t x;
>>> +
>>> +	x = (*scale >> 32) * delta;
>>> +	*scale &= 0xffffffff;
>>> +	bt->sec += x >> 32;
>>> +	bintime_addx(bt, x << 32);
>>> +}
>>
>> It is probably best to not inline the slow path, but clang tends to
>> inline everything anyway.
> It does not matter if it inlines it, as far as it is moved out of the
> linear sequence for the fast path.
>>
>> I prefer my way of writing this in 3 lines.  Modifying 'scale' for
>> the next step is especially ugly and pessimal when the next step is
>> in the caller and this function is not inlined.
> Can you show exactly what do you want ?

Just write 'scale & 0xffffffff' for the low bits of 'scale' in callers,
and don't pass 'scale' indirectly to bintime_helper() and don't modify
it there.

Oops, there is a problem.  'scale' must be reduced iff bintime_helper()
was used.  Duplicate some source code so as to not need a fall-through
to the fast path.  See below.

>>> void
>>> binuptime(struct bintime *bt)
>>> {
>>> 	struct timehands *th;
>>> -	u_int gen;
>>> +	uint64_t scale;
>>> +	u_int delta, gen;
>>>
>>> 	do {
>>> 		th = timehands;
>>> 		gen = atomic_load_acq_int(&th->th_generation);
>>> 		*bt = th->th_offset;
>>> -		bintime_addx(bt, th->th_scale * tc_delta(th));
>>> +		scale = th->th_scale;
>>> +		delta = tc_delta(th);
>>> +#ifdef _LP64
>>> +		/* Avoid overflow for scale * delta. */
>>> +		if (__predict_false(th->th_large_delta <= delta))
>>> +			bintime_helper(bt, &scale, delta);
>>> +		bintime_addx(bt, scale * delta);
>>> +#else
>>> +		/*
>>> +		 * Also avoid (uint64_t, uint32_t) -> uint64_t
>>> +		 * multiplication on 32bit arches.
>>> +		 */
>>
>> "Also avoid overflow for ..."
>>
>>> +		bintime_helper(bt, &scale, delta);
>>> +		bintime_addx(bt, (u_int)scale * delta);
>>
>> The cast should be to uint32_t, but better write it as & 0xffffffff as
>> elsewhere.

This is actually very broken.  The cast gives a 32 x 32 -> 32 bit
multiplication, but all 64 bits of the result are needed.

>>
>> bintime_helper() already reduced 'scale' to 32 bits.  The cast might be
>> needed to tell the compiler this, especially when the function is not
>> inlined.  Better not do it in the function.  The function doesn't even
>> use the reduced value.
> I used cast to use 32x32 multiplication.  I am not sure that all (or any)
> compilers are smart enough to deduce that they can use 32 bit mul.

Writing the reduction to 32 bits using a mask instead of a cast automatically
avoids the bug, but might not give the optimization.

They do do this optimization, but might need the cast as well as the mask.
At worst, '(uint64_t)(uint32_t)(scale & 0xffffffff)', where the mask is
now redundant but the cast back to 64 bits is needed if the cast to 32
bits is used.

You already depended on them not needing the cast for the expression
'(*scale >> 32) * delta'.  Here delta is 32 bits and the other operand
must remain 64 bits so that after default promotions the multiplication
is 64 x 64 -> 64 bits, but the compiler should optimize this to
32 x 32 -> 64 bits.  (*scale >> 32) would need to be cast to 32 bits
and then back to 64 bits if the compiler can't do this automatically.

I checked what some compilers do.  Both gcc-3.3.3 and gcc-4.2.1
optimize only (uint64_t)x * y (where x and y have type uint32_t), so they
need to be helped by casts if x and y have have a larger type even if
their values obviously fit in 32 bits.  So the expressions should be
written as:

 	(uint64_t)(uint32_t)(scale >> 32) * delta;

and

 	(uint64_t)(uint32_t)scale * delta;

The 2 casts are always needed, but the '& 0xffffffff' operation doesn't
need to be explicit because the cast does.

>> This needs lots of testing of course.
>
> Current kernel-only part of the change is below, see the question about
> your preference for binuptime_helper().
>
> diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
> index 2656fb4d22f..6c41ab22288 100644
> --- a/sys/kern/kern_tc.c
> +++ b/sys/kern/kern_tc.c
> @@ -72,6 +71,7 @@ struct timehands {
> 	struct timecounter	*th_counter;
> 	int64_t			th_adjustment;
> 	uint64_t		th_scale;
> +	uint64_t		th_large_delta;
> 	u_int	 		th_offset_count;
> 	struct bintime		th_offset;
> 	struct bintime		th_bintime;
> @@ -351,17 +351,45 @@ fbclock_getmicrotime(struct timeval *tvp)
> 	} while (gen == 0 || gen != th->th_generation);
> }
> #else /* !FFCLOCK */
> +
> +static void

Add __inline.  This is in the fast path for 32-bit systems.

> +bintime_helper(struct bintime *bt, uint64_t *scale, u_int delta)
> +{
> +	uint64_t x;
> +
> +	x = (*scale >> 32) * delta;
> +	*scale &= 0xffffffff;

Remove the '*' on scale, cast (scale >> 32) to
(uint64_t)(uint32_t)(scale >> 32), and remove the change to *scale.

> +	bt->sec += x >> 32;
> +	bintime_addx(bt, x << 32);
> +}
> +
> void
> binuptime(struct bintime *bt)
> {
> 	struct timehands *th;
> -	u_int gen;
> +	uint64_t scale;
> +	u_int delta, gen;
>
> 	do {
> 		th = timehands;
> 		gen = atomic_load_acq_int(&th->th_generation);
> 		*bt = th->th_offset;
> -		bintime_addx(bt, th->th_scale * tc_delta(th));
> +		scale = th->th_scale;
> +		delta = tc_delta(th);
> +#ifdef _LP64
> +		/* Avoid overflow for scale * delta. */
> +		if (__predict_false(th->th_large_delta <= delta))
> +			bintime_helper(bt, &scale, delta);
> +		bintime_addx(bt, scale * delta);

Change to:

 		if (__predict_false(th->th_large_delta <= delta)) {
 			bintime_helper(bt, scale, delta);
 			bintime_addx(bt, (scale & 0xffffffff) * delta);
 		} else
 			bintime_addx(bt, scale * delta);

> +#else
> +		/*
> +		 * Avoid both overflow as above and
> +		 * (uint64_t, uint32_t) -> uint64_t
> +		 * multiplication on 32bit arches.
> +		 */

This is a bit unclear.  Better emphasize avoidance of the 64 x 32 -> 64 bit
multiplication.  Something like:

 		/*
 		 * Use bintime_helper() unconditionally, since the fast
 		 * path in the above method is not so fast here, since
 		 * the 64 x 32 -> 64 bit multiplication is usually not
 		 * available in hardware and emulating it using 2
 		 * 32 x 32 -> 64 bit multiplications uses code much
 		 * like that in bintime_helper().
 		 */

> +		bintime_helper(bt, &scale, delta);
> +		bintime_addx(bt, (uint32_t)scale * delta);
> +#endif

Remove '&' as usual, and fix this by casting the reduced scale back to
64 bits.

Similarly in bintime().

Similarly in libc -- don't use the slow flsll() method in the 32-bit
case where it is especially slow.  Don't use it in the 64-bit case either,
since this would need to be change when th_large_delta is added to the
API.

Now I don't like my method in the kernel.  It is is unnecessarily
complicated to have a specal case, and not faster either.

Bruce

From owner-freebsd-hackers@freebsd.org  Sun Mar  3 18:29:54 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8A3861513E35;
 Sun,  3 Mar 2019 18:29:54 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au
 [211.29.132.246])
 by mx1.freebsd.org (Postfix) with ESMTP id C60318DC84;
 Sun,  3 Mar 2019 18:29:53 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au
 [110.21.101.228])
 by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id F3142433301;
 Mon,  4 Mar 2019 05:29:49 +1100 (AEDT)
Date: Mon, 4 Mar 2019 05:29:48 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Konstantin Belousov <kostikbel@gmail.com>
cc: Bruce Evans <brde@optusnet.com.au>, Mark Millard <marklmi@yahoo.com>, 
 freebsd-hackers Hackers <freebsd-hackers@freebsd.org>, 
 FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale
 * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed]
In-Reply-To: <20190303161635.GJ68879@kib.kiev.ua>
Message-ID: <20190304043416.V5640@besplex.bde.org>
References: <20190301112717.GW2420@kib.kiev.ua>
 <20190302043936.A4444@besplex.bde.org>
 <20190301194217.GB68879@kib.kiev.ua> <20190302071425.G5025@besplex.bde.org>
 <20190302105140.GC68879@kib.kiev.ua> <20190302225513.W3408@besplex.bde.org>
 <20190302142521.GE68879@kib.kiev.ua> <20190303041441.V4781@besplex.bde.org>
 <20190303111931.GI68879@kib.kiev.ua> <20190303223100.B3572@besplex.bde.org>
 <20190303161635.GJ68879@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0
 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17
 a=kj9zAlcOel0A:10 a=8yM2XH24hrI5ozH3vLgA:9 a=CjuIK1q_8ugA:10
X-Rspamd-Queue-Id: C60318DC84
X-Spamd-Bar: ------
Authentication-Results: mx1.freebsd.org
X-Spamd-Result: default: False [-6.97 / 15.00];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 NEURAL_HAM_SHORT(-0.97)[-0.973,0];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]; REPLY(-4.00)[]
X-Mailman-Approved-At: Sun, 03 Mar 2019 22:45:01 +0000
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Mar 2019 18:29:54 -0000

On Sun, 3 Mar 2019, Konstantin Belousov wrote:

> On Mon, Mar 04, 2019 at 12:32:12AM +1100, Bruce Evans wrote:
>> On Sun, 3 Mar 2019, Konstantin Belousov wrote:
>>
>>> On Sun, Mar 03, 2019 at 04:43:20AM +1100, Bruce Evans wrote:
>>>> On Sat, 2 Mar 2019, Konstantin Belousov wrote:
>>>>
>>>>> On Sun, Mar 03, 2019 at 12:03:18AM +1100, Bruce Evans wrote:
>>>>>> On Sat, 2 Mar 2019, Konstantin Belousov wrote:
> * ...
>>>> Yes, that was its point.  It is a bit annoying to have a hardware
>>>> timecounter like the TSC that doesn't wrap naturally, but then make it
>>>> wrap by masking high bits.
>>>>
>>>> The masking step is also a bit wasteful.  For the TSC, it is 1 step to
>>>> discard high bids at the register level, then another step to apply the
>>>> nask to discard th high bits again.
>>> rdtsc-low is implemented in the natural way, after RDTSC, no register
>>> combining into 64bit value is done, instead shrd operates on %edx:%eax
>>> to get the final result into %eax.  I am not sure what you refer to.
>>
>> I was referring mostly to the masking step '& tc->tc_counter_mask' and
>> the lack of register combining in rdtsc().
>>
>> However, shrd in rdtsc-low (tsc_get_timecount_low()) does a slow combining
>> step.  i386 used to be faster here -- the first masking step of discarding
>> %edx doesn't take any code.  amd64 has to mask out the top bits in %rax.
>> Now for the tsc-low pessimization, i386 has to do a slow shrd, and amd64
>> has to do a not so slow shr.
> i386 cannot discard %edx after RDTSC since some bits from %edx come into
> the timecounter value.

These bits are part of the tsc-low pessimization.  The shift count should
always be 1, giving a TSC frequency of > INT32_MAX (usually) and > UINT32_MAX
sometimes.

When tsc-low was new, the shift count was often larger (as much as 8),
and it is still changeable by a read-only tunable, but now it is 1 in
almost all cases.  The code only limits the timecounter frequency
to UINT_MAX, except the tunable defaults to 1 so average CPUs running
at nearly 4 GHz are usually limited to about 2 GHz.  The comment about
this UINT_MAX doesn't match the code.  The comment says int, but the
code says UINT.

All that a shoft count of 1 does is waste time to lose 1 bit of accuracy.
This much accuracy is noise for most purposes.

The tunable is fairly undocumented.  Its description is "Shift to apply
for the maximum TSC frequency".  Of course, it has no effect on the TSC
frequency.  It only affects the TSC timecounter frequency.

The cputicker normally uses the TSC without even an lfence.  This use
only has to be monotonic per-CPU, so this is OK.  Also, any bugs hidden
by discarding low bits shouldn't show up per-CPU.  However, keeping
the cputicker below 4G actually has some efficiency advantages.  For
timecounters, there are no multiplications or divisions by the frequency
in the fast path, but cputicker use isn't so optimized and it does a
slow 64-bit division in cputick2usec().  Keeping cpu_tick_freqency
below UINT_MAX allows dividing by it in integer arithmetic in some cases,
This optimization is not done.

> amd64 cannot either, but amd64 does not need to mask out top bits in %rax,
> since the whole shrdl calculation occurs in 32bit registers, and the result
> is in %rax where top word is cleared by shrdl instruction automatically.
> But the clearing is not required since result is unsigned int anyway.
>
> Dissassemble of tsc_get_timecount_low() is very clear:
>   0xffffffff806767e4 <+4>:     mov    0x30(%rdi),%ecx
>   0xffffffff806767e7 <+7>:     rdtsc
>   0xffffffff806767e9 <+9>:     shrd   %cl,%edx,%eax
> ...
>   0xffffffff806767ed <+13>:    retq
> (I removed frame manipulations).

It would without the shift pessimization, since the function returns uint32_t
but rdtsc() gives uint64_t.  Removing the top bits is not needed since
tc_delta() removes them again, but the API doesn't allow expressing this.

Without the shift pessimization, we just do rdtsc() in all cases and don't
need this function call.  I think this is about 5-10 cycles faster after
some parallelism.

>>>> I prefer my way of writing this in 3 lines.  Modifying 'scale' for
>>>> the next step is especially ugly and pessimal when the next step is
>>>> in the caller and this function is not inlined.
>>> Can you show exactly what do you want ?
>>
>> Just write 'scale & 0xffffffff' for the low bits of 'scale' in callers,
>> and don't pass 'scale' indirectly to bintime_helper() and don't modify
>> it there.
>>
>> Oops, there is a problem.  'scale' must be reduced iff bintime_helper()
>> was used.  Duplicate some source code so as to not need a fall-through
>> to the fast path.  See below.
> Yes, this is the reason why it is passed by pointer (C has no references).

The indirection is slow no matter how it is spelled, unless it is inlined
away.

>>> diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
>>> index 2656fb4d22f..6c41ab22288 100644
>>> --- a/sys/kern/kern_tc.c
>>> +++ b/sys/kern/kern_tc.c
>>> @@ -72,6 +71,7 @@ struct timehands {
>>> 	struct timecounter	*th_counter;
>>> 	int64_t			th_adjustment;
>>> 	uint64_t		th_scale;
>>> +	uint64_t		th_large_delta;
>>> 	u_int	 		th_offset_count;
>>> 	struct bintime		th_offset;
>>> 	struct bintime		th_bintime;
>>> @@ -351,17 +351,45 @@ fbclock_getmicrotime(struct timeval *tvp)
>>> 	} while (gen == 0 || gen != th->th_generation);
>>> }
>>> #else /* !FFCLOCK */
>>> +
>>> +static void
>>
>> Add __inline.  This is in the fast path for 32-bit systems.
> Compilers do not need this hand-holding, and I prefer to avoid __inline
> unless really necessary.  I checked with both clang 7.0 and gcc 8.3
> that autoinlining did occured.

But they do.  I don't use either of these compilers, and turn of inlining
as much as possible anyway using -fno-inline -fno-inline-functions-called-
once (this is very broken in clang -- -fno-inline turns off inlining of
even functions declared as __inline (like curthread), and clang doesn't
support -fno-inline -fno-inline-functions-called-once.

>> ...
>> Similarly in bintime().
> I merged two functions, finally.  Having to copy the same code is too
> annoying for this change.
>
> So I verified that:
> - there is no 64bit multiplication in the generated code, for i386 both
>  for clang 7.0 and gcc 8.3;
> - that everything is inlined, the only call from bintime/binuptime is
>  the indirect call to get the timecounter value.

I will have to fix it for compilers that I use.

> diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
> index 2656fb4d22f..0fd39e25058 100644
> --- a/sys/kern/kern_tc.c
> +++ b/sys/kern/kern_tc.c
+ ...
> +static void
> +binnouptime(struct bintime *bt, u_int off)
> {
> 	struct timehands *th;
> -	u_int gen;
> +	struct bintime *bts;
> +	uint64_t scale;
> +	u_int delta, gen;
>
> 	do {
> 		th = timehands;
> 		gen = atomic_load_acq_int(&th->th_generation);
> -		*bt = th->th_offset;
> -		bintime_addx(bt, th->th_scale * tc_delta(th));
> +		bts = (struct bintime *)(vm_offset_t)th + off;

I don't like the merging.  It obscures the code with conversions like this.

> +		*bt = *bts;
> +		scale = th->th_scale;
> +		delta = tc_delta(th);
> +#ifdef _LP64
> +		if (__predict_false(th->th_large_delta <= delta)) {
> +			/* Avoid overflow for scale * delta. */
> +			bintime_helper(bt, scale, delta);
> +			bintime_addx(bt, (scale & 0xffffffff) * delta);
> +		} else {
> +			bintime_addx(bt, scale * delta);
> +		}
> +#else
> +		/*
> +		 * Use bintime_helper() unconditionally, since the fast
> +		 * path in the above method is not so fast here, since
> +		 * the 64 x 32 -> 64 bit multiplication is usually not
> +		 * available in hardware and emulating it using 2
> +		 * 32 x 32 -> 64 bit multiplications uses code much
> +		 * like that in bintime_helper().
> +		 */
> +		bintime_helper(bt, scale, delta);
> +		bintime_addx(bt, (uint64_t)(uint32_t)scale * delta);
> +#endif

Check that this method is really better.  Without this, the complicated
part is about half as large and duplicating it is smaller than this
version.

> @@ -387,16 +430,8 @@ microuptime(struct timeval *tvp)
> void
> bintime(struct bintime *bt)
> {
> -	struct timehands *th;
> -	u_int gen;
>
> -	do {
> -		th = timehands;
> -		gen = atomic_load_acq_int(&th->th_generation);
> -		*bt = th->th_bintime;
> -		bintime_addx(bt, th->th_scale * tc_delta(th));
> -		atomic_thread_fence_acq();
> -	} while (gen == 0 || gen != th->th_generation);

Duplicating this loop is much better than obfuscating it using inline
functions.  This loop was almost duplicated (except for the delta
calculation) in no less than 17 functions in kern_tc.c (9 tc ones and
8 fflock ones).  Now it is only duplicated 16 times.

> +	binnouptime(bt, __offsetof(struct timehands, th_bintime));
> }
>
> void

Bruce

From owner-freebsd-hackers@freebsd.org  Mon Mar  4 09:40:32 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7E6FE1509777
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Mon,  4 Mar 2019 09:40:32 +0000 (UTC)
 (envelope-from marklmi@yahoo.com)
Received: from sonic309-20.consmr.mail.ne1.yahoo.com
 (sonic309-20.consmr.mail.ne1.yahoo.com [66.163.184.146])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id D66448FD77
 for <freebsd-hackers@freebsd.org>; Mon,  4 Mar 2019 09:40:29 +0000 (UTC)
 (envelope-from marklmi@yahoo.com)
X-YMail-OSG: l_LH1.sVM1kq1w61EeTqzGXhGfMI5E8B25qx4egeP4xLq0SeNmBAQYIGYRTjCb3
 D.pJ7qyIX2j6HvKkuC_l9huJ6ImJfvdmUT75tYr0FfCDXz5zb3yeqzfDzKADbRxu2JMG4Y7oDZVw
 bfUBjZawDv5PUm.Dt0gECgyLSRl4yPBGTXnhiqjYQt9_Qhl2CiUyBiPkCZx8sR98onE84H5FbYBk
 8o508qJc3I7ADmQfprKhnWl9mGrRnbvTN60LBINS46IXxXIMBXHaK8qJoPrd2mn7KdiLbxCY0bIq
 H.rLzlvKMFULcGwLrVGn3SqR4mTMqzPzvUH8fSgF5Jq.5Ntrm88CR9jG4VNm7GOH.OjdDACkP1rN
 poSBtZqLH3Ne5I81HCAcc10YgKqfDV3QPc_LT9zSl.F5qYr0uL9A1AlCrrIavaXUCZJQyT_z8Uqv
 56fe2Ugm4elEc3MD7XgKXFKZRF5zHgVtMugLMCuyvnw2DIR14FHzk9vN8b0YBIuG8ys7vQb8oKav
 HnTQ4yko6I4eWqKEKjnOz07tvF409LKHxuhPHl0Ga42kyLTsFLmMcjWDkLqjisv2yc.hyUbA1re.
 lPTJa9o1u6XODGfre2ypNFW5ftObUmR..CjOciWoWp6QZO.odCOSS_cuxkaBqoE18OrCFSmUvU.i
 GvSHD__0vzSYwN_O0NJiz3iHHX1JhZAeHu1u87fs77hL4kgMf2KYdL9DuHGg8YXsVU4X7n.goMBb
 O95iSehu1BA4OqrU_Vfku8MzRXGYI09Kk3w9W4z0XnKqi_PrDnJ8Bex6tGlREL5crikAfPzjXZ5D
 3TJsS6bZzcc_pKQ6ky9fl09I63EZEpRZzUo_S6nm71aoH69lQlcfhJQ3O7i0n80u0SpKBJlfxE5N
 KhOcQPDur7Tx_QkV6r5zR9004OOmSctivkWo6hLOzCCY046.8u5jD9djBp1_1.8du51oKFimrwEg
 bZnbqTxPeOQVebYBEa6kccai0C1SP82retwGWuE_unmdvLmO2jTBeOs13a05CHOJkYu7yJdiMeqf
 3tAP9LCQ7tdvDaYGVgwvQ8XXFpHpoX69_QnOJmwz4CA--
Received: from sonic.gate.mail.ne1.yahoo.com by
 sonic309.consmr.mail.ne1.yahoo.com with HTTP; Mon, 4 Mar 2019 09:40:23 +0000
Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.115])
 ([67.170.167.181])
 by smtp413.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID
 d5eeb14818ac1606459c94027e379899; 
 Mon, 04 Mar 2019 09:40:19 +0000 (UTC)
From: Mark Millard <marklmi@yahoo.com>
Content-Type: text/plain;
	charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\))
Subject: Re: powerpc64 on PowerMac G5 4-core (system total): a hack that so
 far seem to avoid the stuck-sleeping issue [self-hosted
 buildworld/buildkernel completed]
Date: Mon, 4 Mar 2019 01:40:18 -0800
References: <B898BF60-2872-4FFC-AD72-A32591BC7D20@yahoo.com>
 <76E8BF75-A8F5-4A48-9B7C-6494F4A9520B@yahoo.com>
To: FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>,
 Mark Millard via freebsd-hackers <freebsd-hackers@freebsd.org>
In-Reply-To: <76E8BF75-A8F5-4A48-9B7C-6494F4A9520B@yahoo.com>
Message-Id: <75A8BB07-3273-423E-9436-798395BC8640@yahoo.com>
X-Mailer: Apple Mail (2.3445.102.3)
X-Rspamd-Queue-Id: D66448FD77
X-Spamd-Bar: +++
X-Spamd-Result: default: False [3.37 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[];
 R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[];
 FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3];
 TO_DN_ALL(0.00)[];
 MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net];
 DKIM_TRACE(0.00)[yahoo.com:+]; RCPT_COUNT_TWO(0.00)[2];
 DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject];
 FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[];
 MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com];
 ASN(0.00)[asn:36646, ipnet:66.163.184.0/21, country:US];
 MID_RHS_MATCH_FROM(0.00)[];
 DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0];
 ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048];
 FROM_HAS_DN(0.00)[]; NEURAL_SPAM_SHORT(0.96)[0.960,0];
 MIME_GOOD(-0.10)[text/plain];
 IP_SCORE(1.28)[ip: (4.16), ipnet: 66.163.184.0/21(1.29), asn: 36646(1.04),
 country: US(-0.07)]; NEURAL_SPAM_MEDIUM(0.75)[0.754,0];
 TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(0.89)[0.886,0];
 RCVD_IN_DNSWL_NONE(0.00)[146.184.163.66.list.dnswl.org : 127.0.5.0]
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Mar 2019 09:40:32 -0000

[I did some testing of other figures than testing for < 0x10.]

On 2019-Mar-3, at 13:23, Mark Millard <marklmi at yahoo.com> wrote:

> [So far the hack has been successful. Details given later
> below.]
>=20
> On 2019-Mar-2, at 21:20, Mark Millard <marklmi at yahoo.com> wrote:
>=20
>> [This note goes in a different direction compared to my
>> prior evidence report for overflows and the later activity
>> that has been happening for it. This does *not* involve
>> the patches associated with that report.]
>>=20
>> I view the following as an evidence-gathering hack:
>> showing the change in behavior with the code changes,
>> not as directly what FreeBSD should do for powerpc64.
>> In code for defined(__powerpc64__) && defined(AIM)
>> I freely use knowledge of the PowerMac G5 context
>> instead of attempting general code.
>>=20
>> Also: the code is set up to record some information
>> that I've been looking at via ddb. The recording is
>> not part of what changes the behavior but I decided
>> to show that code too.
>>=20
>> It is preliminary, but, so far, the hack has avoided
>> buf*daemon* threads and pmac_thermal getting stuck
>> sleeping (or, at least, far less frequently).
>>=20
>>=20
>> The tbr-value hack:
>>=20
>> =46rom what I see the G5 various cores have each tbr running at the
>> same rate but have some some offsets as far as the base time
>> goes. cpu_mp_unleash does:
>>=20
>>       ap_awake =3D 1;
>>=20
>>       /* Provide our current DEC and TB values for APs */
>>       ap_timebase =3D mftb() + 10;
>>       __asm __volatile("msync; isync");
>>=20
>>       /* Let APs continue */
>>       atomic_store_rel_int(&ap_letgo, 1);
>>=20
>>       platform_smp_timebase_sync(ap_timebase, 0);
>>=20
>> and machdep_ap_bootstrap does:
>>=20
>>       /*
>>        * Set timebase as soon as possible to meet an implicit =
rendezvous
>>        * from cpu_mp_unleash(), which sets ap_letgo and then =
immediately
>>        * sets timebase.
>>        *
>>        * Note that this is instrinsically racy and is only relevant =
on
>>        * platforms that do not support better mechanisms.
>>        */
>>       platform_smp_timebase_sync(ap_timebase, 1);
>>=20
>>=20
>> which attempts to set the tbrs appropriately.
>>=20
>> But on small scales of differences the various tbr
>> values from different cpus end up not well ordered
>> relative to time, synchronizes with, and the like.
>> Only large enough differences can well indicate an
>> ordering of interest.
>>=20
>> Note: tc->tc_get_timecount(tc) only provides the
>> least signficant 32 bits of the tbr value.
>> th->th_offset_count is also 32 bits and based on
>> truncated tbr values.
>>=20
>> So I made binuptime avoid finishing when it sees
>> a small (<0x10) step backwards for a new
>> tc->tc_get_timecount(tc) value vs. the existing
>> th->th_offset_count value (values strongly tied
>> to powerpc64 tbr values):
>>=20
>> void
>> binuptime(struct bintime *bt)
>> {
>>       struct timehands *th;
>>       u_int gen;
>>=20
>>       struct bintime old_bt=3D *bt; // HACK!!!
>>       struct timecounter *tc; // HACK!!!
>>       u_int tim_cnt, tim_offset, tim_diff; // HACK!!!
>>       uint64_t freq, scale_factor, diff_scaled; // HACK!!!
>>=20
>>       u_int try_cnt=3D 0ull; // HACK!!!
>>=20
>>       do {
>>               do { // HACK!!!
>>                   th =3D timehands;
>>                   tc =3D th->th_counter;
>>                   gen =3D atomic_load_acq_int(&th->th_generation);
>>                   tim_cnt=3D tc->tc_get_timecount(tc);
>>                   tim_offset=3D th->th_offset_count;
>>               } while (tim_cnt<tim_offset && =
tim_offset-tim_cnt<0x10);
>>               *bt =3D th->th_offset;
>>               tim_diff=3D (tim_cnt - tim_offset) & =
tc->tc_counter_mask;
>>               scale_factor=3D th->th_scale;
>>               diff_scaled=3D scale_factor * tim_diff;
>>               bintime_addx(bt, diff_scaled);
>>               freq=3D tc->tc_frequency;
>>               atomic_thread_fence_acq();
>>               try_cnt++;
>>       } while (gen =3D=3D 0 || gen !=3D th->th_generation);
>>=20
>>       if (*(volatile uint64_t*)0xc000000000000020=3D=3D0u && =
(0xffffffffffffffffull/scale_factor)<tim_diff) { // HACK!!!
>>               *(volatile uint64_t*)0xc000000000000020=3D =
bttosbt(old_bt);
>>               *(volatile uint64_t*)0xc000000000000028=3D =
bttosbt(*bt);
>>               *(volatile uint64_t*)0xc000000000000030=3D freq;
>>               *(volatile uint64_t*)0xc000000000000038=3D =
scale_factor;
>>               *(volatile uint64_t*)0xc000000000000040=3D tim_offset;
>>               *(volatile uint64_t*)0xc000000000000048=3D tim_cnt;
>>               *(volatile uint64_t*)0xc000000000000050=3D tim_diff;
>>               *(volatile uint64_t*)0xc000000000000058=3D try_cnt;
>>               *(volatile uint64_t*)0xc000000000000060=3D diff_scaled;
>>               *(volatile uint64_t*)0xc000000000000068=3D =
scale_factor*freq;
>>               __asm__ ("sync");
>>       } else if (*(volatile uint64_t*)0xc0000000000000a0=3D=3D0u && =
(0xffffffffffffffffull/scale_factor)<tim_diff) { // HACK!!!
>>               *(volatile uint64_t*)0xc0000000000000a0=3D =
bttosbt(old_bt);
>>               *(volatile uint64_t*)0xc0000000000000a8=3D =
bttosbt(*bt);
>>               *(volatile uint64_t*)0xc0000000000000b0=3D freq;
>>               *(volatile uint64_t*)0xc0000000000000b8=3D =
scale_factor;
>>               *(volatile uint64_t*)0xc0000000000000c0=3D tim_offset;
>>               *(volatile uint64_t*)0xc0000000000000c8=3D tim_cnt;
>>               *(volatile uint64_t*)0xc0000000000000d0=3D tim_diff;
>>               *(volatile uint64_t*)0xc0000000000000d8=3D try_cnt;
>>               *(volatile uint64_t*)0xc0000000000000e0=3D diff_scaled;
>>               *(volatile uint64_t*)0xc0000000000000e8=3D =
scale_factor*freq;
>>               __asm__ ("sync");
>>       }
>> }
>> #else
>> . . .
>> #endif
>>=20
>> So far as I can tell, the FreeBSD code is not designed to deal
>> with small differences in tc->tc_get_timecount(tc) not actually
>> indicating a useful < vs. =3D=3D vs. > ordering relation uniquely.
>>=20
>> (I make no claim that the hack is a proper way to deal with
>> such.)
>=20
> I did a somewhat over 7 hours buildworld buildkernel on the
> PowerMac G5. Overall the G5 has been up over 13 hours and
> none of the buf*daemon* threads have gotten stuck sleeping.
> Nor has pmac_thermal gotten stuck. Similarly for vnlru
> and syncer: "top -HIStopid" still shows them all as
> periodically active.
>=20
> Previously for this usefdt=3D1 context (with the modern
> VM_MAX_KERNEL_ADDRESS), going more than a few minutes
> without at least one of those threads getting stuck
> sleeping was rare on the G5 (powerpc64 example).
>=20
> So this hack has managed to avoid finding sbinuptime()
> in sleepq_timeout being less than the earlier (by call
> structure/code sequencing) sbinuptime() in timercb that
> lead to the sleepq_timeout callout being called in the
> first place.
>=20
> So in the sleepq_timeout callout's:
>=20
>        if (td->td_sleeptimo > sbinuptime() || td->td_sleeptimo =3D=3D =
0) {
>                /*
>                 * The thread does not want a timeout (yet).
>                 */
>        } else . . .
>=20
> td->td_sleeptimo > sbinuptime() ends up false now for small
> enough original differences.
>=20
> This case does not set up another timeout, it just leaves the
> thread stuck sleeping, no longer doing periodic activities.
>=20
> As stands what I did (presuming an appropriate definition
> of "small differences in the problematical direction") should
> leave this and other sbinuptime-using code with:
>=20
> td->td_sleeptimo <=3D sbinuptime()
>=20
> for what were originally "small" tbr value differences in the
> problematical direction (in case other places require it in
> some way).
>=20
> If, instead, just sleepq_timeout's test could allow for
> some slop in the ordering, it could be a cheaper hack then
> looping in binuptime .
>=20
> At this point I've no clue what a correct/efficient FreeBSD
> design for allowing the sloppy match across tbr's for different
> CPUs would be.

Instead of 0x10 in "&& tim_offset-tim_cnt<0x10" I tried
the each of following and they all failed:

&& tim_offset-tim_cnt<0x2
&& tim_offset-tim_cnt<0x4
&& tim_offset-tim_cnt<0x8
&& tim_offset-tim_cnt<0xc

0x2, 0x4, and 0x8 failed for the first boot attempt,
almost mediately having stuck-in-sleep threads.

0xc seemed to be working for the first boot (including
a buildworld buildkernel that did not have to rebuild
much). But the 2nd boot attempt had a stuck-in-sleep
thread by the time I logged in.

By contrast, for:

&& tim_offset-tim_cnt<0x10

I've not it fail so far, after many reboots, a full
buildworld buildkernel, and running over 24 hours
(that included the somewhat over 7 hours for build
world buildkernel). But it might be that some boots
would need a bigger figure.


=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)


From owner-freebsd-hackers@freebsd.org  Mon Mar  4 11:42:00 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 82D9F150DC4D;
 Mon,  4 Mar 2019 11:42:00 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id C4A1C95862;
 Mon,  4 Mar 2019 11:41:59 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x24BfplY084864
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Mon, 4 Mar 2019 13:41:54 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x24BfplY084864
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id x24BfopB084863;
 Mon, 4 Mar 2019 13:41:50 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Mon, 4 Mar 2019 13:41:50 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Bruce Evans <brde@optusnet.com.au>
Cc: Mark Millard <marklmi@yahoo.com>,
 freebsd-hackers Hackers <freebsd-hackers@freebsd.org>,
 FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale *
 tc_delta(th) overflows unsigned 64 bits sometimes [patched failed]
Message-ID: <20190304114150.GM68879@kib.kiev.ua>
References: <20190301194217.GB68879@kib.kiev.ua>
 <20190302071425.G5025@besplex.bde.org>
 <20190302105140.GC68879@kib.kiev.ua>
 <20190302225513.W3408@besplex.bde.org>
 <20190302142521.GE68879@kib.kiev.ua>
 <20190303041441.V4781@besplex.bde.org>
 <20190303111931.GI68879@kib.kiev.ua>
 <20190303223100.B3572@besplex.bde.org>
 <20190303161635.GJ68879@kib.kiev.ua>
 <20190304043416.V5640@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20190304043416.V5640@besplex.bde.org>
User-Agent: Mutt/1.11.3 (2019-02-01)
X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM,
 NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Mar 2019 11:42:00 -0000

On Mon, Mar 04, 2019 at 05:29:48AM +1100, Bruce Evans wrote:
> On Sun, 3 Mar 2019, Konstantin Belousov wrote:
> 
> > On Mon, Mar 04, 2019 at 12:32:12AM +1100, Bruce Evans wrote:
> >> On Sun, 3 Mar 2019, Konstantin Belousov wrote:
> >>
> >>> On Sun, Mar 03, 2019 at 04:43:20AM +1100, Bruce Evans wrote:
> >>>> On Sat, 2 Mar 2019, Konstantin Belousov wrote:
> >>>>
> >>>>> On Sun, Mar 03, 2019 at 12:03:18AM +1100, Bruce Evans wrote:
> >>>>>> On Sat, 2 Mar 2019, Konstantin Belousov wrote:
> > * ...
> >>>> Yes, that was its point.  It is a bit annoying to have a hardware
> >>>> timecounter like the TSC that doesn't wrap naturally, but then make it
> >>>> wrap by masking high bits.
> >>>>
> >>>> The masking step is also a bit wasteful.  For the TSC, it is 1 step to
> >>>> discard high bids at the register level, then another step to apply the
> >>>> nask to discard th high bits again.
> >>> rdtsc-low is implemented in the natural way, after RDTSC, no register
> >>> combining into 64bit value is done, instead shrd operates on %edx:%eax
> >>> to get the final result into %eax.  I am not sure what you refer to.
> >>
> >> I was referring mostly to the masking step '& tc->tc_counter_mask' and
> >> the lack of register combining in rdtsc().
> >>
> >> However, shrd in rdtsc-low (tsc_get_timecount_low()) does a slow combining
> >> step.  i386 used to be faster here -- the first masking step of discarding
> >> %edx doesn't take any code.  amd64 has to mask out the top bits in %rax.
> >> Now for the tsc-low pessimization, i386 has to do a slow shrd, and amd64
> >> has to do a not so slow shr.
> > i386 cannot discard %edx after RDTSC since some bits from %edx come into
> > the timecounter value.
> 
> These bits are part of the tsc-low pessimization.  The shift count should
> always be 1, giving a TSC frequency of > INT32_MAX (usually) and > UINT32_MAX
> sometimes.
> 
> When tsc-low was new, the shift count was often larger (as much as 8),
> and it is still changeable by a read-only tunable, but now it is 1 in
> almost all cases.  The code only limits the timecounter frequency
> to UINT_MAX, except the tunable defaults to 1 so average CPUs running
> at nearly 4 GHz are usually limited to about 2 GHz.  The comment about
> this UINT_MAX doesn't match the code.  The comment says int, but the
> code says UINT.
> 
> All that a shoft count of 1 does is waste time to lose 1 bit of accuracy.
> This much accuracy is noise for most purposes.
> 
> The tunable is fairly undocumented.  Its description is "Shift to apply
> for the maximum TSC frequency".  Of course, it has no effect on the TSC
> frequency.  It only affects the TSC timecounter frequency.
I suspect that the shift of 1 (at least) hides cross-socket inaccuracy.
Otherwise, I think, some multi-socket machines would start showing the
detectable backward-counting bintime().  At the frequencies at 4GHz and
above (Intel has 5Ghz part numbers) I do not think that stability of
100MHz crystall and on-board traces is enough to avoid that.

We can try to set the tsc-low shift count to 0 (but keep lfence) and see
what is going on in HEAD, but I am afraid that the HEAD users population
is not representative enough to catch the issue with the certainity.
More, it is unclear to me how to diagnose the cause, e.g. I would expect
the sleeps to hang on timeouts, as was reported from the very beginning
of this thread. How would we root-cause it ?

> 
> The cputicker normally uses the TSC without even an lfence.  This use
> only has to be monotonic per-CPU, so this is OK.  Also, any bugs hidden
> by discarding low bits shouldn't show up per-CPU.  However, keeping
> the cputicker below 4G actually has some efficiency advantages.  For
> timecounters, there are no multiplications or divisions by the frequency
> in the fast path, but cputicker use isn't so optimized and it does a
> slow 64-bit division in cputick2usec().  Keeping cpu_tick_freqency
> below UINT_MAX allows dividing by it in integer arithmetic in some cases,
> This optimization is not done.
> 
> > amd64 cannot either, but amd64 does not need to mask out top bits in %rax,
> > since the whole shrdl calculation occurs in 32bit registers, and the result
> > is in %rax where top word is cleared by shrdl instruction automatically.
> > But the clearing is not required since result is unsigned int anyway.
> >
> > Dissassemble of tsc_get_timecount_low() is very clear:
> >   0xffffffff806767e4 <+4>:     mov    0x30(%rdi),%ecx
> >   0xffffffff806767e7 <+7>:     rdtsc
> >   0xffffffff806767e9 <+9>:     shrd   %cl,%edx,%eax
> > ...
> >   0xffffffff806767ed <+13>:    retq
> > (I removed frame manipulations).
> 
> It would without the shift pessimization, since the function returns uint32_t
> but rdtsc() gives uint64_t.  Removing the top bits is not needed since
> tc_delta() removes them again, but the API doesn't allow expressing this.
> 
> Without the shift pessimization, we just do rdtsc() in all cases and don't
> need this function call.  I think this is about 5-10 cycles faster after
> some parallelism.
> 
> >>>> I prefer my way of writing this in 3 lines.  Modifying 'scale' for
> >>>> the next step is especially ugly and pessimal when the next step is
> >>>> in the caller and this function is not inlined.
> >>> Can you show exactly what do you want ?
> >>
> >> Just write 'scale & 0xffffffff' for the low bits of 'scale' in callers,
> >> and don't pass 'scale' indirectly to bintime_helper() and don't modify
> >> it there.
> >>
> >> Oops, there is a problem.  'scale' must be reduced iff bintime_helper()
> >> was used.  Duplicate some source code so as to not need a fall-through
> >> to the fast path.  See below.
> > Yes, this is the reason why it is passed by pointer (C has no references).
> 
> The indirection is slow no matter how it is spelled, unless it is inlined
> away.
> 
> >>> diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
> >>> index 2656fb4d22f..6c41ab22288 100644
> >>> --- a/sys/kern/kern_tc.c
> >>> +++ b/sys/kern/kern_tc.c
> >>> @@ -72,6 +71,7 @@ struct timehands {
> >>> 	struct timecounter	*th_counter;
> >>> 	int64_t			th_adjustment;
> >>> 	uint64_t		th_scale;
> >>> +	uint64_t		th_large_delta;
> >>> 	u_int	 		th_offset_count;
> >>> 	struct bintime		th_offset;
> >>> 	struct bintime		th_bintime;
> >>> @@ -351,17 +351,45 @@ fbclock_getmicrotime(struct timeval *tvp)
> >>> 	} while (gen == 0 || gen != th->th_generation);
> >>> }
> >>> #else /* !FFCLOCK */
> >>> +
> >>> +static void
> >>
> >> Add __inline.  This is in the fast path for 32-bit systems.
> > Compilers do not need this hand-holding, and I prefer to avoid __inline
> > unless really necessary.  I checked with both clang 7.0 and gcc 8.3
> > that autoinlining did occured.
> 
> But they do.  I don't use either of these compilers, and turn of inlining
> as much as possible anyway using -fno-inline -fno-inline-functions-called-
> once (this is very broken in clang -- -fno-inline turns off inlining of
> even functions declared as __inline (like curthread), and clang doesn't
> support -fno-inline -fno-inline-functions-called-once.
> 
> >> ...
> >> Similarly in bintime().
> > I merged two functions, finally.  Having to copy the same code is too
> > annoying for this change.
> >
> > So I verified that:
> > - there is no 64bit multiplication in the generated code, for i386 both
> >  for clang 7.0 and gcc 8.3;
> > - that everything is inlined, the only call from bintime/binuptime is
> >  the indirect call to get the timecounter value.
> 
> I will have to fix it for compilers that I use.
Ok, I will add __inline.

> 
> > diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
> > index 2656fb4d22f..0fd39e25058 100644
> > --- a/sys/kern/kern_tc.c
> > +++ b/sys/kern/kern_tc.c
> + ...
> > +static void
> > +binnouptime(struct bintime *bt, u_int off)
> > {
> > 	struct timehands *th;
> > -	u_int gen;
> > +	struct bintime *bts;
> > +	uint64_t scale;
> > +	u_int delta, gen;
> >
> > 	do {
> > 		th = timehands;
> > 		gen = atomic_load_acq_int(&th->th_generation);
> > -		*bt = th->th_offset;
> > -		bintime_addx(bt, th->th_scale * tc_delta(th));
> > +		bts = (struct bintime *)(vm_offset_t)th + off;
> 
> I don't like the merging.  It obscures the code with conversions like this.
> 
> > +		*bt = *bts;
> > +		scale = th->th_scale;
> > +		delta = tc_delta(th);
> > +#ifdef _LP64
> > +		if (__predict_false(th->th_large_delta <= delta)) {
> > +			/* Avoid overflow for scale * delta. */
> > +			bintime_helper(bt, scale, delta);
> > +			bintime_addx(bt, (scale & 0xffffffff) * delta);
> > +		} else {
> > +			bintime_addx(bt, scale * delta);
> > +		}
> > +#else
> > +		/*
> > +		 * Use bintime_helper() unconditionally, since the fast
> > +		 * path in the above method is not so fast here, since
> > +		 * the 64 x 32 -> 64 bit multiplication is usually not
> > +		 * available in hardware and emulating it using 2
> > +		 * 32 x 32 -> 64 bit multiplications uses code much
> > +		 * like that in bintime_helper().
> > +		 */
> > +		bintime_helper(bt, scale, delta);
> > +		bintime_addx(bt, (uint64_t)(uint32_t)scale * delta);
> > +#endif
> 
> Check that this method is really better.  Without this, the complicated
> part is about half as large and duplicating it is smaller than this
> version.
Better in what sence ?  I am fine with the C code, and asm code looks
good.

> 
> > @@ -387,16 +430,8 @@ microuptime(struct timeval *tvp)
> > void
> > bintime(struct bintime *bt)
> > {
> > -	struct timehands *th;
> > -	u_int gen;
> >
> > -	do {
> > -		th = timehands;
> > -		gen = atomic_load_acq_int(&th->th_generation);
> > -		*bt = th->th_bintime;
> > -		bintime_addx(bt, th->th_scale * tc_delta(th));
> > -		atomic_thread_fence_acq();
> > -	} while (gen == 0 || gen != th->th_generation);
> 
> Duplicating this loop is much better than obfuscating it using inline
> functions.  This loop was almost duplicated (except for the delta
> calculation) in no less than 17 functions in kern_tc.c (9 tc ones and
> 8 fflock ones).  Now it is only duplicated 16 times.
How did you counted the 16 ?  I can see only 4 instances in the unpatched
kern_tc.c, and 3 in patched, but it is 3 and not 1 only because I do not
touch ffclock until the patch is finalized.  After that, it would be
1 instance for kernel and 1 for userspace.

> 
> > +	binnouptime(bt, __offsetof(struct timehands, th_bintime));
> > }
> >
> > void
> 
> Bruce

From owner-freebsd-hackers@freebsd.org  Mon Mar  4 15:33:05 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 551BD1516375
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Mon,  4 Mar 2019 15:33:05 +0000 (UTC)
 (envelope-from asomers@gmail.com)
Received: from mail-lj1-f173.google.com (mail-lj1-f173.google.com
 [209.85.208.173])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 3E7A96F6FD
 for <freebsd-hackers@freebsd.org>; Mon,  4 Mar 2019 15:33:04 +0000 (UTC)
 (envelope-from asomers@gmail.com)
Received: by mail-lj1-f173.google.com with SMTP id z20so4678600ljj.10
 for <freebsd-hackers@freebsd.org>; Mon, 04 Mar 2019 07:33:04 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=0xe9UuPxJzxXPTEcF365k/2KkWtbYwZnhc36PLa+oL0=;
 b=Q1oPNOY2BLj2kS6KZC7udLvnYDbUxRzF0evzbSF5P+NbUOXky2YMZNQPwVygJVcWw/
 Tz6giC0ng+XM3N6tnFo7odO2vIivHLDNCgR1VEiY0I0ZgkjcchXgGo5znNkzC2eCTYFi
 UiYRTIFoEaylJ01VuMEF5E7GdfB8BUriGVQzyKndIx/fGMctc6w+lCWAKsT/brEGzPWO
 hGyotD2f8sv3juV4bHmpNYPIpTdgqby0S0gZ/qZfM43Nc8CtZUn0L8etTMvsdFczmlop
 Qv3BZ3ItmD8mgjBxSY7RX2993bXqFGkTPtwShmnLiavf4XxOPBHc9SCeQWbvyG6mhpKy
 kunA==
X-Gm-Message-State: APjAAAWN6p3FMl1LuAV4+wWPxsnvZw2b769NmuEsCbJy8IEO6LjXHLl3
 1acb/AC00lE6vPCmv7OHFP8K5HhpTRzHAZKxCeo=
X-Google-Smtp-Source: APXvYqxAJYg9J3WRZbRBomVy7W4CKuzLu4zrUozWbU0V6+Gc8tmQ6XTzvhBp6/3+/AQzCdr7BnipKwcdqReX8+hDnmE=
X-Received: by 2002:a2e:1510:: with SMTP id s16mr10965232ljd.62.1551713078715; 
 Mon, 04 Mar 2019 07:24:38 -0800 (PST)
MIME-Version: 1.0
References: <CAOtMX2inYez8dXbmA5b1wj9Uhh_Nbp-gnFmtT_=T1mpWdyAUVw@mail.gmail.com>
 <20190303110346.GH68879@kib.kiev.ua>
In-Reply-To: <20190303110346.GH68879@kib.kiev.ua>
From: Alan Somers <asomers@freebsd.org>
Date: Mon, 4 Mar 2019 08:24:27 -0700
Message-ID: <CAOtMX2hkwYG_Db4pgb5HdXuMTa7UAS6bQ8pNAhhS45mmJsao3Q@mail.gmail.com>
Subject: Re: Adding namecache entries outside of vfs_lookup and vn_open ?
To: Konstantin Belousov <kostikbel@gmail.com>
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>
Content-Type: text/plain; charset="UTF-8"
X-Rspamd-Queue-Id: 3E7A96F6FD
X-Spamd-Bar: ---
Authentication-Results: mx1.freebsd.org;
 spf=pass (mx1.freebsd.org: domain of asomers@gmail.com designates
 209.85.208.173 as permitted sender) smtp.mailfrom=asomers@gmail.com
X-Spamd-Result: default: False [-3.26 / 15.00]; ARC_NA(0.00)[];
 NEURAL_HAM_MEDIUM(-0.99)[-0.994,0]; FROM_HAS_DN(0.00)[];
 R_SPF_ALLOW(-0.20)[+ip4:209.85.128.0/17];
 IP_SCORE(-1.29)[ip: (-0.51), ipnet: 209.85.128.0/17(-3.84), asn: 15169(-2.03),
 country: US(-0.07)]; MIME_GOOD(-0.10)[text/plain];
 PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org];
 DMARC_NA(0.00)[freebsd.org]; NEURAL_HAM_LONG(-1.00)[-1.000,0];
 MIME_TRACE(0.00)[0:+]; TO_DN_ALL(0.00)[];
 MX_GOOD(-0.01)[cached: alt3.gmail-smtp-in.l.google.com];
 RCPT_COUNT_TWO(0.00)[2];
 RCVD_IN_DNSWL_NONE(0.00)[173.208.85.209.list.dnswl.org : 127.0.5.0];
 NEURAL_HAM_SHORT(-0.97)[-0.970,0];
 TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_TLS_LAST(0.00)[];
 FORGED_SENDER(0.30)[asomers@freebsd.org,asomers@gmail.com];
 FREEMAIL_TO(0.00)[gmail.com]; R_DKIM_NA(0.00)[];
 FREEMAIL_ENVFROM(0.00)[gmail.com];
 ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US];
 FROM_NEQ_ENVFROM(0.00)[asomers@freebsd.org,asomers@gmail.com];
 SUBJECT_ENDS_QUESTION(1.00)[]; RCVD_COUNT_TWO(0.00)[2]
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Mar 2019 15:33:05 -0000

On Sun, Mar 3, 2019 at 4:03 AM Konstantin Belousov <kostikbel@gmail.com> wrote:
>
> On Sat, Mar 02, 2019 at 06:02:06PM -0700, Alan Somers wrote:
> > It looks like lookup and open are the only common vops that create new
> > namecache entries.  At least, those are the only ones that set
> > MAKEENTRY in the cn_flags field.  However, fuse(4)'s create-like
> > operations (FUSE_CREATE, FUSE_SYMLINK, etc) all return enough
> > information to create a namecache entry for the newly created file.
> > As-is, an operation like FUSE_CREATE will almost always be followed up
> > by a FUSE_LOOKUP, necessitating an extra round-trip to userland.
> In VFS, creation of the new file is done by VOP_CREATE() after negative
> VOP_LOOKUP().   VOP_CREATE() returns the new vnode that is installed into
> file.  [A flag VN_OPEN_NAMECACHE was added for vn_open_cred() which results
> in created name entry insertion into namecache.  It was done to handle
> very specific situation in core dump code, which is no longer relevant.
> The flag is still there.]
>
> Similar discussion occured some time ago.  I think that the current
> selection of the cases where namecache entry is created, is optimized
> for the scenario where extracting large tarball does not largely affect
> the non-directory elements of the cache.  If you do such extraction,
> it is unlikely that you will access most of the files shortly.

I don't understand this objection.  When you extract a tarball full of
non-empty files, don't you still need to open every file to write its
contents, creating a namecache entry for each one?

>
> > Would it be possible and wise to add these newly created entries to
> > the namecache automatically?
> Not from VFS, but the policy can be overriden by the filesystem by inserting
> the elements into cache from VOPs as it finds suitable.
>
> Does FUSE cache vnodes ?  I would find aggressive caching on the kernel
> side somewhat unexpected for it.
>

From owner-freebsd-hackers@freebsd.org  Mon Mar  4 15:42:24 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id BA68D15167E5
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Mon,  4 Mar 2019 15:42:24 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id E40246FCC3;
 Mon,  4 Mar 2019 15:42:23 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x24FgCt2067452
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Mon, 4 Mar 2019 17:42:15 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x24FgCt2067452
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id x24FgC70067451;
 Mon, 4 Mar 2019 17:42:12 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Mon, 4 Mar 2019 17:42:12 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Alan Somers <asomers@freebsd.org>
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject: Re: Adding namecache entries outside of vfs_lookup and vn_open ?
Message-ID: <20190304154212.GP68879@kib.kiev.ua>
References: <CAOtMX2inYez8dXbmA5b1wj9Uhh_Nbp-gnFmtT_=T1mpWdyAUVw@mail.gmail.com>
 <20190303110346.GH68879@kib.kiev.ua>
 <CAOtMX2hkwYG_Db4pgb5HdXuMTa7UAS6bQ8pNAhhS45mmJsao3Q@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAOtMX2hkwYG_Db4pgb5HdXuMTa7UAS6bQ8pNAhhS45mmJsao3Q@mail.gmail.com>
User-Agent: Mutt/1.11.3 (2019-02-01)
X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM,
 NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Mar 2019 15:42:24 -0000

On Mon, Mar 04, 2019 at 08:24:27AM -0700, Alan Somers wrote:
> On Sun, Mar 3, 2019 at 4:03 AM Konstantin Belousov <kostikbel@gmail.com> wrote:
> > Similar discussion occured some time ago.  I think that the current
> > selection of the cases where namecache entry is created, is optimized
> > for the scenario where extracting large tarball does not largely affect
> > the non-directory elements of the cache.  If you do such extraction,
> > it is unlikely that you will access most of the files shortly.
> 
> I don't understand this objection.  When you extract a tarball full of
> non-empty files, don't you still need to open every file to write its
> contents, creating a namecache entry for each one?
No, you don't.

Typically, when archiver parsed the stream and noted that there is a file
to create with a content, it
- opens the file, and gets the file descriptor returned to usermode.
  Internally, kernel does (vn_open_cred())
	namei() <- this call returns no vnode because the file is non-existent,
		   and does not create negative cache entry, see NOCACHE
		   argument for cn_flags.
	VOP_CREATE() <- creating the file, again not caching
	assign the vnode returned, to the file
- now the process has the descriptor for writes, but namecache entry is
  still not installed.
- content is written, file is closed.

From owner-freebsd-hackers@freebsd.org  Mon Mar  4 16:07:46 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3D749151843C
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Mon,  4 Mar 2019 16:07:46 +0000 (UTC)
 (envelope-from asomers@gmail.com)
Received: from mail-lj1-f196.google.com (mail-lj1-f196.google.com
 [209.85.208.196])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id C43F2709AD
 for <freebsd-hackers@freebsd.org>; Mon,  4 Mar 2019 16:07:45 +0000 (UTC)
 (envelope-from asomers@gmail.com)
Received: by mail-lj1-f196.google.com with SMTP id q128so4791366ljb.11
 for <freebsd-hackers@freebsd.org>; Mon, 04 Mar 2019 08:07:45 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=8+1TJBVjbUBvS7tBG9NXOZphS4GQaRY9lmI351kexVE=;
 b=cokLpiY0xQ3ht8M9pcHKKOCezhWhpWyG94DQApcLqIk6xuasDsPAMjh0k9rjMh4w+6
 Dl2ldk6Wcvi5Pc/nitNicwI/z7wzhqt+lO6ot6B5EL5Xc2YV78eO1HguP1YLPL/4dJpN
 Td+lYK9RjXg0pQQl8tCVdDHJsIVMF8iPS2VwPwazfY0mDpNrJZq1PFvh+p8dD37CEejk
 kC/SPA697sclZpdm3XcfyA6RcIBIyfQVhlQmhpNCvXPM4pslFIFwLdt3qxTHxbf/wuC4
 8VE7JUEg4d+URIVxNd5lbeUcu7IW90y6gQzrQWxGL6hqGK2ZkOSPRbqfK/uC5wv4dtSP
 LTQw==
X-Gm-Message-State: APjAAAXsvSkn6INoWoVyorWZKy1zjZoMtPPfuE/n+ZRL7SCa6IPz3szo
 H6nm/pBj+U1WKI2oQflzuzZpVbC0HBnlQPEZKX0=
X-Google-Smtp-Source: APXvYqxdwyciuOwvPOfSbKNTvvTyw4WTeYotf6KOGaNWO4JZhU9K8JcLg9UicM4KDXJZqzLPEXzsz5IFK+uCLTsIT+s=
X-Received: by 2002:a2e:1510:: with SMTP id s16mr11056159ljd.62.1551715195466; 
 Mon, 04 Mar 2019 07:59:55 -0800 (PST)
MIME-Version: 1.0
References: <CAOtMX2inYez8dXbmA5b1wj9Uhh_Nbp-gnFmtT_=T1mpWdyAUVw@mail.gmail.com>
 <20190303110346.GH68879@kib.kiev.ua>
 <CAOtMX2hkwYG_Db4pgb5HdXuMTa7UAS6bQ8pNAhhS45mmJsao3Q@mail.gmail.com>
 <20190304154212.GP68879@kib.kiev.ua>
In-Reply-To: <20190304154212.GP68879@kib.kiev.ua>
From: Alan Somers <asomers@freebsd.org>
Date: Mon, 4 Mar 2019 08:59:43 -0700
Message-ID: <CAOtMX2hYETCArxE_9qVcjj2dNr1LXdLKnU4yJ3u8ZrhfF_37cQ@mail.gmail.com>
Subject: Re: Adding namecache entries outside of vfs_lookup and vn_open ?
To: Konstantin Belousov <kostikbel@gmail.com>
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>
Content-Type: text/plain; charset="UTF-8"
X-Rspamd-Queue-Id: C43F2709AD
X-Spamd-Bar: ------
Authentication-Results: mx1.freebsd.org
X-Spamd-Result: default: False [-6.98 / 15.00];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]; REPLY(-4.00)[];
 NEURAL_HAM_SHORT(-0.98)[-0.985,0]
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Mar 2019 16:07:46 -0000

On Mon, Mar 4, 2019 at 8:42 AM Konstantin Belousov <kostikbel@gmail.com> wrote:
>
> On Mon, Mar 04, 2019 at 08:24:27AM -0700, Alan Somers wrote:
> > On Sun, Mar 3, 2019 at 4:03 AM Konstantin Belousov <kostikbel@gmail.com> wrote:
> > > Similar discussion occured some time ago.  I think that the current
> > > selection of the cases where namecache entry is created, is optimized
> > > for the scenario where extracting large tarball does not largely affect
> > > the non-directory elements of the cache.  If you do such extraction,
> > > it is unlikely that you will access most of the files shortly.
> >
> > I don't understand this objection.  When you extract a tarball full of
> > non-empty files, don't you still need to open every file to write its
> > contents, creating a namecache entry for each one?
> No, you don't.
>
> Typically, when archiver parsed the stream and noted that there is a file
> to create with a content, it
> - opens the file, and gets the file descriptor returned to usermode.
>   Internally, kernel does (vn_open_cred())
>         namei() <- this call returns no vnode because the file is non-existent,
>                    and does not create negative cache entry, see NOCACHE
>                    argument for cn_flags.
>         VOP_CREATE() <- creating the file, again not caching
>         assign the vnode returned, to the file
> - now the process has the descriptor for writes, but namecache entry is
>   still not installed.
> - content is written, file is closed.

Ok, that make sense.  So I guess the problem only really applies to
filetypes like symlinks that can't create-and-open.  But in the
tarball case, you wouldn't need to access the symlink again anyway.
-Alan

From owner-freebsd-hackers@freebsd.org  Mon Mar  4 16:07:48 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9C023151844E
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Mon,  4 Mar 2019 16:07:48 +0000 (UTC) (envelope-from ap00@mail.ru)
Received: from smtp16.mail.ru (smtp16.mail.ru [94.100.176.153])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id CF88F709AF
 for <freebsd-hackers@freebsd.org>; Mon,  4 Mar 2019 16:07:46 +0000 (UTC)
 (envelope-from ap00@mail.ru)
Received: by smtp16.mail.ru with esmtpa (envelope-from <ap00@mail.ru>)
 id 1h0q7r-0000lX-OC
 for freebsd-hackers@freebsd.org; Mon, 04 Mar 2019 19:07:36 +0300
Date: Mon, 4 Mar 2019 19:07:32 +0300
From: Anthony Pankov <ap00@mail.ru>
X-Priority: 3 (Normal)
Message-ID: <434119194.20190304190732@mail.ru>
To: freebsd-hackers@freebsd.org
Subject: building with WITHOUT_SSP side effect
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-77F55803: BBE463BEF7A60BD05A78504BD2AC294173B5FE5E8078F29671F281D99DEE432334F7C2E3C076D04EB2A465EF40EED4CF
X-7FA49CB5: 0D63561A33F958A576BBAA4014634BBA13B007AA445EFFC5FE0C38FC6E8DE41D8941B15DA834481FA18204E546F3947C744B801E316CB65FF6B57BC7E64490618DEB871D839B7333395957E7521B51C2545D4CF71C94A83E9FA2833FD35BB23D27C277FBC8AE2E8B3733B5EC72352B9FA471835C12D1D977C4224003CC8364767815B9869FA544D8D32BA5DBAC0009BE9E8FC8737B5C2249309DFB797F6729CB3AA81AA40904B5D9CF19DD082D7633A0E7DDDDC251EA7DABD81D268191BDAD3D78DA827A17800CE73753CEE10E4ED4A7CD04E86FAF290E2D40A5AABA2AD3711975ECD9A6C639B01B78DA827A17800CE777EBE22FC43B5F5CA21B9635CCCA6ACB75ECD9A6C639B01B4E70A05D1297E1BBC6867C52282FAC85D9B7C4F32B44FF57285124B2A10EEC6C00306258E7E6ABB4E4A6367B16DE6309
X-Mailru-Sender: D8D48EF70163D79D00784CDFC8FD3107C6C3F753F0081E4B2BBA2B88EEBD1C1303EE00DEB249E58D50D5CF8590B94F4EC77752E0C033A69E81198BD1A48777B793AC9912533B2342AE208404248635DF
X-Mras: OK
X-Rspamd-Queue-Id: CF88F709AF
X-Spamd-Bar: --
X-Spamd-Result: default: False [-2.47 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[];
 R_SPF_ALLOW(-0.20)[+ip4:94.100.176.0/20];
 FREEMAIL_FROM(0.00)[mail.ru]; TO_DN_NONE(0.00)[];
 DKIM_TRACE(0.00)[mail.ru:+];
 DMARC_POLICY_ALLOW(-0.50)[mail.ru,reject];
 HAS_X_PRIO_THREE(0.00)[3];
 MX_GOOD(-0.01)[mxs.mail.ru,mxs.mail.ru];
 RCVD_IN_DNSWL_LOW(-0.10)[153.176.100.94.list.dnswl.org : 127.0.5.1];
 FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+];
 FREEMAIL_ENVFROM(0.00)[mail.ru];
 ASN(0.00)[asn:47764, ipnet:94.100.176.0/20, country:RU];
 MID_RHS_MATCH_FROM(0.00)[];
 DWL_DNSWL_NONE(0.00)[mail.ru.dwl.dnswl.org : 127.0.5.0];
 ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.98)[-0.977,0];
 R_DKIM_ALLOW(-0.20)[mail.ru:s=mail2]; FROM_HAS_DN(0.00)[];
 TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-1.00)[-0.999,0];
 MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[];
 RCPT_COUNT_ONE(0.00)[1];
 IP_SCORE(0.03)[ipnet: 94.100.176.0/20(0.08), asn: 47764(0.05), country:
 RU(0.00)]; NEURAL_SPAM_SHORT(0.59)[0.594,0];
 RCVD_COUNT_TWO(0.00)[2]
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Mar 2019 16:07:48 -0000

Greetings,

I've builded 11-stable ( 11.2-STABLE  r344696) from source with option
WITHOUT_SSP="yes" in src.conf.

Installing kernel and world was OK. But  when I tried to build from port it give me an error:
configure: error: in `/usr/ports/ports-mgmt/pkg/work/pkg-1.10.5':
configure: error: C compiler cannot create executables

config.log:
...
configure:3555: cc -v >&5
FreeBSD clang version 7.0.1 (tags/RELEASE_701/final 349250) (based on LLVM 7.0.1)
Target: x86_64-unknown-freebsd11.2
...
configure:3608: cc -O2 -pipe  -Wno-error -fno-strict-aliasing     conftest.c  >&5
/usr/bin/ld: cannot find /usr/lib/libssp_nonshared.a
cc: error: linker command failed with exit code 1 (use -v to see invocation)

And yes, there is SSP_UNSAFE=yes in make.conf

Is this a bug or feature?

-- 
Best regards,
 Anthony Pankov                         mailto:ap00@mail.ru


From owner-freebsd-hackers@freebsd.org  Mon Mar  4 16:56:18 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 00FDC1519AF8
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Mon,  4 Mar 2019 16:56:18 +0000 (UTC) (envelope-from ap00@mail.ru)
Received: from smtp29.i.mail.ru (smtp29.i.mail.ru [94.100.177.89])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 8BB8F72DDE
 for <freebsd-hackers@freebsd.org>; Mon,  4 Mar 2019 16:56:15 +0000 (UTC)
 (envelope-from ap00@mail.ru)
Received: by smtp29.i.mail.ru with esmtpa (envelope-from <ap00@mail.ru>)
 id 1h0qsn-0002la-WD
 for freebsd-hackers@freebsd.org; Mon, 04 Mar 2019 19:56:06 +0300
Date: Mon, 4 Mar 2019 19:56:02 +0300
From: Anthony Pankov <ap00@mail.ru>
X-Priority: 3 (Normal)
Message-ID: <1122478880.20190304195602@mail.ru>
To: Anthony Pankov via freebsd-hackers <freebsd-hackers@freebsd.org>
Subject: Re: building with WITHOUT_SSP side effect
In-Reply-To: <434119194.20190304190732@mail.ru>
References: <434119194.20190304190732@mail.ru>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-77F55803: BBE463BEF7A60BD05A78504BD2AC294173B5FE5E8078F296209616C39EACCCB9268608C7137D5E2FF74DF7540681202C
X-7FA49CB5: 0D63561A33F958A56A3B061B6E4F86418CD123DFA0DB66FE4A423FDB79F1A4D78941B15DA834481FA18204E546F3947CEDCF5861DED71B2F389733CBF5DBD5E9C8A9BA7A39EFB7666BA297DBC24807EA117882F44604297287769387670735209ECD01F8117BC8BEA471835C12D1D977C4224003CC8364767815B9869FA544D8D32BA5DBAC0009BE9E8FC8737B5C2249668B94F0A65C3A0C3AA81AA40904B5D9CF19DD082D7633A0E7DDDDC251EA7DABD81D268191BDAD3D78DA827A17800CE7542AF255F21831B5CD04E86FAF290E2D40A5AABA2AD3711975ECD9A6C639B01B78DA827A17800CE77FA89C872EA2218695742EC39967965D75ECD9A6C639B01B4E70A05D1297E1BBC6867C52282FAC85D9B7C4F32B44FF57285124B2A10EEC6C00306258E7E6ABB4E4A6367B16DE6309
X-Mailru-Sender: D8D48EF70163D79D00784CDFC8FD31072CC0BE42E31726121943B29DF3553A1CFBA4D9A6C41392ED50D5CF8590B94F4EC77752E0C033A69E81198BD1A48777B793AC9912533B2342AE208404248635DF
X-Mras: OK
X-Rspamd-Queue-Id: 8BB8F72DDE
X-Spamd-Bar: ---
X-Spamd-Result: default: False [-3.53 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[];
 R_SPF_ALLOW(-0.20)[+ip4:94.100.176.0/20];
 FREEMAIL_FROM(0.00)[mail.ru]; TO_DN_ALL(0.00)[];
 MX_GOOD(-0.01)[cached: mxs.mail.ru];
 DKIM_TRACE(0.00)[mail.ru:+]; HAS_X_PRIO_THREE(0.00)[3];
 NEURAL_HAM_SHORT(-0.47)[-0.468,0];
 DMARC_POLICY_ALLOW(-0.50)[mail.ru,reject];
 FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+];
 RCVD_TLS_LAST(0.00)[];
 RCVD_IN_DNSWL_LOW(-0.10)[89.177.100.94.list.dnswl.org : 127.0.5.1];
 ASN(0.00)[asn:47764, ipnet:94.100.176.0/20, country:RU];
 MID_RHS_MATCH_FROM(0.00)[];
 DWL_DNSWL_NONE(0.00)[mail.ru.dwl.dnswl.org : 127.0.5.0];
 ARC_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[mail.ru];
 R_DKIM_ALLOW(-0.20)[mail.ru:s=mail2];
 NEURAL_HAM_MEDIUM(-0.98)[-0.980,0]; FROM_HAS_DN(0.00)[];
 TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-1.00)[-0.999,0];
 MIME_GOOD(-0.10)[text/plain];
 IP_SCORE(0.03)[ipnet: 94.100.176.0/20(0.08), asn: 47764(0.05), country:
 RU(0.00)]; RCPT_COUNT_ONE(0.00)[1]; RCVD_COUNT_TWO(0.00)[2]
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Mar 2019 16:56:18 -0000

It  seems  that  world  builded with  WITHOUT_SSP=yes loose ability to
build anything.

# cc -v test.c
FreeBSD clang version 7.0.1 (tags/RELEASE_701/final 349250) (based on LLVM 7.0.1)
Target: x86_64-unknown-freebsd11.2
Thread model: posix
InstalledDir: /usr/bin
 "/usr/bin/cc" -cc1 -triple x86_64-unknown-freebsd11.2 -emit-obj -mrelax-all -disable-free -disable-llvm-verifier -discard-value-names -main-file-name test.c -mrelocation-model static -mthread-model posix -mdisable-fp-elim -masm-verbose -mconstructor-aliases -munwind-tables -target-cpu x86-64 -dwarf-column-info -debugger-tuning=gdb -v -resource-dir /usr/lib/clang/7.0.1 -fdebug-compilation-dir /root/test -ferror-limit 19 -fmessage-length 90 -fobjc-runtime=gnustep -fdiagnostics-show-option -fcolor-diagnostics -o /tmp/test-d853d1.o -x c test.c -faddrsig
clang -cc1 version 7.0.1 based upon LLVM 7.0.1 default target x86_64-unknown-freebsd11.2
#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/clang/7.0.1/include
 /usr/include
End of search list.
 "/usr/bin/ld" --eh-frame-hdr -dynamic-linker /libexec/ld-elf.so.1 --hash-style=both --enable-new-dtags -o a.out /usr/lib/crt1.o /usr/lib/crti.o /usr/lib/crtbegin.o -L/usr/lib /tmp/test-d853d1.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/crtend.o /usr/lib/crtn.o
/usr/bin/ld: cannot find /usr/lib/libssp_nonshared.a


> Greetings,

> I've builded 11-stable ( 11.2-STABLE  r344696) from source with option
> WITHOUT_SSP="yes" in src.conf.

> Installing kernel and world was OK. But  when I tried to build from port it give me an error:
> configure: error: in `/usr/ports/ports-mgmt/pkg/work/pkg-1.10.5':
> configure: error: C compiler cannot create executables

> config.log:
> ...
> configure:3555: cc -v >&5
> FreeBSD clang version 7.0.1 (tags/RELEASE_701/final 349250) (based on LLVM 7.0.1)
> Target: x86_64-unknown-freebsd11.2
> ...
> configure:3608: cc -O2 -pipe  -Wno-error -fno-strict-aliasing     conftest.c  >&5
> /usr/bin/ld: cannot find /usr/lib/libssp_nonshared.a
> cc: error: linker command failed with exit code 1 (use -v to see invocation)

> And yes, there is SSP_UNSAFE=yes in make.conf

> Is this a bug or feature?


-- 
Best regards,
 Anthony Pankov                         mailto:ap00@mail.ru


From owner-freebsd-hackers@freebsd.org  Mon Mar  4 17:14:01 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id C9EE9151B015
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Mon,  4 Mar 2019 17:14:01 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id EA65773A44
 for <freebsd-hackers@freebsd.org>; Mon,  4 Mar 2019 17:14:00 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x24HDqxr095713
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Mon, 4 Mar 2019 19:13:55 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x24HDqxr095713
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id x24HDpZF095712;
 Mon, 4 Mar 2019 19:13:51 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Mon, 4 Mar 2019 19:13:51 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Anthony Pankov <ap00@mail.ru>
Cc: Anthony Pankov via freebsd-hackers <freebsd-hackers@freebsd.org>
Subject: Re: building with WITHOUT_SSP side effect
Message-ID: <20190304171351.GQ68879@kib.kiev.ua>
References: <434119194.20190304190732@mail.ru>
 <1122478880.20190304195602@mail.ru>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1122478880.20190304195602@mail.ru>
User-Agent: Mutt/1.11.3 (2019-02-01)
X-Spam-Status: No, score=0.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM,FREEMAIL_REPLY,
 NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Mar 2019 17:14:02 -0000

On Mon, Mar 04, 2019 at 07:56:02PM +0300, Anthony Pankov via freebsd-hackers wrote:
> It  seems  that  world  builded with  WITHOUT_SSP=yes loose ability to
> build anything.
> 
> # cc -v test.c
> FreeBSD clang version 7.0.1 (tags/RELEASE_701/final 349250) (based on LLVM 7.0.1)
> Target: x86_64-unknown-freebsd11.2
> Thread model: posix
> InstalledDir: /usr/bin
>  "/usr/bin/cc" -cc1 -triple x86_64-unknown-freebsd11.2 -emit-obj -mrelax-all -disable-free -disable-llvm-verifier -discard-value-names -main-file-name test.c -mrelocation-model static -mthread-model posix -mdisable-fp-elim -masm-verbose -mconstructor-aliases -munwind-tables -target-cpu x86-64 -dwarf-column-info -debugger-tuning=gdb -v -resource-dir /usr/lib/clang/7.0.1 -fdebug-compilation-dir /root/test -ferror-limit 19 -fmessage-length 90 -fobjc-runtime=gnustep -fdiagnostics-show-option -fcolor-diagnostics -o /tmp/test-d853d1.o -x c test.c -faddrsig
> clang -cc1 version 7.0.1 based upon LLVM 7.0.1 default target x86_64-unknown-freebsd11.2
> #include "..." search starts here:
> #include <...> search starts here:
>  /usr/lib/clang/7.0.1/include
>  /usr/include
> End of search list.
>  "/usr/bin/ld" --eh-frame-hdr -dynamic-linker /libexec/ld-elf.so.1 --hash-style=both --enable-new-dtags -o a.out /usr/lib/crt1.o /usr/lib/crti.o /usr/lib/crtbegin.o -L/usr/lib /tmp/test-d853d1.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/crtend.o /usr/lib/crtn.o
> /usr/bin/ld: cannot find /usr/lib/libssp_nonshared.a
It seems that you installed without specifying WITHOUT_SSP, which
ended up installing wrong linker script as libc.a.  Either create dummy
libssp_nonshared.a, or reinstall libc.a (look at lib/libc/Makefile for
SHLIB_LDSCRIPT), or reinstall the world.

> 
> 
> > Greetings,
> 
> > I've builded 11-stable ( 11.2-STABLE  r344696) from source with option
> > WITHOUT_SSP="yes" in src.conf.
> 
> > Installing kernel and world was OK. But  when I tried to build from port it give me an error:
> > configure: error: in `/usr/ports/ports-mgmt/pkg/work/pkg-1.10.5':
> > configure: error: C compiler cannot create executables
> 
> > config.log:
> > ...
> > configure:3555: cc -v >&5
> > FreeBSD clang version 7.0.1 (tags/RELEASE_701/final 349250) (based on LLVM 7.0.1)
> > Target: x86_64-unknown-freebsd11.2
> > ...
> > configure:3608: cc -O2 -pipe  -Wno-error -fno-strict-aliasing     conftest.c  >&5
> > /usr/bin/ld: cannot find /usr/lib/libssp_nonshared.a
> > cc: error: linker command failed with exit code 1 (use -v to see invocation)
> 
> > And yes, there is SSP_UNSAFE=yes in make.conf
> 
> > Is this a bug or feature?
> 
> 
> 
> 
> -- 
> Best regards,
>  Anthony Pankov                         mailto:ap00@mail.ru
> 
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"

From owner-freebsd-hackers@freebsd.org  Mon Mar  4 17:31:47 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6C179151B86C
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Mon,  4 Mar 2019 17:31:47 +0000 (UTC) (envelope-from ap00@mail.ru)
Received: from smtp39.i.mail.ru (smtp39.i.mail.ru [94.100.177.99])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id EF630745DB
 for <freebsd-hackers@freebsd.org>; Mon,  4 Mar 2019 17:31:46 +0000 (UTC)
 (envelope-from ap00@mail.ru)
Received: by smtp39.i.mail.ru with esmtpa (envelope-from <ap00@mail.ru>)
 id 1h0rRB-0000Vh-0l; Mon, 04 Mar 2019 20:31:37 +0300
Date: Mon, 4 Mar 2019 20:31:33 +0300
From: Anthony Pankov <ap00@mail.ru>
X-Priority: 3 (Normal)
Message-ID: <1032136115.20190304203133@mail.ru>
To: Konstantin Belousov <kostikbel@gmail.com>
CC: Anthony Pankov via freebsd-hackers <freebsd-hackers@freebsd.org>
Subject: Re: building with WITHOUT_SSP side effect
In-Reply-To: <20190304171351.GQ68879@kib.kiev.ua>
References: <434119194.20190304190732@mail.ru>
 <1122478880.20190304195602@mail.ru> 
 <20190304171351.GQ68879@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1251
Content-Transfer-Encoding: quoted-printable
X-77F55803: 257C4F86AB09C89C5A78504BD2AC2941988784FC6C4AE31F8AB91D8030D92387D63009B91FF4146F18B539348254898DE6C6E5F6ACBD9482
X-7FA49CB5: 0D63561A33F958A5D6E224D12FFAC8C5054B814514D99F4C82A1CE9E894AF5B58941B15DA834481FA18204E546F3947CEDCF5861DED71B2F389733CBF5DBD5E9C8A9BA7A39EFB7666BA297DBC24807EA117882F44604297287769387670735209ECD01F8117BC8BEA471835C12D1D977C4224003CC8364767815B9869FA544D8D32BA5DBAC0009BE9E8FC8737B5C2249668B94F0A65C3A0C3AA81AA40904B5D9CF19DD082D7633A0E7DDDDC251EA7DABD81D268191BDAD3D78DA827A17800CE7542AF255F21831B5CD04E86FAF290E2D40A5AABA2AD3711975ECD9A6C639B01B78DA827A17800CE79EBEB503AFBA2DD44EED78E81DD8BDE975ECD9A6C639B01B4E70A05D1297E1BBC6867C52282FAC85D9B7C4F32B44FF57285124B2A10EEC6C00306258E7E6ABB4E4A6367B16DE6309
X-Mailru-Sender: D8D48EF70163D79D00784CDFC8FD31077F8BD95612A2B4898D77F1A9FF468278C0F74C40C00C070950D5CF8590B94F4EC77752E0C033A69E81198BD1A48777B793AC9912533B2342AE208404248635DF
X-Mras: OK
X-Rspamd-Queue-Id: EF630745DB
X-Spamd-Bar: ------
X-Spamd-Result: default: False [-6.99 / 15.00];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 NEURAL_HAM_SHORT(-0.99)[-0.993,0]; REPLY(-4.00)[];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Mar 2019 17:31:47 -0000

Thank you for reply,

Do you mean that I must install world explicity as

make installworld WITHOUT_SSP=3Dyes

and  the same string in src.conf is not enough? I'm sure that I didn't
touch src.conf between 'buildworld' and 'installworld'.


> On Mon, Mar 04, 2019 at 07:56:02PM +0300, Anthony Pankov via freebsd-hack=
ers wrote:
>> It  seems  that  world  builded with  WITHOUT_SSP=3Dyes loose ability to
>> build anything.
>>=20
>> # cc -v test.c
>> FreeBSD clang version 7.0.1 (tags/RELEASE_701/final 349250) (based on LL=
VM 7.0.1)
>> Target: x86_64-unknown-freebsd11.2
>> Thread model: posix
>> InstalledDir: /usr/bin
>>  "/usr/bin/cc" -cc1 -triple x86_64-unknown-freebsd11.2 -emit-obj -mrelax=
-all -disable-free -disable-llvm-verifier -discard-value-names -main-file-n=
ame test.c -mrelocation-model static -mthread-model posix -mdisable-fp-elim=
 -masm-verbose -mconstructor-aliases -munwind-tables -target-cpu x86-64 -dw=
arf-column-info -debugger-tuning=3Dgdb -v -resource-dir /usr/lib/clang/7.0.=
1 -fdebug-compilation-dir /root/test -ferror-limit 19 -fmessage-length 90 -=
fobjc-runtime=3Dgnustep -fdiagnostics-show-option -fcolor-diagnostics -o /t=
mp/test-d853d1.o -x c test.c -faddrsig
>> clang -cc1 version 7.0.1 based upon LLVM 7.0.1 default target x86_64-unk=
nown-freebsd11.2
>> #include "..." search starts here:
>> #include <...> search starts here:
>>  /usr/lib/clang/7.0.1/include
>>  /usr/include
>> End of search list.
>>  "/usr/bin/ld" --eh-frame-hdr -dynamic-linker /libexec/ld-elf.so.1 --has=
h-style=3Dboth --enable-new-dtags -o a.out /usr/lib/crt1.o /usr/lib/crti.o =
/usr/lib/crtbegin.o -L/usr/lib /tmp/test-d853d1.o -lgcc --as-needed -lgcc_s=
 --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/crten=
d.o /usr/lib/crtn.o
>> /usr/bin/ld: cannot find /usr/lib/libssp_nonshared.a
> It seems that you installed without specifying WITHOUT_SSP, which
> ended up installing wrong linker script as libc.a.  Either create dummy
> libssp_nonshared.a, or reinstall libc.a (look at lib/libc/Makefile for
> SHLIB_LDSCRIPT), or reinstall the world.

>>=20
>>=20
>> > Greetings,
>>=20
>> > I've builded 11-stable ( 11.2-STABLE  r344696) from source with option
>> > WITHOUT_SSP=3D"yes" in src.conf.
>>=20
>> > Installing kernel and world was OK. But  when I tried to build from po=
rt it give me an error:
>> > configure: error: in `/usr/ports/ports-mgmt/pkg/work/pkg-1.10.5':
>> > configure: error: C compiler cannot create executables
>>=20
>> > config.log:
>> > ...
>> > configure:3555: cc -v >&5
>> > FreeBSD clang version 7.0.1 (tags/RELEASE_701/final 349250) (based on =
LLVM 7.0.1)
>> > Target: x86_64-unknown-freebsd11.2
>> > ...
>> > configure:3608: cc -O2 -pipe  -Wno-error -fno-strict-aliasing     conf=
test.c  >&5
>> > /usr/bin/ld: cannot find /usr/lib/libssp_nonshared.a
>> > cc: error: linker command failed with exit code 1 (use -v to see invoc=
ation)
>>=20
>> > And yes, there is SSP_UNSAFE=3Dyes in make.conf
>>=20
>> > Is this a bug or feature?
>>=20
>>=20
>>=20
>>=20
>> --=20
>> Best regards,
>>  Anthony Pankov                         mailto:ap00@mail.ru
>>=20
>> _______________________________________________
>> freebsd-hackers@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.or=
g"
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to
> "freebsd-hackers-unsubscribe@freebsd.org"


--=20
=D1 =F3=E2=E0=E6=E5=ED=E8=E5=EC,
 Anthony                          mailto:ap00@mail.ru


From owner-freebsd-hackers@freebsd.org  Mon Mar  4 17:39:45 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 649DA151BAB3
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Mon,  4 Mar 2019 17:39:45 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id AD47D74A09
 for <freebsd-hackers@freebsd.org>; Mon,  4 Mar 2019 17:39:44 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x24HdbGJ001660
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Mon, 4 Mar 2019 19:39:40 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x24HdbGJ001660
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id x24HdbWf001659;
 Mon, 4 Mar 2019 19:39:37 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Mon, 4 Mar 2019 19:39:37 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Anthony Pankov <ap00@mail.ru>
Cc: Anthony Pankov via freebsd-hackers <freebsd-hackers@freebsd.org>
Subject: Re: building with WITHOUT_SSP side effect
Message-ID: <20190304173937.GR68879@kib.kiev.ua>
References: <434119194.20190304190732@mail.ru>
 <1122478880.20190304195602@mail.ru>
 <20190304171351.GQ68879@kib.kiev.ua>
 <1032136115.20190304203133@mail.ru>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <1032136115.20190304203133@mail.ru>
User-Agent: Mutt/1.11.3 (2019-02-01)
X-Spam-Status: No, score=0.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM,FREEMAIL_REPLY,
 NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Mar 2019 17:39:45 -0000

On Mon, Mar 04, 2019 at 08:31:33PM +0300, Anthony Pankov wrote:
> Thank you for reply,
> 
> Do you mean that I must install world explicity as
> 
> make installworld WITHOUT_SSP=yes
> 
> and  the same string in src.conf is not enough? I'm sure that I didn't
> touch src.conf between 'buildworld' and 'installworld'.
Check your /usr/lib/libc.a, if it mentions libssp_nonshared.a then
you have something broken.

> 
> 
> > On Mon, Mar 04, 2019 at 07:56:02PM +0300, Anthony Pankov via freebsd-hackers wrote:
> >> It  seems  that  world  builded with  WITHOUT_SSP=yes loose ability to
> >> build anything.
> >> 
> >> # cc -v test.c
> >> FreeBSD clang version 7.0.1 (tags/RELEASE_701/final 349250) (based on LLVM 7.0.1)
> >> Target: x86_64-unknown-freebsd11.2
> >> Thread model: posix
> >> InstalledDir: /usr/bin
> >>  "/usr/bin/cc" -cc1 -triple x86_64-unknown-freebsd11.2 -emit-obj -mrelax-all -disable-free -disable-llvm-verifier -discard-value-names -main-file-name test.c -mrelocation-model static -mthread-model posix -mdisable-fp-elim -masm-verbose -mconstructor-aliases -munwind-tables -target-cpu x86-64 -dwarf-column-info -debugger-tuning=gdb -v -resource-dir /usr/lib/clang/7.0.1 -fdebug-compilation-dir /root/test -ferror-limit 19 -fmessage-length 90 -fobjc-runtime=gnustep -fdiagnostics-show-option -fcolor-diagnostics -o /tmp/test-d853d1.o -x c test.c -faddrsig
> >> clang -cc1 version 7.0.1 based upon LLVM 7.0.1 default target x86_64-unknown-freebsd11.2
> >> #include "..." search starts here:
> >> #include <...> search starts here:
> >>  /usr/lib/clang/7.0.1/include
> >>  /usr/include
> >> End of search list.
> >>  "/usr/bin/ld" --eh-frame-hdr -dynamic-linker /libexec/ld-elf.so.1 --hash-style=both --enable-new-dtags -o a.out /usr/lib/crt1.o /usr/lib/crti.o /usr/lib/crtbegin.o -L/usr/lib /tmp/test-d853d1.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/crtend.o /usr/lib/crtn.o
> >> /usr/bin/ld: cannot find /usr/lib/libssp_nonshared.a
> > It seems that you installed without specifying WITHOUT_SSP, which
> > ended up installing wrong linker script as libc.a.  Either create dummy
> > libssp_nonshared.a, or reinstall libc.a (look at lib/libc/Makefile for
> > SHLIB_LDSCRIPT), or reinstall the world.
> 
> >> 
> >> 
> >> > Greetings,
> >> 
> >> > I've builded 11-stable ( 11.2-STABLE  r344696) from source with option
> >> > WITHOUT_SSP="yes" in src.conf.
> >> 
> >> > Installing kernel and world was OK. But  when I tried to build from port it give me an error:
> >> > configure: error: in `/usr/ports/ports-mgmt/pkg/work/pkg-1.10.5':
> >> > configure: error: C compiler cannot create executables
> >> 
> >> > config.log:
> >> > ...
> >> > configure:3555: cc -v >&5
> >> > FreeBSD clang version 7.0.1 (tags/RELEASE_701/final 349250) (based on LLVM 7.0.1)
> >> > Target: x86_64-unknown-freebsd11.2
> >> > ...
> >> > configure:3608: cc -O2 -pipe  -Wno-error -fno-strict-aliasing     conftest.c  >&5
> >> > /usr/bin/ld: cannot find /usr/lib/libssp_nonshared.a
> >> > cc: error: linker command failed with exit code 1 (use -v to see invocation)
> >> 
> >> > And yes, there is SSP_UNSAFE=yes in make.conf
> >> 
> >> > Is this a bug or feature?
> >> 
> >> 
> >> 
> >> 
> >> -- 
> >> Best regards,
> >>  Anthony Pankov                         mailto:ap00@mail.ru
> >> 
> >> _______________________________________________
> >> freebsd-hackers@freebsd.org mailing list
> >> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> >> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
> > _______________________________________________
> > freebsd-hackers@freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> > To unsubscribe, send any mail to
> > "freebsd-hackers-unsubscribe@freebsd.org"
> 
> 
> 
> -- 
> С уважением,
>  Anthony                          mailto:ap00@mail.ru
> 

From owner-freebsd-hackers@freebsd.org  Mon Mar  4 17:56:48 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id B8DA1151C57A
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Mon,  4 Mar 2019 17:56:47 +0000 (UTC) (envelope-from ap00@mail.ru)
Received: from smtp5.mail.ru (smtp5.mail.ru [94.100.179.24])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 48D777586D
 for <freebsd-hackers@freebsd.org>; Mon,  4 Mar 2019 17:56:47 +0000 (UTC)
 (envelope-from ap00@mail.ru)
Received: by smtp5.mail.ru with esmtpa (envelope-from <ap00@mail.ru>)
 id 1h0rpN-0007HQ-AH; Mon, 04 Mar 2019 20:56:37 +0300
Date: Mon, 4 Mar 2019 20:56:34 +0300
From: Anthony Pankov <ap00@mail.ru>
X-Priority: 3 (Normal)
Message-ID: <1178496353.20190304205634@mail.ru>
To: Konstantin Belousov <kostikbel@gmail.com>
CC: Anthony Pankov via freebsd-hackers <freebsd-hackers@freebsd.org>
Subject: Re: building with WITHOUT_SSP side effect
In-Reply-To: <20190304173937.GR68879@kib.kiev.ua>
References: <434119194.20190304190732@mail.ru>
 <1122478880.20190304195602@mail.ru> 
 <20190304171351.GQ68879@kib.kiev.ua> <1032136115.20190304203133@mail.ru>
 <20190304173937.GR68879@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-77F55803: BBE463BEF7A60BD05A78504BD2AC294173B5FE5E8078F296934FF6215BFBAED1D9BD7BD9299EB09D76C5711636B2200D
X-7FA49CB5: 0D63561A33F958A5268FC42D51BE80F42976F3D4F0E0E38FEFC645FA292A76658941B15DA834481FA18204E546F3947CEDCF5861DED71B2F389733CBF5DBD5E9C8A9BA7A39EFB7666BA297DBC24807EA117882F44604297287769387670735209ECD01F8117BC8BEA471835C12D1D977C4224003CC8364767815B9869FA544D8D32BA5DBAC0009BE9E8FC8737B5C2249668B94F0A65C3A0C3AA81AA40904B5D9CF19DD082D7633A0E7DDDDC251EA7DABD81D268191BDAD3D78DA827A17800CE7542AF255F21831B5CD04E86FAF290E2D40A5AABA2AD3711975ECD9A6C639B01B78DA827A17800CE7DBA9D19EC28D74DCABBED4C59776AF2D75ECD9A6C639B01B4E70A05D1297E1BBC6867C52282FAC85D9B7C4F32B44FF57285124B2A10EEC6C00306258E7E6ABB4E4A6367B16DE6309
X-Mailru-Sender: D8D48EF70163D79D00784CDFC8FD3107B19AC08ED0E7A9241832297969E15CEEB3C773650554347350D5CF8590B94F4EC77752E0C033A69E81198BD1A48777B793AC9912533B2342AE208404248635DF
X-Mras: OK
X-Rspamd-Queue-Id: 48D777586D
X-Spamd-Bar: ------
X-Spamd-Result: default: False [-6.99 / 15.00];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 NEURAL_HAM_SHORT(-0.99)[-0.993,0]; REPLY(-4.00)[];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Mar 2019 17:56:48 -0000

I have looked on it and found no ssp entries:

ar t /usr/lib/libc.a |grep ssp

wcsspn.o
readpassphrase.o

P.S.
touch  /usr/lib/libssp_nonshared.a

is a cure. But it seems weird.


> On Mon, Mar 04, 2019 at 08:31:33PM +0300, Anthony Pankov wrote:
>> Thank you for reply,
>>=20
>> Do you mean that I must install world explicity as
>>=20
>> make installworld WITHOUT_SSP=3Dyes
>>=20
>> and  the same string in src.conf is not enough? I'm sure that I didn't
>> touch src.conf between 'buildworld' and 'installworld'.
> Check your /usr/lib/libc.a, if it mentions libssp_nonshared.a then
> you have something broken.

>>=20
>>=20
>> > On Mon, Mar 04, 2019 at 07:56:02PM +0300, Anthony Pankov via freebsd-h=
ackers wrote:
>> >> It  seems  that  world  builded with  WITHOUT_SSP=3Dyes loose ability=
 to
>> >> build anything.
>> >>=20
>> >> # cc -v test.c
>> >> FreeBSD clang version 7.0.1 (tags/RELEASE_701/final 349250) (based on=
 LLVM 7.0.1)
>> >> Target: x86_64-unknown-freebsd11.2
>> >> Thread model: posix
>> >> InstalledDir: /usr/bin
>> >>  "/usr/bin/cc" -cc1 -triple x86_64-unknown-freebsd11.2 -emit-obj -mre=
lax-all -disable-free -disable-llvm-verifier -discard-value-names -main-fil=
e-name test.c -mrelocation-model static -mthread-model posix -mdisable-fp-e=
lim -masm-verbose -mconstructor-aliases -munwind-tables -target-cpu x86-64 =
-dwarf-column-info -debugger-tuning=3Dgdb -v -resource-dir /usr/lib/clang/7=
.0.1 -fdebug-compilation-dir /root/test -ferror-limit 19 -fmessage-length 9=
0 -fobjc-runtime=3Dgnustep -fdiagnostics-show-option -fcolor-diagnostics -o=
 /tmp/test-d853d1.o -x c test.c -faddrsig
>> >> clang -cc1 version 7.0.1 based upon LLVM 7.0.1 default target x86_64-=
unknown-freebsd11.2
>> >> #include "..." search starts here:
>> >> #include <...> search starts here:
>> >>  /usr/lib/clang/7.0.1/include
>> >>  /usr/include
>> >> End of search list.
>> >>  "/usr/bin/ld" --eh-frame-hdr -dynamic-linker /libexec/ld-elf.so.1 --=
hash-style=3Dboth --enable-new-dtags -o a.out /usr/lib/crt1.o /usr/lib/crti=
.o /usr/lib/crtbegin.o -L/usr/lib /tmp/test-d853d1.o -lgcc --as-needed -lgc=
c_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/cr=
tend.o /usr/lib/crtn.o
>> >> /usr/bin/ld: cannot find /usr/lib/libssp_nonshared.a
>> > It seems that you installed without specifying WITHOUT_SSP, which
>> > ended up installing wrong linker script as libc.a.  Either create dummy
>> > libssp_nonshared.a, or reinstall libc.a (look at lib/libc/Makefile for
>> > SHLIB_LDSCRIPT), or reinstall the world.
>>=20
>> >>=20
>> >>=20
>> >> > Greetings,
>> >>=20
>> >> > I've builded 11-stable ( 11.2-STABLE  r344696) from source with opt=
ion
>> >> > WITHOUT_SSP=3D"yes" in src.conf.
>> >>=20
>> >> > Installing kernel and world was OK. But  when I tried to build from=
 port it give me an error:
>> >> > configure: error: in `/usr/ports/ports-mgmt/pkg/work/pkg-1.10.5':
>> >> > configure: error: C compiler cannot create executables
>> >>=20
>> >> > config.log:
>> >> > ...
>> >> > configure:3555: cc -v >&5
>> >> > FreeBSD clang version 7.0.1 (tags/RELEASE_701/final 349250) (based =
on LLVM 7.0.1)
>> >> > Target: x86_64-unknown-freebsd11.2
>> >> > ...
>> >> > configure:3608: cc -O2 -pipe  -Wno-error -fno-strict-aliasing     c=
onftest.c  >&5
>> >> > /usr/bin/ld: cannot find /usr/lib/libssp_nonshared.a
>> >> > cc: error: linker command failed with exit code 1 (use -v to see in=
vocation)
>> >>=20
>> >> > And yes, there is SSP_UNSAFE=3Dyes in make.conf
>> >>=20
>> >> > Is this a bug or feature?
>> >>=20
>> >>=20
>> >>=20
>> >>=20
>> >> --=20
>> >> Best regards,
>> >>  Anthony Pankov                         mailto:ap00@mail.ru
>> >>=20
>> >> _______________________________________________
>> >> freebsd-hackers@freebsd.org mailing list
>> >> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>> >> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd=
.org"
>> > _______________________________________________
>> > freebsd-hackers@freebsd.org mailing list
>> > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>> > To unsubscribe, send any mail to
>> > "freebsd-hackers-unsubscribe@freebsd.org"
>>=20
>>=20
>>=20
>> --=20
>> =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC,
>>  Anthony                          mailto:ap00@mail.ru
>>=20
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to
> "freebsd-hackers-unsubscribe@freebsd.org"


--=20
=D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC,
 Anthony                          mailto:ap00@mail.ru


From owner-freebsd-hackers@freebsd.org  Mon Mar  4 18:06:10 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 707B4151D02F
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Mon,  4 Mar 2019 18:06:10 +0000 (UTC)
 (envelope-from shawn.webb@hardenedbsd.org)
Received: from mail-qt1-x844.google.com (mail-qt1-x844.google.com
 [IPv6:2607:f8b0:4864:20::844])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 588A676D1A
 for <freebsd-hackers@freebsd.org>; Mon,  4 Mar 2019 18:06:09 +0000 (UTC)
 (envelope-from shawn.webb@hardenedbsd.org)
Received: by mail-qt1-x844.google.com with SMTP id o6so6106637qtk.6
 for <freebsd-hackers@freebsd.org>; Mon, 04 Mar 2019 10:06:09 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=hardenedbsd.org; s=google;
 h=date:from:to:cc:subject:message-id:references:mime-version
 :content-disposition:in-reply-to:user-agent;
 bh=e/AgxHwEKMfjJwXgArj73TJhdZgscDaLbOmOPph0Ijc=;
 b=gs/WjFlt9jkszp61tyxW0/wClMSc9NpY47D0n2qTPlbzZJysHyEBgCFxySnJYMjLpD
 P/NPgMVjecelqU1UF3OOgH0VHwVFFnpzL40aEWbgLgmRr03aipNwmGMzyWu946pJFltu
 OXQOq713LgUeGwHvEp7fo7A+a8sC8nxmtjw7DQ/394OHO/cvaJ0eGf4AUwqAOR4OfikP
 DewSq8bZDrr4rQaZJ7GyiKb/DX8nE4qJrxPSWcbcUU3WA6iCTI3C8dDtP2CJ0h0cTUgd
 JI7Pt/Ka4xSC6e5UikMODpSFJFt4cWMEcWYPGTZ/L6NJOTSMNKOKQo6+CuZ/JR9C9T68
 gM1Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:date:from:to:cc:subject:message-id:references
 :mime-version:content-disposition:in-reply-to:user-agent;
 bh=e/AgxHwEKMfjJwXgArj73TJhdZgscDaLbOmOPph0Ijc=;
 b=dpOn/cntpdIL8udi13UaFpVe6zATfBYYbA2eSKcHF9okTCu+PvSVrqO9LveYtfPzoF
 3dOVWSki04Nt0lLm69OJUIvQuoRCBvbqJLSEkXqj/zM8i1PDGtwLZAu3o8ro8LbON3np
 E1lEwqPPdgeM2A8t108HEdaSabIh98ENv2OVH4rOYy8BQ5tPxJEHnkk7ao3M/OBe1o0r
 FqL1OXhrr98zvZvkTTOqg4DZhuXYksPcOV2Q/ZRHgnaLlMceFrbtZ3p7Jy+lPg7Uo3fp
 uOBZ8c4oP5gMjodWdIT+GTewLfX48Ma7EpUt4u9YDM0iqapNYrDfWILaK3bHH7EiWqGv
 ctZA==
X-Gm-Message-State: APjAAAUL9VMy7A1g5+voa9R9vogyvV07Ia9CBmHPn7c1dE+rf1twHzSz
 nQBM0AbuseT7p+Oqgf7ZqW6UFw==
X-Google-Smtp-Source: APXvYqw/fUMIM9Gh9JCYfOROTdtAJ8MRqczipIt2SE470TGwjNg1qxoMW+Pl5wR+jpYtF+jd1VUItw==
X-Received: by 2002:ac8:396b:: with SMTP id t40mr15297860qtb.159.1551722768886; 
 Mon, 04 Mar 2019 10:06:08 -0800 (PST)
Received: from mutt-hbsd ([63.88.83.108])
 by smtp.gmail.com with ESMTPSA id k27sm3514370qki.19.2019.03.04.10.06.07
 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
 Mon, 04 Mar 2019 10:06:08 -0800 (PST)
Date: Mon, 4 Mar 2019 13:05:33 -0500
From: Shawn Webb <shawn.webb@hardenedbsd.org>
To: Anthony Pankov <ap00@mail.ru>
Cc: Konstantin Belousov <kostikbel@gmail.com>,
 Anthony Pankov via freebsd-hackers <freebsd-hackers@freebsd.org>
Subject: Re: building with WITHOUT_SSP side effect
Message-ID: <20190304180533.rkpfkg5qxmhifeiy@mutt-hbsd>
References: <434119194.20190304190732@mail.ru>
 <1122478880.20190304195602@mail.ru>
 <20190304171351.GQ68879@kib.kiev.ua>
 <1032136115.20190304203133@mail.ru>
 <20190304173937.GR68879@kib.kiev.ua>
 <1178496353.20190304205634@mail.ru>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha256;
 protocol="application/pgp-signature"; boundary="p2lpnnazo2wwjgz2"
Content-Disposition: inline
In-Reply-To: <1178496353.20190304205634@mail.ru>
X-Operating-System: FreeBSD mutt-hbsd 13.0-CURRENT-HBSD FreeBSD
 13.0-CURRENT-HBSD  HARDENEDBSD-13-CURRENT  amd64
X-PGP-Key: http://pgp.mit.edu/pks/lookup?op=vindex&search=0x6A84658F52456EEE
User-Agent: NeoMutt/20180716
X-Rspamd-Queue-Id: 588A676D1A
X-Spamd-Bar: -----
Authentication-Results: mx1.freebsd.org;
 dkim=pass header.d=hardenedbsd.org header.s=google header.b=gs/WjFlt;
 spf=pass (mx1.freebsd.org: domain of shawn.webb@hardenedbsd.org designates
 2607:f8b0:4864:20::844 as permitted sender)
 smtp.mailfrom=shawn.webb@hardenedbsd.org
X-Spamd-Result: default: False [-5.54 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[];
 R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36];
 RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[];
 MX_GOOD(-0.01)[cached: alt1.aspmx.l.google.com];
 DKIM_TRACE(0.00)[hardenedbsd.org:+];
 NEURAL_HAM_SHORT(-0.97)[-0.972,0]; SIGNED_PGP(-2.00)[];
 FREEMAIL_TO(0.00)[mail.ru]; FROM_EQ_ENVFROM(0.00)[];
 IP_SCORE(-0.46)[ip: (2.52), ipnet: 2607:f8b0::/32(-2.70), asn: 15169(-2.04),
 country: US(-0.07)]; MIME_TRACE(0.00)[0:+,1:+];
 ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US];
 RCVD_TLS_LAST(0.00)[]; ARC_NA(0.00)[];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 R_DKIM_ALLOW(-0.20)[hardenedbsd.org:s=google];
 FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3];
 NEURAL_HAM_LONG(-1.00)[-1.000,0];
 MIME_GOOD(-0.20)[multipart/signed,text/plain];
 PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org];
 DMARC_NA(0.00)[hardenedbsd.org]; TO_MATCH_ENVRCPT_SOME(0.00)[];
 RCVD_IN_DNSWL_NONE(0.00)[4.4.8.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.b.8.f.7.0.6.2.list.dnswl.org
 : 127.0.5.0]; MID_RHS_NOT_FQDN(0.50)[];
 FREEMAIL_CC(0.00)[gmail.com]
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Mar 2019 18:06:10 -0000


--p2lpnnazo2wwjgz2
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

I'm curious about your use case for building without stack cookies.

Thanks,

--=20
Shawn Webb
Cofounder and Security Engineer
HardenedBSD

Tor-ified Signal:    +1 443-546-8752
Tor+XMPP+OTR:        lattera@is.a.hacker.sx
GPG Key ID:          0x6A84658F52456EEE
GPG Key Fingerprint: 2ABA B6BD EF6A F486 BE89  3D9E 6A84 658F 5245 6EEE

On Mon, Mar 04, 2019 at 08:56:34PM +0300, Anthony Pankov via freebsd-hacker=
s wrote:
> I have looked on it and found no ssp entries:
>=20
> ar t /usr/lib/libc.a |grep ssp
>=20
> wcsspn.o
> readpassphrase.o
>=20
> P.S.
> touch  /usr/lib/libssp_nonshared.a
>=20
> is a cure. But it seems weird.

--p2lpnnazo2wwjgz2
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEEKrq2ve9q9Ia+iT2eaoRlj1JFbu4FAlx9aOgACgkQaoRlj1JF
bu6ncw/+NX4oLR0HaXK4bgmth4xMwQ/3MfGyhDT/+p0j/TN6QtlcECKYDdHriFV9
RfjtgsPytdHFb8eb3nwnR4EjL2DqN3y0LQq7WPwZVKPlHm+ohqIGx3F+7REBXCL2
zamwwQSqOgX7EwOXKEQWobGXBMwwTklf8pl9G/h5+1MuwxYANEMKKBGzWsOah+a4
chGIFyi+b8smykeOy7h4y1YznblrbQcbN7IhAaHYpE7NmS8LQLIMcMNdb1baOpOx
EhDJmth+UWv+3is2wkL1UCqbMbNfatjs/nOAmUVZIO33GVYPsjI8ElgjhmJ6cz3S
Q1HS1ucVNCeB7okcU0Z2DuYexljr8/4k2x9qTE6yJs4N/lMRqm0mEOZmBBnmuhMh
OoUl8kj5+U7hSSttNTEgRYQELESCq7pPNJgOeNZLG0h0F7NfLMmqu+PG9fyeELOz
L/o+zbxHD3NfWsih+11zEnxJ7XJCcA7LY2Fkl2ekETQk1bA/dfxRaYhuD0bAT+RR
5Eso9mgX3X5DM+KBiJE2zzYEs6P6xnfGkGiFPxTqTn3MxPt5Rcv63OY9+kKGX4fZ
DKEjsMLCJNU/+Z4w+KvyqQCJZssn1UawfdA4eGLJfFdhUI0HlJmbwJytagZqJUpP
rLAZYJ8bUquh27qV8/MV2RTuhQeGQOEMOfQvuLMZp7PKoVg6gm0=
=g3NW
-----END PGP SIGNATURE-----

--p2lpnnazo2wwjgz2--

From owner-freebsd-hackers@freebsd.org  Mon Mar  4 18:17:26 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 75BCC151DB25;
 Mon,  4 Mar 2019 18:17:26 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au
 [211.29.132.246])
 by mx1.freebsd.org (Postfix) with ESMTP id 6A52577842;
 Mon,  4 Mar 2019 18:17:25 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au
 [110.21.101.228])
 by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 41B5B43A329;
 Tue,  5 Mar 2019 05:17:15 +1100 (AEDT)
Date: Tue, 5 Mar 2019 05:17:14 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Konstantin Belousov <kostikbel@gmail.com>
cc: Bruce Evans <brde@optusnet.com.au>, Mark Millard <marklmi@yahoo.com>, 
 freebsd-hackers Hackers <freebsd-hackers@freebsd.org>, 
 FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale
 * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed]
In-Reply-To: <20190304114150.GM68879@kib.kiev.ua>
Message-ID: <20190305031010.I4610@besplex.bde.org>
References: <20190301194217.GB68879@kib.kiev.ua>
 <20190302071425.G5025@besplex.bde.org>
 <20190302105140.GC68879@kib.kiev.ua> <20190302225513.W3408@besplex.bde.org>
 <20190302142521.GE68879@kib.kiev.ua> <20190303041441.V4781@besplex.bde.org>
 <20190303111931.GI68879@kib.kiev.ua> <20190303223100.B3572@besplex.bde.org>
 <20190303161635.GJ68879@kib.kiev.ua> <20190304043416.V5640@besplex.bde.org>
 <20190304114150.GM68879@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0
 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17
 a=kj9zAlcOel0A:10 a=2apI1eGbhsv_kSbrP38A:9 a=CjuIK1q_8ugA:10
X-Rspamd-Queue-Id: 6A52577842
X-Spamd-Bar: ------
Authentication-Results: mx1.freebsd.org
X-Spamd-Result: default: False [-6.99 / 15.00];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 NEURAL_HAM_SHORT(-0.99)[-0.994,0]; REPLY(-4.00)[];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]
X-Mailman-Approved-At: Mon, 04 Mar 2019 19:10:34 +0000
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Mar 2019 18:17:26 -0000

On Mon, 4 Mar 2019, Konstantin Belousov wrote:

> On Mon, Mar 04, 2019 at 05:29:48AM +1100, Bruce Evans wrote:
>> On Sun, 3 Mar 2019, Konstantin Belousov wrote:
>>
>>> On Mon, Mar 04, 2019 at 12:32:12AM +1100, Bruce Evans wrote:
>>>> On Sun, 3 Mar 2019, Konstantin Belousov wrote:
>>>>
>>>>> On Sun, Mar 03, 2019 at 04:43:20AM +1100, Bruce Evans wrote:
>>>>>> On Sat, 2 Mar 2019, Konstantin Belousov wrote:
>>>>>>
>>>>>>> On Sun, Mar 03, 2019 at 12:03:18AM +1100, Bruce Evans wrote:
>>>>>>>> On Sat, 2 Mar 2019, Konstantin Belousov wrote:
>>> * ...
>>>> However, shrd in rdtsc-low (tsc_get_timecount_low()) does a slow combining
>>>> step.  i386 used to be faster here -- the first masking step of discarding
>>>> %edx doesn't take any code.  amd64 has to mask out the top bits in %rax.
>>>> Now for the tsc-low pessimization, i386 has to do a slow shrd, and amd64
>>>> has to do a not so slow shr.
>>> i386 cannot discard %edx after RDTSC since some bits from %edx come into
>>> the timecounter value.
>>
>> These bits are part of the tsc-low pessimization.  The shift count should
>> always be 1, giving a TSC frequency of > INT32_MAX (usually) and > UINT32_MAX
>> sometimes.
>>
>> When tsc-low was new, the shift count was often larger (as much as 8),
>> and it is still changeable by a read-only tunable, but now it is 1 in
>> almost all cases.  The code only limits the timecounter frequency
>> to UINT_MAX, except the tunable defaults to 1 so average CPUs running
>> at nearly 4 GHz are usually limited to about 2 GHz.  The comment about
>> this UINT_MAX doesn't match the code.  The comment says int, but the
>> code says UINT.
>>
>> All that a shoft count of 1 does is waste time to lose 1 bit of accuracy.
>> This much accuracy is noise for most purposes.
>>
>> The tunable is fairly undocumented.  Its description is "Shift to apply
>> for the maximum TSC frequency".  Of course, it has no effect on the TSC
>> frequency.  It only affects the TSC timecounter frequency.
> I suspect that the shift of 1 (at least) hides cross-socket inaccuracy.
> Otherwise, I think, some multi-socket machines would start showing the
> detectable backward-counting bintime().  At the frequencies at 4GHz and
> above (Intel has 5Ghz part numbers) I do not think that stability of
> 100MHz crystall and on-board traces is enough to avoid that.

I think it is just a kludge that reduced the problem before it was fixed
properly using fences.

Cross-socket latency is over 100 cycles according to jhb's tscskew
benchmark: on Haswell 4x2:

CPU | TSC skew (min/avg/max/stddev)
----+------------------------------
   0 |     0     0     0    0.000
   1 |    24    49    84   14.353
   2 |   164   243   308   47.811
   3 |   164   238   312   47.242
   4 |   168   242   332   49.593
   5 |   168   243   324   48.722
   6 |   172   242   320   52.596
   7 |   172   240   316   53.014

freefall is similar.  Latency is apparently measured relative to CPU 0.
It is much lower to CPU 1 since that is on the same core.

I played with this program a lot 3 and a half years ago, but forgot
mist of what I learned :-(.  I tried different fencing in it.  This
seems to make little difference when the program is rerun.  With the
default TESTS = 1024, the min skew sometimes goes negative on freefall,
but with TESTS = 1024000 that doesn't happen.  This is the opposite
of what I would expect.  freefall has load average about 1.

Removing the only fencing in it reduces average latency by 10-20 cycles
and minimum latency by over 100 cycles, except on freefall it is
reduced from 33 to 6.  On Haswell it is 24 with fencing and I didn't
test it with no fencing.

I think tscskew doesn't really measure tsc skew.  What it measures is
the time taken for a locking protocol, using the TSCs on different
CPUs to make the start and end timestamps.  If the TSCs have a lot of
skew or jitter, then this will show up indirectly as inconsistent and
possibly negative differences.

A shift of just 1 can't hide latencies of hundreds of cycles on single-
socket machines.  Even a shift of 8 only works sometimes, by reducing
the chance of observing the TSC going backwards by a factor of 256.
E.g., assume for simplicity that all instructions and IPCs take 0-1
cycles, and that unfenced rdtsc's differ by at most +-5 cycles (with
the 11 values between -5 and 5 uniformly distributed.  Then with a
shift of 0 and no fences, a CPU that updates the timehands is ahead of
another CPU that spins reading the timehands about 5/11 of the time.
With a shift of 8, the CPUs are close enough when the first one reads
at least 5 above and at least 5 below a 256-boundary.  The chance of
seeing a negative difference is reduced by at least a factor of 10/256.

> I suspect that the shift of 1 (at least) hides cross-socket inaccuracy.
> Otherwise, I think, some multi-socket machines would start showing the
> detectable backward-counting bintime().  At the frequencies at 4GHz and
> above (Intel has 5Ghz part numbers) I do not think that stability of
> 100MHz crystall and on-board traces is enough to avoid that.

Why would losing just 1 bit fix that?

Fences for rdtsc of course only serialize it for the CPU that runs it.
The locking (ordering) protocol (for the generation count) orders the
CPUs too.  It takes longer than we would like, much more than the 1-
cycle error that might be hidden by ignoring the low bit.  Surely the
ordering protocol must work across sockets?  It then gives ordering of
rdtsc's.

TSC-low was added in 2011.  That was long before the ordering was fixed.
You added fences in 2012 and memory ordering for the generation count in
2016.  Fences slowed everything down by 10-20+ cycles and probably hide
bugs in the memory ordering better than TSC-low.  Memory ordering plus
fences slow down the cross-core case by more than 100 cycles according
to tscskew.  That is enough to hide large hardware bugs.

> We can try to set the tsc-low shift count to 0 (but keep lfence) and see
> what is going on in HEAD, but I am afraid that the HEAD users population
> is not representative enough to catch the issue with the certainity.
> More, it is unclear to me how to diagnose the cause, e.g. I would expect
> the sleeps to hang on timeouts, as was reported from the very beginning
> of this thread. How would we root-cause it ?

Negative time differences cause lots of overflows so break the timecounter.
The fix under discussion actually gives larger overflows in the positive
direction.  E.g., a delta of -1 first overflows to 0xffffffff.  The fix
prevents overflow on multiplication by that.  When the timecounter
frequency is small, say 1 MHz, 0xffffffff means 4294 seconds, so the
timecounter advances by that.

>>> amd64 cannot either, but amd64 does not need to mask out top bits in %rax,
>>> since the whole shrdl calculation occurs in 32bit registers, and the result
>>> is in %rax where top word is cleared by shrdl instruction automatically.
>>> But the clearing is not required since result is unsigned int anyway.
>>>
>>> Dissassemble of tsc_get_timecount_low() is very clear:
>>>   0xffffffff806767e4 <+4>:     mov    0x30(%rdi),%ecx
>>>   0xffffffff806767e7 <+7>:     rdtsc
>>>   0xffffffff806767e9 <+9>:     shrd   %cl,%edx,%eax
>>> ...
>>>   0xffffffff806767ed <+13>:    retq
>>> (I removed frame manipulations).

I checked that all compilers still produce horrible code for the better
source code 'return (rdtsc() << (intptr_t)tc->tc_priv);'.  64-bit shifts
are apparently pessimal for compatibility.  The above is written mostly
in asm to avoid 2-5 extra instructions.

>>>> ...
>>>> Similarly in bintime().
>>> I merged two functions, finally.  Having to copy the same code is too
>>> annoying for this change.

I strongly disklike the merge.

>>> So I verified that:
>>> - there is no 64bit multiplication in the generated code, for i386 both
>>>  for clang 7.0 and gcc 8.3;
>>> - that everything is inlined, the only call from bintime/binuptime is
>>>  the indirect call to get the timecounter value.
>>
>> I will have to fix it for compilers that I use.
> Ok, I will add __inline.

That will make it fast enough, but still hard to read.

>>> +		*bt = *bts;
>>> +		scale = th->th_scale;
>>> +		delta = tc_delta(th);
>>> +#ifdef _LP64
>>> +		if (__predict_false(th->th_large_delta <= delta)) {
>>> +			/* Avoid overflow for scale * delta. */
>>> +			bintime_helper(bt, scale, delta);
>>> +			bintime_addx(bt, (scale & 0xffffffff) * delta);
>>> +		} else {
>>> +			bintime_addx(bt, scale * delta);
>>> +		}
>>> +#else
>>> +		/*
>>> +		 * Use bintime_helper() unconditionally, since the fast
>>> +		 * path in the above method is not so fast here, since
>>> +		 * the 64 x 32 -> 64 bit multiplication is usually not
>>> +		 * available in hardware and emulating it using 2
>>> +		 * 32 x 32 -> 64 bit multiplications uses code much
>>> +		 * like that in bintime_helper().
>>> +		 */
>>> +		bintime_helper(bt, scale, delta);
>>> +		bintime_addx(bt, (uint64_t)(uint32_t)scale * delta);
>>> +#endif
>>
>> Check that this method is really better.  Without this, the complicated
>> part is about half as large and duplicating it is smaller than this
>> version.
> Better in what sence ?  I am fine with the C code, and asm code looks
> good.

Better in terms of actually running significantly faster.  I fear the
32-bit method is actually slightly slower for the fast path.

>>> -	do {
>>> -		th = timehands;
>>> -		gen = atomic_load_acq_int(&th->th_generation);
>>> -		*bt = th->th_bintime;
>>> -		bintime_addx(bt, th->th_scale * tc_delta(th));
>>> -		atomic_thread_fence_acq();
>>> -	} while (gen == 0 || gen != th->th_generation);
>>
>> Duplicating this loop is much better than obfuscating it using inline
>> functions.  This loop was almost duplicated (except for the delta
>> calculation) in no less than 17 functions in kern_tc.c (9 tc ones and
>> 8 fflock ones).  Now it is only duplicated 16 times.
> How did you counted the 16 ?  I can see only 4 instances in the unpatched
> kern_tc.c, and 3 in patched, but it is 3 and not 1 only because I do not
> touch ffclock until the patch is finalized.  After that, it would be
> 1 instance for kernel and 1 for userspace.

Grep for the end condition in this loop.  There are actually 20 of these.
I'm counting the loops and not the previously-simple scaling operation in
it.  The scaling is indeed only done for 4 cases.  I prefer the 20 
duplications (except I only want about 6 of the functions).  Duplication
works even better for only 4 cases.

This should be written as a function call to 1 new function to replace
the line with the overflowing multiplication.  The line is always the
same, so the new function call can look like bintime_xxx(bt, th).

Bruce

From owner-freebsd-hackers@freebsd.org  Mon Mar  4 19:25:41 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8EB01151FFD0
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Mon,  4 Mar 2019 19:25:41 +0000 (UTC) (envelope-from ap00@mail.ru)
Received: from smtp14.mail.ru (smtp14.mail.ru [94.100.181.95])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 328F483649
 for <freebsd-hackers@freebsd.org>; Mon,  4 Mar 2019 19:25:39 +0000 (UTC)
 (envelope-from ap00@mail.ru)
Received: by smtp14.mail.ru with esmtpa (envelope-from <ap00@mail.ru>)
 id 1h0tDO-00032k-0g; Mon, 04 Mar 2019 22:25:30 +0300
Date: Mon, 4 Mar 2019 22:25:26 +0300
From: Anthony Pankov <ap00@mail.ru>
X-Priority: 3 (Normal)
Message-ID: <577261663.20190304222526@mail.ru>
To: Shawn Webb <shawn.webb@hardenedbsd.org>
CC: Anthony Pankov via freebsd-hackers <freebsd-hackers@freebsd.org>
Subject: Re: building with WITHOUT_SSP side effect
In-Reply-To: <20190304180533.rkpfkg5qxmhifeiy@mutt-hbsd>
References: <434119194.20190304190732@mail.ru>
 <1122478880.20190304195602@mail.ru> 
 <20190304171351.GQ68879@kib.kiev.ua> <1032136115.20190304203133@mail.ru>
 <20190304173937.GR68879@kib.kiev.ua> <1178496353.20190304205634@mail.ru>
 <20190304180533.rkpfkg5qxmhifeiy@mutt-hbsd>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-77F55803: 2D1AD755E866B1545A78504BD2AC294173B5FE5E8078F296BC8A6DABA11CA760A72207F55358C5EBE5C0825993A36E7A
X-7FA49CB5: 0D63561A33F958A5BEFCD66EC12C75CE0B53608618351A6D981F630370E5D2DF8941B15DA834481FA18204E546F3947CD2DCF9CF1F528DBCF6B57BC7E64490618DEB871D839B7333395957E7521B51C2545D4CF71C94A83E9FA2833FD35BB23D27C277FBC8AE2E8B974A882099E279BDA471835C12D1D977C4224003CC8364767815B9869FA544D8D32BA5DBAC0009BE9E8FC8737B5C2249D99FB7B2A39B49613AA81AA40904B5D9CF19DD082D7633A0E7DDDDC251EA7DABD81D268191BDAD3D78DA827A17800CE7FBC5FED0552DA851CD04E86FAF290E2D40A5AABA2AD3711975ECD9A6C639B01B78DA827A17800CE7ED9A86E2EB61E0EA46C550781D382B8C75ECD9A6C639B01B4E70A05D1297E1BBC6867C52282FAC85D9B7C4F32B44FF57285124B2A10EEC6C00306258E7E6ABB4E4A6367B16DE6309
X-Mailru-Sender: D8D48EF70163D79D00784CDFC8FD3107F5F70E5BCFE1B6DD4883F302D92DCF67E9E5CDC777A08C4150D5CF8590B94F4EC77752E0C033A69E81198BD1A48777B793AC9912533B2342AE208404248635DF
X-Mras: OK
X-Rspamd-Queue-Id: 328F483649
X-Spamd-Bar: ---
X-Spamd-Result: default: False [-3.76 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[];
 R_SPF_ALLOW(-0.20)[+ip4:94.100.176.0/20];
 FREEMAIL_FROM(0.00)[mail.ru]; TO_DN_ALL(0.00)[];
 MX_GOOD(-0.01)[cached: mxs.mail.ru];
 DKIM_TRACE(0.00)[mail.ru:+]; HAS_X_PRIO_THREE(0.00)[3];
 NEURAL_HAM_SHORT(-0.68)[-0.677,0]; RCPT_COUNT_TWO(0.00)[2];
 DMARC_POLICY_ALLOW(-0.50)[mail.ru,reject];
 RCVD_IN_DNSWL_LOW(-0.10)[95.181.100.94.list.dnswl.org : 127.0.5.1];
 FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+];
 FREEMAIL_ENVFROM(0.00)[mail.ru];
 ASN(0.00)[asn:47764, ipnet:94.100.176.0/20, country:RU];
 MID_RHS_MATCH_FROM(0.00)[];
 DWL_DNSWL_NONE(0.00)[mail.ru.dwl.dnswl.org : 127.0.5.0];
 ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-0.997,0];
 R_DKIM_ALLOW(-0.20)[mail.ru:s=mail2]; FROM_HAS_DN(0.00)[];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain];
 RCVD_TLS_LAST(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[];
 IP_SCORE(0.03)[ipnet: 94.100.176.0/20(0.08), asn: 47764(0.05), country:
 RU(0.00)]; RCVD_COUNT_TWO(0.00)[2]
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Mar 2019 19:25:41 -0000

In  my  case  no  applications  from  the  base  "world" listen to the
internet (no open ports from syslogd, bind, sendmail, etc). Also there
is no public login to servers.

So  I  see  SSP  as  waste of billions and billions instruction. The
probability  of  joint  events: the known user become an evil hacker
AND  the  weakest point is the buffer overflow in systems base world -
is  near  zero.  At  least  because weakest point can be obtained more
easily from misconfiguration, additional packages etc.

The   idea   was   to  throw  out  SSP  from kernel and base world but
fortify  sshd,  postfix etc. But things went not as smooth as desired.

> I'm curious about your use case for building without stack cookies.

> Thanks,

-- 
Best regards,
 Anthony Pankov                          mailto:ap00@mail.ru


From owner-freebsd-hackers@freebsd.org  Mon Mar  4 20:50:22 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4B651152274E
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Mon,  4 Mar 2019 20:50:22 +0000 (UTC) (envelope-from sjg@juniper.net)
Received: from mx0a-00273201.pphosted.com (mx0a-00273201.pphosted.com
 [208.84.65.16])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "*.pphosted.com", Issuer "Thawte RSA CA 2018" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 71BA287375
 for <freebsd-hackers@freebsd.org>; Mon,  4 Mar 2019 20:50:20 +0000 (UTC)
 (envelope-from sjg@juniper.net)
Received: from pps.filterd (m0108159.ppops.net [127.0.0.1])
 by mx0a-00273201.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id
 x24KhqeF007164; Mon, 4 Mar 2019 12:50:19 -0800
Received: from nam01-bn3-obe.outbound.protection.outlook.com
 (mail-bn3nam01lp2056.outbound.protection.outlook.com [104.47.33.56])
 by mx0a-00273201.pphosted.com with ESMTP id 2r167s8g20-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT);
 Mon, 04 Mar 2019 12:50:18 -0800
Received: from SN4PR0501CA0046.namprd05.prod.outlook.com
 (2603:10b6:803:41::23) by CY4PR05MB3079.namprd05.prod.outlook.com
 (2603:10b6:903:fd::15) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1686.6; Mon, 4 Mar
 2019 20:50:16 +0000
Received: from BY2NAM05FT008.eop-nam05.prod.protection.outlook.com
 (2a01:111:f400:7e52::204) by SN4PR0501CA0046.outlook.office365.com
 (2603:10b6:803:41::23) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.1686.6 via Frontend
 Transport; Mon, 4 Mar 2019 20:50:16 +0000
Received-SPF: SoftFail (protection.outlook.com: domain of transitioning
 juniper.net discourages use of 66.129.239.13 as permitted sender)
Received: from P-EXFEND-EQX-02.jnpr.net (66.129.239.13) by
 BY2NAM05FT008.mail.protection.outlook.com (10.152.100.145) with Microsoft
 SMTP Server (version=TLS1_0, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA) id
 15.20.1686.5 via Frontend Transport; Mon, 4 Mar 2019 20:50:15 +0000
Received: from P-EXBEND-EQX-02.jnpr.net (10.104.8.53) by
 P-EXFEND-EQX-02.jnpr.net (10.104.8.55) with Microsoft SMTP Server (TLS) id
 15.0.847.32; Mon, 4 Mar 2019 12:50:13 -0800
Received: from p-mailhub01.juniper.net (10.104.20.6) by
 P-EXBEND-EQX-02.jnpr.net (10.104.8.53) with Microsoft SMTP Server (TLS) id
 15.0.1367.3 via Frontend Transport; Mon, 4 Mar 2019 12:50:13 -0800
Received: from kaos.jnpr.net (kaos.jnpr.net [172.23.50.162])	by
 p-mailhub01.juniper.net (8.14.4/8.11.3) with ESMTP id x24KoBsU011256;	Mon, 4
 Mar 2019 12:50:11 -0800	(envelope-from sjg@juniper.net)
Received: by kaos.jnpr.net (Postfix, from userid 1377)	id C58D4737A5; Mon,  4
 Mar 2019 12:50:11 -0800 (PST)
Received: from kaos.jnpr.net (localhost [127.0.0.1])	by kaos.jnpr.net
 (Postfix) with ESMTP id C50AB737A4;	Mon,  4 Mar 2019 12:50:11 -0800 (PST)
To: Shawn Webb <shawn.webb@hardenedbsd.org>
CC: Anthony Pankov <ap00@mail.ru>, Konstantin Belousov <kostikbel@gmail.com>, 
 "Anthony Pankov via  freebsd-hackers" <freebsd-hackers@freebsd.org>,
 <sjg@juniper.net>
Subject: Re: building with WITHOUT_SSP side effect
In-Reply-To: <20190304180533.rkpfkg5qxmhifeiy@mutt-hbsd>
References: <434119194.20190304190732@mail.ru>
 <1122478880.20190304195602@mail.ru> <20190304171351.GQ68879@kib.kiev.ua>
 <1032136115.20190304203133@mail.ru> <20190304173937.GR68879@kib.kiev.ua>
 <1178496353.20190304205634@mail.ru>
 <20190304180533.rkpfkg5qxmhifeiy@mutt-hbsd>
Comments: In-reply-to: Shawn Webb <shawn.webb@hardenedbsd.org>
 message dated "Mon, 04 Mar 2019 13:05:33 -0500."
From: "Simon J. Gerraty" <sjg@juniper.net>
X-Mailer: MH-E 8.6+git; nmh 1.7.1; GNU Emacs 26.1
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <21488.1551732611.1@kaos.jnpr.net>
Date: Mon, 4 Mar 2019 12:50:11 -0800
Message-ID: <23396.1551732611@kaos.jnpr.net>
X-EXCLAIMER-MD-CONFIG: e3cb0ff2-54e7-4646-8a04-0dae4ac7b136
X-EOPAttributedMessage: 0
X-MS-Office365-Filtering-HT: Tenant
X-Forefront-Antispam-Report: CIP:66.129.239.13; IPV:NLI; CTRY:US; EFV:NLI;
 SFV:NSPM;
 SFS:(10019020)(39860400002)(136003)(346002)(376002)(396003)(2980300002)(189003)(199004)(478600001)(16586007)(54906003)(117636001)(86362001)(97876018)(9686003)(47776003)(126002)(76176011)(558084003)(305945005)(4326008)(7696005)(55016002)(356004)(5660300002)(97756001)(90966002)(2906002)(23726003)(53936002)(69596002)(229853002)(68736007)(53416004)(97736004)(336012)(316002)(76506005)(26005)(81156014)(81166006)(8676002)(77096007)(93886005)(186003)(7126003)(6266002)(50226002)(446003)(11346002)(107886003)(50466002)(486006)(476003)(106466001)(105596002)(6246003)(8936002)(46406003)(6916009);
 DIR:OUT; SFP:1102; SCL:1; SRVR:CY4PR05MB3079; H:P-EXFEND-EQX-02.jnpr.net; FPR:;
 SPF:SoftFail; LANG:en; PTR:InfoDomainNonexistent; MX:1; A:1; 
X-MS-PublicTrafficType: Email
X-MS-Office365-Filtering-Correlation-Id: 32768963-7a30-4ccd-e7bd-08d6a0e30146
X-Microsoft-Antispam: BCL:0; PCL:0;
 RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600127)(711020)(4605104)(4710095)(4711035)(2017052603328)(7153060);
 SRVR:CY4PR05MB3079; 
X-MS-TrafficTypeDiagnostic: CY4PR05MB3079:
X-Microsoft-Exchange-Diagnostics: 1; CY4PR05MB3079;
 20:0oKfRvIfpf5RrvcjHGPwqog2yU3PuT9HZHakHbd/13JSPcL/m1YE1Hp0zV61BhOI36koJA2zkez0ZGfVrxyvoOJFEf9RrfXcoeTS86EESg3kZafPrFjttfac5iApZsNJa6wgSnfsJEkl1aQM4I8kU6GSMSr4pGVYGUPWJoBU/QNe9WgnoOcLDGmnC4SBEniLujqnKDIikv8p+wqInaqSBwvq1ZCX9ON2dPKBJRzjl+BrpXc+fHiti2YXi2i5451/qPF32nNmk+1/sp+F6b/JTGI9j047bfPCl5UpwCF/pkMb5s9UvzagWbyttvggMbQvVAvVa6ugq2RhhYe+N+qpqZMhSWCFcLlNQI1Sp8r0IZNAITmMf4225PvYv5kVErcWZCWAWuHJID6w+1lS4H+gopX+3GgILWhZ+++G1OFBCCbnAhb4q/Ukgm2NcnkezCy2dp1dZz2eR1GbJR5NUYbyllKkYOTXZCy0seuaLK2pMWcH/IOLk8UJonqXG0zYG9yo
X-Microsoft-Antispam-PRVS: <CY4PR05MB30795639A4996756ADD88DB7AA710@CY4PR05MB3079.namprd05.prod.outlook.com>
X-Forefront-PRVS: 09669DB681
X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; CY4PR05MB3079;
 23:MhBfLVydSxAC4LohaH7dTFGNh9t6JqYW0h90o8Qmd?=
 =?us-ascii?Q?RAEWDhRUXl4TlYqT78lLK8cCiLFZLAT2v2vWJremnLk8yMP/iCksDVxfItcw?=
 =?us-ascii?Q?QfyJaxa9AvdR199/3DBxt7+sAaTIG8ps8ssAe/2AfKRfbT3722A2Wsj7+tpZ?=
 =?us-ascii?Q?7mXc0OfTdWGNPzrQ4bO1JJ6zYeSkP0wabTC0zavwZMYVkOMjL0w7Fe5fhXov?=
 =?us-ascii?Q?dAw7d+kUVdzkBE2yHSXUM5Ru6sbEJR3xgV7yxTTBEhVoIefF6lm9PxLnfN11?=
 =?us-ascii?Q?/+h1Jtrdi3OjgXOj0eYE6FZF8kpT3Fs9Gy7+MqaD7VvQciTayo28KecfPSBR?=
 =?us-ascii?Q?UXzbYFSkZDwHLBs0SeiO3cgiu9b+I4JKttk5RabcIK9QxdutyijL+CLLxyTh?=
 =?us-ascii?Q?IM400+Cb1z5ev5VX2iedPE+yrF4AZNd162nJIJmueNpb2y0yJx5OWopTB+Bq?=
 =?us-ascii?Q?4ypWbN31dBV648R59OUgPWuP6OM+g6aufE511L2o8zPciEB3b4WrX6JfNRDw?=
 =?us-ascii?Q?JU5zTELnGMpwNnjVGVPUEYAZ0RC8bphR8BO0/rieWZBZaVO7OGKassw5fh9h?=
 =?us-ascii?Q?JU9BX5tYqyFa2bHtDdEU7J7fm18/1+klIjSHbntr43UfQxL5EF9HacO1ffKJ?=
 =?us-ascii?Q?dIHXLQsAnumrqKBHA2hJq+gG02Y/OJRLxiHig78dDpgC6z5U2vKN/gfD3OIp?=
 =?us-ascii?Q?lmPcePwh1bScIoWsJB04lUhIP+dA/ek4Lgtd2FUhdZShlP4YKPzIQazBI48v?=
 =?us-ascii?Q?QcmjivpQB1ltXPAlarB0Hm5LBYpyS6j8zHj+2cViZHORd1SX7tCVKP91pfNv?=
 =?us-ascii?Q?SrEZITr9fNTrmITINCDuS7WqeDC2+jGYjSERo1vzsM3CLYIPgqQ+Zx5QQIBo?=
 =?us-ascii?Q?J/WQlHAAbepPd+AQJCyLDPXqqRragzuWQJXde4BVtZR52/R0Mn6IA8nUjyJR?=
 =?us-ascii?Q?lCAkjtdc+BhIB3ncqdb1rXMpF4OtIl+fp4Qa8nNT7N3omjGvYHSw7smTaPaK?=
 =?us-ascii?Q?s0ux8fnuEaX4xgOYmwA+o4HmOgIUMgB4t3PukkUQNOxSGfUgmNGY834o1D0q?=
 =?us-ascii?Q?Lb4qzCwT2qVNbEq9IKmO4Ay5lX7Q2lgz1PJHWEMOYp01ZrLpZfEWfU62OaFj?=
 =?us-ascii?Q?bUOEKEjT1vFP0VagnD6+bKRapheJ1zfaEQ2rv641T494Sf4ju4QEUcee8f6i?=
 =?us-ascii?Q?OtM53cSKKMvmRn4JiyeMlC/5tyD990BRVBPWi268uTueEFZeqIgPuuaPS+0m?=
 =?us-ascii?Q?LfebnjFwkU4MJVdw7n3O36Q51RmPqcd84XqmeN75XDIFEE526vrQF1R/i0fm?=
 =?us-ascii?Q?hNibvhXgEOT4ZhLng2+/t9+qVBsuNrKirmJFd8z9JZu?=
X-MS-Exchange-SenderADCheck: 1
X-Microsoft-Antispam-Message-Info: xnSaVrRq6KSCLjc+kZRqYUnV6BAQD99H9iYtftYPAK9aE3z0BH4EdIrVj/+E6FAJduyfisNjlqLr/21h7Huz4Y+WrRR0JSgYrlvrDtcVLj11gIXGHJiXfoEAhUBSwWErIP02mkrKn4VJvQaJ1eCyeJFjOPW9ju5VecFQMDx79SXX/hIM50CbNyLfaiQRLVCSFxL0tNM/6Gc8xGbqQYKGQYhSRPeyOKe9tXJCTS/qCewk1hcURXK64L9PiTtRzqtEv5wgLJZBbvogyqGsKg3nsn6fkePCxgyGhi262Skk7Z87KSc761YhE11lrbKCYRyr5XF1ErXE0SDyg2prx+XhvI86Aql3t+JphVoIASo94JEOAlDmwEcX7z4qMAGXHefUaC2mljN9yHopvAGiK7qCv9oaLjfojgf+M30TtzpJg7s=
X-OriginatorOrg: juniper.net
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 04 Mar 2019 20:50:15.8668 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 32768963-7a30-4ccd-e7bd-08d6a0e30146
X-MS-Exchange-CrossTenant-Id: bea78b3c-4cdb-4130-854a-1d193232e5f4
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=bea78b3c-4cdb-4130-854a-1d193232e5f4; Ip=[66.129.239.13];
 Helo=[P-EXFEND-EQX-02.jnpr.net]
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR05MB3079
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, ,
 definitions=2019-03-04_11:, , signatures=0
X-Proofpoint-Spam-Details: rule=outbound_spam_notspam policy=outbound_spam
 score=0 priorityscore=1501
 malwarescore=0 suspectscore=18 phishscore=0 bulkscore=0 spamscore=0
 clxscore=1011 lowpriorityscore=0 mlxscore=0 impostorscore=0
 mlxlogscore=455 adultscore=0 classifier=spam adjust=0 reason=mlx
 scancount=1 engine=8.0.1-1810050000 definitions=main-1903040147
X-Rspamd-Queue-Id: 71BA287375
X-Spamd-Bar: ---
X-Spamd-Result: default: False [-3.30 / 15.00]; ARC_NA(0.00)[];
 NEURAL_HAM_MEDIUM(-1.00)[-0.998,0];
 R_DKIM_ALLOW(-0.20)[juniper.net:s=PPS1017];
 FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[];
 R_SPF_ALLOW(-0.20)[+ip4:208.84.65.16];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain];
 RCVD_TLS_LAST(0.00)[]; RCPT_COUNT_FIVE(0.00)[5];
 TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[juniper.net:+];
 DMARC_POLICY_ALLOW(-0.50)[juniper.net,quarantine];
 MX_GOOD(-0.01)[mxb-00273201.gslb.pphosted.com,mxa-00273201.gslb.pphosted.com]; 
 IP_SCORE(-0.06)[ip: (-0.15), ipnet: 208.84.65.0/24(-0.07), asn: 26211(0.01),
 country: US(-0.07)]; NEURAL_HAM_SHORT(-0.13)[-0.134,0];
 RCVD_IN_DNSWL_LOW(-0.10)[16.65.84.208.list.dnswl.org : 127.0.3.1];
 FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+];
 ASN(0.00)[asn:26211, ipnet:208.84.65.0/24, country:US];
 FREEMAIL_CC(0.00)[mail.ru]; RCVD_COUNT_SEVEN(0.00)[11]
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Mar 2019 20:50:22 -0000

Shawn Webb <shawn.webb@hardenedbsd.org> wrote:

> I'm curious about your use case for building without stack cookies.

GPL ?

From owner-freebsd-hackers@freebsd.org  Mon Mar  4 20:58:26 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6447E1522B61
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Mon,  4 Mar 2019 20:58:26 +0000 (UTC)
 (envelope-from marklmi@yahoo.com)
Received: from sonic317-34.consmr.mail.ne1.yahoo.com
 (sonic317-34.consmr.mail.ne1.yahoo.com [66.163.184.45])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 236EB879B3
 for <freebsd-hackers@freebsd.org>; Mon,  4 Mar 2019 20:58:25 +0000 (UTC)
 (envelope-from marklmi@yahoo.com)
X-YMail-OSG: Av7ZzPMVM1l7UlZBbiUW8idImHi0nHz4ITc3Jyo21R27rl_HGqXYs41UrIuhVHR
 BH0YrPEQ9Fm0UeS6UuNHdJ9Enbwsx0MyVCLaZVgg10n5hzowe3_n3FiKUbWi30GFxlIWSKPOufWa
 mzYT6yxrjrYX3C9HQZYWM6M.87.FH1lGBa8CkzSIj0tCwJB4UUcwg9OWmwT6aInR7bK_.qMYiQHl
 vZL5v6PAzqhAqvTRQcTrUh4lHaVYj2.wbhjMSG9Cgz7xRHTEzK1ERYYOZaAkCrAlXQNj1wC7iVvh
 PUJDJ_2iurCK0M_HL_LcDOEf5P3pwH9HhBHq8wIp4Tx1VrhHmtgRM0F61A0NdPP.7nBw1hg3HKrj
 RggP9i6Tjb.NvwL1o_lmo8pqT3AoNkomzm66.H1HqzyHh7BeXG8RN7miZOY6_nhzZ4geXrU6l4tI
 3_ypiTtgepCy1ogdW.BqjFwG7Ds0o1OZ5ESDA0cAkkPgjA.i_6PLz_pfcSmFvB42iA4mK_ORiry5
 A82qjPSCJWJmlFH2BV2FOiFbhMSSgaB85Sq2XoqAYnQwba0aVZRL_vo0xAU_GKqrgDRC8HEcZaeY
 QSXAPNKSjXdv8UzKdFi37fVCwPn8GpYXkyZA7tpiCsTO1oN6LgSlC3MYxz73qCkqaaWSLHoBLFqv
 vsHoau68oZDuuN8_preWIrBlzxOEMqTfJ3AzaItnQubExOZHzVcvO_ZBxZIj.dBKzuSO_JAGR87V
 Rtma0p_xcGI9pUKysOMuWUUdqaLReL_YOBCgaMSpFVupNAc3SKRdbkU2jb2StJ_vnyGPhUwNajjW
 aqKitxLmSrKAGNyRH0Hz.eKfhw1YiwlVgq37FZE3VPgKSB6KBRfQbsj7MMgZWvthI.OOX7DvoUC8
 yQfCKPczTar5VcW0S1BeZw_Vn6nFGx.Fj9w8Nb1IgGckb1GBH64.2tcLxo3bFQJsOdJdZIAkhwgd
 s8g7C4dbBozuTbGFs8w_7a.W7IKHEMrcobqYm1O5pdyTI7TRDJAksNSnEDtD3Dhl3Qme6dVg3tdZ
 igPjHHwaTKd853_QYquCMKOfTPILL8VEgxMpgRJA3Vv90BpZUvARBYhpK8ETy6ja0Els-
Received: from sonic.gate.mail.ne1.yahoo.com by
 sonic317.consmr.mail.ne1.yahoo.com with HTTP; Mon, 4 Mar 2019 20:58:23 +0000
Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.113])
 ([67.170.167.181])
 by smtp417.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID
 d6e36c6b74fad01663fd179bddcbc796; 
 Mon, 04 Mar 2019 20:58:16 +0000 (UTC)
Content-Type: text/plain;
	charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\))
Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale *
 tc_delta(th) overflows unsigned 64 bits sometimes [patched failed]
From: Mark Millard <marklmi@yahoo.com>
In-Reply-To: <20190305031010.I4610@besplex.bde.org>
Date: Mon, 4 Mar 2019 12:58:14 -0800
Cc: Konstantin Belousov <kostikbel@gmail.com>,
 freebsd-hackers Hackers <freebsd-hackers@freebsd.org>,
 FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Content-Transfer-Encoding: 7bit
Message-Id: <E473E08C-5EAD-4414-96B3-2EDF7B671974@yahoo.com>
References: <20190301194217.GB68879@kib.kiev.ua>
 <20190302071425.G5025@besplex.bde.org> <20190302105140.GC68879@kib.kiev.ua>
 <20190302225513.W3408@besplex.bde.org> <20190302142521.GE68879@kib.kiev.ua>
 <20190303041441.V4781@besplex.bde.org> <20190303111931.GI68879@kib.kiev.ua>
 <20190303223100.B3572@besplex.bde.org> <20190303161635.GJ68879@kib.kiev.ua>
 <20190304043416.V5640@besplex.bde.org> <20190304114150.GM68879@kib.kiev.ua>
 <20190305031010.I4610@besplex.bde.org>
To: Bruce Evans <brde@optusnet.com.au>
X-Mailer: Apple Mail (2.3445.102.3)
X-Rspamd-Queue-Id: 236EB879B3
X-Spamd-Bar: ++
X-Spamd-Result: default: False [2.06 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[];
 R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[];
 FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3];
 TO_DN_ALL(0.00)[];
 MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net];
 DKIM_TRACE(0.00)[yahoo.com:+];
 DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject];
 FREEMAIL_TO(0.00)[optusnet.com.au]; FROM_EQ_ENVFROM(0.00)[];
 MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[];
 FREEMAIL_ENVFROM(0.00)[yahoo.com];
 ASN(0.00)[asn:36646, ipnet:66.163.184.0/21, country:US];
 MID_RHS_MATCH_FROM(0.00)[];
 DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0];
 ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048];
 FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4];
 NEURAL_SPAM_SHORT(0.62)[0.622,0]; MIME_GOOD(-0.10)[text/plain];
 IP_SCORE(1.22)[ip: (3.86), ipnet: 66.163.184.0/21(1.29), asn: 36646(1.03),
 country: US(-0.07)]; NEURAL_SPAM_MEDIUM(0.13)[0.130,0];
 TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(0.60)[0.598,0];
 RCVD_IN_DNSWL_NONE(0.00)[45.184.163.66.list.dnswl.org : 127.0.5.0];
 RWL_MAILSPIKE_POSSIBLE(0.00)[45.184.163.66.rep.mailspike.net : 127.0.0.17];
 FREEMAIL_CC(0.00)[gmail.com]
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Mar 2019 20:58:26 -0000


On 2019-Mar-4, at 10:17, Bruce Evans <brde at optusnet.com.au> wrote:

>> . . .
> 
> I think it is just a kludge that reduced the problem before it was fixed
> properly using fences.
> 
> Cross-socket latency is over 100 cycles according to jhb's tscskew
> benchmark: on Haswell 4x2:
> 
> CPU | TSC skew (min/avg/max/stddev)
> ----+------------------------------
>  0 |     0     0     0    0.000
>  1 |    24    49    84   14.353
>  2 |   164   243   308   47.811
>  3 |   164   238   312   47.242
>  4 |   168   242   332   49.593
>  5 |   168   243   324   48.722
>  6 |   172   242   320   52.596
>  7 |   172   240   316   53.014
> 
> freefall is similar.  Latency is apparently measured relative to CPU 0.
> It is much lower to CPU 1 since that is on the same core.
> 

You may want to look at:

https://lists.freebsd.org/pipermail/freebsd-hackers/2019-March/054218.html

for cruder, but somewhat related, information for
the old Powermac G5 2-socket with 2 cores each, given
how FreeBSD tries  to synchronize the tbr's across
cores as it starts up the CPUs.

It may give some idea of a ball-park scale involved for
such context, especially the reports of what happened
for varying one figure in the source code.

As stands, I've only done the experiments with a debug
kernel build.

I built using devel/powerpc64-xtoolchain-gcc related
infrastructure, not gcc 4.2.1 . (This is typical for me.)


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)


From owner-freebsd-hackers@freebsd.org  Tue Mar  5 11:11:13 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id D14C615267E3
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Tue,  5 Mar 2019 11:11:13 +0000 (UTC)
 (envelope-from shreyankfbsd@gmail.com)
Received: from mail-yw1-xc2b.google.com (mail-yw1-xc2b.google.com
 [IPv6:2607:f8b0:4864:20::c2b])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id BE48D8D2EB
 for <freebsd-hackers@freebsd.org>; Tue,  5 Mar 2019 11:11:12 +0000 (UTC)
 (envelope-from shreyankfbsd@gmail.com)
Received: by mail-yw1-xc2b.google.com with SMTP id z191so6632624ywa.6
 for <freebsd-hackers@freebsd.org>; Tue, 05 Mar 2019 03:11:12 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:from:date:message-id:subject:to:cc;
 bh=ZJ2lKYolBUTkXCV9yzqz7A3os4X3qabD3/CzWVeXWX8=;
 b=hn+S3G75eJOWHQZdIv2XdsfL1SRHYkh4srRwzr3TnJWVK3DzU+6EKPapXwixmmuwVh
 Rv6vMGKKDX77QiqUzBDlIuUwzIXafU6wbtOaHrf9i/7GZaz7+d1+beGCgvvlVbQIr1As
 mrCP8hC///WdhQa7kLA0pNBijhLDmICOc/VMLs0G/2tVpAjV3nGcMEUSrVM4AGlirYqr
 4Gi1Mh52ocejV8j4lBIjlFc9iXurdgWRHDrUyRWZ5VaVwZ+ezDaLXXYXRg1Xgq85XFRv
 NQa3daD/PQdhN/aK2PVTjkVP1TSW3N34La6efgE+sqhPosrAuV9FTi1xxpQB/yV01rHO
 kwaA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc;
 bh=ZJ2lKYolBUTkXCV9yzqz7A3os4X3qabD3/CzWVeXWX8=;
 b=m6bNk5sNOEMz9RHr12nPY560/ULowdLn/cskWOj683razVuQVNpBPJhFinAH1gwMCl
 4pzA4n8I56/dYGY9MOPgpgE7xzxkmvqw25ql89EmJ5YcB8bG1AEl4sVd8Xtc0pzXSqNo
 4fFbad1SGFPXccdQ7cLkB9YtVyoQ18CTIaZjBH9KMBth/CHDT0NgWW5jwQ/Qso3au6oP
 FHBsoIbenG2eQiznNJYTv7dD4oQcs9O1M1DL+Oq77wPXcbQoj3BfO6RMdVeyGElHa9H2
 LA6N+fcarUkyfVp66dVLp0b/OkVRCZZB1zfbqFacvMfME7JLZihN+hZRkm0wTCY0eHXE
 Gyqg==
X-Gm-Message-State: APjAAAXXaPyTrbuSuMAeT3UAAvtC79MmBidyFf07ns/lqjlshcjJjSF3
 LiK8gemjqFaKy4ALbA8QacmebeHAJ6WJ15j8PZJmI0E=
X-Google-Smtp-Source: APXvYqzUpAsOYXeJkhqgflTjRdOPThVjIFZZmBYt1Uws4AL0oKoV+qjIbxkH0kBgZBHFrdx1a9ppKlLX9aYxfK4Jg+w=
X-Received: by 2002:a0d:e082:: with SMTP id j124mr383939ywe.33.1551784272129; 
 Tue, 05 Mar 2019 03:11:12 -0800 (PST)
MIME-Version: 1.0
From: shreyank amartya <shreyankfbsd@gmail.com>
Date: Tue, 5 Mar 2019 16:41:00 +0530
Message-ID: <CAD9jf8Bg+-kroGEiRuHBpaVbjCV-n=zQwE=UtOkcAcHG1AfDpQ@mail.gmail.com>
Subject: iflib MSI init
To: mmacy@mattmacy.io
Cc: freebsd-hackers@freebsd.org
X-Rspamd-Queue-Id: BE48D8D2EB
X-Spamd-Bar: ------
Authentication-Results: mx1.freebsd.org;
 dkim=pass header.d=gmail.com header.s=20161025 header.b=hn+S3G75;
 dmarc=pass (policy=none) header.from=gmail.com;
 spf=pass (mx1.freebsd.org: domain of shreyankfbsd@gmail.com designates
 2607:f8b0:4864:20::c2b as permitted sender)
 smtp.mailfrom=shreyankfbsd@gmail.com
X-Spamd-Result: default: False [-6.40 / 15.00]; ARC_NA(0.00)[];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; FROM_HAS_DN(0.00)[];
 R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36];
 FREEMAIL_FROM(0.00)[gmail.com];
 MIME_GOOD(-0.10)[multipart/alternative,text/plain];
 PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org];
 TO_DN_NONE(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0];
 TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+];
 MX_GOOD(-0.01)[cached: alt3.gmail-smtp-in.l.google.com];
 RCPT_COUNT_TWO(0.00)[2];
 RCVD_IN_DNSWL_NONE(0.00)[b.2.c.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.b.8.f.7.0.6.2.list.dnswl.org
 : 127.0.5.0]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none];
 RCVD_TLS_LAST(0.00)[]; NEURAL_HAM_SHORT(-0.51)[-0.515,0];
 FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+];
 FREEMAIL_ENVFROM(0.00)[gmail.com];
 ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US];
 RCVD_COUNT_TWO(0.00)[2];
 IP_SCORE(-2.88)[ip: (-9.66), ipnet: 2607:f8b0::/32(-2.66), asn: 15169(-2.00),
 country: US(-0.07)]; 
 DWL_DNSWL_NONE(0.00)[gmail.com.dwl.dnswl.org : 127.0.5.0]
Content-Type: text/plain; charset="UTF-8"
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Mar 2019 11:11:14 -0000

Hi,

I'm trying to initialize a network interface using iflib. While configuring
MSI interrupts for the device, the number of vectors returned by the
pci_msi_count is 32 (max supported) in my case due to which the condition
(vectors == 1) fails and as a result legacy mode is selected. Is this
intentional? In which case, how can I make sure number of MSI vectors is 1?

/sys/net/iflib.c

   6126 msi:
   6127         vectors = pci_msi_count(dev);
   6128         scctx->isc_nrxqsets = 1;
   6129         scctx->isc_ntxqsets = 1;
   6130         scctx->isc_vectors = vectors;
   6131         if (vectors == 1 && pci_alloc_msi(dev, &vectors) == 0) {
   6132                 device_printf(dev,"Using an MSI interrupt\n");
   6133                 scctx->isc_intr = IFLIB_INTR_MSI;
   6134         } else {
   6135                 scctx->isc_vectors = 1;
   6136                 device_printf(dev,"Using a Legacy interrupt\n");
   6137                 scctx->isc_intr = IFLIB_INTR_LEGACY;
   6138         }
   6139
   6140         return (vectors);


Thanks
Shreyank Amartya

From owner-freebsd-hackers@freebsd.org  Tue Mar  5 13:19:45 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1BAEF1529D22;
 Tue,  5 Mar 2019 13:19:45 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au
 [211.29.132.246])
 by mx1.freebsd.org (Postfix) with ESMTP id 3E5966A676;
 Tue,  5 Mar 2019 13:19:42 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au
 [110.21.101.228])
 by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 9CDA243BF06;
 Wed,  6 Mar 2019 00:19:39 +1100 (AEDT)
Date: Wed, 6 Mar 2019 00:19:38 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Bruce Evans <brde@optusnet.com.au>
cc: Konstantin Belousov <kostikbel@gmail.com>, 
 Mark Millard <marklmi@yahoo.com>, 
 freebsd-hackers Hackers <freebsd-hackers@freebsd.org>, 
 FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject: TSC "skew" (was: Re: powerpc64 head -r344018 stuck sleeping problems:
 th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes
 [patched failed])
In-Reply-To: <20190305031010.I4610@besplex.bde.org>
Message-ID: <20190305223415.U1563@besplex.bde.org>
References: <20190301194217.GB68879@kib.kiev.ua>
 <20190302071425.G5025@besplex.bde.org>
 <20190302105140.GC68879@kib.kiev.ua> <20190302225513.W3408@besplex.bde.org>
 <20190302142521.GE68879@kib.kiev.ua> <20190303041441.V4781@besplex.bde.org>
 <20190303111931.GI68879@kib.kiev.ua> <20190303223100.B3572@besplex.bde.org>
 <20190303161635.GJ68879@kib.kiev.ua> <20190304043416.V5640@besplex.bde.org>
 <20190304114150.GM68879@kib.kiev.ua> <20190305031010.I4610@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=UJetJGXy c=1 sm=1 tr=0
 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17
 a=kj9zAlcOel0A:10 a=aZ2SpzNVlL9aNEeq27IA:9 a=CjuIK1q_8ugA:10
X-Rspamd-Queue-Id: 3E5966A676
X-Spamd-Bar: ------
Authentication-Results: mx1.freebsd.org;
 spf=pass (mx1.freebsd.org: domain of brde@optusnet.com.au designates
 211.29.132.246 as permitted sender) smtp.mailfrom=brde@optusnet.com.au
X-Spamd-Result: default: False [-6.13 / 15.00]; ARC_NA(0.00)[];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 RCVD_IN_DNSWL_LOW(-0.10)[246.132.29.211.list.dnswl.org : 127.0.5.1];
 FROM_HAS_DN(0.00)[]; FREEMAIL_FROM(0.00)[optusnet.com.au];
 R_SPF_ALLOW(-0.20)[+ip4:211.29.132.0/23];
 MIME_GOOD(-0.10)[text/plain]; MIME_TRACE(0.00)[0:+];
 DMARC_NA(0.00)[optusnet.com.au]; RCPT_COUNT_FIVE(0.00)[5];
 NEURAL_HAM_LONG(-1.00)[-1.000,0];
 TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[];
 MX_GOOD(-0.01)[cached: extmail.optusnet.com.au];
 NEURAL_HAM_SHORT(-0.74)[-0.739,0];
 IP_SCORE(-3.08)[ip: (-8.06), ipnet: 211.28.0.0/14(-4.06), asn: 4804(-3.24),
 country: AU(-0.04)]; FREEMAIL_TO(0.00)[optusnet.com.au];
 RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[];
 R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[optusnet.com.au];
 ASN(0.00)[asn:4804, ipnet:211.28.0.0/14, country:AU];
 FREEMAIL_CC(0.00)[gmail.com]; RCVD_COUNT_TWO(0.00)[2]
X-Mailman-Approved-At: Tue, 05 Mar 2019 13:36:43 +0000
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Mar 2019 13:19:45 -0000

On Tue, 5 Mar 2019, Bruce Evans wrote:

> On Mon, 4 Mar 2019, Konstantin Belousov wrote:

>* [... shift for bogus TSC-low timecounter]
>> I suspect that the shift of 1 (at least) hides cross-socket inaccuracy.
>> Otherwise, I think, some multi-socket machines would start showing the
>> detectable backward-counting bintime().  At the frequencies at 4GHz and
>> above (Intel has 5Ghz part numbers) I do not think that stability of
>> 100MHz crystall and on-board traces is enough to avoid that.
>
> I think it is just a kludge that reduced the problem before it was fixed
> properly using fences.
>
> Cross-socket latency is over 100 cycles according to jhb's tscskew
> benchmark: on Haswell 4x2:
>
> CPU | TSC skew (min/avg/max/stddev)
> ----+------------------------------
>  0 |     0     0     0    0.000
>  1 |    24    49    84   14.353
>  2 |   164   243   308   47.811
>  3 |   164   238   312   47.242
>  4 |   168   242   332   49.593
>  5 |   168   243   324   48.722
>  6 |   172   242   320   52.596
>  7 |   172   240   316   53.014
>
> freefall is similar.  Latency is apparently measured relative to CPU 0.
> It is much lower to CPU 1 since that is on the same core.
>
> I played with this program a lot 3 and a half years ago, but forgot
> mist of what I learned :-(.  I tried different fencing in it.  This
> seems to make little difference when the program is rerun.  With the
> default TESTS = 1024, the min skew sometimes goes negative on freefall,
> but with TESTS = 1024000 that doesn't happen.  This is the opposite
> of what I would expect.  freefall has load average about 1.

I understand this program again.  First, its name is actually tscdrift.
I tested the 2015 version, and this version is still in
/usr/src/tools/tools/tscdrift/tscdrift.c, with no changes to except to
the copyright (rgrimes wouldn't like this) and to $FreeBSD$.

The program doesn't actually measure either TSC drift or TSC skew, except
indirectly.  What it actually measures is the IPC (Inter-Process-
Communication) time for synchronizing the drift and skew measurments,
except bugs or intentional sloppiness in its synchronization also make it
give an indirect measurement of similar bugs or sloppiness in normal use.

After changing TESTS from 1024 to 1024000, it shows large errors in the
negative direction, as expected from either large negative skew or program
bugs: this is on freefall:

XX CPU | TSC skew (min/avg/max/stddev)
XX ----+------------------------------
XX   0 |     0     0     0    0.000
XX   1 | -6148   108 10232   46.871
XX   2 |   114   209 95676   163.359
XX   3 |    96   202 47835   101.250
XX   4 | -2223   207 34017   117.257
XX   5 | -2349   206 33837   106.259
XX   6 | -2664   213 33579   96.048
XX   7 | -2451   212 49242   126.428

The negative "skews" occur because the server and the clients (1 client at
a time) read the TSC with uncontrolled timing after the server opens the
gate for this read (gate = 2).  The IPC time is about 200 cycles to CPUs
on different cores.  So when neither thread is preempted, the TSC on the
server is about 200 cycles in advance.  Sometimes the server is preempted,
so it reads its TSC later than the client (a maximum of about 6148 cycles
later in this test).  More often the client is preempted, since the IPC
time is march larger than the time between the server opening the gate and
the server reading its TSC.

The server is also missing fencing for its TSC read, so this read may
appear to occur several cycles before opening the gate.  This gives a
an error in the positive direction for the reported "skew" (the error
is actually in the positive direction for the reported IPC time).  It
would be useful to measure this error by intentionally omitting fencing,
but currently it is just a small amount of noise on top of the noise from
preemption.

After fixing the syncronization:

XX CPU | TSC skew (min/avg/max/stddev)
XX ----+------------------------------
XX   0 |     0     0     0    0.000
XX   1 |    33    62 49161   57.652
XX   2 |   108   169 33678   73.456
XX   3 |   108   171 43053   119.256
XX   4 |   141   169 41289   114.567
XX   5 |   141   169 40035   112.755
XX   6 |   132   186 147099   269.449
XX   7 |   153   183 431526   436.689

Synchronization apparenly takes a long time, especially to other cores.
The minimum and avergae now gives the best-case IPC time very accurately.
The average is 20-30 cycles smaller than before, probably because I
fixed the fencing.  The maximum and standard deviation are garbage noise
from preemption.  Preemption should be disabled somehow.

Large drifts and skews would show up as nonsense values for the minimum
IPC time.  Small drifts would soon give large skews.  To measure small
skews, change the CPU of the server to measure the minimum IPC time in
the opposite direction.

Fixes:

XX --- tscdrift.c	2015-07-10 06:22:36.505493000 +0000
XX +++ w.c	2019-03-05 11:22:22.232341000 +0000
XX @@ -32,6 +32,15 @@
XX  #include <sys/param.h>
XX  #include <sys/cpuset.h>
XX  #include <machine/atomic.h>
XX +/*
XX + * XXX: atomic.h is not used.  Instead we depend on x86 memory ordering and
XX + * do direct assignments to and comparisons of 'gate', and sometimes add
XX + * memory barriers.  The correct atomic ops would do much the same with
XX + * clearer spelling.  The 'lock' prefix is never needed and the barriers are
XX + * only to get program order so as to give acq or rel semantics for ether
XX + * the loads, the stores or for buggy unfenced rdtsc's.  Fences also give
XX + * program order, so some of the explicit barriers are redundant.
XX + */
XX  #include <machine/cpu.h>
XX  #include <machine/cpufunc.h>
XX  #include <assert.h>
XX @@ -45,7 +54,7 @@
XX 
XX  #define	barrier()	__asm __volatile("" ::: "memory")
XX 
XX -#define	TESTS		1024
XX +#define	TESTS		1024000
XX 
XX  static volatile int gate;
XX  static volatile uint64_t thread_tsc;
XX @@ -74,12 +83,12 @@
XX  		gate = 1;
XX  		while (gate == 1)
XX  			cpu_spinwait();
XX -		barrier();
XX 
XX +		barrier();
XX  		__asm __volatile("lfence");
XX  		thread_tsc = rdtsc();
XX -
XX  		barrier();
XX +
XX  		gate = 3;
XX  		while (gate == 3)
XX  			cpu_spinwait();

This is the client.  The explicit barriers are confusing, and the blank
lines are in all the wrong places.  All the accesses to 'gate' need
to be in program order.  x86 memory ordering gives this automatically
at the hardware level.  'gate' being volatile gives it at the compiler
level.  Both rdtsc() and storing the result to thread_tsc need to be
in program order.  lfence() in cpufunc.h has a memory clobber which
gives the former, but we use a direct asm and need a barrier() before
it to do the same thing.  Then we need another barrier() after the
assignment to thread_tsc so that the store for this is before the store
to 'gate' (I think gate being volatile doesn't give this).  This also
keeps the rdtsc() in program order (the asm for rdtsc() doesn't have
a memory clobber.  I haven't noticed care about this being taken
anywhere else).

Summary: only style changes in this section.

XX @@ -139,12 +148,13 @@
XX  		for (j = 0; j < TESTS; j++) {
XX  			while (gate != 1)
XX  				cpu_spinwait();
XX -			gate = 2;
XX -			barrier();

Move down opening the gate so that it not opened until after reading the
TSC on the server.

XX 
XX +			barrier();
XX +			__asm __volatile("lfence");

Fencing is not critical here.  Using an early TSC value just gives a larger
reported IPC time.  The barrier is important for getting program order of
rdtsc().

XX  			tsc = rdtsc();
XX -
XX  			barrier();

This barrier is still associated with the TSC read, and the blank like is
moved to reflect this.  Here rdtsc() must occur in program order, but
storing to tsc can be after storing to 'gate'.  The barrier gives ordering
for the store to tsc too.

XX +
XX +			gate = 2;
XX  			while (gate != 3)
XX  				cpu_spinwait();
XX  			gate = 4;

I tried some locked atomic ops on 'gate') and mfence instead of lfence
to try to speed up the IPC.  Nothing helped.  We noticed long ago that
fence instructions tend to be even slower that locked atomic ops for
mutexes, and jhb guessed that this might be because fence instructions
don't do so much to force out stores.

Similar IPC is needed for updating timecounters.  This benchmark indicates
that after an update, the change usually won't be visible on other CPUs
for 100+ cycles.  Since updates are rare, this isn't much of a problem.

Similar IPC is needed for comparing timecounters across CPUs.  Any activity
on different CPUs is incomparable without synchronization to establish an
ordering.  Since fences give ordering relative to memory and timecounters
don't use anything except fences and memory order for the generation count
to establish their order, the synchronization for comparing timecounters
(or clock_gettime() at higher levels) must also use memory order.

If the synchronization takes over 100 cycles, then smaller TSC skews don't
matter much (they never break monotonicity, and only show up time differences
varying by 100 or so cycles depending on which CPU measures the start and
end events).  Small differences don't matter at all.  Skews may be caused
by the TSCs actually being out of sync, or hardware only syncing them on
average (hopefully with small jitter) or bugs like missing fences.  Missing
fences don't matter much provided unserialized TSC reads aren't too far
in the past.  E.g., if we had a guarantee of only 10 cycles in the past for
the TSC and 160 cycles for IPCs to other CPUs, then we could omit the fences.
But IPCs to the same core are 100 cycles faster so the margin is too close
for ommitting fences in all cases.

Similarly for imperfect hardware.  Hopefully its skew is in the +-1 cycle
range, but even +-10 isn't a problem if the IPC time is a bit larger than
10 and even +-100 if the IPC time is a bit larger than 100.  And the problem
scales nicely with the distance of the CPUs -- when they are further apart
so that hardware synchronization of their TSCs is more difficult, the IPC
time is large too.

Hmm, that is only with physical IPCs.  Since timecounters use physical
IPCs for everything, they can't work right with virtual synchronization.
Something like ntpd is needed to compare times across even small local
networks.  It does virtual synchronization by compensating for delays.

Bruce

From owner-freebsd-hackers@freebsd.org  Wed Mar  6 17:20:15 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id EF9CF1520067;
 Wed,  6 Mar 2019 17:20:14 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 4883494D6E;
 Wed,  6 Mar 2019 17:20:14 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x26HK4Km092433
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Wed, 6 Mar 2019 19:20:07 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x26HK4Km092433
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id x26HK3r1092419;
 Wed, 6 Mar 2019 19:20:03 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Wed, 6 Mar 2019 19:20:03 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Bruce Evans <brde@optusnet.com.au>
Cc: Mark Millard <marklmi@yahoo.com>,
 freebsd-hackers Hackers <freebsd-hackers@freebsd.org>,
 FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale *
 tc_delta(th) overflows unsigned 64 bits sometimes [patched failed]
Message-ID: <20190306172003.GD2492@kib.kiev.ua>
References: <20190302105140.GC68879@kib.kiev.ua>
 <20190302225513.W3408@besplex.bde.org>
 <20190302142521.GE68879@kib.kiev.ua>
 <20190303041441.V4781@besplex.bde.org>
 <20190303111931.GI68879@kib.kiev.ua>
 <20190303223100.B3572@besplex.bde.org>
 <20190303161635.GJ68879@kib.kiev.ua>
 <20190304043416.V5640@besplex.bde.org>
 <20190304114150.GM68879@kib.kiev.ua>
 <20190305031010.I4610@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20190305031010.I4610@besplex.bde.org>
User-Agent: Mutt/1.11.3 (2019-02-01)
X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM,
 NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Mar 2019 17:20:15 -0000

On Tue, Mar 05, 2019 at 05:17:14AM +1100, Bruce Evans wrote:
> On Mon, 4 Mar 2019, Konstantin Belousov wrote:
> 
> > On Mon, Mar 04, 2019 at 05:29:48AM +1100, Bruce Evans wrote:
> >> On Sun, 3 Mar 2019, Konstantin Belousov wrote:
> >>
> >>> On Mon, Mar 04, 2019 at 12:32:12AM +1100, Bruce Evans wrote:
> >>>> On Sun, 3 Mar 2019, Konstantin Belousov wrote:
> >>>>
> >>>>> On Sun, Mar 03, 2019 at 04:43:20AM +1100, Bruce Evans wrote:
> >>>>>> On Sat, 2 Mar 2019, Konstantin Belousov wrote:
> >>>>>>
> >>>>>>> On Sun, Mar 03, 2019 at 12:03:18AM +1100, Bruce Evans wrote:
> >>>>>>>> On Sat, 2 Mar 2019, Konstantin Belousov wrote:
> >>> * ...
> >>>> However, shrd in rdtsc-low (tsc_get_timecount_low()) does a slow combining
> >>>> step.  i386 used to be faster here -- the first masking step of discarding
> >>>> %edx doesn't take any code.  amd64 has to mask out the top bits in %rax.
> >>>> Now for the tsc-low pessimization, i386 has to do a slow shrd, and amd64
> >>>> has to do a not so slow shr.
> >>> i386 cannot discard %edx after RDTSC since some bits from %edx come into
> >>> the timecounter value.
> >>
> >> These bits are part of the tsc-low pessimization.  The shift count should
> >> always be 1, giving a TSC frequency of > INT32_MAX (usually) and > UINT32_MAX
> >> sometimes.
> >>
> >> When tsc-low was new, the shift count was often larger (as much as 8),
> >> and it is still changeable by a read-only tunable, but now it is 1 in
> >> almost all cases.  The code only limits the timecounter frequency
> >> to UINT_MAX, except the tunable defaults to 1 so average CPUs running
> >> at nearly 4 GHz are usually limited to about 2 GHz.  The comment about
> >> this UINT_MAX doesn't match the code.  The comment says int, but the
> >> code says UINT.
> >>
> >> All that a shoft count of 1 does is waste time to lose 1 bit of accuracy.
> >> This much accuracy is noise for most purposes.
> >>
> >> The tunable is fairly undocumented.  Its description is "Shift to apply
> >> for the maximum TSC frequency".  Of course, it has no effect on the TSC
> >> frequency.  It only affects the TSC timecounter frequency.
> > I suspect that the shift of 1 (at least) hides cross-socket inaccuracy.
> > Otherwise, I think, some multi-socket machines would start showing the
> > detectable backward-counting bintime().  At the frequencies at 4GHz and
> > above (Intel has 5Ghz part numbers) I do not think that stability of
> > 100MHz crystall and on-board traces is enough to avoid that.
> 
> I think it is just a kludge that reduced the problem before it was fixed
> properly using fences.
> 
> Cross-socket latency is over 100 cycles according to jhb's tscskew
> benchmark: on Haswell 4x2:
> 
> CPU | TSC skew (min/avg/max/stddev)
> ----+------------------------------
>    0 |     0     0     0    0.000
>    1 |    24    49    84   14.353
>    2 |   164   243   308   47.811
>    3 |   164   238   312   47.242
>    4 |   168   242   332   49.593
>    5 |   168   243   324   48.722
>    6 |   172   242   320   52.596
>    7 |   172   240   316   53.014
> 
> freefall is similar.  Latency is apparently measured relative to CPU 0.
> It is much lower to CPU 1 since that is on the same core.
> 
> I played with this program a lot 3 and a half years ago, but forgot
> mist of what I learned :-(.  I tried different fencing in it.  This
> seems to make little difference when the program is rerun.  With the
> default TESTS = 1024, the min skew sometimes goes negative on freefall,
> but with TESTS = 1024000 that doesn't happen.  This is the opposite
> of what I would expect.  freefall has load average about 1.
> 
> Removing the only fencing in it reduces average latency by 10-20 cycles
> and minimum latency by over 100 cycles, except on freefall it is
> reduced from 33 to 6.  On Haswell it is 24 with fencing and I didn't
> test it with no fencing.
> 
> I think tscskew doesn't really measure tsc skew.  What it measures is
> the time taken for a locking protocol, using the TSCs on different
> CPUs to make the start and end timestamps.  If the TSCs have a lot of
> skew or jitter, then this will show up indirectly as inconsistent and
> possibly negative differences.
> 
> A shift of just 1 can't hide latencies of hundreds of cycles on single-
> socket machines.  Even a shift of 8 only works sometimes, by reducing
> the chance of observing the TSC going backwards by a factor of 256.
> E.g., assume for simplicity that all instructions and IPCs take 0-1
> cycles, and that unfenced rdtsc's differ by at most +-5 cycles (with
> the 11 values between -5 and 5 uniformly distributed.  Then with a
> shift of 0 and no fences, a CPU that updates the timehands is ahead of
> another CPU that spins reading the timehands about 5/11 of the time.
> With a shift of 8, the CPUs are close enough when the first one reads
> at least 5 above and at least 5 below a 256-boundary.  The chance of
> seeing a negative difference is reduced by at least a factor of 10/256.
> 
> > I suspect that the shift of 1 (at least) hides cross-socket inaccuracy.
> > Otherwise, I think, some multi-socket machines would start showing the
> > detectable backward-counting bintime().  At the frequencies at 4GHz and
> > above (Intel has 5Ghz part numbers) I do not think that stability of
> > 100MHz crystall and on-board traces is enough to avoid that.
> 
> Why would losing just 1 bit fix that?
> 
> Fences for rdtsc of course only serialize it for the CPU that runs it.
> The locking (ordering) protocol (for the generation count) orders the
> CPUs too.  It takes longer than we would like, much more than the 1-
> cycle error that might be hidden by ignoring the low bit.  Surely the
> ordering protocol must work across sockets?  It then gives ordering of
> rdtsc's.
> 
> TSC-low was added in 2011.  That was long before the ordering was fixed.
> You added fences in 2012 and memory ordering for the generation count in
> 2016.  Fences slowed everything down by 10-20+ cycles and probably hide
> bugs in the memory ordering better than TSC-low.  Memory ordering plus
> fences slow down the cross-core case by more than 100 cycles according
> to tscskew.  That is enough to hide large hardware bugs.
> 
> > We can try to set the tsc-low shift count to 0 (but keep lfence) and see
> > what is going on in HEAD, but I am afraid that the HEAD users population
> > is not representative enough to catch the issue with the certainity.
> > More, it is unclear to me how to diagnose the cause, e.g. I would expect
> > the sleeps to hang on timeouts, as was reported from the very beginning
> > of this thread. How would we root-cause it ?
> 
> Negative time differences cause lots of overflows so break the timecounter.
> The fix under discussion actually gives larger overflows in the positive
> direction.  E.g., a delta of -1 first overflows to 0xffffffff.  The fix
> prevents overflow on multiplication by that.  When the timecounter
> frequency is small, say 1 MHz, 0xffffffff means 4294 seconds, so the
> timecounter advances by that.
> 
> >>> amd64 cannot either, but amd64 does not need to mask out top bits in %rax,
> >>> since the whole shrdl calculation occurs in 32bit registers, and the result
> >>> is in %rax where top word is cleared by shrdl instruction automatically.
> >>> But the clearing is not required since result is unsigned int anyway.
> >>>
> >>> Dissassemble of tsc_get_timecount_low() is very clear:
> >>>   0xffffffff806767e4 <+4>:     mov    0x30(%rdi),%ecx
> >>>   0xffffffff806767e7 <+7>:     rdtsc
> >>>   0xffffffff806767e9 <+9>:     shrd   %cl,%edx,%eax
> >>> ...
> >>>   0xffffffff806767ed <+13>:    retq
> >>> (I removed frame manipulations).
> 
> I checked that all compilers still produce horrible code for the better
> source code 'return (rdtsc() << (intptr_t)tc->tc_priv);'.  64-bit shifts
> are apparently pessimal for compatibility.  The above is written mostly
> in asm to avoid 2-5 extra instructions.
> 
> >>>> ...
> >>>> Similarly in bintime().
> >>> I merged two functions, finally.  Having to copy the same code is too
> >>> annoying for this change.
> 
> I strongly disklike the merge.
> 
> >>> So I verified that:
> >>> - there is no 64bit multiplication in the generated code, for i386 both
> >>>  for clang 7.0 and gcc 8.3;
> >>> - that everything is inlined, the only call from bintime/binuptime is
> >>>  the indirect call to get the timecounter value.
> >>
> >> I will have to fix it for compilers that I use.
> > Ok, I will add __inline.
> 
> That will make it fast enough, but still hard to read.
> 
> >>> +		*bt = *bts;
> >>> +		scale = th->th_scale;
> >>> +		delta = tc_delta(th);
> >>> +#ifdef _LP64
> >>> +		if (__predict_false(th->th_large_delta <= delta)) {
> >>> +			/* Avoid overflow for scale * delta. */
> >>> +			bintime_helper(bt, scale, delta);
> >>> +			bintime_addx(bt, (scale & 0xffffffff) * delta);
> >>> +		} else {
> >>> +			bintime_addx(bt, scale * delta);
> >>> +		}
> >>> +#else
> >>> +		/*
> >>> +		 * Use bintime_helper() unconditionally, since the fast
> >>> +		 * path in the above method is not so fast here, since
> >>> +		 * the 64 x 32 -> 64 bit multiplication is usually not
> >>> +		 * available in hardware and emulating it using 2
> >>> +		 * 32 x 32 -> 64 bit multiplications uses code much
> >>> +		 * like that in bintime_helper().
> >>> +		 */
> >>> +		bintime_helper(bt, scale, delta);
> >>> +		bintime_addx(bt, (uint64_t)(uint32_t)scale * delta);
> >>> +#endif
> >>
> >> Check that this method is really better.  Without this, the complicated
> >> part is about half as large and duplicating it is smaller than this
> >> version.
> > Better in what sence ?  I am fine with the C code, and asm code looks
> > good.
> 
> Better in terms of actually running significantly faster.  I fear the
> 32-bit method is actually slightly slower for the fast path.
> 
> >>> -	do {
> >>> -		th = timehands;
> >>> -		gen = atomic_load_acq_int(&th->th_generation);
> >>> -		*bt = th->th_bintime;
> >>> -		bintime_addx(bt, th->th_scale * tc_delta(th));
> >>> -		atomic_thread_fence_acq();
> >>> -	} while (gen == 0 || gen != th->th_generation);
> >>
> >> Duplicating this loop is much better than obfuscating it using inline
> >> functions.  This loop was almost duplicated (except for the delta
> >> calculation) in no less than 17 functions in kern_tc.c (9 tc ones and
> >> 8 fflock ones).  Now it is only duplicated 16 times.
> > How did you counted the 16 ?  I can see only 4 instances in the unpatched
> > kern_tc.c, and 3 in patched, but it is 3 and not 1 only because I do not
> > touch ffclock until the patch is finalized.  After that, it would be
> > 1 instance for kernel and 1 for userspace.
> 
> Grep for the end condition in this loop.  There are actually 20 of these.
> I'm counting the loops and not the previously-simple scaling operation in
> it.  The scaling is indeed only done for 4 cases.  I prefer the 20 
> duplications (except I only want about 6 of the functions).  Duplication
> works even better for only 4 cases.
Ok, I merged these as well.  Now there are only four loops left in kernel.
I do not think that merging them is beneficial, since they have sufficiently
different bodies.

I disagree with you characterization of it as obfuscation, IMO it improves
the maintainability of the code by reducing number of places which need
careful inspection of the lock-less algorithm.

> 
> This should be written as a function call to 1 new function to replace
> the line with the overflowing multiplication.  The line is always the
> same, so the new function call can look like bintime_xxx(bt, th).
Again, please provide at least of a pseudocode of your preference.

The current patch becomes to large already, I want to test/commit what
I already have, and I will need to split it for the commit.

diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
index 2656fb4d22f..7114a0e5219 100644
--- a/sys/kern/kern_tc.c
+++ b/sys/kern/kern_tc.c
@@ -72,6 +72,7 @@ struct timehands {
 	struct timecounter	*th_counter;
 	int64_t			th_adjustment;
 	uint64_t		th_scale;
+	uint64_t		th_large_delta;
 	u_int	 		th_offset_count;
 	struct bintime		th_offset;
 	struct bintime		th_bintime;
@@ -200,22 +201,77 @@ tc_delta(struct timehands *th)
  * the comment in <sys/time.h> for a description of these 12 functions.
  */
 
-#ifdef FFCLOCK
-void
-fbclock_binuptime(struct bintime *bt)
+static __inline void
+bintime_helper(struct bintime *bt, uint64_t scale, u_int delta)
+{
+	uint64_t x;
+
+	x = (scale >> 32) * delta;
+	bt->sec += x >> 32;
+	bintime_addx(bt, x << 32);
+}
+
+static __inline void
+binnouptime(struct bintime *bt, u_int off)
 {
 	struct timehands *th;
-	unsigned int gen;
+	struct bintime *bts;
+	uint64_t scale;
+	u_int delta, gen;
 
 	do {
 		th = timehands;
 		gen = atomic_load_acq_int(&th->th_generation);
-		*bt = th->th_offset;
-		bintime_addx(bt, th->th_scale * tc_delta(th));
+		bts = (struct bintime *)(vm_offset_t)th + off;
+		*bt = *bts;
+		scale = th->th_scale;
+		delta = tc_delta(th);
+#ifdef _LP64
+		if (__predict_false(th->th_large_delta <= delta)) {
+			/* Avoid overflow for scale * delta. */
+			bintime_helper(bt, scale, delta);
+			bintime_addx(bt, (scale & 0xffffffff) * delta);
+		} else {
+			bintime_addx(bt, scale * delta);
+		}
+#else
+		/*
+		 * Use bintime_helper() unconditionally, since the fast
+		 * path in the above method is not so fast here, since
+		 * the 64 x 32 -> 64 bit multiplication is usually not
+		 * available in hardware and emulating it using 2
+		 * 32 x 32 -> 64 bit multiplications uses code much
+		 * like that in bintime_helper().
+		 */
+		bintime_helper(bt, scale, delta);
+		bintime_addx(bt, (uint64_t)(uint32_t)scale * delta);
+#endif
 		atomic_thread_fence_acq();
 	} while (gen == 0 || gen != th->th_generation);
 }
 
+static __inline void
+getbinnouptime(void *out, size_t out_size, u_int off)
+{
+	struct timehands *th;
+	u_int gen;
+
+	do {
+		th = timehands;
+		gen = atomic_load_acq_int(&th->th_generation);
+		memcpy(out, (char *)th + off, out_size);
+		atomic_thread_fence_acq();
+	} while (gen == 0 || gen != th->th_generation);
+}
+
+#ifdef FFCLOCK
+void
+fbclock_binuptime(struct bintime *bt)
+{
+
+	binnouptime(bt, __offsetof(struct timehands, th_offset));
+}
+
 void
 fbclock_nanouptime(struct timespec *tsp)
 {
@@ -237,16 +293,8 @@ fbclock_microuptime(struct timeval *tvp)
 void
 fbclock_bintime(struct bintime *bt)
 {
-	struct timehands *th;
-	unsigned int gen;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		*bt = th->th_bintime;
-		bintime_addx(bt, th->th_scale * tc_delta(th));
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	binnouptime(bt, __offsetof(struct timehands, th_bintime));
 }
 
 void
@@ -270,100 +318,61 @@ fbclock_microtime(struct timeval *tvp)
 void
 fbclock_getbinuptime(struct bintime *bt)
 {
-	struct timehands *th;
-	unsigned int gen;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		*bt = th->th_offset;
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	getbinnouptime(bt, sizeof(*bt), __offsetof(struct timehands,
+	    th_offset));
 }
 
 void
 fbclock_getnanouptime(struct timespec *tsp)
 {
-	struct timehands *th;
-	unsigned int gen;
+	struct bintime bt;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		bintime2timespec(&th->th_offset, tsp);
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	getbinnouptime(&bt, sizeof(bt), __offsetof(struct timehands,
+	    th_offset));
+	bintime2timespec(&bt, tsp);
 }
 
 void
 fbclock_getmicrouptime(struct timeval *tvp)
 {
-	struct timehands *th;
-	unsigned int gen;
+	struct bintime bt;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		bintime2timeval(&th->th_offset, tvp);
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	getbinnouptime(&bt, sizeof(bt), __offsetof(struct timehands,
+	    th_offset));
+	bintime2timeval(&bt, tvp);
 }
 
 void
 fbclock_getbintime(struct bintime *bt)
 {
-	struct timehands *th;
-	unsigned int gen;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		*bt = th->th_bintime;
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	getbinnouptime(bt, sizeof(*bt), __offsetof(struct timehands,
+	    th_bintime));
 }
 
 void
 fbclock_getnanotime(struct timespec *tsp)
 {
-	struct timehands *th;
-	unsigned int gen;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		*tsp = th->th_nanotime;
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	getbinnouptime(tsp, sizeof(*tsp), __offsetof(struct timehands,
+	    th_nanotime));
 }
 
 void
 fbclock_getmicrotime(struct timeval *tvp)
 {
-	struct timehands *th;
-	unsigned int gen;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		*tvp = th->th_microtime;
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	getbinnouptime(tvp, sizeof(*tvp), __offsetof(struct timehands,
+	    th_microtime));
 }
 #else /* !FFCLOCK */
+
 void
 binuptime(struct bintime *bt)
 {
-	struct timehands *th;
-	u_int gen;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		*bt = th->th_offset;
-		bintime_addx(bt, th->th_scale * tc_delta(th));
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	binnouptime(bt, __offsetof(struct timehands, th_offset));
 }
 
 void
@@ -387,16 +396,8 @@ microuptime(struct timeval *tvp)
 void
 bintime(struct bintime *bt)
 {
-	struct timehands *th;
-	u_int gen;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		*bt = th->th_bintime;
-		bintime_addx(bt, th->th_scale * tc_delta(th));
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	binnouptime(bt, __offsetof(struct timehands, th_bintime));
 }
 
 void
@@ -420,85 +421,53 @@ microtime(struct timeval *tvp)
 void
 getbinuptime(struct bintime *bt)
 {
-	struct timehands *th;
-	u_int gen;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		*bt = th->th_offset;
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	getbinnouptime(bt, sizeof(*bt), __offsetof(struct timehands,
+	    th_offset));
 }
 
 void
 getnanouptime(struct timespec *tsp)
 {
-	struct timehands *th;
-	u_int gen;
+	struct bintime bt;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		bintime2timespec(&th->th_offset, tsp);
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	getbinnouptime(&bt, sizeof(bt), __offsetof(struct timehands,
+	    th_offset));
+	bintime2timespec(&bt, tsp);
 }
 
 void
 getmicrouptime(struct timeval *tvp)
 {
-	struct timehands *th;
-	u_int gen;
+	struct bintime bt;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		bintime2timeval(&th->th_offset, tvp);
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	getbinnouptime(&bt, sizeof(bt), __offsetof(struct timehands,
+	    th_offset));
+	bintime2timeval(&bt, tvp);
 }
 
 void
 getbintime(struct bintime *bt)
 {
-	struct timehands *th;
-	u_int gen;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		*bt = th->th_bintime;
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	getbinnouptime(bt, sizeof(*bt), __offsetof(struct timehands,
+	    th_bintime));
 }
 
 void
 getnanotime(struct timespec *tsp)
 {
-	struct timehands *th;
-	u_int gen;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		*tsp = th->th_nanotime;
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	getbinnouptime(tsp, sizeof(*tsp), __offsetof(struct timehands,
+	    th_nanotime));
 }
 
 void
 getmicrotime(struct timeval *tvp)
 {
-	struct timehands *th;
-	u_int gen;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		*tvp = th->th_microtime;
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	getbinnouptime(tvp, sizeof(*tvp), __offsetof(struct timehands,
+	    th_microtime));
 }
 #endif /* FFCLOCK */
 
@@ -514,15 +483,9 @@ getboottime(struct timeval *boottime)
 void
 getboottimebin(struct bintime *boottimebin)
 {
-	struct timehands *th;
-	u_int gen;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		*boottimebin = th->th_boottime;
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	getbinnouptime(boottimebin, sizeof(*boottimebin),
+	    __offsetof(struct timehands, th_boottime));
 }
 
 #ifdef FFCLOCK
@@ -1038,15 +1001,9 @@ getmicrotime(struct timeval *tvp)
 void
 dtrace_getnanotime(struct timespec *tsp)
 {
-	struct timehands *th;
-	u_int gen;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		*tsp = th->th_nanotime;
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	getbinnouptime(tsp, sizeof(*tsp), __offsetof(struct timehands,
+	    th_nanotime));
 }
 
 /*
@@ -1464,6 +1421,7 @@ tc_windup(struct bintime *new_boottimebin)
 	scale += (th->th_adjustment / 1024) * 2199;
 	scale /= th->th_counter->tc_frequency;
 	th->th_scale = scale * 2;
+	th->th_large_delta = ((uint64_t)1 << 63) / scale;
 
 	/*
 	 * Now that the struct timehands is again consistent, set the new

From owner-freebsd-hackers@freebsd.org  Wed Mar  6 21:03:53 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id ADDC415273FF
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Wed,  6 Mar 2019 21:03:53 +0000 (UTC)
 (envelope-from marklmi@yahoo.com)
Received: from sonic305-22.consmr.mail.ne1.yahoo.com
 (sonic305-22.consmr.mail.ne1.yahoo.com [66.163.185.148])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id A21DA70560
 for <freebsd-hackers@freebsd.org>; Wed,  6 Mar 2019 21:03:52 +0000 (UTC)
 (envelope-from marklmi@yahoo.com)
X-YMail-OSG: ok9IFkgVM1mDn1NyROc6pAsNPoJjWLIQ1eZslNsDEbrtjuz3QthYjLnuT9m_imE
 x8kHcG0P3LT0gM_jJzyZfpMf_hLJKNikukaSwy1_JgGOmitIXNwkKV9MUYSJibh7_zNic79Ik_wH
 AsIejzs2Qvx17ShWTM43j5R3Z48XajS8.WZ4BR3rrnhu86bxqHfH89ssV3gXQcoqFUUMse4BOoEy
 2wnWtLflah9DxhzKkYynVsa8Hyc7zaRLti3OTgvI5D.oW8_flNX7gRWXITj6crJb3_yTadMYYHP1
 _YnQLgHpLTePtvbgrJmgAWzyQhtVctI1HHWtTZBlmiyq795aohDShVi0WML9z1aUnDczI2BhgIeJ
 WR7n1iVU8Pu1CJ7LVAYgpe5CNtuR1BQxXfwwCEhePGqMDxdqUSZ2rDXYynvNILC5M.QMrZQl4eJK
 FJoUwKdHfIefHQ2LpRxauxlI6wW9TSsN5eASnZsQGFJMNArscbxaZaBnMZBBcbNNec7o1nNhzi71
 g0HmscP41eaFZLmAUMqSZWC0tkmhomHm1ej30..gaVcO6lz.5AYV9ix7UOMNoH1XCiG_0Ksn6MFg
 y2TrGQRA5QgzfU6B7vAicY4KKN2ojD27hf19gDvhzp8niI8.uusz.bPknnt1ID6WBEQTY_cdUzck
 Jbt0D6FmbTPZEKo5_5Se1d5F3I4ZwLUhex38sw.h75AK.URMKakP5M36z5rPERZq6oKtadg_Yfh2
 sQq6.oV97g3HY6s3u2MQcRTlX6B83ztQ4.mwNeV8qOHhE_HQ3r.h_2CegvO9JwKbTAN9NmY8kqaR
 Qa9CLX5.hpuVQbmTFIA71rve6anB7DHQROaYsuAwPg8HjVfxN2lsI.qAZkJdtk7O1h8OjpNk6GI_
 0JHiOfJLR0UkGVuYRLSh51WGEaTsa4iylXSDbvFFAof7AlYUHAXjSO7.p4GRAunkU6V55oeap0S0
 vTVHZ87aN36JLhrIJyWArJVSjrcvF.mn6JCS4TeGnZpNfXMKBXRnJ6oANhyBmk1mR6vkg68b2WhZ
 0YwHYLpzFNh3C8Ean2uWb4G40WYMjFMa7xfQHO9K1mdd2_9vIeLgb99ckKkkOzCQl.Q--
Received: from sonic.gate.mail.ne1.yahoo.com by
 sonic305.consmr.mail.ne1.yahoo.com with HTTP; Wed, 6 Mar 2019 21:03:46 +0000
Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.115])
 ([67.170.167.181])
 by smtp407.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID
 989e6db6d298809b3619be81032a35c7; 
 Wed, 06 Mar 2019 21:03:43 +0000 (UTC)
From: Mark Millard <marklmi@yahoo.com>
Content-Type: text/plain;
	charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\))
Subject: Re: powerpc64 on PowerMac G5 4-core (system total): a hack that so
 far seem to avoid the stuck-sleeping issue [self-hosted
 buildworld/buildkernel completed]
Date: Wed, 6 Mar 2019 13:03:42 -0800
References: <B898BF60-2872-4FFC-AD72-A32591BC7D20@yahoo.com>
 <76E8BF75-A8F5-4A48-9B7C-6494F4A9520B@yahoo.com>
 <75A8BB07-3273-423E-9436-798395BC8640@yahoo.com>
To: FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>,
 Mark Millard via freebsd-hackers <freebsd-hackers@freebsd.org>
In-Reply-To: <75A8BB07-3273-423E-9436-798395BC8640@yahoo.com>
Message-Id: <23683875-418E-4E48-BE26-01221EABC906@yahoo.com>
X-Mailer: Apple Mail (2.3445.102.3)
X-Rspamd-Queue-Id: A21DA70560
X-Spamd-Bar: +++
X-Spamd-Result: default: False [3.11 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[];
 R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[];
 FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3];
 TO_DN_ALL(0.00)[];
 MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net];
 DKIM_TRACE(0.00)[yahoo.com:+]; RCPT_COUNT_TWO(0.00)[2];
 DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject];
 FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[];
 MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com];
 ASN(0.00)[asn:36646, ipnet:66.163.184.0/21, country:US];
 MID_RHS_MATCH_FROM(0.00)[];
 DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0];
 ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048];
 FROM_HAS_DN(0.00)[]; NEURAL_SPAM_SHORT(0.79)[0.791,0];
 MIME_GOOD(-0.10)[text/plain];
 IP_SCORE(1.31)[ip: (4.36), ipnet: 66.163.184.0/21(1.25), asn: 36646(1.00),
 country: US(-0.07)]; NEURAL_SPAM_MEDIUM(0.81)[0.808,0];
 TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(0.71)[0.709,0];
 RCVD_IN_DNSWL_NONE(0.00)[148.185.163.66.list.dnswl.org : 127.0.5.0]
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Mar 2019 21:03:53 -0000

[I have a new observed maximum difference, having changed
the code record such.]

On 2019-Mar-4, at 01:40, Mark Millard <marklmi at yahoo.com> wrote:

> [I did some testing of other figures than testing for < 0x10.]
>=20
> On 2019-Mar-3, at 13:23, Mark Millard <marklmi at yahoo.com> wrote:
>=20
>> [So far the hack has been successful. Details given later
>> below.]
>>=20
>> On 2019-Mar-2, at 21:20, Mark Millard <marklmi at yahoo.com> wrote:
>>=20
>>> [This note goes in a different direction compared to my
>>> prior evidence report for overflows and the later activity
>>> that has been happening for it. This does *not* involve
>>> the patches associated with that report.]
>>>=20
>>> I view the following as an evidence-gathering hack:
>>> showing the change in behavior with the code changes,
>>> not as directly what FreeBSD should do for powerpc64.
>>> In code for defined(__powerpc64__) && defined(AIM)
>>> I freely use knowledge of the PowerMac G5 context
>>> instead of attempting general code.
>>>=20
>>> Also: the code is set up to record some information
>>> that I've been looking at via ddb. The recording is
>>> not part of what changes the behavior but I decided
>>> to show that code too.
>>>=20
>>> It is preliminary, but, so far, the hack has avoided
>>> buf*daemon* threads and pmac_thermal getting stuck
>>> sleeping (or, at least, far less frequently).
>>>=20
>>>=20
>>> The tbr-value hack:
>>>=20
>>> =46rom what I see the G5 various cores have each tbr running at the
>>> same rate but have some some offsets as far as the base time
>>> goes. cpu_mp_unleash does:
>>>=20
>>>      ap_awake =3D 1;
>>>=20
>>>      /* Provide our current DEC and TB values for APs */
>>>      ap_timebase =3D mftb() + 10;
>>>      __asm __volatile("msync; isync");
>>>=20
>>>      /* Let APs continue */
>>>      atomic_store_rel_int(&ap_letgo, 1);
>>>=20
>>>      platform_smp_timebase_sync(ap_timebase, 0);
>>>=20
>>> and machdep_ap_bootstrap does:
>>>=20
>>>      /*
>>>       * Set timebase as soon as possible to meet an implicit =
rendezvous
>>>       * from cpu_mp_unleash(), which sets ap_letgo and then =
immediately
>>>       * sets timebase.
>>>       *
>>>       * Note that this is instrinsically racy and is only relevant =
on
>>>       * platforms that do not support better mechanisms.
>>>       */
>>>      platform_smp_timebase_sync(ap_timebase, 1);
>>>=20
>>>=20
>>> which attempts to set the tbrs appropriately.
>>>=20
>>> But on small scales of differences the various tbr
>>> values from different cpus end up not well ordered
>>> relative to time, synchronizes with, and the like.
>>> Only large enough differences can well indicate an
>>> ordering of interest.
>>>=20
>>> Note: tc->tc_get_timecount(tc) only provides the
>>> least signficant 32 bits of the tbr value.
>>> th->th_offset_count is also 32 bits and based on
>>> truncated tbr values.
>>>=20
>>> So I made binuptime avoid finishing when it sees
>>> a small (<0x10) step backwards for a new
>>> tc->tc_get_timecount(tc) value vs. the existing
>>> th->th_offset_count value (values strongly tied
>>> to powerpc64 tbr values):
>>>=20
>>> . . . [old code omitted] . . .
>>>=20
>>> So far as I can tell, the FreeBSD code is not designed to deal
>>> with small differences in tc->tc_get_timecount(tc) not actually
>>> indicating a useful < vs. =3D=3D vs. > ordering relation uniquely.
>>>=20
>>> (I make no claim that the hack is a proper way to deal with
>>> such.)
>>=20
>> I did a somewhat over 7 hours buildworld buildkernel on the
>> PowerMac G5. Overall the G5 has been up over 13 hours and
>> none of the buf*daemon* threads have gotten stuck sleeping.
>> Nor has pmac_thermal gotten stuck. Similarly for vnlru
>> and syncer: "top -HIStopid" still shows them all as
>> periodically active.
>>=20
>> Previously for this usefdt=3D1 context (with the modern
>> VM_MAX_KERNEL_ADDRESS), going more than a few minutes
>> without at least one of those threads getting stuck
>> sleeping was rare on the G5 (powerpc64 example).
>>=20
>> So this hack has managed to avoid finding sbinuptime()
>> in sleepq_timeout being less than the earlier (by call
>> structure/code sequencing) sbinuptime() in timercb that
>> lead to the sleepq_timeout callout being called in the
>> first place.
>>=20
>> So in the sleepq_timeout callout's:
>>=20
>>       if (td->td_sleeptimo > sbinuptime() || td->td_sleeptimo =3D=3D =
0) {
>>               /*
>>                * The thread does not want a timeout (yet).
>>                */
>>       } else . . .
>>=20
>> td->td_sleeptimo > sbinuptime() ends up false now for small
>> enough original differences.
>>=20
>> This case does not set up another timeout, it just leaves the
>> thread stuck sleeping, no longer doing periodic activities.
>>=20
>> As stands what I did (presuming an appropriate definition
>> of "small differences in the problematical direction") should
>> leave this and other sbinuptime-using code with:
>>=20
>> td->td_sleeptimo <=3D sbinuptime()
>>=20
>> for what were originally "small" tbr value differences in the
>> problematical direction (in case other places require it in
>> some way).
>>=20
>> If, instead, just sleepq_timeout's test could allow for
>> some slop in the ordering, it could be a cheaper hack then
>> looping in binuptime .
>>=20
>> At this point I've no clue what a correct/efficient FreeBSD
>> design for allowing the sloppy match across tbr's for different
>> CPUs would be.
>=20
> Instead of 0x10 in "&& tim_offset-tim_cnt<0x10" I tried
> the each of following and they all failed:
>=20
> && tim_offset-tim_cnt<0x2
> && tim_offset-tim_cnt<0x4
> && tim_offset-tim_cnt<0x8
> && tim_offset-tim_cnt<0xc

I've now seen a difference of 0x11 that lead to hung
up threads, hung waiting for sleep.

> 0x2, 0x4, and 0x8 failed for the first boot attempt,
> almost mediately having stuck-in-sleep threads.
>=20
> 0xc seemed to be working for the first boot (including
> a buildworld buildkernel that did not have to rebuild
> much). But the 2nd boot attempt had a stuck-in-sleep
> thread by the time I logged in.
>=20
> By contrast, for:
>=20
> && tim_offset-tim_cnt<0x10
>=20
> I've not it fail so far, after many reboots, a full
> buildworld buildkernel, and running over 24 hours
> (that included the somewhat over 7 hours for build
> world buildkernel). But it might be that some boots
> would need a bigger figure.
>=20

During a ports-mgmt/poudriere-devel run I had some
threads hang in sleep when the code was based on
less than 0x10 differences. But I'd changed to
be recording the maximum "small difference in the
problematical direction" observed and so was able
to see that it got a:

0x11

difference.

The below is the newer code structure as far as what
is recorded. It already has 0x14 instead of 0x10 for
the bound it uses to control the loop. I omitted
#if 0 . . . #endif code that I'm not currently using.

#if defined(__powerpc64__) && defined(AIM)
void
binuptime(struct bintime *bt)
{
        struct timehands *th;
        u_int gen;

        u_int tim_cnt, tim_offset; // HACK!!! (for "small difference is =
problem direction loop")

        struct timecounter *tc; // HACK!!! (for recording other data for =
inspection via ddb)
        u_int tim_diff; // HACK!!!
        uint64_t scale_factor, diff_scaled; // HACK!!!

#if 1
        u_int tim_wrong_order_diff=3D 0u; // HACK!!!
        u_int max_wrong_order_diff=3D 0u; // HACK!!!
        u_int wrong_order_cnt=3D      0u; // HACK!!!
        u_int wrong_order_offset=3D   0u; // HACK!!!
#endif

        do {
                do { // HACK!!!
                    th=3D  timehands;
                    tc=3D  th->th_counter;
                    gen=3D atomic_load_acq_int(&th->th_generation);
                    tim_cnt=3D    tc->tc_get_timecount(tc);
                    tim_offset=3D th->th_offset_count;
#if 1
                    tim_wrong_order_diff=3D tim_offset-tim_cnt;
                    if (  tim_cnt<tim_offset
                       && tim_wrong_order_diff<0x100u
                       && max_wrong_order_diff<tim_wrong_order_diff
                       ) {
                        wrong_order_cnt=3D      tim_cnt;
                        wrong_order_offset=3D   tim_offset;
                        max_wrong_order_diff=3D tim_wrong_order_diff;
                    }
#endif
                } while (tim_cnt<tim_offset && =
tim_wrong_order_diff<0x14u); // HACK!!!
                *bt =3D th->th_offset;
                tim_diff=3D (tim_cnt - tim_offset) & =
tc->tc_counter_mask;
                scale_factor=3D th->th_scale;
                diff_scaled=3D scale_factor * tim_diff;
                bintime_addx(bt, diff_scaled);
                atomic_thread_fence_acq();
        } while (gen =3D=3D 0 || gen !=3D th->th_generation);

#if 1
        // Uses direct-map addresses (mapping to the most signficant c =
being masked off).
        // Justin H. reported that some of the 0x0..0xff addresses were =
unused
        // and available. The 2 larger ranges that I observed to stay at =
zero
        // were 0x20..0x7f and 0xa..0xff --so that is what I limited =
myself to.
        if (*(volatile =
uint64_t*)0xc0000000000000b0<max_wrong_order_diff) { // HACK!!!
                *(volatile uint64_t*)0xc0000000000000a0=3D =
wrong_order_cnt;
                *(volatile uint64_t*)0xc0000000000000a8=3D =
wrong_order_offset;
                *(volatile uint64_t*)0xc0000000000000b0=3D =
max_wrong_order_diff;
        }
#endif
}
#else
. . .
#endif

=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)


From owner-freebsd-hackers@freebsd.org  Thu Mar  7 12:22:23 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id A3F14152D5CB;
 Thu,  7 Mar 2019 12:22:23 +0000 (UTC)
 (envelope-from babupalit@gmail.com)
Received: from mail-qt1-x844.google.com (mail-qt1-x844.google.com
 [IPv6:2607:f8b0:4864:20::844])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 9DD33821E1;
 Thu,  7 Mar 2019 12:22:22 +0000 (UTC)
 (envelope-from babupalit@gmail.com)
Received: by mail-qt1-x844.google.com with SMTP id y4so16760213qtc.10;
 Thu, 07 Mar 2019 04:22:22 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:from:date:message-id:subject:to;
 bh=MkiqQeA1vGpih3llJog3SU2FoClkoQ+xZvBJkRhxkDY=;
 b=qo9RhNaZBEb8+79Shho0kFOhr0oO6NWgHqdQf3ra3dX9SrtXvZMd+MjQ6FlHsIVGWk
 6VLrzIYrjnDwKqp0lM7xZ+LfUy5oL3eR63wXu5I9Lmfy4l+qqcY7MZqlwXIjZ0q7+RxN
 OjUx1Wvw4nduiSzI39iDzGAixOz7BhDHdIXkhQuFEjvIc63w0jcZAii26o8lfwB4LbNz
 BUuwFgF9j7r3eaQ46xmdItxnCNcF4D5qpH2SNswuL25G3GFCm1sfTHCeuTFfp3IMBq4h
 XRpY/ejWMwYMXJn25Af71pj1ahp8mSQtsS955nrzlP6RXhxvfoel/vEuGTu0SU1Fv0tI
 FnfQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:from:date:message-id:subject:to;
 bh=MkiqQeA1vGpih3llJog3SU2FoClkoQ+xZvBJkRhxkDY=;
 b=MKJ00E+lXMyfiyOql9drEEVtBPL/2UJMGSEZmqNrgNTNKtaKGv9Qm0R2gwqh6DOQSD
 yRKkBBd6q1XjMvreZsgY1Kzjb6+2q9DUquwkHlZFeiB09HjOsHyVLP9RNEJSfFDZqeJ5
 06QW/YY2c+bgjpK0oKyRFv9mJ5LGHQaHodUdbKOj6yLUm3rTBY5lv+l6tnXwlR21dQcB
 xM+22BSMx0vNy6JtZuVsUVNuRBuF6lTf/ZnkTjh4UKapPVo3QZnFJyp26cS2IAxz9/Hv
 +R9tLb2kCn/vgvgV7qIaJ27NM8NQzKaaKl2MbXIzOUu305rP9kTiEJ2Lu0zkPQee5ieI
 akiQ==
X-Gm-Message-State: APjAAAXE752Z3KhbJ4g+sxh3X6gZyrPmlaM4iWu9IVZd2EJGyRteiYJg
 4F9PTI/yyQzyy/3kyWQo8mU+VHZlBMjjGRwPz83tTqVF
X-Google-Smtp-Source: APXvYqzwurFceC/4oWAMIDHLObqM6nb3VGXcdeLNtM567KpjjgEntEBPdmebp+v5STwIxnM3W67MSIgJ/AYQa7KkW3U=
X-Received: by 2002:ac8:1761:: with SMTP id u30mr9675836qtk.354.1551961340985; 
 Thu, 07 Mar 2019 04:22:20 -0800 (PST)
MIME-Version: 1.0
From: Arpan Palit <babupalit@gmail.com>
Date: Thu, 7 Mar 2019 17:52:09 +0530
Message-ID: <CAF3txfhyhKgq0YOhn_XsPKKA2emVVFdgn9hQuA-4pWxkjmeGFg@mail.gmail.com>
Subject: How to access external PHY on MDIO bus?
To: freebsd-drivers@freebsd.org, freebsd-hackers@freebsd.org
X-Rspamd-Queue-Id: 9DD33821E1
X-Spamd-Bar: ---
Authentication-Results: mx1.freebsd.org;
 dkim=pass header.d=gmail.com header.s=20161025 header.b=qo9RhNaZ;
 dmarc=pass (policy=none) header.from=gmail.com;
 spf=pass (mx1.freebsd.org: domain of babupalit@gmail.com designates
 2607:f8b0:4864:20::844 as permitted sender) smtp.mailfrom=babupalit@gmail.com
X-Spamd-Result: default: False [-3.00 / 15.00]; ARC_NA(0.00)[];
 NEURAL_HAM_MEDIUM(-1.00)[-0.999,0];
 R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; FROM_HAS_DN(0.00)[];
 R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36];
 FREEMAIL_FROM(0.00)[gmail.com];
 MIME_GOOD(-0.10)[multipart/alternative,text/plain];
 TO_DN_NONE(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0];
 TO_MATCH_ENVRCPT_ALL(0.00)[];
 NEURAL_HAM_SHORT(-0.48)[-0.478,0]; RCVD_TLS_LAST(0.00)[];
 DKIM_TRACE(0.00)[gmail.com:+]; RCPT_COUNT_TWO(0.00)[2];
 DMARC_POLICY_ALLOW(-0.50)[gmail.com,none];
 MX_GOOD(-0.01)[cached: alt3.gmail-smtp-in.l.google.com];
 SUBJECT_ENDS_QUESTION(1.00)[];
 RCVD_IN_DNSWL_NONE(0.00)[4.4.8.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.b.8.f.7.0.6.2.list.dnswl.org
 : 127.0.5.0]; FROM_EQ_ENVFROM(0.00)[];
 MIME_TRACE(0.00)[0:+,1:+]; FREEMAIL_ENVFROM(0.00)[gmail.com];
 ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US];
 RCVD_COUNT_TWO(0.00)[2];
 IP_SCORE(-0.51)[ip: (2.26), ipnet: 2607:f8b0::/32(-2.72), asn: 15169(-2.05),
 country: US(-0.07)]; 
 DWL_DNSWL_NONE(0.00)[gmail.com.dwl.dnswl.org : 127.0.5.0]
X-Mailman-Approved-At: Thu, 07 Mar 2019 12:38:44 +0000
Content-Type: text/plain; charset="UTF-8"
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Mar 2019 12:22:23 -0000

Hi,

I need to know how can I access the specific register offset in external
phy  freebsd. In linux the equivalent routine is phy_read/phy_write to
read/write a specific register, which internally call the
mdiobus_read/mdiobus_write function.
I could see that there is a mdio_readreg/mdio_writereg MDIO interface
present which is driven by stack, what if driver needs to do the same, is
there any equivalent present or any other way to do that.

Thanks,
Arpan Palit

From owner-freebsd-hackers@freebsd.org  Thu Mar  7 14:31:41 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id D4DDC153112F;
 Thu,  7 Mar 2019 14:31:40 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au
 [211.29.132.42])
 by mx1.freebsd.org (Postfix) with ESMTP id 74F1388991;
 Thu,  7 Mar 2019 14:31:39 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au
 [110.21.101.228])
 by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 7BC2C3D92DB;
 Fri,  8 Mar 2019 01:31:32 +1100 (AEDT)
Date: Fri, 8 Mar 2019 01:31:30 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Konstantin Belousov <kostikbel@gmail.com>
cc: Bruce Evans <brde@optusnet.com.au>, Mark Millard <marklmi@yahoo.com>, 
 freebsd-hackers Hackers <freebsd-hackers@freebsd.org>, 
 FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale
 * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed]
In-Reply-To: <20190306172003.GD2492@kib.kiev.ua>
Message-ID: <20190308001005.M2756@besplex.bde.org>
References: <20190302105140.GC68879@kib.kiev.ua>
 <20190302225513.W3408@besplex.bde.org>
 <20190302142521.GE68879@kib.kiev.ua> <20190303041441.V4781@besplex.bde.org>
 <20190303111931.GI68879@kib.kiev.ua> <20190303223100.B3572@besplex.bde.org>
 <20190303161635.GJ68879@kib.kiev.ua> <20190304043416.V5640@besplex.bde.org>
 <20190304114150.GM68879@kib.kiev.ua> <20190305031010.I4610@besplex.bde.org>
 <20190306172003.GD2492@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0
 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17
 a=kj9zAlcOel0A:10 a=GReyFr9QJwj15KPVhA0A:9 a=CjuIK1q_8ugA:10
X-Rspamd-Queue-Id: 74F1388991
X-Spamd-Bar: ------
Authentication-Results: mx1.freebsd.org
X-Spamd-Result: default: False [-6.91 / 15.00];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]; REPLY(-4.00)[];
 NEURAL_HAM_SHORT(-0.91)[-0.914,0]
X-Mailman-Approved-At: Thu, 07 Mar 2019 16:29:06 +0000
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Mar 2019 14:31:41 -0000

On Wed, 6 Mar 2019, Konstantin Belousov wrote:

> On Tue, Mar 05, 2019 at 05:17:14AM +1100, Bruce Evans wrote:
>> On Mon, 4 Mar 2019, Konstantin Belousov wrote:
>>
>>> On Mon, Mar 04, 2019 at 05:29:48AM +1100, Bruce Evans wrote:
>>>> On Sun, 3 Mar 2019, Konstantin Belousov wrote:
>>>>
>>>>> On Mon, Mar 04, 2019 at 12:32:12AM +1100, Bruce Evans wrote:
>* ...
>> I strongly disklike the merge.
>>
>>>>> So I verified that:
>>>>> - there is no 64bit multiplication in the generated code, for i386 both
>>>>>  for clang 7.0 and gcc 8.3;
>>>>> - that everything is inlined, the only call from bintime/binuptime is
>>>>>  the indirect call to get the timecounter value.
>>>>
>>>> I will have to fix it for compilers that I use.
>>> Ok, I will add __inline.
>>
>> That will make it fast enough, but still hard to read.
>>
>>>>> +		*bt = *bts;
>>>>> +		scale = th->th_scale;
>>>>> +		delta = tc_delta(th);
>>>>> +#ifdef _LP64
>>>>> +		if (__predict_false(th->th_large_delta <= delta)) {
>>>>> +			/* Avoid overflow for scale * delta. */
>>>>> +			bintime_helper(bt, scale, delta);
>>>>> +			bintime_addx(bt, (scale & 0xffffffff) * delta);
>>>>> +		} else {
>>>>> +			bintime_addx(bt, scale * delta);
>>>>> +		}
>>>>> +#else
>>>>> +		/*
>>>>> +		 * Use bintime_helper() unconditionally, since the fast
>>>>> +		 * path in the above method is not so fast here, since
>>>>> +		 * the 64 x 32 -> 64 bit multiplication is usually not
>>>>> +		 * available in hardware and emulating it using 2
>>>>> +		 * 32 x 32 -> 64 bit multiplications uses code much
>>>>> +		 * like that in bintime_helper().
>>>>> +		 */
>>>>> +		bintime_helper(bt, scale, delta);
>>>>> +		bintime_addx(bt, (uint64_t)(uint32_t)scale * delta);
>>>>> +#endif
>>>>
>>>> Check that this method is really better.  Without this, the complicated
>>>> part is about half as large and duplicating it is smaller than this
>>>> version.
>>> Better in what sence ?  I am fine with the C code, and asm code looks
>>> good.
>>
>> Better in terms of actually running significantly faster.  I fear the
>> 32-bit method is actually slightly slower for the fast path.

I checked that it is just worse.  Significantly slower and more complicated.

I wrote and run a lot of timing benchmarks of various versions.  All
times in cycles on Haswell @4.08 GHz.  On i386 except where noted:

- the fastest case is when compiled by clang with the default of -O2.
   binuptime() in a loop then takes 34 cycles.  This is faster than possible
   for latency, since rdtsc alone has a latency of 24 cycles.  There must be
   several iterations of the loop running in parallel.

- the slowest case is when compiled by gcc-4.2.1 with my config  of -Os.
   binuptime() in a loop then takes 116 cycles.  -Os does at least the
   following pessimization: use memcpy() for copying the 12-byte struct
   bitime.

- gcc-4.2.1 -O2 takes 74 cycles.  -O2 still does the following pessimization:
   do a 64 x 32 -> 64 bit multiplication after not noticing that the first
   operand has been reduced to 32 bits by a shift or mask.

The above tests were done with the final version.  The version which tested
alternatives used switch (method) and takes about 20 cycles longer for the
fastest version, presumably by defeating parallelism.  Times for various
methods:

- with clang -Os, about 54 cycles for the old method that allowed overflow,
   and the same for the version with the check of the overflow threshold
   (but with the threshold never reached), and 59 cycles for the branch-
   free method.  100-116 cycles with gcc-4.2.1 -Os, with the branch-free
   method taking 5-10 cycles longer.

- on amd64, only a couple of cycles faster (49-50 cycles in best cases),
   and gcc-4.2.1 only taking a ouple of cycles longer.  The branch-free
   method still takes about 59 cycles so it is relatively worse.

In userland, using the syscall for syscall for clock_gettime(), the
extra 5-10 cycles for the branch-free method is relatively insignificat.
It is about 2 nanonseconds.  Other pessimizatations are more significant.
Times for this syscall:
- amd64 now: 224 nsec (with gcc-4.2.1 -Os)
- i386 4+4 nopae: 500-580 nsec (depending on clang/gcc and -O2/-Os)
   even getpid(2) takes 280 nsec.  Add at least 140 more nsec for pae.
- i386 3+1: 224 nsec (with gcc 4.2.1 -Os)
- i386 FreeBSD-5 UP: 193 nsec (with gcc-3.3.3 -O).
- i386 4+4 nopae old library version of clock_gettime() compiled by
   clang: 29 nsec.

In some tests, the version with the branch was even a cycle or two faster.
In the tests, the branch was always perfectly predicted, so costs nothing
except possibly by changing scheduling in an accidentally good way.  The
tests were too small to measure the cost of using branch prediction
resources.  I've never noticed a case where 1 more branch causes thrashing.

>>>>> -	do {
>>>>> -		th = timehands;
>>>>> -		gen = atomic_load_acq_int(&th->th_generation);
>>>>> -		*bt = th->th_bintime;
>>>>> -		bintime_addx(bt, th->th_scale * tc_delta(th));
>>>>> -		atomic_thread_fence_acq();
>>>>> -	} while (gen == 0 || gen != th->th_generation);
>>>>
>>>> Duplicating this loop is much better than obfuscating it using inline
>>>> functions.  This loop was almost duplicated (except for the delta
>>>> calculation) in no less than 17 functions in kern_tc.c (9 tc ones and
>>>> 8 fflock ones).  Now it is only duplicated 16 times.
>>> How did you counted the 16 ?  I can see only 4 instances in the unpatched
>>> kern_tc.c, and 3 in patched, but it is 3 and not 1 only because I do not
>>> touch ffclock until the patch is finalized.  After that, it would be
>>> 1 instance for kernel and 1 for userspace.
>>
>> Grep for the end condition in this loop.  There are actually 20 of these.
>> I'm counting the loops and not the previously-simple scaling operation in
>> it.  The scaling is indeed only done for 4 cases.  I prefer the 20
>> duplications (except I only want about 6 of the functions).  Duplication
>> works even better for only 4 cases.
> Ok, I merged these as well.  Now there are only four loops left in kernel.
> I do not think that merging them is beneficial, since they have sufficiently
> different bodies.

This is exacly what I don't want.
>
> I disagree with you characterization of it as obfuscation, IMO it improves
> the maintainability of the code by reducing number of places which need
> careful inspection of the lock-less algorithm.

It makes the inspection and changes more difficult for each instance.
General functions are more difficult to work with since they need more
args to control them and can't be changed without affecting all callers.

In another thread, you didn't like similar churn for removing td args.
Here there isn't even a bug, since overflow only occurs when an invariant
is violated.

>> This should be written as a function call to 1 new function to replace
>> the line with the overflowing multiplication.  The line is always the
>> same, so the new function call can look like bintime_xxx(bt, th).
> Again, please provide at least of a pseudocode of your preference.

The following is a complete tested and benchmarked implementation, with a
couple more minor fixes:

XX Index: kern_tc.c
XX ===================================================================
XX --- kern_tc.c	(revision 344852)
XX +++ kern_tc.c	(working copy)
XX @@ -72,6 +72,7 @@
XX  	struct timecounter	*th_counter;
XX  	int64_t			th_adjustment;
XX  	uint64_t		th_scale;
XX +	u_int			th_large_delta;
XX  	u_int	 		th_offset_count;
XX  	struct bintime		th_offset;
XX  	struct bintime		th_bintime;

Improvement not already discussed: use a u_int limit for the u_int variable.

XX @@ -90,6 +91,7 @@
XX  static struct timehands th0 = {
XX  	.th_counter = &dummy_timecounter,
XX  	.th_scale = (uint64_t)-1 / 1000000,
XX +	.th_large_delta = 1000000,
XX  	.th_offset = { .sec = 1 },
XX  	.th_generation = 1,
XX  	.th_next = &th1

Fix not already discussed: th_large_delta was used in the dummy timehands
before it was initialized.  Static initialization to 0 gives fail-safe
behaviour and unintended exercizing of the slow path.

The dummy timecounter has a low frequency, so its overflow threshold is
quite low.  I think it is not used even 1000000 times unless there is a
bug in the boot code, so it doesn't overflow in practice.  I did see
some strange crashes at boot time while testing this.

XX @@ -351,6 +353,26 @@
XX  	} while (gen == 0 || gen != th->th_generation);
XX  }
XX  #else /* !FFCLOCK */
XX +
XX +static __inline void
XX +bintime_adddelta(struct bintime *bt, struct timehands *th)

Only 1 utility function now.

XX +{
XX +	uint64_t scale, x;
XX +	u_int delta;
XX +
XX +	scale = th->th_scale;
XX +	delta = tc_delta(th);
XX +	if (__predict_false(delta < th->th_large_delta)) {
XX +		/* Avoid overflow for scale * delta. */
XX +		x = (scale >> 32) * delta;
XX +		bt->sec += x >> 32;
XX +		bintime_addx(bt, x << 32);
XX +		bintime_addx(bt, (scale & 0xffffffff) * delta);

This is clearer with all the scaling code together.

I thought of renaming x to x95_32 to sort of document that it holds bits
95..32 in a component of the product.

XX +	} else {
XX +		bintime_addx(bt, scale * delta);
XX +	}
XX +}
XX +
XX  void
XX  binuptime(struct bintime *bt)
XX  {
XX @@ -361,7 +383,7 @@
XX  		th = timehands;
XX  		gen = atomic_load_acq_int(&th->th_generation);
XX  		*bt = th->th_offset;
XX -		bintime_addx(bt, th->th_scale * tc_delta(th));
XX +		bintime_adddelta(bt, th);
XX  		atomic_thread_fence_acq();
XX  	} while (gen == 0 || gen != th->th_generation);
XX  }

This is the kind of non-churning change that I like.

The function name bintime_adddelta() isn't so good, but it is in the same
style as bintime_addx() where the names are worse.  bintime_addx() is global
so it needs a descriptive name more.  'delta' is more descriptive than 'x'
(x means a scalar and not a bintime).  The 'bintime' prefix is verbose.  It
should be bt, especially in non-global APIs.

XX @@ -394,7 +416,7 @@
XX  		th = timehands;
XX  		gen = atomic_load_acq_int(&th->th_generation);
XX  		*bt = th->th_bintime;
XX -		bintime_addx(bt, th->th_scale * tc_delta(th));
XX +		bintime_adddelta(bt, th);
XX  		atomic_thread_fence_acq();
XX  	} while (gen == 0 || gen != th->th_generation);
XX  }
XX @@ -1464,6 +1486,7 @@
XX  	scale += (th->th_adjustment / 1024) * 2199;
XX  	scale /= th->th_counter->tc_frequency;
XX  	th->th_scale = scale * 2;
XX +	th->th_large_delta = MIN(((uint64_t)1 << 63) / scale, UINT_MAX);
XX 
XX  	/*
XX  	 * Now that the struct timehands is again consistent, set the new

Clamp this to UINT_MAX now that it is stored in a u_int.

> The current patch becomes to large already, I want to test/commit what
> I already have, and I will need to split it for the commit.

It was already too large.
>
> diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
> index 2656fb4d22f..7114a0e5219 100644
> --- a/sys/kern/kern_tc.c
> +++ b/sys/kern/kern_tc.c
> ...
> @@ -200,22 +201,77 @@ tc_delta(struct timehands *th)
>  * the comment in <sys/time.h> for a description of these 12 functions.
>  */
>
> -#ifdef FFCLOCK
> -void
> -fbclock_binuptime(struct bintime *bt)
> +static __inline void
> +bintime_helper(struct bintime *bt, uint64_t scale, u_int delta)

This name is not descriptive.

> +static __inline void
> +binnouptime(struct bintime *bt, u_int off)

This name is an example of further problems with the naming scheme.
The bintime_ prefix used above is verbose, but it is at least a prefix
and is in the normal bintime_ namespace.  Here the prefix is 'bin',
which is neither of these.  It means bintime_ again, but this duplicates
'time'.

If I liked churn, then I would have changed all names here long ago.
E.g.:
- bintime_ -> bt_, and use it consistently
- timecounter -> tc except for the timecounter public variable
- fb_ -> facebook_ -> /dev/null.  Er, fb_ -> fbt_ or -> ft_.
- bt -> btp when bt is a pointer.  You used bts for a struct in this patch
- unsigned int -> u_int.  I policed this in early timecounter code.
   You fixed some instances of this too.
- th_generation -> th_gen.

Bruce

From owner-freebsd-hackers@freebsd.org  Thu Mar  7 22:22:35 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id EF9B215280EE;
 Thu,  7 Mar 2019 22:22:34 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id EACD675CB4;
 Thu,  7 Mar 2019 22:22:33 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x27MMMbY024576
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Fri, 8 Mar 2019 00:22:25 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x27MMMbY024576
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id x27MMKjN024519;
 Fri, 8 Mar 2019 00:22:20 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Fri, 8 Mar 2019 00:22:20 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Bruce Evans <brde@optusnet.com.au>
Cc: Mark Millard <marklmi@yahoo.com>,
 freebsd-hackers Hackers <freebsd-hackers@freebsd.org>,
 FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale *
 tc_delta(th) overflows unsigned 64 bits sometimes [patched failed]
Message-ID: <20190307222220.GK2492@kib.kiev.ua>
References: <20190302142521.GE68879@kib.kiev.ua>
 <20190303041441.V4781@besplex.bde.org>
 <20190303111931.GI68879@kib.kiev.ua>
 <20190303223100.B3572@besplex.bde.org>
 <20190303161635.GJ68879@kib.kiev.ua>
 <20190304043416.V5640@besplex.bde.org>
 <20190304114150.GM68879@kib.kiev.ua>
 <20190305031010.I4610@besplex.bde.org>
 <20190306172003.GD2492@kib.kiev.ua>
 <20190308001005.M2756@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20190308001005.M2756@besplex.bde.org>
User-Agent: Mutt/1.11.3 (2019-02-01)
X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM,
 NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Mar 2019 22:22:35 -0000

On Fri, Mar 08, 2019 at 01:31:30AM +1100, Bruce Evans wrote:
> On Wed, 6 Mar 2019, Konstantin Belousov wrote:
> 
> > On Tue, Mar 05, 2019 at 05:17:14AM +1100, Bruce Evans wrote:
> >> On Mon, 4 Mar 2019, Konstantin Belousov wrote:
> >>
> >>> On Mon, Mar 04, 2019 at 05:29:48AM +1100, Bruce Evans wrote:
> >>>> On Sun, 3 Mar 2019, Konstantin Belousov wrote:
> >>>>
> >>>>> On Mon, Mar 04, 2019 at 12:32:12AM +1100, Bruce Evans wrote:
> >* ...
> >> I strongly disklike the merge.
> >>
> >>>>> So I verified that:
> >>>>> - there is no 64bit multiplication in the generated code, for i386 both
> >>>>>  for clang 7.0 and gcc 8.3;
> >>>>> - that everything is inlined, the only call from bintime/binuptime is
> >>>>>  the indirect call to get the timecounter value.
> >>>>
> >>>> I will have to fix it for compilers that I use.
> >>> Ok, I will add __inline.
> >>
> >> That will make it fast enough, but still hard to read.
> >>
> >>>>> +		*bt = *bts;
> >>>>> +		scale = th->th_scale;
> >>>>> +		delta = tc_delta(th);
> >>>>> +#ifdef _LP64
> >>>>> +		if (__predict_false(th->th_large_delta <= delta)) {
> >>>>> +			/* Avoid overflow for scale * delta. */
> >>>>> +			bintime_helper(bt, scale, delta);
> >>>>> +			bintime_addx(bt, (scale & 0xffffffff) * delta);
> >>>>> +		} else {
> >>>>> +			bintime_addx(bt, scale * delta);
> >>>>> +		}
> >>>>> +#else
> >>>>> +		/*
> >>>>> +		 * Use bintime_helper() unconditionally, since the fast
> >>>>> +		 * path in the above method is not so fast here, since
> >>>>> +		 * the 64 x 32 -> 64 bit multiplication is usually not
> >>>>> +		 * available in hardware and emulating it using 2
> >>>>> +		 * 32 x 32 -> 64 bit multiplications uses code much
> >>>>> +		 * like that in bintime_helper().
> >>>>> +		 */
> >>>>> +		bintime_helper(bt, scale, delta);
> >>>>> +		bintime_addx(bt, (uint64_t)(uint32_t)scale * delta);
> >>>>> +#endif
> >>>>
> >>>> Check that this method is really better.  Without this, the complicated
> >>>> part is about half as large and duplicating it is smaller than this
> >>>> version.
> >>> Better in what sence ?  I am fine with the C code, and asm code looks
> >>> good.
> >>
> >> Better in terms of actually running significantly faster.  I fear the
> >> 32-bit method is actually slightly slower for the fast path.
> 
> I checked that it is just worse.  Significantly slower and more complicated.
> 
> I wrote and run a lot of timing benchmarks of various versions.  All
> times in cycles on Haswell @4.08 GHz.  On i386 except where noted:
> 
> - the fastest case is when compiled by clang with the default of -O2.
>    binuptime() in a loop then takes 34 cycles.  This is faster than possible
>    for latency, since rdtsc alone has a latency of 24 cycles.  There must be
>    several iterations of the loop running in parallel.
> 
> - the slowest case is when compiled by gcc-4.2.1 with my config  of -Os.
>    binuptime() in a loop then takes 116 cycles.  -Os does at least the
>    following pessimization: use memcpy() for copying the 12-byte struct
>    bitime.
> 
> - gcc-4.2.1 -O2 takes 74 cycles.  -O2 still does the following pessimization:
>    do a 64 x 32 -> 64 bit multiplication after not noticing that the first
>    operand has been reduced to 32 bits by a shift or mask.
> 
> The above tests were done with the final version.  The version which tested
> alternatives used switch (method) and takes about 20 cycles longer for the
> fastest version, presumably by defeating parallelism.  Times for various
> methods:
> 
> - with clang -Os, about 54 cycles for the old method that allowed overflow,
>    and the same for the version with the check of the overflow threshold
>    (but with the threshold never reached), and 59 cycles for the branch-
>    free method.  100-116 cycles with gcc-4.2.1 -Os, with the branch-free
>    method taking 5-10 cycles longer.
> 
> - on amd64, only a couple of cycles faster (49-50 cycles in best cases),
>    and gcc-4.2.1 only taking a ouple of cycles longer.  The branch-free
>    method still takes about 59 cycles so it is relatively worse.
> 
> In userland, using the syscall for syscall for clock_gettime(), the
> extra 5-10 cycles for the branch-free method is relatively insignificat.
> It is about 2 nanonseconds.  Other pessimizatations are more significant.
> Times for this syscall:
> - amd64 now: 224 nsec (with gcc-4.2.1 -Os)
> - i386 4+4 nopae: 500-580 nsec (depending on clang/gcc and -O2/-Os)
>    even getpid(2) takes 280 nsec.  Add at least 140 more nsec for pae.
> - i386 3+1: 224 nsec (with gcc 4.2.1 -Os)
> - i386 FreeBSD-5 UP: 193 nsec (with gcc-3.3.3 -O).
> - i386 4+4 nopae old library version of clock_gettime() compiled by
>    clang: 29 nsec.
> 
> In some tests, the version with the branch was even a cycle or two faster.
> In the tests, the branch was always perfectly predicted, so costs nothing
> except possibly by changing scheduling in an accidentally good way.  The
> tests were too small to measure the cost of using branch prediction
> resources.  I've never noticed a case where 1 more branch causes thrashing.
About testing such tight loops. There is a known phenomen where Intel
CPUs give absurd times when code in the loop has unsuitable alignment.
The manifestation of the phenomen is very surprising and hardly
controllable. It is due to the way the CPU front-end prefetches blocks
of bytes for instruction decoding and jmps locations in the blocks.

The only source explaining it is https://www.youtube.com/watch?v=IX16gcX4vDQ
the talk of Intel engineer.

> 
> >>>>> -	do {
> >>>>> -		th = timehands;
> >>>>> -		gen = atomic_load_acq_int(&th->th_generation);
> >>>>> -		*bt = th->th_bintime;
> >>>>> -		bintime_addx(bt, th->th_scale * tc_delta(th));
> >>>>> -		atomic_thread_fence_acq();
> >>>>> -	} while (gen == 0 || gen != th->th_generation);
> >>>>
> >>>> Duplicating this loop is much better than obfuscating it using inline
> >>>> functions.  This loop was almost duplicated (except for the delta
> >>>> calculation) in no less than 17 functions in kern_tc.c (9 tc ones and
> >>>> 8 fflock ones).  Now it is only duplicated 16 times.
> >>> How did you counted the 16 ?  I can see only 4 instances in the unpatched
> >>> kern_tc.c, and 3 in patched, but it is 3 and not 1 only because I do not
> >>> touch ffclock until the patch is finalized.  After that, it would be
> >>> 1 instance for kernel and 1 for userspace.
> >>
> >> Grep for the end condition in this loop.  There are actually 20 of these.
> >> I'm counting the loops and not the previously-simple scaling operation in
> >> it.  The scaling is indeed only done for 4 cases.  I prefer the 20
> >> duplications (except I only want about 6 of the functions).  Duplication
> >> works even better for only 4 cases.
> > Ok, I merged these as well.  Now there are only four loops left in kernel.
> > I do not think that merging them is beneficial, since they have sufficiently
> > different bodies.
> 
> This is exacly what I don't want.
> >
> > I disagree with you characterization of it as obfuscation, IMO it improves
> > the maintainability of the code by reducing number of places which need
> > careful inspection of the lock-less algorithm.
> 
> It makes the inspection and changes more difficult for each instance.
> General functions are more difficult to work with since they need more
> args to control them and can't be changed without affecting all callers.
> 
> In another thread, you didn't like similar churn for removing td args.
It is not similar.  I do valid refactoring there (in terms of that
thread, I do not like the term refactoring).  I eliminate dozen instrances
of very intricate loop which implements quite delicate lockless algorithm.
Its trickiness can be illustrated by the fact that it is only valid
use of thread_fence_acq() which cannot be replaced by load_acq() (similar
case is present in sys/seq.h).

> Here there isn't even a bug, since overflow only occurs when an invariant
> is violated.
> 
> >> This should be written as a function call to 1 new function to replace
> >> the line with the overflowing multiplication.  The line is always the
> >> same, so the new function call can look like bintime_xxx(bt, th).
> > Again, please provide at least of a pseudocode of your preference.
> 
> The following is a complete tested and benchmarked implementation, with a
> couple more minor fixes:
> 
> XX Index: kern_tc.c
> XX ===================================================================
> XX --- kern_tc.c	(revision 344852)
> XX +++ kern_tc.c	(working copy)
> XX @@ -72,6 +72,7 @@
> XX  	struct timecounter	*th_counter;
> XX  	int64_t			th_adjustment;
> XX  	uint64_t		th_scale;
> XX +	u_int			th_large_delta;
> XX  	u_int	 		th_offset_count;
> XX  	struct bintime		th_offset;
> XX  	struct bintime		th_bintime;
> 
> Improvement not already discussed: use a u_int limit for the u_int variable.
> 
> XX @@ -90,6 +91,7 @@
> XX  static struct timehands th0 = {
> XX  	.th_counter = &dummy_timecounter,
> XX  	.th_scale = (uint64_t)-1 / 1000000,
> XX +	.th_large_delta = 1000000,
> XX  	.th_offset = { .sec = 1 },
> XX  	.th_generation = 1,
> XX  	.th_next = &th1
> 
> Fix not already discussed: th_large_delta was used in the dummy timehands
> before it was initialized.  Static initialization to 0 gives fail-safe
> behaviour and unintended exercizing of the slow path.
> 
> The dummy timecounter has a low frequency, so its overflow threshold is
> quite low.  I think it is not used even 1000000 times unless there is a
> bug in the boot code, so it doesn't overflow in practice.  I did see
> some strange crashes at boot time while testing this.
> 
> XX @@ -351,6 +353,26 @@
> XX  	} while (gen == 0 || gen != th->th_generation);
> XX  }
> XX  #else /* !FFCLOCK */
> XX +
> XX +static __inline void
> XX +bintime_adddelta(struct bintime *bt, struct timehands *th)
> 
> Only 1 utility function now.
And in my patch this helper function is called only once, so I
inlined it manually.

> 
> XX +{
> XX +	uint64_t scale, x;
> XX +	u_int delta;
> XX +
> XX +	scale = th->th_scale;
> XX +	delta = tc_delta(th);
> XX +	if (__predict_false(delta < th->th_large_delta)) {
> XX +		/* Avoid overflow for scale * delta. */
> XX +		x = (scale >> 32) * delta;
> XX +		bt->sec += x >> 32;
> XX +		bintime_addx(bt, x << 32);
> XX +		bintime_addx(bt, (scale & 0xffffffff) * delta);
> 
> This is clearer with all the scaling code together.
> 
> I thought of renaming x to x95_32 to sort of document that it holds bits
> 95..32 in a component of the product.
> 
> XX +	} else {
> XX +		bintime_addx(bt, scale * delta);
> XX +	}
> XX +}
> XX +
> XX  void
> XX  binuptime(struct bintime *bt)
> XX  {
> XX @@ -361,7 +383,7 @@
> XX  		th = timehands;
> XX  		gen = atomic_load_acq_int(&th->th_generation);
> XX  		*bt = th->th_offset;
> XX -		bintime_addx(bt, th->th_scale * tc_delta(th));
> XX +		bintime_adddelta(bt, th);
> XX  		atomic_thread_fence_acq();
> XX  	} while (gen == 0 || gen != th->th_generation);
> XX  }
> 
> This is the kind of non-churning change that I like.
Ok.  I made all cases where timehands are read, more uniform by
moving calculations after the generation loop.  This makes the
atomic part of the functions easier to see, and loop body has lower
chance to hit generation reset.

> 
> The function name bintime_adddelta() isn't so good, but it is in the same
> style as bintime_addx() where the names are worse.  bintime_addx() is global
> so it needs a descriptive name more.  'delta' is more descriptive than 'x'
> (x means a scalar and not a bintime).  The 'bintime' prefix is verbose.  It
> should be bt, especially in non-global APIs.
> 
> XX @@ -394,7 +416,7 @@
> XX  		th = timehands;
> XX  		gen = atomic_load_acq_int(&th->th_generation);
> XX  		*bt = th->th_bintime;
> XX -		bintime_addx(bt, th->th_scale * tc_delta(th));
> XX +		bintime_adddelta(bt, th);
> XX  		atomic_thread_fence_acq();
> XX  	} while (gen == 0 || gen != th->th_generation);
> XX  }
> XX @@ -1464,6 +1486,7 @@
> XX  	scale += (th->th_adjustment / 1024) * 2199;
> XX  	scale /= th->th_counter->tc_frequency;
> XX  	th->th_scale = scale * 2;
> XX +	th->th_large_delta = MIN(((uint64_t)1 << 63) / scale, UINT_MAX);
> XX 
> XX  	/*
> XX  	 * Now that the struct timehands is again consistent, set the new
> 
> Clamp this to UINT_MAX now that it is stored in a u_int.
> 
> > The current patch becomes to large already, I want to test/commit what
> > I already have, and I will need to split it for the commit.
> 
> It was already too large.
> >
> > diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
> > index 2656fb4d22f..7114a0e5219 100644
> > --- a/sys/kern/kern_tc.c
> > +++ b/sys/kern/kern_tc.c
> > ...
> > @@ -200,22 +201,77 @@ tc_delta(struct timehands *th)
> >  * the comment in <sys/time.h> for a description of these 12 functions.
> >  */
> >
> > -#ifdef FFCLOCK
> > -void
> > -fbclock_binuptime(struct bintime *bt)
> > +static __inline void
> > +bintime_helper(struct bintime *bt, uint64_t scale, u_int delta)
> 
> This name is not descriptive.
> 
> > +static __inline void
> > +binnouptime(struct bintime *bt, u_int off)
> 
> This name is an example of further problems with the naming scheme.
> The bintime_ prefix used above is verbose, but it is at least a prefix
> and is in the normal bintime_ namespace.  Here the prefix is 'bin',
> which is neither of these.  It means bintime_ again, but this duplicates
> 'time'.
I agree, and I made a name getthmember for the other function which clearly
reflect its operation.  For this one, I ended with bintime_off().

> 
> If I liked churn, then I would have changed all names here long ago.
> E.g.:
> - bintime_ -> bt_, and use it consistently
> - timecounter -> tc except for the timecounter public variable
> - fb_ -> facebook_ -> /dev/null.  Er, fb_ -> fbt_ or -> ft_.
> - bt -> btp when bt is a pointer.  You used bts for a struct in this patch
> - unsigned int -> u_int.  I policed this in early timecounter code.
>    You fixed some instances of this too.
> - th_generation -> th_gen.

diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
index 2656fb4d22f..8d12847f2cd 100644
--- a/sys/kern/kern_tc.c
+++ b/sys/kern/kern_tc.c
@@ -72,6 +72,7 @@ struct timehands {
 	struct timecounter	*th_counter;
 	int64_t			th_adjustment;
 	uint64_t		th_scale;
+	u_int			th_large_delta;
 	u_int	 		th_offset_count;
 	struct bintime		th_offset;
 	struct bintime		th_bintime;
@@ -90,6 +91,7 @@ static struct timehands th1 = {
 static struct timehands th0 = {
 	.th_counter = &dummy_timecounter,
 	.th_scale = (uint64_t)-1 / 1000000,
+	.th_large_delta = 1000000,
 	.th_offset = { .sec = 1 },
 	.th_generation = 1,
 	.th_next = &th1
@@ -200,20 +202,56 @@ tc_delta(struct timehands *th)
  * the comment in <sys/time.h> for a description of these 12 functions.
  */
 
-#ifdef FFCLOCK
-void
-fbclock_binuptime(struct bintime *bt)
+static __inline void
+bintime_off(struct bintime *bt, u_int off)
 {
 	struct timehands *th;
-	unsigned int gen;
+	struct bintime *btp;
+	uint64_t scale, x;
+	u_int delta, gen, large_delta;
 
 	do {
 		th = timehands;
 		gen = atomic_load_acq_int(&th->th_generation);
-		*bt = th->th_offset;
-		bintime_addx(bt, th->th_scale * tc_delta(th));
+		btp = (struct bintime *)((vm_offset_t)th + off);
+		*bt = *btp;
+		scale = th->th_scale;
+		delta = tc_delta(th);
+		large_delta = th->th_large_delta;
 		atomic_thread_fence_acq();
 	} while (gen == 0 || gen != th->th_generation);
+
+	if (__predict_false(delta < large_delta)) {
+		/* Avoid overflow for scale * delta. */
+		x = (scale >> 32) * delta;
+		bt->sec += x >> 32;
+		bintime_addx(bt, x << 32);
+		bintime_addx(bt, (scale & 0xffffffff) * delta);
+	} else {
+		bintime_addx(bt, scale * delta);
+	}
+}
+
+static __inline void
+getthmember(void *out, size_t out_size, u_int off)
+{
+	struct timehands *th;
+	u_int gen;
+
+	do {
+		th = timehands;
+		gen = atomic_load_acq_int(&th->th_generation);
+		memcpy(out, (char *)th + off, out_size);
+		atomic_thread_fence_acq();
+	} while (gen == 0 || gen != th->th_generation);
+}
+
+#ifdef FFCLOCK
+void
+fbclock_binuptime(struct bintime *bt)
+{
+
+	bintime_off(bt, __offsetof(struct timehands, th_offset));
 }
 
 void
@@ -237,16 +275,8 @@ fbclock_microuptime(struct timeval *tvp)
 void
 fbclock_bintime(struct bintime *bt)
 {
-	struct timehands *th;
-	unsigned int gen;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		*bt = th->th_bintime;
-		bintime_addx(bt, th->th_scale * tc_delta(th));
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	bintime_off(bt, __offsetof(struct timehands, th_bintime));
 }
 
 void
@@ -270,100 +300,61 @@ fbclock_microtime(struct timeval *tvp)
 void
 fbclock_getbinuptime(struct bintime *bt)
 {
-	struct timehands *th;
-	unsigned int gen;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		*bt = th->th_offset;
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	getthmember(bt, sizeof(*bt), __offsetof(struct timehands,
+	    th_offset));
 }
 
 void
 fbclock_getnanouptime(struct timespec *tsp)
 {
-	struct timehands *th;
-	unsigned int gen;
+	struct bintime bt;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		bintime2timespec(&th->th_offset, tsp);
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	getthmember(&bt, sizeof(bt), __offsetof(struct timehands,
+	    th_offset));
+	bintime2timespec(&bt, tsp);
 }
 
 void
 fbclock_getmicrouptime(struct timeval *tvp)
 {
-	struct timehands *th;
-	unsigned int gen;
+	struct bintime bt;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		bintime2timeval(&th->th_offset, tvp);
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	getthmember(&bt, sizeof(bt), __offsetof(struct timehands,
+	    th_offset));
+	bintime2timeval(&bt, tvp);
 }
 
 void
 fbclock_getbintime(struct bintime *bt)
 {
-	struct timehands *th;
-	unsigned int gen;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		*bt = th->th_bintime;
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	getthmember(bt, sizeof(*bt), __offsetof(struct timehands,
+	    th_bintime));
 }
 
 void
 fbclock_getnanotime(struct timespec *tsp)
 {
-	struct timehands *th;
-	unsigned int gen;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		*tsp = th->th_nanotime;
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	getthmember(tsp, sizeof(*tsp), __offsetof(struct timehands,
+	    th_nanotime));
 }
 
 void
 fbclock_getmicrotime(struct timeval *tvp)
 {
-	struct timehands *th;
-	unsigned int gen;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		*tvp = th->th_microtime;
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	getthmember(tvp, sizeof(*tvp), __offsetof(struct timehands,
+	    th_microtime));
 }
 #else /* !FFCLOCK */
+
 void
 binuptime(struct bintime *bt)
 {
-	struct timehands *th;
-	u_int gen;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		*bt = th->th_offset;
-		bintime_addx(bt, th->th_scale * tc_delta(th));
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	bintime_off(bt, __offsetof(struct timehands, th_offset));
 }
 
 void
@@ -387,16 +378,8 @@ microuptime(struct timeval *tvp)
 void
 bintime(struct bintime *bt)
 {
-	struct timehands *th;
-	u_int gen;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		*bt = th->th_bintime;
-		bintime_addx(bt, th->th_scale * tc_delta(th));
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	bintime_off(bt, __offsetof(struct timehands, th_bintime));
 }
 
 void
@@ -420,85 +403,53 @@ microtime(struct timeval *tvp)
 void
 getbinuptime(struct bintime *bt)
 {
-	struct timehands *th;
-	u_int gen;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		*bt = th->th_offset;
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	getthmember(bt, sizeof(*bt), __offsetof(struct timehands,
+	    th_offset));
 }
 
 void
 getnanouptime(struct timespec *tsp)
 {
-	struct timehands *th;
-	u_int gen;
+	struct bintime bt;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		bintime2timespec(&th->th_offset, tsp);
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	getthmember(&bt, sizeof(bt), __offsetof(struct timehands,
+	    th_offset));
+	bintime2timespec(&bt, tsp);
 }
 
 void
 getmicrouptime(struct timeval *tvp)
 {
-	struct timehands *th;
-	u_int gen;
+	struct bintime bt;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		bintime2timeval(&th->th_offset, tvp);
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	getthmember(&bt, sizeof(bt), __offsetof(struct timehands,
+	    th_offset));
+	bintime2timeval(&bt, tvp);
 }
 
 void
 getbintime(struct bintime *bt)
 {
-	struct timehands *th;
-	u_int gen;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		*bt = th->th_bintime;
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	getthmember(bt, sizeof(*bt), __offsetof(struct timehands,
+	    th_bintime));
 }
 
 void
 getnanotime(struct timespec *tsp)
 {
-	struct timehands *th;
-	u_int gen;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		*tsp = th->th_nanotime;
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	getthmember(tsp, sizeof(*tsp), __offsetof(struct timehands,
+	    th_nanotime));
 }
 
 void
 getmicrotime(struct timeval *tvp)
 {
-	struct timehands *th;
-	u_int gen;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		*tvp = th->th_microtime;
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	getthmember(tvp, sizeof(*tvp), __offsetof(struct timehands,
+	    th_microtime));
 }
 #endif /* FFCLOCK */
 
@@ -514,15 +465,9 @@ getboottime(struct timeval *boottime)
 void
 getboottimebin(struct bintime *boottimebin)
 {
-	struct timehands *th;
-	u_int gen;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		*boottimebin = th->th_boottime;
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	getthmember(boottimebin, sizeof(*boottimebin),
+	    __offsetof(struct timehands, th_boottime));
 }
 
 #ifdef FFCLOCK
@@ -1038,15 +983,9 @@ getmicrotime(struct timeval *tvp)
 void
 dtrace_getnanotime(struct timespec *tsp)
 {
-	struct timehands *th;
-	u_int gen;
 
-	do {
-		th = timehands;
-		gen = atomic_load_acq_int(&th->th_generation);
-		*tsp = th->th_nanotime;
-		atomic_thread_fence_acq();
-	} while (gen == 0 || gen != th->th_generation);
+	getthmember(tsp, sizeof(*tsp), __offsetof(struct timehands,
+	    th_nanotime));
 }
 
 /*
@@ -1464,6 +1403,7 @@ tc_windup(struct bintime *new_boottimebin)
 	scale += (th->th_adjustment / 1024) * 2199;
 	scale /= th->th_counter->tc_frequency;
 	th->th_scale = scale * 2;
+	th->th_large_delta = MIN(((uint64_t)1 << 63) / scale, UINT_MAX);
 
 	/*
 	 * Now that the struct timehands is again consistent, set the new

From owner-freebsd-hackers@freebsd.org  Fri Mar  8 01:30:04 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id D94C6152DFD0
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Fri,  8 Mar 2019 01:30:03 +0000 (UTC)
 (envelope-from marklmi@yahoo.com)
Received: from sonic309-22.consmr.mail.ne1.yahoo.com
 (sonic309-22.consmr.mail.ne1.yahoo.com [66.163.184.148])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 6A57A847A6
 for <freebsd-hackers@freebsd.org>; Fri,  8 Mar 2019 01:30:03 +0000 (UTC)
 (envelope-from marklmi@yahoo.com)
X-YMail-OSG: z7zvHfUVM1mQmMyjx4YL2r07_97JyYrstZve3Kprd7hdb8Zf3FX97lkWdYwxOem
 K2rTM9be1lSJH0yuEcOwo5Y_k8ea01xm6t23GwJ7ygwcqU2OX7MfiRkd0BE_855EnA9xkvZzrzC_
 hZIEB7PU9iMioc9RmA06Q51xjPHwK8HRwodgvDtbmjrOrSHA9hYMdbQO_leLQhqk_3mnAPRhCUK9
 rg0TX3rXjrlUFb0xy32FJ2ta2UO4Zhlal63JsUZlhIqwr6YrZY8R7L2xM7bWPioYv8i59NaZtYAz
 kRMLHnqYtjS.U0qigzQ6Lwvf0_w37ZXjOcGhiOes9KFNj96rc4bE3_XQs9MqaRLzeVWYSAatz4tQ
 wca8vd6IZzAMs3xkUv1Ul4v7gcZzMiC_95BAYGo6kA9sN7z6AgsfTx_lxt7neFgthSyc81OZ2EqI
 lDsLQOT3Tl3LLqW87xR0kXAUdE9NTZyW7y1B4NNK8RMGnwFeXJec1guAf7RQ.55K89Do.zv9dUEA
 kBuOK_QvLZ3JkhKpISbVSEOzSFdL5mzHaOsVGMYwBgfGgAVZ.6ZMHDoMVnf5pYY7P_eEv9.XNNsG
 8b3QGI7KlOksdMqNie3QfnFb_SMMWVAcSgqWxxpY4f2cYErrY4XJETCph4K4nIzplxe9fgYj30_7
 QjhqetzjHOmYVqy4cx8S4LTDT57GfoBd3kRLY.6YVXSjEb1WvjY6LHnnFgggVrhAvSXA.U24EJxE
 F8VRxFRy04MVERSirC7mn6I0X0kBZ6mjrfRV9ZFdWJFpJ5hSavjOOR6s4.9nebI9iBjy2C51Y3i9
 NmA0sT6_1315MSiC156DRUL1MU0t7GGSUCbWyvv8__d0wPatERnPbZqpMzbWvV1IfxJd06pFEvFx
 fr0X30.4Ga8AeYR_g6jg7F3r2_Wuf7yEz9Sl7536KbCPGZFLnUR.fttF1p6yrL0pz_PzJzSZIe1R
 VCkXhztBy3VMjKLyGCPFQ8U4MjMTT3B34beonXdJXa_44xixvLGSGrNnVt.7moNBKXiRGrRkzDOO
 F6Wgpyj565EcsXqeCr8E_H5H7qF5jcT1zzv9ZeZxaTu6vPZNEk03Yp9L4NxU5U8sbmOHTfg--
Received: from sonic.gate.mail.ne1.yahoo.com by
 sonic309.consmr.mail.ne1.yahoo.com with HTTP; Fri, 8 Mar 2019 01:29:56 +0000
Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.113])
 ([67.170.167.181])
 by smtp428.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID
 f665bae4c52bbab61751dd46e593eb0e; 
 Fri, 08 Mar 2019 01:29:53 +0000 (UTC)
Content-Type: text/plain;
	charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\))
Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale *
 tc_delta(th) overflows unsigned 64 bits sometimes [patched failed]
From: Mark Millard <marklmi@yahoo.com>
In-Reply-To: <20190307222220.GK2492@kib.kiev.ua>
Date: Thu, 7 Mar 2019 17:29:51 -0800
Cc: Bruce Evans <brde@optusnet.com.au>,
 freebsd-hackers Hackers <freebsd-hackers@freebsd.org>,
 FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Content-Transfer-Encoding: 7bit
Message-Id: <5EED3352-2E8C-4BEE-B281-4AC8DE9570C2@yahoo.com>
References: <20190302142521.GE68879@kib.kiev.ua>
 <20190303041441.V4781@besplex.bde.org> <20190303111931.GI68879@kib.kiev.ua>
 <20190303223100.B3572@besplex.bde.org> <20190303161635.GJ68879@kib.kiev.ua>
 <20190304043416.V5640@besplex.bde.org> <20190304114150.GM68879@kib.kiev.ua>
 <20190305031010.I4610@besplex.bde.org> <20190306172003.GD2492@kib.kiev.ua>
 <20190308001005.M2756@besplex.bde.org> <20190307222220.GK2492@kib.kiev.ua>
To: Konstantin Belousov <kostikbel@gmail.com>
X-Mailer: Apple Mail (2.3445.102.3)
X-Rspamd-Queue-Id: 6A57A847A6
X-Spamd-Bar: ------
X-Spamd-Result: default: False [-6.98 / 15.00];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]; REPLY(-4.00)[];
 NEURAL_HAM_SHORT(-0.98)[-0.977,0]
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Mar 2019 01:30:04 -0000

A basic question and a small note.

Question's context for it tc->tc_get_timecount(tc) values: 

In the powerpc64 context tc->tc_get_timecount(tc) is the lower
32 bits of the tbr, in my context having a 33,333,333 MHz or so
increment rate for a machine with a 2.5 GHz or so clock rate.
The truncated 32 bit tbr value wraps every 128 seconds or so.
2 sockets, 2 cores per socket, so 4 separate tbr values.

The question is . . .

In tc_delta's:

    tc->tc_get_timecount(tc) - th->th_offset_count

is observing tc->tc_get_timecount(tc) < th->th_offset_count
ever supposed to be possible in correct operation, other than
tc->tc_get_timecount(tc) having wrapped around (and so being 
newly 0 or "near" 0, no evidence of of having it having been
near 128 seconds or more for my context)?


The note:

On 2019-Mar-7, at 14:22, Konstantin Belousov <kostikbel@gmail.com> wrote:

> . . .
> +
> +	if (__predict_false(delta < large_delta)) {

I thought that delta<large_delta was the non-overflow context
for scale*delta and that the overflow case for the multiplication
was when delta>=large_delta .

> +		/* Avoid overflow for scale * delta. */
> +		x = (scale >> 32) * delta;
> +		bt->sec += x >> 32;
> +		bintime_addx(bt, x << 32);
> +		bintime_addx(bt, (scale & 0xffffffff) * delta);
> +	} else {
> +		bintime_addx(bt, scale * delta);
> +	}
> . . .

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)


From owner-freebsd-hackers@freebsd.org  Fri Mar  8 23:36:24 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 39C0C152DF4C
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Fri,  8 Mar 2019 23:36:24 +0000 (UTC)
 (envelope-from darius@dons.net.au)
Received: from ipmail07.adl2.internode.on.net (ipmail07.adl2.internode.on.net
 [150.101.137.131])
 by mx1.freebsd.org (Postfix) with ESMTP id 2F7F891D23
 for <freebsd-hackers@freebsd.org>; Fri,  8 Mar 2019 23:36:14 +0000 (UTC)
 (envelope-from darius@dons.net.au)
Received: from ppp118-210-135-201.adl-adc-lon-bras33.tpg.internode.on.net
 (HELO midget.dons.net.au) ([118.210.135.201])
 by ipmail07.adl2.internode.on.net with ESMTP; 09 Mar 2019 10:00:53 +1030
Received: from midget.dons.net.au (localhost [127.0.0.1])
 by midget.dons.net.au (8.15.2/8.15.2) with ESMTPS id x28NUgQt013125
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO)
 for <freebsd-hackers@freebsd.org>; Sat, 9 Mar 2019 10:00:47 +1030 (ACDT)
 (envelope-from darius@dons.net.au)
Received: (from mailnull@localhost)
 by midget.dons.net.au (8.15.2/8.15.2/Submit) id x28N8fHB094865
 for <freebsd-hackers@freebsd.org>; Sat, 9 Mar 2019 09:38:41 +1030 (ACDT)
 (envelope-from darius@dons.net.au)
X-Authentication-Warning: midget.dons.net.au: mailnull set sender to
 <darius@dons.net.au> using -f
Received: from [10.0.2.26] ([10.0.2.26])
 by ns.dons.net.au (envelope-sender <darius@dons.net.au>) (MIMEDefang) with
 ESMTP id x28N8fq1094864; Sat, 09 Mar 2019 09:38:41 +1030
From: "O'Connor, Daniel" <darius@dons.net.au>
Content-Type: text/plain;
	charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\))
Date: Sat, 9 Mar 2019 09:38:40 +1030
Subject: USB stack getting confused
Message-Id: <E0371188-FD0A-47E1-8378-40239F5C6622@dons.net.au>
To: FreeBSD Hackers <freebsd-hackers@freebsd.org>
X-Mailer: Apple Mail (2.3445.102.3)
X-Spam-Score: -1 () No,
 score=-1.0 required=5.0 tests=ALL_TRUSTED autolearn=unavailable
 autolearn_force=no version=3.4.1
X-Scanned-By: MIMEDefang 2.83 on 10.0.2.1
X-Rspamd-Queue-Id: 2F7F891D23
X-Spamd-Bar: +++++
Authentication-Results: mx1.freebsd.org
X-Spamd-Result: default: False [5.16 / 15.00]; MV_CASE(0.50)[];
 HAS_XAW(0.00)[]; TO_DN_ALL(0.00)[];
 MX_GOOD(-0.01)[midget.dons.net.au]; RCVD_NO_TLS_LAST(0.10)[];
 RECEIVED_SPAMHAUS_PBL(0.00)[201.135.210.118.zen.spamhaus.org : 127.0.0.11];
 RCVD_IN_DNSWL_LOW(-0.10)[131.137.101.150.list.dnswl.org : 127.0.5.1];
 R_DKIM_NA(0.00)[];
 ASN(0.00)[asn:4739, ipnet:150.101.0.0/16, country:AU];
 MIME_TRACE(0.00)[0:+]; MID_RHS_MATCH_FROM(0.00)[];
 FROM_EQ_ENVFROM(0.00)[]; ARC_NA(0.00)[];
 RCVD_COUNT_FIVE(0.00)[5];
 IP_SCORE(0.80)[ip: (2.65), ipnet: 150.101.0.0/16(1.08), asn: 4739(0.33),
 country: AU(-0.04)]; FROM_HAS_DN(0.00)[];
 TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_SPAM_SHORT(0.97)[0.970,0];
 MIME_GOOD(-0.10)[text/plain];
 PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org];
 AUTH_NA(1.00)[]; NEURAL_SPAM_MEDIUM(1.00)[1.000,0];
 RCPT_COUNT_ONE(0.00)[1]; DMARC_NA(0.00)[dons.net.au];
 NEURAL_SPAM_LONG(1.00)[1.000,0]; R_SPF_NA(0.00)[]
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Mar 2019 23:36:24 -0000

Hi,
I'm developing a data acquisition system on FreeBSD using a USB3 =
interface (the OrangeTree ZestSC3) and I find that the USB stack appears =
to 'lose' the device after a while.

My program normally runs continually doing acquisitions of data for N =
seconds, doing some checks and restarting. After a while (~30 1 minute =
acquisitions or ~8 30 minute ones) my program can't 'see' the device (it =
uses libusb10) any more (it reconnects each acquisition for $REASONS). =
Also pretty weirdly usbconfig can't see it either(!).

If I stop my program the device reappears in usbconfig. If I restart my =
program it works.

I did some GDB'ing and it appears that ugen20_enumerate (the libusb10 =
interface is implemented by calling libusb20 functions) can't open =
/dev/ugenX.Y and errno is 12 (ENOMEM).

After digging with dtrace I have seen the open method be something =
different for this device. I have also seen it where opening the device =
doesn't call usb_fifo_open (not sure what it *does* call though - I see =
user land call openat but haven't traced through what gets called).

I'm still digging but am somewhat hopeful someone can suggest some =
things to look at :)

This is on 11.2 if it matters.

Thanks.

--
Daniel O'Connor
"The nice thing about standards is that there
are so many of them to choose from."
 -- Andrew Tanenbaum


From owner-freebsd-hackers@freebsd.org  Sat Mar  9 09:01:23 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id B9D8D153DCAC
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Sat,  9 Mar 2019 09:01:23 +0000 (UTC) (envelope-from hps@selasky.org)
Received: from mail.turbocat.net (turbocat.net [IPv6:2a01:4f8:c17:6c4b::2])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id B018F758CA
 for <freebsd-hackers@freebsd.org>; Sat,  9 Mar 2019 09:01:22 +0000 (UTC)
 (envelope-from hps@selasky.org)
Received: from hps2016.home.selasky.org (unknown [176.74.212.121])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by mail.turbocat.net (Postfix) with ESMTPSA id F368B260209;
 Sat,  9 Mar 2019 10:01:18 +0100 (CET)
Subject: Re: USB stack getting confused
To: "O'Connor, Daniel" <darius@dons.net.au>,
 FreeBSD Hackers <freebsd-hackers@freebsd.org>
References: <E0371188-FD0A-47E1-8378-40239F5C6622@dons.net.au>
From: Hans Petter Selasky <hps@selasky.org>
Message-ID: <f3e6e30b-8b62-546b-2b51-e841f2e645bd@selasky.org>
Date: Sat, 9 Mar 2019 10:00:56 +0100
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:60.0) Gecko/20100101
 Thunderbird/60.4.0
MIME-Version: 1.0
In-Reply-To: <E0371188-FD0A-47E1-8378-40239F5C6622@dons.net.au>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Rspamd-Queue-Id: B018F758CA
X-Spamd-Bar: -----
Authentication-Results: mx1.freebsd.org;
 spf=pass (mx1.freebsd.org: domain of hps@selasky.org designates
 2a01:4f8:c17:6c4b::2 as permitted sender) smtp.mailfrom=hps@selasky.org
X-Spamd-Result: default: False [-5.86 / 15.00]; ARC_NA(0.00)[];
 RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+a:mail.turbocat.net];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain];
 DMARC_NA(0.00)[selasky.org]; TO_MATCH_ENVRCPT_SOME(0.00)[];
 TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: mail.turbocat.net];
 RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-0.90)[-0.895,0];
 IP_SCORE(-2.66)[ip: (-8.73), ipnet: 2a01:4f8::/29(-2.31), asn: 24940(-2.23),
 country: DE(-0.01)]; FROM_EQ_ENVFROM(0.00)[];
 R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+];
 ASN(0.00)[asn:24940, ipnet:2a01:4f8::/29, country:DE];
 MID_RHS_MATCH_FROM(0.00)[]; RCVD_TLS_ALL(0.00)[];
 RCVD_COUNT_TWO(0.00)[2]
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 09 Mar 2019 09:01:23 -0000

On 3/9/19 12:08 AM, O'Connor, Daniel wrote:
> My program normally runs continually doing acquisitions of data for N seconds, doing some checks and restarting. After a while (~30 1 minute acquisitions or ~8 30 minute ones) my program can't 'see' the device (it uses libusb10) any more (it reconnects each acquisition for $REASONS). Also pretty weirdly usbconfig can't see it either(!).

What is printed in dmesg? Maybe the device has a problem.

--HPS

From owner-freebsd-hackers@freebsd.org  Sat Mar  9 10:36:15 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9DD631540E58
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Sat,  9 Mar 2019 10:36:15 +0000 (UTC)
 (envelope-from darius@dons.net.au)
Received: from ipmail02.adl2.internode.on.net (ipmail02.adl2.internode.on.net
 [150.101.137.139])
 by mx1.freebsd.org (Postfix) with ESMTP id 3E0308115D
 for <freebsd-hackers@freebsd.org>; Sat,  9 Mar 2019 10:36:11 +0000 (UTC)
 (envelope-from darius@dons.net.au)
Received: from 124-148-131-52.dyn.iinet.net.au (HELO midget.dons.net.au)
 ([124.148.131.52])
 by ipmail02.adl2.internode.on.net with ESMTP; 09 Mar 2019 21:00:56 +1030
Received: from midget.dons.net.au (localhost [127.0.0.1])
 by midget.dons.net.au (8.15.2/8.15.2) with ESMTPS id x29AUhwe080338
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO)
 for <freebsd-hackers@freebsd.org>; Sat, 9 Mar 2019 21:00:52 +1030 (ACDT)
 (envelope-from darius@dons.net.au)
Received: (from mailnull@localhost)
 by midget.dons.net.au (8.15.2/8.15.2/Submit) id x29ATVuT076664
 for <freebsd-hackers@freebsd.org>; Sat, 9 Mar 2019 20:59:31 +1030 (ACDT)
 (envelope-from darius@dons.net.au)
X-Authentication-Warning: midget.dons.net.au: mailnull set sender to
 <darius@dons.net.au> using -f
Received: from [10.0.2.26] ([10.0.2.26])
 by ns.dons.net.au (envelope-sender <darius@dons.net.au>) (MIMEDefang) with
 ESMTP id x29ATUqq076662; Sat, 09 Mar 2019 20:59:31 +1030
Content-Type: text/plain;
	charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\))
Subject: Re: USB stack getting confused
From: "O'Connor, Daniel" <darius@dons.net.au>
In-Reply-To: <f3e6e30b-8b62-546b-2b51-e841f2e645bd@selasky.org>
Date: Sat, 9 Mar 2019 20:59:30 +1030
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <3B29D870-41F9-46AF-B9F3-03106DEC417D@dons.net.au>
References: <E0371188-FD0A-47E1-8378-40239F5C6622@dons.net.au>
 <f3e6e30b-8b62-546b-2b51-e841f2e645bd@selasky.org>
To: Hans Petter Selasky <hps@selasky.org>
X-Mailer: Apple Mail (2.3445.102.3)
X-Spam-Score: -1 () No,
 score=-1.0 required=5.0 tests=ALL_TRUSTED autolearn=unavailable
 autolearn_force=no version=3.4.1
X-Scanned-By: MIMEDefang 2.83 on 10.0.2.1
X-Rspamd-Queue-Id: 3E0308115D
X-Spamd-Bar: ++++
Authentication-Results: mx1.freebsd.org
X-Spamd-Result: default: False [4.41 / 15.00]; MV_CASE(0.50)[];
 HAS_XAW(0.00)[]; TO_DN_ALL(0.00)[];
 MX_GOOD(-0.01)[cached: midget.dons.net.au];
 RCPT_COUNT_TWO(0.00)[2]; RCVD_NO_TLS_LAST(0.10)[];
 FROM_EQ_ENVFROM(0.00)[];
 RCVD_IN_DNSWL_LOW(-0.10)[139.137.101.150.list.dnswl.org : 127.0.5.1];
 R_DKIM_NA(0.00)[];
 ASN(0.00)[asn:4739, ipnet:150.101.0.0/16, country:AU];
 MIME_TRACE(0.00)[0:+]; MID_RHS_MATCH_FROM(0.00)[];
 RECEIVED_SPAMHAUS_PBL(0.00)[52.131.148.124.zen.spamhaus.org : 127.0.0.11];
 ARC_NA(0.00)[]; RCVD_COUNT_FIVE(0.00)[5];
 IP_SCORE(0.27)[ipnet: 150.101.0.0/16(1.06), asn: 4739(0.33), country:
 AU(-0.04)]; FROM_HAS_DN(0.00)[];
 NEURAL_SPAM_SHORT(0.93)[0.931,0]; MIME_GOOD(-0.10)[text/plain];
 PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org];
 AUTH_NA(1.00)[]; NEURAL_SPAM_MEDIUM(0.90)[0.901,0];
 DMARC_NA(0.00)[dons.net.au]; TO_MATCH_ENVRCPT_SOME(0.00)[];
 NEURAL_SPAM_LONG(0.92)[0.916,0]; R_SPF_NA(0.00)[]
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 09 Mar 2019 10:36:15 -0000


> On 9 Mar 2019, at 19:30, Hans Petter Selasky <hps@selasky.org> wrote:
> On 3/9/19 12:08 AM, O'Connor, Daniel wrote:
>> My program normally runs continually doing acquisitions of data for N =
seconds, doing some checks and restarting. After a while (~30 1 minute =
acquisitions or ~8 30 minute ones) my program can't 'see' the device (it =
uses libusb10) any more (it reconnects each acquisition for $REASONS). =
Also pretty weirdly usbconfig can't see it either(!).
>=20
> What is printed in dmesg? Maybe the device has a problem.

There is nothing in dmesg - no disconnect / reconnect etc.

If I hold the user space process in gdb 'forever' (eg over night) =
usbconfig doesn't see the device, but the moment I quit the user space =
process it can be seen again.

--
Daniel O'Connor
"The nice thing about standards is that there
are so many of them to choose from."
 -- Andrew Tanenbaum


From owner-freebsd-hackers@freebsd.org  Sat Mar  9 15:26:32 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id F143B1529B52
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Sat,  9 Mar 2019 15:26:31 +0000 (UTC) (envelope-from hps@selasky.org)
Received: from mail.turbocat.net (turbocat.net [88.99.82.50])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id C30F98BE9B
 for <freebsd-hackers@freebsd.org>; Sat,  9 Mar 2019 15:26:30 +0000 (UTC)
 (envelope-from hps@selasky.org)
Received: from hps2016.home.selasky.org (unknown [176.74.212.121])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by mail.turbocat.net (Postfix) with ESMTPSA id 3B1A32603CF;
 Sat,  9 Mar 2019 16:26:21 +0100 (CET)
Subject: Re: USB stack getting confused
To: "O'Connor, Daniel" <darius@dons.net.au>
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>
References: <E0371188-FD0A-47E1-8378-40239F5C6622@dons.net.au>
 <f3e6e30b-8b62-546b-2b51-e841f2e645bd@selasky.org>
 <3B29D870-41F9-46AF-B9F3-03106DEC417D@dons.net.au>
From: Hans Petter Selasky <hps@selasky.org>
Message-ID: <6dd8fe5f-6835-d98a-7592-0293406ccd63@selasky.org>
Date: Sat, 9 Mar 2019 16:25:58 +0100
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:60.0) Gecko/20100101
 Thunderbird/60.4.0
MIME-Version: 1.0
In-Reply-To: <3B29D870-41F9-46AF-B9F3-03106DEC417D@dons.net.au>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Rspamd-Queue-Id: C30F98BE9B
X-Spamd-Bar: ------
Authentication-Results: mx1.freebsd.org;
 spf=pass (mx1.freebsd.org: domain of hps@selasky.org designates 88.99.82.50 as
 permitted sender) smtp.mailfrom=hps@selasky.org
X-Spamd-Result: default: False [-6.55 / 15.00]; ARC_NA(0.00)[];
 RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+a:mail.turbocat.net];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain];
 DMARC_NA(0.00)[selasky.org]; TO_MATCH_ENVRCPT_SOME(0.00)[];
 TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: mail.turbocat.net];
 RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-0.96)[-0.963,0];
 IP_SCORE(-3.28)[ip: (-9.49), ipnet: 88.99.0.0/16(-4.66), asn: 24940(-2.23),
 country: DE(-0.01)]; FROM_EQ_ENVFROM(0.00)[];
 R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+];
 ASN(0.00)[asn:24940, ipnet:88.99.0.0/16, country:DE];
 MID_RHS_MATCH_FROM(0.00)[]; RCVD_TLS_ALL(0.00)[];
 RCVD_COUNT_TWO(0.00)[2]
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 09 Mar 2019 15:26:32 -0000

On 3/9/19 11:29 AM, O'Connor, Daniel wrote:
> If I hold the user space process in gdb 'forever' (eg over night) usbconfig doesn't see the device, but the moment I quit the user space process it can be seen again.

Check the output from "procstat -ak". Likely your application is not 
closing the USB handle during device detach and so a deadlock happens.

Also see:
libusb20_dev_check_connected() . Poll this function regularly to figure 
out if disconnect is needed.

--HPS

From owner-freebsd-hackers@freebsd.org  Sat Mar  9 16:27:10 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 614A5152CA8F
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Sat,  9 Mar 2019 16:27:10 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 9D3328DCFE
 for <freebsd-hackers@freebsd.org>; Sat,  9 Mar 2019 16:27:09 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x29GQgHF086341
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Sat, 9 Mar 2019 18:26:45 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x29GQgHF086341
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id x29GQeOL086339;
 Sat, 9 Mar 2019 18:26:40 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Sat, 9 Mar 2019 18:26:40 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Hans Petter Selasky <hps@selasky.org>
Cc: "O'Connor, Daniel" <darius@dons.net.au>,
 FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject: Re: USB stack getting confused
Message-ID: <20190309162640.GN2492@kib.kiev.ua>
References: <E0371188-FD0A-47E1-8378-40239F5C6622@dons.net.au>
 <f3e6e30b-8b62-546b-2b51-e841f2e645bd@selasky.org>
 <3B29D870-41F9-46AF-B9F3-03106DEC417D@dons.net.au>
 <20190309152613.GM2492@kib.kiev.ua>
 <ea6e2690-1ad7-6c06-49e5-c528013f26c0@selasky.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <ea6e2690-1ad7-6c06-49e5-c528013f26c0@selasky.org>
User-Agent: Mutt/1.11.3 (2019-02-01)
X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM,
 NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 09 Mar 2019 16:27:10 -0000

On Sat, Mar 09, 2019 at 04:42:50PM +0100, Hans Petter Selasky wrote:
> On 3/9/19 4:26 PM, Konstantin Belousov wrote:
> > On Sat, Mar 09, 2019 at 08:59:30PM +1030, O'Connor, Daniel wrote:
> >>
> >>
> >>> On 9 Mar 2019, at 19:30, Hans Petter Selasky <hps@selasky.org> wrote:
> >>> On 3/9/19 12:08 AM, O'Connor, Daniel wrote:
> >>>> My program normally runs continually doing acquisitions of data for N seconds, doing some checks and restarting. After a while (~30 1 minute acquisitions or ~8 30 minute ones) my program can't 'see' the device (it uses libusb10) any more (it reconnects each acquisition for $REASONS). Also pretty weirdly usbconfig can't see it either(!).
> >>>
> >>> What is printed in dmesg? Maybe the device has a problem.
> >>
> >> There is nothing in dmesg - no disconnect / reconnect etc.
> >>
> >> If I hold the user space process in gdb 'forever' (eg over night) usbconfig doesn't see the device, but the moment I quit the user space process it can be seen again.
> > 
> > Does it mean that the file descriptor opened for ugen has a chance to
> > be closed ?
> 
> The USB stack will wait for all FDs to be closed during detach also via 
> destroy_dev().
So my guess was correct.  Do you agree that this behaviour is wrong ?

In fact I saw something similar with apcupsd and either usb/com adapters
or native usb control card for APC UPSes.  For reasons I do not understand,
these devices are often disconnected.  For older versions of apcupsd,
it required restart for newly reattached device to be recreated in /dev.
Sometimes it hangs whole usb stack.

Newer apcupsd seems to open /dev/ugen only for the duration of the query,
which makes the erratic behaviour is much less likely, but could still cause
breakage when device disappear while apcupsd has it opened.

> 
> > 
> > I suspect that usb subsystem tried to destroy the device but some internal
> > refcounting prevents it.  Proper use of destroy_dev(_cb)(9) avoids
> > the issue.
> 
> --HPS

From owner-freebsd-hackers@freebsd.org  Sat Mar  9 07:00:28 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2067F153A1F7;
 Sat,  9 Mar 2019 07:00:28 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au
 [211.29.132.249])
 by mx1.freebsd.org (Postfix) with ESMTP id 22F7F71414;
 Sat,  9 Mar 2019 07:00:25 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au
 [110.21.101.228])
 by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 84AD9105AD5E;
 Sat,  9 Mar 2019 18:00:15 +1100 (AEDT)
Date: Sat, 9 Mar 2019 18:00:14 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Konstantin Belousov <kostikbel@gmail.com>
cc: Bruce Evans <brde@optusnet.com.au>, Mark Millard <marklmi@yahoo.com>, 
 freebsd-hackers Hackers <freebsd-hackers@freebsd.org>, 
 FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale
 * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed]
In-Reply-To: <20190307222220.GK2492@kib.kiev.ua>
Message-ID: <20190309144844.K1166@besplex.bde.org>
References: <20190302142521.GE68879@kib.kiev.ua>
 <20190303041441.V4781@besplex.bde.org>
 <20190303111931.GI68879@kib.kiev.ua> <20190303223100.B3572@besplex.bde.org>
 <20190303161635.GJ68879@kib.kiev.ua> <20190304043416.V5640@besplex.bde.org>
 <20190304114150.GM68879@kib.kiev.ua> <20190305031010.I4610@besplex.bde.org>
 <20190306172003.GD2492@kib.kiev.ua> <20190308001005.M2756@besplex.bde.org>
 <20190307222220.GK2492@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=UJetJGXy c=1 sm=1 tr=0
 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17
 a=kj9zAlcOel0A:10 a=vnREMb7VAAAA:8 a=ClMc5Of-GfaXbdAZ3JQA:9
 a=f8I4eRmMFRTVFEQH:21 a=DjpI8WK0P_VDdg0N:21 a=CjuIK1q_8ugA:10
X-Rspamd-Queue-Id: 22F7F71414
X-Spamd-Bar: -----
Authentication-Results: mx1.freebsd.org;
 spf=pass (mx1.freebsd.org: domain of brde@optusnet.com.au designates
 211.29.132.249 as permitted sender) smtp.mailfrom=brde@optusnet.com.au
X-Spamd-Result: default: False [-6.00 / 15.00]; ARC_NA(0.00)[];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 RCVD_IN_DNSWL_LOW(-0.10)[249.132.29.211.list.dnswl.org : 127.0.5.1];
 FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:211.29.132.0/23];
 FREEMAIL_FROM(0.00)[optusnet.com.au];
 MIME_GOOD(-0.10)[text/plain]; MIME_TRACE(0.00)[0:+];
 DMARC_NA(0.00)[optusnet.com.au]; RCPT_COUNT_FIVE(0.00)[5];
 NEURAL_HAM_LONG(-1.00)[-1.000,0];
 TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[];
 MX_GOOD(-0.01)[cached: extmail.optusnet.com.au];
 NEURAL_HAM_SHORT(-0.83)[-0.826,0];
 IP_SCORE(-2.86)[ip: (-7.21), ipnet: 211.28.0.0/14(-3.92), asn: 4804(-3.13),
 country: AU(-0.04)]; FREEMAIL_TO(0.00)[gmail.com];
 RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[];
 R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[optusnet.com.au];
 ASN(0.00)[asn:4804, ipnet:211.28.0.0/14, country:AU];
 FREEMAIL_CC(0.00)[optusnet.com.au]; RCVD_COUNT_TWO(0.00)[2]
X-Mailman-Approved-At: Sat, 09 Mar 2019 14:00:12 +0000
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 09 Mar 2019 07:00:28 -0000

On Fri, 8 Mar 2019, Konstantin Belousov wrote:

> On Fri, Mar 08, 2019 at 01:31:30AM +1100, Bruce Evans wrote:
>> On Wed, 6 Mar 2019, Konstantin Belousov wrote:
>>
>>> On Tue, Mar 05, 2019 at 05:17:14AM +1100, Bruce Evans wrote:
>>>> On Mon, 4 Mar 2019, Konstantin Belousov wrote:
>>>>
>>>>> On Mon, Mar 04, 2019 at 05:29:48AM +1100, Bruce Evans wrote:
>>>>>> On Sun, 3 Mar 2019, Konstantin Belousov wrote:
>>>>>>
>>>>>>> On Mon, Mar 04, 2019 at 12:32:12AM +1100, Bruce Evans wrote:
>>> * ...
>>>> I strongly disklike the merge.

I more strongly disclike (sic) the more complete merge.  The central APIs
have even more parameters and reduced type safety to describe objects as
(offset, size) pairs.

>* ...
>>>>>>> +#else
>>>>>>> +		/*
>>>>>>> +		 * Use bintime_helper() unconditionally, since the fast
>>>>>>> +		 * path in the above method is not so fast here, since
>>>>>>> +		 * the 64 x 32 -> 64 bit multiplication is usually not
>>>>>>> +		 * available in hardware and emulating it using 2
>>>>>>> +		 * 32 x 32 -> 64 bit multiplications uses code much
>>>>>>> +		 * like that in bintime_helper().
>>>>>>> +		 */
>>>>>>> +		bintime_helper(bt, scale, delta);
>>>>>>> +		bintime_addx(bt, (uint64_t)(uint32_t)scale * delta);
>>>>>>> +#endif
>>>>>>
>>>>>> Check that this method is really better.  Without this, the complicated
>>>>>> part is about half as large and duplicating it is smaller than this
>>>>>> version.
>>>>> Better in what sence ?  I am fine with the C code, and asm code looks
>>>>> good.
>>>>
>>>> Better in terms of actually running significantly faster.  I fear the
>>>> 32-bit method is actually slightly slower for the fast path.
>>
>> I checked that it is just worse.  Significantly slower and more complicated.
>>
>> I wrote and run a lot of timing benchmarks of various versions.  All
>> times in cycles on Haswell @4.08 GHz.  On i386 except where noted:
>> ...
>> The above tests were done with the final version.  The version which tested
>> alternatives used switch (method) and takes about 20 cycles longer for the
>> fastest version, presumably by defeating parallelism.  Times for various
>> methods:
>>
>> - with clang -Os, about 54 cycles for the old method that allowed overflow,
>>    and the same for the version with the check of the overflow threshold
>>    (but with the threshold never reached), and 59 cycles for the branch-
>>    free method.  100-116 cycles with gcc-4.2.1 -Os, with the branch-free
>>    method taking 5-10 cycles longer.
>>
>> - on amd64, only a couple of cycles faster (49-50 cycles in best cases),
>>    and gcc-4.2.1 only taking a ouple of cycles longer.  The branch-free
>>    method still takes about 59 cycles so it is relatively worse.
>>
>> In userland, using the syscall for syscall for clock_gettime(), the
>> extra 5-10 cycles for the branch-free method is relatively insignificat.
>> It is about 2 nanonseconds.  Other pessimizatations are more significant.
>> Times for this syscall:
>> - amd64 now: 224 nsec (with gcc-4.2.1 -Os)
>> - i386 4+4 nopae: 500-580 nsec (depending on clang/gcc and -O2/-Os)
>>    even getpid(2) takes 280 nsec.  Add at least 140 more nsec for pae.
>> - i386 3+1: 224 nsec (with gcc 4.2.1 -Os)
>> - i386 FreeBSD-5 UP: 193 nsec (with gcc-3.3.3 -O).
>> - i386 4+4 nopae old library version of clock_gettime() compiled by
>>    clang: 29 nsec.
>>
>> In some tests, the version with the branch was even a cycle or two faster.
>> In the tests, the branch was always perfectly predicted, so costs nothing
>> except possibly by changing scheduling in an accidentally good way.  The
>> tests were too small to measure the cost of using branch prediction
>> resources.  I've never noticed a case where 1 more branch causes thrashing.
> About testing such tight loops. There is a known phenomen where Intel
> CPUs give absurd times when code in the loop has unsuitable alignment.
> The manifestation of the phenomen is very surprising and hardly
> controllable. It is due to the way the CPU front-end prefetches blocks
> of bytes for instruction decoding and jmps locations in the blocks.
>
> The only source explaining it is https://www.youtube.com/watch?v=IX16gcX4vDQ
> the talk of Intel engineer.

I know a little about such tests since I have written thousands and
interpreted millions of them (mostly automatically).  There are a lot
of other side effects of caching resources that usually make more
difference than alignment.  The most mysterious one that I noticed was
apparently due to alignment, but in a makeworld macro-benchmark.  Minor
changes in even in unused functions or data gave differences of about
2% in real time and many more % in system time.  This only showed up
on an old Turion2 (early Athlon64) system.  I think it is due to limited
cache associativity causing many cache misses by lining up unrelated
far apart code or data adresses mod some power of 2.  Padding to give
the same alignment as the best case was too hard, but I eventually
found a configuration accidentally giving nearly the best case even
with its alignments changed by small modifications the areas that I
was working on.

>* ...
>>>>>>> -	do {
>>>>>>> -		th = timehands;
>>>>>>> -		gen = atomic_load_acq_int(&th->th_generation);
>>>>>>> -		*bt = th->th_bintime;
>>>>>>> -		bintime_addx(bt, th->th_scale * tc_delta(th));
>>>>>>> -		atomic_thread_fence_acq();
>>>>>>> -	} while (gen == 0 || gen != th->th_generation);
>>>>>>
>>>>>> Duplicating this loop is much better than obfuscating it using inline
>>>>>> functions.  This loop was almost duplicated (except for the delta
>>>>>> calculation) in no less than 17 functions in kern_tc.c (9 tc ones and
>>>>>> 8 fflock ones).  Now it is only duplicated 16 times.
>>>>> How did you counted the 16 ?  I can see only 4 instances in the unpatched
>>>>> kern_tc.c, and 3 in patched, but it is 3 and not 1 only because I do not
>>>>> touch ffclock until the patch is finalized.  After that, it would be
>>>>> 1 instance for kernel and 1 for userspace.
>>>>
>>>> Grep for the end condition in this loop.  There are actually 20 of these.
>>>> I'm counting the loops and not the previously-simple scaling operation in
>>>> it.  The scaling is indeed only done for 4 cases.  I prefer the 20
>>>> duplications (except I only want about 6 of the functions).  Duplication
>>>> works even better for only 4 cases.
>>> Ok, I merged these as well.  Now there are only four loops left in kernel.
>>> I do not think that merging them is beneficial, since they have sufficiently
>>> different bodies.
>>
>> This is exacly what I don't want.
>>>
>>> I disagree with you characterization of it as obfuscation, IMO it improves
>>> the maintainability of the code by reducing number of places which need
>>> careful inspection of the lock-less algorithm.
>>
>> It makes the inspection and changes more difficult for each instance.
>> General functions are more difficult to work with since they need more
>> args to control them and can't be changed without affecting all callers.
>>
>> In another thread, you didn't like similar churn for removing td args.
> It is not similar.  I do valid refactoring there (in terms of that
> thread, I do not like the term refactoring).  I eliminate dozen instrances
> of very intricate loop which implements quite delicate lockless algorithm.
> Its trickiness can be illustrated by the fact that it is only valid
> use of thread_fence_acq() which cannot be replaced by load_acq() (similar
> case is present in sys/seq.h).

Small delicate loops are ideal for duplicating.  They are easier to
understand individually and short enough to compare without using diff
to see gratuitous and substantive differences.  Multiple instances are
only hard to write and maintain.  Since these multiple instances are
already written, they are only harder to maintain.

>> XX  void
>> XX  binuptime(struct bintime *bt)
>> XX  {
>> XX @@ -361,7 +383,7 @@
>> XX  		th = timehands;
>> XX  		gen = atomic_load_acq_int(&th->th_generation);
>> XX  		*bt = th->th_offset;
>> XX -		bintime_addx(bt, th->th_scale * tc_delta(th));
>> XX +		bintime_adddelta(bt, th);
>> XX  		atomic_thread_fence_acq();
>> XX  	} while (gen == 0 || gen != th->th_generation);
>> XX  }
>>
>> This is the kind of non-churning change that I like.
> Ok.  I made all cases where timehands are read, more uniform by
> moving calculations after the generation loop.  This makes the
> atomic part of the functions easier to see, and loop body has lower
> chance to hit generation reset.

I think this change is slightly worse:
- it increases register pressure.  'scale' and 'delta' must be read in a
   alost program program before the loop exit test.  The above order uses
   them and stores the results to memory, so more registers are free for
   the exit test.  i386 certainly runs out of registers.  IIRC, i386 now
   spills 'gen'.  It would have to spill something to load 'gen' or 'th'
   for the test.
- it enlarges the window between reading 'scale' and 'delta' and the
   caller seeing the results.  Preemption in this window gives results
   that may be far in the past.

>>> diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
>>> index 2656fb4d22f..7114a0e5219 100644
>>> --- a/sys/kern/kern_tc.c
>>> +++ b/sys/kern/kern_tc.c
>>> ...
>>> @@ -200,22 +201,77 @@ tc_delta(struct timehands *th)
>>>  * the comment in <sys/time.h> for a description of these 12 functions.
>>>  */
>>>
>>> -#ifdef FFCLOCK
>>> -void
>>> -fbclock_binuptime(struct bintime *bt)
>>> +static __inline void
>>> +bintime_helper(struct bintime *bt, uint64_t scale, u_int delta)
>>
>> This name is not descriptive.
>>
>>> +static __inline void
>>> +binnouptime(struct bintime *bt, u_int off)
>>
>> This name is an example of further problems with the naming scheme.
>> The bintime_ prefix used above is verbose, but it is at least a prefix
>> and is in the normal bintime_ namespace.  Here the prefix is 'bin',
>> which is neither of these.  It means bintime_ again, but this duplicates
>> 'time'.
> I agree, and I made a name getthmember for the other function which clearly
> reflect its operation.  For this one, I ended with bintime_off().

The 'get' name is another problem.  I would like all the get*time
functions and not add new names starting with 'get'.  The library
implementation already doesn't bother optimizing the get*time functions,
but always uses the hardware timecounter.

getfoo() is a more natural name than foo_get() for the action of getting
foo, but the latter is better for consistency, especially in code that
puts the subsystem name first in nearby code.

The get*time functions would be better if they were more like
time_second.  Note that time_second is racy if time_t is too larger
for the arch so that accesses to it are not atomic, as happens on
32-bit arches with premature 64-bit time_t.  However, in this 32/64
case, the race is only run every 136 years, with the next event
scheduled in 2038, so this race is even less important now than other
events scheduled in 2038.  Bintimes are 96 or 128 bits, so directly
copying a global like time_second for them would race every 1/2**32
second on 2-bit arches or every 1 second on 64-bit arches.  Most of
the loops on the generation count are for fixing these races, but
perhaps a simpler method would work.  On 64-bit arches with atomic
64 accesses on 32-bit boundaries, the following would work:
- set the lower 32 bits of the fraction to 0, or ignore them
- load the higher 32 bits of the fraction and the lower 32 bits of the
   seconds
- race once every 136 years starting in 2038 reading the higher 32 bits
   of the seconds non-atomically.
- alternatively, break instead of racing in 2038 by setting the higher
   32 bits to 0.  This is the same as using sbintimes instead of bintimes.
- drop a few more lower bits by storing a right-shifted value.  Right
   shifting by just 1 gives a race frequency of once per 272 years, with
   the next one in 2006.

> diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
> index 2656fb4d22f..8d12847f2cd 100644
> --- a/sys/kern/kern_tc.c
> +++ b/sys/kern/kern_tc.c
> @@ -200,20 +202,56 @@ tc_delta(struct timehands *th)
>  * the comment in <sys/time.h> for a description of these 12 functions.
>  */
>
> -#ifdef FFCLOCK
> -void
> -fbclock_binuptime(struct bintime *bt)
> +static __inline void
> +bintime_off(struct bintime *bt, u_int off)
> {
> 	struct timehands *th;
> -	unsigned int gen;
> +	struct bintime *btp;
> +	uint64_t scale, x;
> +	u_int delta, gen, large_delta;
>
> 	do {
> 		th = timehands;
> 		gen = atomic_load_acq_int(&th->th_generation);
> -		*bt = th->th_offset;
> -		bintime_addx(bt, th->th_scale * tc_delta(th));

You didn't fully obfuscate this by combinining this function with
getthmember() so as to deduplicate the loop.

> +		btp = (struct bintime *)((vm_offset_t)th + off);

Ugly conversion to share code.  This is technically incorrect.  Improving
the casts gives:

 	btp = (void *)(uintptr_t)((uintptr_t)(void *)th + off);

but this assumes that arithmetic on the intermediate integer does what
is espected.  uintptr_t is only guaranteed to work when the intermediate
representation held in it is not adjusted.

Fixing the API gives

     static __inline void
     bintime_off(struct bintime *btp, struct bintime *base_btp)

where base_btp is &th->th_bintime or &th->th_offset.

(th_offset and th_bintime are badly named.  th_offset is really a base
time and the offset is tc_delta().  th_bintime is also a base time.
It is the same as th_offset with another actual offset (the difference
between UTC and local time) already added to it as an optimization.  In
old versions, th_bintime didn't exist, but the related struct members
th_nanotime and th_microtime existed, since these benefit more from
not converting on every call.

My old version even documents the struct members, while -current still
has no comments.  The comments were lost to staticization.  My version
mostly adds "duh" to the banal comments after recovering them:

XX /*
XX  * XXX rotted comment cloned from <sys/timetc.h>.
XX  *
XX  * th_counter is undocumented (duh).
XX  *
XX  * th_adjustment [PPM << 16] which means that the smallest unit of correction
XX  *     you can apply amounts to 481.5 usec/year.
XX  *
XX  * th_scale is undocumented (duh).
XX  *
XX  * th_offset_count is the contents of the counter which corresponds to the
XX  *
XX  *     rest of the offset_* values.
XX  *
XX  * th_offset is undocumented (duh).
XX  *
XX  * th_microtime is undocumented (duh).
XX  *
XX  * th_nanotime is undocumented (duh).
XX  *
XX  * XXX especially massive bitrot here.  "three" is now "many"...
XX  * Each timecounter must supply an array of three timecounters.  This is needed
XX  * to guarantee atomicity in the code.  Index zero is used to transport
XX  * modifications, for instance done with sysctl, into the timecounter being
XX  * used in a safe way.  Such changes may be adopted with a delay of up to 1/HZ.
XX  * Index one and two are used alternately for the actual timekeeping.
XX  *
XX  * th_generation is undocumented (duh).
XX  *
XX  * th_next is undocumented (duh).
XX  */

> +		*bt = *btp;
> +		scale = th->th_scale;
> +		delta = tc_delta(th);
> +		large_delta = th->th_large_delta;

I had forgotten that th_scale is so volatile (it may be adjusted on
every windup).  th_large_delta is equally volatile.  So moving the
calculation outside of the loop gives even more register pressure
than I noticed above.

> 		atomic_thread_fence_acq();
> 	} while (gen == 0 || gen != th->th_generation);
> +
> +	if (__predict_false(delta < large_delta)) {
> +		/* Avoid overflow for scale * delta. */
> +		x = (scale >> 32) * delta;
> +		bt->sec += x >> 32;
> +		bintime_addx(bt, x << 32);
> +		bintime_addx(bt, (scale & 0xffffffff) * delta);
> +	} else {
> +		bintime_addx(bt, scale * delta);
> +	}
> +}
> +
> +static __inline void
> +getthmember(void *out, size_t out_size, u_int off)
> +{
> +	struct timehands *th;
> +	u_int gen;
> +
> +	do {
> +		th = timehands;
> +		gen = atomic_load_acq_int(&th->th_generation);
> +		memcpy(out, (char *)th + off, out_size);

This isn't so ugly or technically incorrect.  Now the object is generic,
so the reference to it should be passed as (void *objp, size_t objsize)
instead of the type-safe (struct bintime *base_bpt).

> +		atomic_thread_fence_acq();
> +	} while (gen == 0 || gen != th->th_generation);
> +}

I can see a useful use of copying methods like this for sysctls.  All
sysctl accesses except possibly for aligned register_t's were orginally
racy, but we sprinkled mutexes for large objects and reduced race windows
for smaller objects.  E.g., sysctl_handle_long() still makes a copy with
no locking, but this has no effect except on my i386-with-64-bit-longs
since longs have the same size as ints so are as atomic as ints on
32-bit arches.  sysctl_handle_64() uses the same method.  It works to
reduce the race window on 32-bit arches.  sysctl_handle_string() makes
a copy to malloc()ed storage.  memcpy() to that risks losing the NUL
terminator, and subsequent strlen() on the copy gives buffer overrun if
the result has no terminators.  sysctl_handle_opaque() uses a generation
count method, like the one used by timecounters before the ordering bugs
were fixed, but even more primitive and probably even more in need of
ordering fixes.

It would be good to fix all sysctl using the same generation count method
as above.  A loop at the top level might work.  I wouldn't like a structure
like the above where the top level calls individual sysctl functions which
do nothing except wrap themselves in a generic function like the above.

The above does give this structure to clock_gettime() calls.  The top
level converts the clock id to a function and the above makes the
function essentially convert back to another clock id (the offset of
the relevant field in timehands), especially for the get*time functions
where the call just copies the relevant field to userland.

Unfortunately, the indivual time functions are called directly in the
kernel.  I prefer this to generic APIs based on ids.  So that callers
can use simple efficient APIs like nanouptime() and instead of using
complicated inefficieciencies like

 	kern_clock_gettime_generic(int clock_id = CLOCK_MONOTONIC,
 	    int format_id = CLOCK_TYPE_TIMESPEC,
 	    int precision = CLOCK_PRECISION_NSEC,
 	    void *dstp = &ts);

Bruce

From owner-freebsd-hackers@freebsd.org  Sat Mar  9 15:43:16 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id C9B92152A3A0
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Sat,  9 Mar 2019 15:43:16 +0000 (UTC) (envelope-from hps@selasky.org)
Received: from mail.turbocat.net (turbocat.net [88.99.82.50])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 6C8F78C75B
 for <freebsd-hackers@freebsd.org>; Sat,  9 Mar 2019 15:43:16 +0000 (UTC)
 (envelope-from hps@selasky.org)
Received: from hps2016.home.selasky.org (unknown [176.74.212.121])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by mail.turbocat.net (Postfix) with ESMTPSA id 6C5152603CF;
 Sat,  9 Mar 2019 16:43:13 +0100 (CET)
Subject: Re: USB stack getting confused
To: Konstantin Belousov <kostikbel@gmail.com>,
 "O'Connor, Daniel" <darius@dons.net.au>
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>
References: <E0371188-FD0A-47E1-8378-40239F5C6622@dons.net.au>
 <f3e6e30b-8b62-546b-2b51-e841f2e645bd@selasky.org>
 <3B29D870-41F9-46AF-B9F3-03106DEC417D@dons.net.au>
 <20190309152613.GM2492@kib.kiev.ua>
From: Hans Petter Selasky <hps@selasky.org>
Message-ID: <ea6e2690-1ad7-6c06-49e5-c528013f26c0@selasky.org>
Date: Sat, 9 Mar 2019 16:42:50 +0100
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:60.0) Gecko/20100101
 Thunderbird/60.4.0
MIME-Version: 1.0
In-Reply-To: <20190309152613.GM2492@kib.kiev.ua>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Rspamd-Queue-Id: 6C8F78C75B
X-Spamd-Bar: ------
Authentication-Results: mx1.freebsd.org
X-Spamd-Result: default: False [-7.00 / 15.00];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 NEURAL_HAM_SHORT(-1.00)[-0.996,0]; REPLY(-4.00)[];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 09 Mar 2019 15:43:17 -0000

On 3/9/19 4:26 PM, Konstantin Belousov wrote:
> On Sat, Mar 09, 2019 at 08:59:30PM +1030, O'Connor, Daniel wrote:
>>
>>
>>> On 9 Mar 2019, at 19:30, Hans Petter Selasky <hps@selasky.org> wrote:
>>> On 3/9/19 12:08 AM, O'Connor, Daniel wrote:
>>>> My program normally runs continually doing acquisitions of data for N seconds, doing some checks and restarting. After a while (~30 1 minute acquisitions or ~8 30 minute ones) my program can't 'see' the device (it uses libusb10) any more (it reconnects each acquisition for $REASONS). Also pretty weirdly usbconfig can't see it either(!).
>>>
>>> What is printed in dmesg? Maybe the device has a problem.
>>
>> There is nothing in dmesg - no disconnect / reconnect etc.
>>
>> If I hold the user space process in gdb 'forever' (eg over night) usbconfig doesn't see the device, but the moment I quit the user space process it can be seen again.
> 
> Does it mean that the file descriptor opened for ugen has a chance to
> be closed ?

The USB stack will wait for all FDs to be closed during detach also via 
destroy_dev().

> 
> I suspect that usb subsystem tried to destroy the device but some internal
> refcounting prevents it.  Proper use of destroy_dev(_cb)(9) avoids
> the issue.

--HPS


From owner-freebsd-hackers@freebsd.org  Sat Mar  9 19:28:36 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8BE8B15334A8
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Sat,  9 Mar 2019 19:28:36 +0000 (UTC)
 (envelope-from rozhuk.im@gmail.com)
Received: from mail-lj1-x230.google.com (mail-lj1-x230.google.com
 [IPv6:2a00:1450:4864:20::230])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id E7D066F08B
 for <freebsd-hackers@freebsd.org>; Sat,  9 Mar 2019 19:28:35 +0000 (UTC)
 (envelope-from rozhuk.im@gmail.com)
Received: by mail-lj1-x230.google.com with SMTP id d14so754016ljl.9
 for <freebsd-hackers@freebsd.org>; Sat, 09 Mar 2019 11:28:35 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=from:date:to:cc:subject:message-id:in-reply-to:references
 :mime-version:content-transfer-encoding;
 bh=ASIRQ+rXHp3Y/pFm1CLL1O/yDFyfVNzdlP2/UTzs3w4=;
 b=mO4Z1mneEYhpBIAurcsCfEx+1AbPLAa0dSeWHv0kkxJhEk8mH59B0KBw83Xb5o8pCl
 EL0/jNliAp8w5wDlbej8D9cfaCqXy65sCJiOTvpStGQWW7/DGYxbN7MiQkHoZvXW8A/A
 cIg4yf/hcVvFWzf6VZysqtN5I7Db6MUCCORwD/OWb/tBnumE7andxhRmpYfpQLUAZ/Gy
 +UpQY5nS0J02f6rxiuwDgJBFqx0FBUSZy4mbRBuuTVW42ZZ99guun6D1+zB60eMjojr8
 IZYGb14EcW+jzrDWyucJgyTm+H0MFd6cdKgSEcOb9RFeKy1kn7hqPXy0tvv5wsZ/jQNQ
 P1vg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:date:to:cc:subject:message-id:in-reply-to
 :references:mime-version:content-transfer-encoding;
 bh=ASIRQ+rXHp3Y/pFm1CLL1O/yDFyfVNzdlP2/UTzs3w4=;
 b=axMGInJ0kUUaAEhFqahYQynl4t26lEPT6I/zeb0n8PjVLw8GaCC8r5aCP+n2LgEdop
 f6YDVqyvc52T/GkRslCnMzGxt5ArjHFc9+oI4iQkWSckDgEtToKu9debRNuYmZYy6Sqh
 qZuDznFhA4jrTks2lavsFvrd8mgB2+KhsYDrT/Szj4y1kWoJwBY/Bk5iUP3p9YJpUwhr
 ugtij0aHp6iXiXW62GeSNCekMXU41EHyz46wwqxxcB2JIAXmHdQS7rFwawVnP3n3TY/G
 AtdzYF6e+BSRcZIrQIO32qoU0czDcuEQeUgP6R7eRyZKSd1QToJOyqpwjxHOaQc+X/wr
 tYIA==
X-Gm-Message-State: APjAAAXgCQlI0Bq8pst3uFUW/BVYMny394Wz9eWdWoaO0RrUnYp54ghs
 vLz1kfwJizwcLRjjngDEscI=
X-Google-Smtp-Source: APXvYqxqY5y6S+5iqPw7lpdAkyqEy4rWoN9V3ytW/Y9XESJqSTN77lBBgeqG/hmWtUAo+sMzgJpU1g==
X-Received: by 2002:a2e:7314:: with SMTP id o20mr12478741ljc.111.1552159714347; 
 Sat, 09 Mar 2019 11:28:34 -0800 (PST)
Received: from rimwks ([2001:470:1f15:3d8:7285:c2ff:fe43:675b])
 by smtp.gmail.com with ESMTPSA id m1sm287795lfh.36.2019.03.09.11.28.33
 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256);
 Sat, 09 Mar 2019 11:28:33 -0800 (PST)
From: Rozhuk Ivan <rozhuk.im@gmail.com>
X-Google-Original-From: Rozhuk Ivan <Rozhuk.IM@gmail.com>
Date: Sat, 9 Mar 2019 22:28:27 +0300
To: Konstantin Belousov <kostikbel@gmail.com>
Cc: Hans Petter Selasky <hps@selasky.org>, FreeBSD Hackers
 <freebsd-hackers@freebsd.org>, "O'Connor, Daniel" <darius@dons.net.au>
Subject: Re: USB stack getting confused
Message-ID: <20190309222827.5407ddbf@rimwks>
In-Reply-To: <20190309162640.GN2492@kib.kiev.ua>
References: <E0371188-FD0A-47E1-8378-40239F5C6622@dons.net.au>
 <f3e6e30b-8b62-546b-2b51-e841f2e645bd@selasky.org>
 <3B29D870-41F9-46AF-B9F3-03106DEC417D@dons.net.au>
 <20190309152613.GM2492@kib.kiev.ua>
 <ea6e2690-1ad7-6c06-49e5-c528013f26c0@selasky.org>
 <20190309162640.GN2492@kib.kiev.ua>
X-Mailer: Claws Mail 3.17.3 (GTK+ 2.24.32; amd64-portbld-freebsd12.0)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-Rspamd-Queue-Id: E7D066F08B
X-Spamd-Bar: ------
Authentication-Results: mx1.freebsd.org
X-Spamd-Result: default: False [-7.00 / 15.00];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 NEURAL_HAM_SHORT(-1.00)[-0.997,0]; REPLY(-4.00)[];
 TAGGED_FROM(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 09 Mar 2019 19:28:36 -0000

On Sat, 9 Mar 2019 18:26:40 +0200
Konstantin Belousov <kostikbel@gmail.com> wrote:

> In fact I saw something similar with apcupsd and either usb/com
> adapters or native usb control card for APC UPSes.  For reasons I do
> not understand, these devices are often disconnected.  For older
> versions of apcupsd, it required restart for newly reattached device
> to be recreated in /dev. Sometimes it hangs whole usb stack.
> 
> Newer apcupsd seems to open /dev/ugen only for the duration of the
> query, which makes the erratic behaviour is much less likely, but
> could still cause breakage when device disappear while apcupsd has it
> opened.
> 

Same problem with usb sound cards.
I try to fix it, but fail with dsp, only mixer can be fixed with small code change.
https://reviews.freebsd.org/D11140

From owner-freebsd-hackers@freebsd.org  Sat Mar  9 15:26:39 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9335B1529B54
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Sat,  9 Mar 2019 15:26:39 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id A6DD28BE9C
 for <freebsd-hackers@freebsd.org>; Sat,  9 Mar 2019 15:26:38 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x29FQDfp071741
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Sat, 9 Mar 2019 17:26:16 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x29FQDfp071741
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id x29FQDYo071740;
 Sat, 9 Mar 2019 17:26:13 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Sat, 9 Mar 2019 17:26:13 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: "O'Connor, Daniel" <darius@dons.net.au>
Cc: Hans Petter Selasky <hps@selasky.org>,
 FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject: Re: USB stack getting confused
Message-ID: <20190309152613.GM2492@kib.kiev.ua>
References: <E0371188-FD0A-47E1-8378-40239F5C6622@dons.net.au>
 <f3e6e30b-8b62-546b-2b51-e841f2e645bd@selasky.org>
 <3B29D870-41F9-46AF-B9F3-03106DEC417D@dons.net.au>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <3B29D870-41F9-46AF-B9F3-03106DEC417D@dons.net.au>
User-Agent: Mutt/1.11.3 (2019-02-01)
X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM,
 NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 09 Mar 2019 15:26:39 -0000

On Sat, Mar 09, 2019 at 08:59:30PM +1030, O'Connor, Daniel wrote:
> 
> 
> > On 9 Mar 2019, at 19:30, Hans Petter Selasky <hps@selasky.org> wrote:
> > On 3/9/19 12:08 AM, O'Connor, Daniel wrote:
> >> My program normally runs continually doing acquisitions of data for N seconds, doing some checks and restarting. After a while (~30 1 minute acquisitions or ~8 30 minute ones) my program can't 'see' the device (it uses libusb10) any more (it reconnects each acquisition for $REASONS). Also pretty weirdly usbconfig can't see it either(!).
> > 
> > What is printed in dmesg? Maybe the device has a problem.
> 
> There is nothing in dmesg - no disconnect / reconnect etc.
> 
> If I hold the user space process in gdb 'forever' (eg over night) usbconfig doesn't see the device, but the moment I quit the user space process it can be seen again.

Does it mean that the file descriptor opened for ugen has a chance to
be closed ?

I suspect that usb subsystem tried to destroy the device but some internal
refcounting prevents it.  Proper use of destroy_dev(_cb)(9) avoids
the issue.

From owner-freebsd-hackers@freebsd.org  Sat Mar  9 21:35:55 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id A516A1537F76
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Sat,  9 Mar 2019 21:35:55 +0000 (UTC) (envelope-from hps@selasky.org)
Received: from mail.turbocat.net (turbocat.net [88.99.82.50])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 3718274233
 for <freebsd-hackers@freebsd.org>; Sat,  9 Mar 2019 21:35:55 +0000 (UTC)
 (envelope-from hps@selasky.org)
Received: from hps2016.home.selasky.org (unknown [176.74.212.121])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by mail.turbocat.net (Postfix) with ESMTPSA id 4C742260377;
 Sat,  9 Mar 2019 22:35:52 +0100 (CET)
Subject: Re: USB stack getting confused
To: Konstantin Belousov <kostikbel@gmail.com>, Warner Losh <imp@bsdimp.com>
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>,
 "O'Connor, Daniel" <darius@dons.net.au>
References: <E0371188-FD0A-47E1-8378-40239F5C6622@dons.net.au>
 <f3e6e30b-8b62-546b-2b51-e841f2e645bd@selasky.org>
 <3B29D870-41F9-46AF-B9F3-03106DEC417D@dons.net.au>
 <20190309152613.GM2492@kib.kiev.ua>
 <ea6e2690-1ad7-6c06-49e5-c528013f26c0@selasky.org>
 <20190309162640.GN2492@kib.kiev.ua>
 <CANCZdfr9jRcXQeZWMPKSMvUB5u7kE0eDvbuKrtGvuUDYOr=n4A@mail.gmail.com>
 <20190309192330.GO2492@kib.kiev.ua>
From: Hans Petter Selasky <hps@selasky.org>
Message-ID: <fd5038a4-406b-6e4b-bb52-b567b1954ad1@selasky.org>
Date: Sat, 9 Mar 2019 22:35:28 +0100
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:60.0) Gecko/20100101
 Thunderbird/60.4.0
MIME-Version: 1.0
In-Reply-To: <20190309192330.GO2492@kib.kiev.ua>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Rspamd-Queue-Id: 3718274233
X-Spamd-Bar: ------
Authentication-Results: mx1.freebsd.org
X-Spamd-Result: default: False [-6.94 / 15.00];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 NEURAL_HAM_SHORT(-0.94)[-0.944,0]; REPLY(-4.00)[];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 09 Mar 2019 21:35:55 -0000

On 3/9/19 8:23 PM, Konstantin Belousov wrote:
> On Sat, Mar 09, 2019 at 11:41:31AM -0700, Warner Losh wrote:
>>
>> Is there a form of destroy_dev() that does a revoke on all open instances?
>> Eg, this is gone, you can't use it anymore, and all further attempts to use
>> the device will generate an error, but in the mean time we destroy the
>> device and let the detach routine get on with life. waiting may make sense
>> when you are merely unloading the driver (and getting to the detach routine
>> that way), but when the device is gone, I've come around to the point of
>> view that we should just destroy it w/o waiting for closes and anybody that
>> touches it afterwards gets an error and has to cope with the error. But
>> even in the unload case, we maybe we shouldn't get to the detach routine
>> unless we're forcing and/or the detach routine just returns EBUSY since the
>> only one that knows what dev_t's are associated with the device_t is the
>> driver itself.
> You are asking very basic questions about devfs there.
> 
> destroy_dev(9) waits for two things:
> - that all threads left the cdevsw methods for the given device;
> - that all cdevpriv destructors finished running.

Hi,

> To facilitate waking up threads potentially sleeping inside the cdevsw
> methods, drivers might implement d_purge method which must weed out sleeping
> threads from inside the code in the bound time.

USB will purge all callers before calling destroy_dev(). This is not the 
problem.

> After that we return from destroy_dev(9) and guarantee that no new calls
> into cdevsw is done for this device.  devfs magic consumes  the fo_ and
> VOP_ calls and does not allow them to reach into the driver.

When I designed the current USB devfs it was important to me to keep 
open() and close() calls balanced to avoid situations where an open call 
may setup some resource and then close(), which free this resource 
again, never gets called. destroy_dev(9) makes no such guarantee, and I 
think that is a failure of destroy_dev(9). That's when I started using 
the cdev's destructor callback function.

> So what usb does there is actively defeating existing mechanism by
> keeping internal refcount on opens and refusing to call destroy_dev()
> until the count goes to zero 

The FreeBSD USB stack also is used in environments w/o DEVFS and need 
own refcounts.

> (I did not read the usb code, but I believe
> that I am not too wrong).  
 >Would usb core just destroy_dev() when the
> physical device goes away, then at worst the existing file descriptors
> opened against the lost devices would become dead (not same dead as
> terminals after revoke(2), but very similar).

Yes, I can do that if destroy_dev() ensures that d_close is called for 
all open file handles once and only once before it returns. I think this 
is where the problem comes from.

> 
> If the problem is due to keeping some instance data for the opened device,
> then cdevpriv might be the better fit (at least the KPI was designed
> to be) than blocking destroy until all users are gone.
> 

The USB stack does not use MMAP, so this is not a problem.

--HPS

From owner-freebsd-hackers@freebsd.org  Sat Mar  9 18:41:45 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4C25F1531D8D
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Sat,  9 Mar 2019 18:41:45 +0000 (UTC)
 (envelope-from wlosh@bsdimp.com)
Received: from mail-qt1-x841.google.com (mail-qt1-x841.google.com
 [IPv6:2607:f8b0:4864:20::841])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id CF1B06D62B
 for <freebsd-hackers@freebsd.org>; Sat,  9 Mar 2019 18:41:44 +0000 (UTC)
 (envelope-from wlosh@bsdimp.com)
Received: by mail-qt1-x841.google.com with SMTP id s1so882815qte.5
 for <freebsd-hackers@freebsd.org>; Sat, 09 Mar 2019 10:41:44 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=bsdimp-com.20150623.gappssmtp.com; s=20150623;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc; bh=24DamIhEtYgu9AJuB4aZZtn3WPvNEr/Wvg55cTkzOto=;
 b=ms6GynYaXuGJPP/9M/LM5u5zDF/yOhmzCgmkOPgBSwkGVhX5HTpGQc/X0/vRN/O9f3
 kNCxVNWnvOmyokKRcsh7e3RAcEzxXj6R6n0VNVseBYaU9E9GS0SL0+SPWcFVwSF+hvZL
 MQFVkf3+aD03xQM1i7SzJ8W3JI5CrWERhJGUNK3cUMjN86MRhWtll4FRcTf3f0vEJmIl
 RTqCiJivIx10cwQDJ0psSI18tVJhuQ2GoR1ZW/WinPsntyP+nzv2ho/TwRHRti7o0W2H
 VT0iCG6MrPjH2nEvC1MWbsMS3NvRODf6M+/1iIJSDYDP6qXkultYG4gO/rfJT5e4smy4
 jpbQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=24DamIhEtYgu9AJuB4aZZtn3WPvNEr/Wvg55cTkzOto=;
 b=HuLn4mBdPDTe6bbt8akE+3ZCx7ClhhG3mUeDrnvoQKxLVXy8S74uXKHfLHb6XMZ58j
 3es7BTuQgHoduliSyzViZiNvsAwz1NoIxXfChmGN4AL7lB2o5TYLSRDZD3I2doq9u6XI
 3mLds2qW/hXKK3nRbS8Pk9TyxGCEVRoq0ZPOclc00OLXF3GRxaDD4UVjP+JdZzRD30ej
 mHh8tsQv8xqC+Nu79F2b/Ui3qckqQUdMSb9HLxeSVGke/WJCqoDEDHlDkEx0zMF/KkrQ
 l9assGMp8DY1GwNymnNsK2r5WPLrUF8IOH8ZFAxJRzDp+85fzM3ySc5adFS9F48uojAC
 8JEQ==
X-Gm-Message-State: APjAAAW6MOjwoskq4jIhmm8ODC5K6VgQjrf2mAmSBhG2mW1O7euWBkm3
 fNwQDYxro4ZQ5jvVPjHP8pkGsNozWcZXPgSqh++kKg==
X-Google-Smtp-Source: APXvYqyp2msurvoc+LZPlRiHfBN/AHYoxsS4b6sfY8TzmeRCqr7j2rU40tWFDQ7yUfRMlKwuE6PtMDeeGqzjAMN5Ju8=
X-Received: by 2002:a0c:9ba7:: with SMTP id o39mr19638971qve.153.1552156904308; 
 Sat, 09 Mar 2019 10:41:44 -0800 (PST)
MIME-Version: 1.0
References: <E0371188-FD0A-47E1-8378-40239F5C6622@dons.net.au>
 <f3e6e30b-8b62-546b-2b51-e841f2e645bd@selasky.org>
 <3B29D870-41F9-46AF-B9F3-03106DEC417D@dons.net.au>
 <20190309152613.GM2492@kib.kiev.ua>
 <ea6e2690-1ad7-6c06-49e5-c528013f26c0@selasky.org>
 <20190309162640.GN2492@kib.kiev.ua>
In-Reply-To: <20190309162640.GN2492@kib.kiev.ua>
From: Warner Losh <imp@bsdimp.com>
Date: Sat, 9 Mar 2019 11:41:31 -0700
Message-ID: <CANCZdfr9jRcXQeZWMPKSMvUB5u7kE0eDvbuKrtGvuUDYOr=n4A@mail.gmail.com>
Subject: Re: USB stack getting confused
To: Konstantin Belousov <kostikbel@gmail.com>
Cc: Hans Petter Selasky <hps@selasky.org>,
 FreeBSD Hackers <freebsd-hackers@freebsd.org>, 
 "O'Connor, Daniel" <darius@dons.net.au>
X-Rspamd-Queue-Id: CF1B06D62B
X-Spamd-Bar: ------
Authentication-Results: mx1.freebsd.org
X-Spamd-Result: default: False [-6.99 / 15.00];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 NEURAL_HAM_SHORT(-0.99)[-0.993,0]; REPLY(-4.00)[];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]
Content-Type: text/plain; charset="UTF-8"
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 09 Mar 2019 18:41:45 -0000

On Sat, Mar 9, 2019 at 11:25 AM Konstantin Belousov <kostikbel@gmail.com>
wrote:

> On Sat, Mar 09, 2019 at 04:42:50PM +0100, Hans Petter Selasky wrote:
> > On 3/9/19 4:26 PM, Konstantin Belousov wrote:
> > > On Sat, Mar 09, 2019 at 08:59:30PM +1030, O'Connor, Daniel wrote:
> > >>
> > >>
> > >>> On 9 Mar 2019, at 19:30, Hans Petter Selasky <hps@selasky.org>
> wrote:
> > >>> On 3/9/19 12:08 AM, O'Connor, Daniel wrote:
> > >>>> My program normally runs continually doing acquisitions of data for
> N seconds, doing some checks and restarting. After a while (~30 1 minute
> acquisitions or ~8 30 minute ones) my program can't 'see' the device (it
> uses libusb10) any more (it reconnects each acquisition for $REASONS). Also
> pretty weirdly usbconfig can't see it either(!).
> > >>>
> > >>> What is printed in dmesg? Maybe the device has a problem.
> > >>
> > >> There is nothing in dmesg - no disconnect / reconnect etc.
> > >>
> > >> If I hold the user space process in gdb 'forever' (eg over night)
> usbconfig doesn't see the device, but the moment I quit the user space
> process it can be seen again.
> > >
> > > Does it mean that the file descriptor opened for ugen has a chance to
> > > be closed ?
> >
> > The USB stack will wait for all FDs to be closed during detach also via
> > destroy_dev().
> So my guess was correct.  Do you agree that this behaviour is wrong ?
>
> In fact I saw something similar with apcupsd and either usb/com adapters
> or native usb control card for APC UPSes.  For reasons I do not understand,
> these devices are often disconnected.  For older versions of apcupsd,
> it required restart for newly reattached device to be recreated in /dev.
> Sometimes it hangs whole usb stack.
>
> Newer apcupsd seems to open /dev/ugen only for the duration of the query,
> which makes the erratic behaviour is much less likely, but could still
> cause
> breakage when device disappear while apcupsd has it opened.
>

Is there a form of destroy_dev() that does a revoke on all open instances?
Eg, this is gone, you can't use it anymore, and all further attempts to use
the device will generate an error, but in the mean time we destroy the
device and let the detach routine get on with life. waiting may make sense
when you are merely unloading the driver (and getting to the detach routine
that way), but when the device is gone, I've come around to the point of
view that we should just destroy it w/o waiting for closes and anybody that
touches it afterwards gets an error and has to cope with the error. But
even in the unload case, we maybe we shouldn't get to the detach routine
unless we're forcing and/or the detach routine just returns EBUSY since the
only one that knows what dev_t's are associated with the device_t is the
driver itself.

Warner

>
> > >
> > > I suspect that usb subsystem tried to destroy the device but some
> internal
> > > refcounting prevents it.  Proper use of destroy_dev(_cb)(9) avoids
> > > the issue.
> >
> > --HPS
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
>

From owner-freebsd-hackers@freebsd.org  Sat Mar  9 19:23:57 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2E8C91533181
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Sat,  9 Mar 2019 19:23:57 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 7B8916EDDD
 for <freebsd-hackers@freebsd.org>; Sat,  9 Mar 2019 19:23:56 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x29JNV95026317
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Sat, 9 Mar 2019 21:23:34 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x29JNV95026317
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id x29JNUJK026315;
 Sat, 9 Mar 2019 21:23:30 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Sat, 9 Mar 2019 21:23:30 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Warner Losh <imp@bsdimp.com>
Cc: Hans Petter Selasky <hps@selasky.org>,
 FreeBSD Hackers <freebsd-hackers@freebsd.org>,
 "O'Connor, Daniel" <darius@dons.net.au>
Subject: Re: USB stack getting confused
Message-ID: <20190309192330.GO2492@kib.kiev.ua>
References: <E0371188-FD0A-47E1-8378-40239F5C6622@dons.net.au>
 <f3e6e30b-8b62-546b-2b51-e841f2e645bd@selasky.org>
 <3B29D870-41F9-46AF-B9F3-03106DEC417D@dons.net.au>
 <20190309152613.GM2492@kib.kiev.ua>
 <ea6e2690-1ad7-6c06-49e5-c528013f26c0@selasky.org>
 <20190309162640.GN2492@kib.kiev.ua>
 <CANCZdfr9jRcXQeZWMPKSMvUB5u7kE0eDvbuKrtGvuUDYOr=n4A@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CANCZdfr9jRcXQeZWMPKSMvUB5u7kE0eDvbuKrtGvuUDYOr=n4A@mail.gmail.com>
User-Agent: Mutt/1.11.3 (2019-02-01)
X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM,
 NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 09 Mar 2019 19:23:57 -0000

On Sat, Mar 09, 2019 at 11:41:31AM -0700, Warner Losh wrote:
> On Sat, Mar 9, 2019 at 11:25 AM Konstantin Belousov <kostikbel@gmail.com>
> wrote:
> 
> > On Sat, Mar 09, 2019 at 04:42:50PM +0100, Hans Petter Selasky wrote:
> > > On 3/9/19 4:26 PM, Konstantin Belousov wrote:
> > > > On Sat, Mar 09, 2019 at 08:59:30PM +1030, O'Connor, Daniel wrote:
> > > >>
> > > >>
> > > >>> On 9 Mar 2019, at 19:30, Hans Petter Selasky <hps@selasky.org>
> > wrote:
> > > >>> On 3/9/19 12:08 AM, O'Connor, Daniel wrote:
> > > >>>> My program normally runs continually doing acquisitions of data for
> > N seconds, doing some checks and restarting. After a while (~30 1 minute
> > acquisitions or ~8 30 minute ones) my program can't 'see' the device (it
> > uses libusb10) any more (it reconnects each acquisition for $REASONS). Also
> > pretty weirdly usbconfig can't see it either(!).
> > > >>>
> > > >>> What is printed in dmesg? Maybe the device has a problem.
> > > >>
> > > >> There is nothing in dmesg - no disconnect / reconnect etc.
> > > >>
> > > >> If I hold the user space process in gdb 'forever' (eg over night)
> > usbconfig doesn't see the device, but the moment I quit the user space
> > process it can be seen again.
> > > >
> > > > Does it mean that the file descriptor opened for ugen has a chance to
> > > > be closed ?
> > >
> > > The USB stack will wait for all FDs to be closed during detach also via
> > > destroy_dev().
> > So my guess was correct.  Do you agree that this behaviour is wrong ?
> >
> > In fact I saw something similar with apcupsd and either usb/com adapters
> > or native usb control card for APC UPSes.  For reasons I do not understand,
> > these devices are often disconnected.  For older versions of apcupsd,
> > it required restart for newly reattached device to be recreated in /dev.
> > Sometimes it hangs whole usb stack.
> >
> > Newer apcupsd seems to open /dev/ugen only for the duration of the query,
> > which makes the erratic behaviour is much less likely, but could still
> > cause
> > breakage when device disappear while apcupsd has it opened.
> >
> 
> Is there a form of destroy_dev() that does a revoke on all open instances?
> Eg, this is gone, you can't use it anymore, and all further attempts to use
> the device will generate an error, but in the mean time we destroy the
> device and let the detach routine get on with life. waiting may make sense
> when you are merely unloading the driver (and getting to the detach routine
> that way), but when the device is gone, I've come around to the point of
> view that we should just destroy it w/o waiting for closes and anybody that
> touches it afterwards gets an error and has to cope with the error. But
> even in the unload case, we maybe we shouldn't get to the detach routine
> unless we're forcing and/or the detach routine just returns EBUSY since the
> only one that knows what dev_t's are associated with the device_t is the
> driver itself.
You are asking very basic questions about devfs there.

destroy_dev(9) waits for two things:
- that all threads left the cdevsw methods for the given device;
- that all cdevpriv destructors finished running.
To facilitate waking up threads potentially sleeping inside the cdevsw
methods, drivers might implement d_purge method which must weed out sleeping
threads from inside the code in the bound time.

After that we return from destroy_dev(9) and guarantee that no new calls
into cdevsw is done for this device.  devfs magic consumes  the fo_ and
VOP_ calls and does not allow them to reach into the driver.

So what usb does there is actively defeating existing mechanism by
keeping internal refcount on opens and refusing to call destroy_dev()
until the count goes to zero (I did not read the usb code, but I believe
that I am not too wrong).  Would usb core just destroy_dev() when the
physical device goes away, then at worst the existing file descriptors
opened against the lost devices would become dead (not same dead as
terminals after revoke(2), but very similar).

If the problem is due to keeping some instance data for the opened device,
then cdevpriv might be the better fit (at least the KPI was designed
to be) than blocking destroy until all users are gone.

From owner-freebsd-hackers@freebsd.org  Sat Mar  9 20:57:40 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id B9D281536CCA
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Sat,  9 Mar 2019 20:57:40 +0000 (UTC) (envelope-from hps@selasky.org)
Received: from mail.turbocat.net (turbocat.net [88.99.82.50])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 683457253D
 for <freebsd-hackers@freebsd.org>; Sat,  9 Mar 2019 20:57:39 +0000 (UTC)
 (envelope-from hps@selasky.org)
Received: from hps2016.home.selasky.org (unknown [176.74.212.121])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by mail.turbocat.net (Postfix) with ESMTPSA id 86CCB26011B;
 Sat,  9 Mar 2019 21:57:36 +0100 (CET)
Subject: Re: USB stack getting confused
To: Warner Losh <imp@bsdimp.com>, Konstantin Belousov <kostikbel@gmail.com>
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>,
 "O'Connor, Daniel" <darius@dons.net.au>
References: <E0371188-FD0A-47E1-8378-40239F5C6622@dons.net.au>
 <f3e6e30b-8b62-546b-2b51-e841f2e645bd@selasky.org>
 <3B29D870-41F9-46AF-B9F3-03106DEC417D@dons.net.au>
 <20190309152613.GM2492@kib.kiev.ua>
 <ea6e2690-1ad7-6c06-49e5-c528013f26c0@selasky.org>
 <20190309162640.GN2492@kib.kiev.ua>
 <CANCZdfr9jRcXQeZWMPKSMvUB5u7kE0eDvbuKrtGvuUDYOr=n4A@mail.gmail.com>
From: Hans Petter Selasky <hps@selasky.org>
Message-ID: <44116887-3dc8-d3a9-e9b6-c32a6876b1ec@selasky.org>
Date: Sat, 9 Mar 2019 21:57:13 +0100
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:60.0) Gecko/20100101
 Thunderbird/60.4.0
MIME-Version: 1.0
In-Reply-To: <CANCZdfr9jRcXQeZWMPKSMvUB5u7kE0eDvbuKrtGvuUDYOr=n4A@mail.gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Rspamd-Queue-Id: 683457253D
X-Spamd-Bar: ------
Authentication-Results: mx1.freebsd.org;
 spf=pass (mx1.freebsd.org: domain of hps@selasky.org designates 88.99.82.50 as
 permitted sender) smtp.mailfrom=hps@selasky.org
X-Spamd-Result: default: False [-6.26 / 15.00]; ARC_NA(0.00)[];
 RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4];
 R_SPF_ALLOW(-0.20)[+a:mail.turbocat.net];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain];
 DMARC_NA(0.00)[selasky.org]; TO_MATCH_ENVRCPT_SOME(0.00)[];
 TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[mail.turbocat.net];
 NEURAL_HAM_SHORT(-0.67)[-0.671,0];
 IP_SCORE(-3.28)[ip: (-9.49), ipnet: 88.99.0.0/16(-4.66), asn: 24940(-2.23),
 country: DE(-0.01)]; FROM_EQ_ENVFROM(0.00)[];
 R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+];
 ASN(0.00)[asn:24940, ipnet:88.99.0.0/16, country:DE];
 MID_RHS_MATCH_FROM(0.00)[]; RCVD_TLS_ALL(0.00)[];
 RCVD_COUNT_TWO(0.00)[2]
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 09 Mar 2019 20:57:41 -0000

On 3/9/19 7:41 PM, Warner Losh wrote:
>> Newer apcupsd seems to open /dev/ugen only for the duration of the query,
>> which makes the erratic behaviour is much less likely, but could still
>> cause
>> breakage when device disappear while apcupsd has it opened.
>>
> Is there a form of destroy_dev() that does a revoke on all open instances?
> Eg, this is gone, you can't use it anymore, and all further attempts to use
> the device will generate an error, but in the mean time we destroy the
> device and let the detach routine get on with life. waiting may make sense
> when you are merely unloading the driver (and getting to the detach routine
> that way), but when the device is gone, I've come around to the point of
> view that we should just destroy it w/o waiting for closes and anybody that
> touches it afterwards gets an error and has to cope with the error. But
> even in the unload case, we maybe we shouldn't get to the detach routine
> unless we're forcing and/or the detach routine just returns EBUSY since the
> only one that knows what dev_t's are associated with the device_t is the
> driver itself.

Hi,

There are multiple issues here:

1) The USB stack use device numbers from device_get_unit() when creating 
character devices. That means it must wait at least until the VNODE in 
/dev is removed, and the same device name can be re-used.

2) When disconnecting the "struct file" from the USB, lost memory might 
pile up if these daemons which are typically created by devd don't get 
killed.

Many of these applications are using libusb. We can add a heartbeat 
thread inside there to simply close the ugen device handle when we 
understand the device is gone. That will close 99% of these issues.

--HPS


--HPS


From owner-freebsd-hackers@freebsd.org  Sat Mar  9 21:40:29 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 52B7B1538274
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Sat,  9 Mar 2019 21:40:29 +0000 (UTC) (envelope-from hps@selasky.org)
Received: from mail.turbocat.net (turbocat.net [88.99.82.50])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 93766744CF
 for <freebsd-hackers@freebsd.org>; Sat,  9 Mar 2019 21:40:28 +0000 (UTC)
 (envelope-from hps@selasky.org)
Received: from hps2016.home.selasky.org (unknown [176.74.212.121])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by mail.turbocat.net (Postfix) with ESMTPSA id 6CD1D260377;
 Sat,  9 Mar 2019 22:40:26 +0100 (CET)
Subject: Re: USB stack getting confused
To: Rozhuk Ivan <rozhuk.im@gmail.com>,
 Konstantin Belousov <kostikbel@gmail.com>
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>,
 "O'Connor, Daniel" <darius@dons.net.au>
References: <E0371188-FD0A-47E1-8378-40239F5C6622@dons.net.au>
 <f3e6e30b-8b62-546b-2b51-e841f2e645bd@selasky.org>
 <3B29D870-41F9-46AF-B9F3-03106DEC417D@dons.net.au>
 <20190309152613.GM2492@kib.kiev.ua>
 <ea6e2690-1ad7-6c06-49e5-c528013f26c0@selasky.org>
 <20190309162640.GN2492@kib.kiev.ua> <20190309222827.5407ddbf@rimwks>
From: Hans Petter Selasky <hps@selasky.org>
Message-ID: <e2f04e0b-0f52-686a-5253-caa25a498182@selasky.org>
Date: Sat, 9 Mar 2019 22:40:02 +0100
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:60.0) Gecko/20100101
 Thunderbird/60.4.0
MIME-Version: 1.0
In-Reply-To: <20190309222827.5407ddbf@rimwks>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Rspamd-Queue-Id: 93766744CF
X-Spamd-Bar: ------
Authentication-Results: mx1.freebsd.org;
 spf=pass (mx1.freebsd.org: domain of hps@selasky.org designates 88.99.82.50 as
 permitted sender) smtp.mailfrom=hps@selasky.org
X-Spamd-Result: default: False [-6.47 / 15.00]; ARC_NA(0.00)[];
 RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4];
 R_SPF_ALLOW(-0.20)[+a:mail.turbocat.net];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]; TAGGED_RCPT(0.00)[];
 MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[selasky.org];
 TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[];
 MX_GOOD(-0.01)[cached: mail.turbocat.net];
 NEURAL_HAM_SHORT(-0.88)[-0.878,0];
 IP_SCORE(-3.28)[ip: (-9.49), ipnet: 88.99.0.0/16(-4.66), asn: 24940(-2.23),
 country: DE(-0.01)]; FREEMAIL_TO(0.00)[gmail.com];
 FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[];
 MIME_TRACE(0.00)[0:+];
 ASN(0.00)[asn:24940, ipnet:88.99.0.0/16, country:DE];
 MID_RHS_MATCH_FROM(0.00)[]; RCVD_TLS_ALL(0.00)[];
 RCVD_COUNT_TWO(0.00)[2]
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 09 Mar 2019 21:40:29 -0000

On 3/9/19 8:28 PM, Rozhuk Ivan wrote:
> On Sat, 9 Mar 2019 18:26:40 +0200
> Konstantin Belousov <kostikbel@gmail.com> wrote:
> 
>> In fact I saw something similar with apcupsd and either usb/com
>> adapters or native usb control card for APC UPSes.  For reasons I do
>> not understand, these devices are often disconnected.  For older
>> versions of apcupsd, it required restart for newly reattached device
>> to be recreated in /dev. Sometimes it hangs whole usb stack.
>>
>> Newer apcupsd seems to open /dev/ugen only for the duration of the
>> query, which makes the erratic behaviour is much less likely, but
>> could still cause breakage when device disappear while apcupsd has it
>> opened.
>>
> 
> Same problem with usb sound cards.
> I try to fix it, but fail with dsp, only mixer can be fixed with small code change.
> https://reviews.freebsd.org/D11140
> 

Hi,

How will these apps detect that they need to open the new /dev/mixer node?

I mean, after hang is fixed, mixer app will still try to query the old 
file handle forever?

--HPS

From owner-freebsd-hackers@freebsd.org  Sat Mar  9 22:56:15 2019
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 364FC153B914
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Sat,  9 Mar 2019 22:56:15 +0000 (UTC)
 (envelope-from rozhuk.im@gmail.com)
Received: from mail-lf1-x130.google.com (mail-lf1-x130.google.com
 [IPv6:2a00:1450:4864:20::130])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 0D21877E9C
 for <freebsd-hackers@freebsd.org>; Sat,  9 Mar 2019 22:56:14 +0000 (UTC)
 (envelope-from rozhuk.im@gmail.com)
Received: by mail-lf1-x130.google.com with SMTP id f16so822196lfk.12
 for <freebsd-hackers@freebsd.org>; Sat, 09 Mar 2019 14:56:13 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=from:date:to:cc:subject:message-id:in-reply-to:references
 :mime-version:content-transfer-encoding;
 bh=U+R8/bTK3ypl25ePSe246exdMmdiI8tlt9vUM1360wI=;
 b=dPgKmX5DRVIpVfSK3HMKtZFx063b827+L/8BGytYaYezu6jtFf+Jx+IzH2/PRV192M
 FPtfbJJbjvKOpmGd+wlLxcxGP+Pv58Q6EqImuQncQP1LBfNlfQwbY2stkO4x34C89jh+
 vK0bpSck2lmfzxQWCeSzrAT8jbzwDYqcWX60IGqMNj+T1IVspXX1D4oGMqgMLAf+lmWx
 LgKX6+7QJmBs8zwIFiYYMr0HDR2iKvJubrHCeuoN4fgTZrS+TiY0G15dZ/YJajHGzMmN
 yQC0AygpJnjjPQaHJ+TR+ggPKIA7pjiVGSArX5Qs1aX5PMMBNYxXMuSPJhTbuCSF2Bl9
 CFug==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:date:to:cc:subject:message-id:in-reply-to
 :references:mime-version:content-transfer-encoding;
 bh=U+R8/bTK3ypl25ePSe246exdMmdiI8tlt9vUM1360wI=;
 b=nU/Z3j2Ly9nKemK+ts8d7FFZ57x05Qu+T1FB9lsplHSTWw/LwhzW85/lEWpBWOJgui
 5T8/d1qRoyH07Hoo4lFKwl5WdjrmHyJ00HVnmnkD5WyokaYQXcsxF1KokMNiUK124M/4
 j4dPg+XN3zYKzYU3dTiQHJYTIPtRYq24gaECFujvZPYSn4q9XM8gBr2p8mnvb8+o+Bou
 g+ue7Op8i9F8fosSWolj6YDlaT6RjEqOEPPxUk7o/oHowpxH0o/CbzFmBgjMnXUYNi64
 k1M/tqTwy5xWYJ05sF8cHyh4KJm/yMb0juGb2fsIFCqgCE35qFqdpwN3tYC02pR6ELLX
 T/bg==
X-Gm-Message-State: APjAAAWrT8VpPySiRZl9A7S2Ib/63l9TKmTOMQ7hZn1yyu0ss1M/F8iE
 Wu7LnjsO2f1jdUtFpF5o5qE=
X-Google-Smtp-Source: APXvYqzXBjyRwwO49nWDgFXE13ZC1k1Pxt3x8qHdS7sTGwnjNlB47QmsFvRPBBMRoftLUTJtai0LMg==
X-Received: by 2002:ac2:5228:: with SMTP id i8mr13587152lfl.162.1552172171278; 
 Sat, 09 Mar 2019 14:56:11 -0800 (PST)
Received: from rimwks ([2001:470:1f15:3d8:7285:c2ff:fe43:675b])
 by smtp.gmail.com with ESMTPSA id u18sm338516lfd.15.2019.03.09.14.56.10
 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256);
 Sat, 09 Mar 2019 14:56:10 -0800 (PST)
From: Rozhuk Ivan <rozhuk.im@gmail.com>
X-Google-Original-From: Rozhuk Ivan <Rozhuk.IM@gmail.com>
Date: Sun, 10 Mar 2019 01:56:08 +0300
To: Hans Petter Selasky <hps@selasky.org>
Cc: Konstantin Belousov <kostikbel@gmail.com>, FreeBSD Hackers
 <freebsd-hackers@freebsd.org>, "O'Connor, Daniel" <darius@dons.net.au>
Subject: Re: USB stack getting confused
Message-ID: <20190310015608.4d32e14f@rimwks>
In-Reply-To: <e2f04e0b-0f52-686a-5253-caa25a498182@selasky.org>
References: <E0371188-FD0A-47E1-8378-40239F5C6622@dons.net.au>
 <f3e6e30b-8b62-546b-2b51-e841f2e645bd@selasky.org>
 <3B29D870-41F9-46AF-B9F3-03106DEC417D@dons.net.au>
 <20190309152613.GM2492@kib.kiev.ua>
 <ea6e2690-1ad7-6c06-49e5-c528013f26c0@selasky.org>
 <20190309162640.GN2492@kib.kiev.ua>
 <20190309222827.5407ddbf@rimwks>
 <e2f04e0b-0f52-686a-5253-caa25a498182@selasky.org>
X-Mailer: Claws Mail 3.17.3 (GTK+ 2.24.32; amd64-portbld-freebsd12.0)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-Rspamd-Queue-Id: 0D21877E9C
X-Spamd-Bar: ------
Authentication-Results: mx1.freebsd.org;
 dkim=pass header.d=gmail.com header.s=20161025 header.b=dPgKmX5D;
 dmarc=pass (policy=none) header.from=gmail.com;
 spf=pass (mx1.freebsd.org: domain of rozhukim@gmail.com designates
 2a00:1450:4864:20::130 as permitted sender) smtp.mailfrom=rozhukim@gmail.com
X-Spamd-Result: default: False [-6.25 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[];
 R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36];
 FREEMAIL_FROM(0.00)[gmail.com]; RCVD_COUNT_THREE(0.00)[3];
 TO_DN_ALL(0.00)[];
 MX_GOOD(-0.01)[cached: alt3.gmail-smtp-in.l.google.com];
 DKIM_TRACE(0.00)[gmail.com:+];
 DMARC_POLICY_ALLOW(-0.50)[gmail.com,none];
 NEURAL_HAM_SHORT(-0.98)[-0.977,0]; FROM_EQ_ENVFROM(0.00)[];
 MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[];
 FREEMAIL_ENVFROM(0.00)[gmail.com];
 ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US];
 TAGGED_FROM(0.00)[];
 DWL_DNSWL_NONE(0.00)[gmail.com.dwl.dnswl.org : 127.0.5.0];
 ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; FROM_HAS_DN(0.00)[];
 RCPT_COUNT_THREE(0.00)[4]; NEURAL_HAM_LONG(-1.00)[-1.000,0];
 MIME_GOOD(-0.10)[text/plain];
 PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org];
 TO_MATCH_ENVRCPT_SOME(0.00)[];
 RCVD_IN_DNSWL_NONE(0.00)[0.3.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.5.4.1.0.0.a.2.list.dnswl.org
 : 127.0.5.0]; 
 IP_SCORE(-2.76)[ip: (-9.34), ipnet: 2a00:1450::/32(-2.32), asn: 15169(-2.06),
 country: US(-0.07)]; MID_RHS_NOT_FQDN(0.50)[];
 FREEMAIL_CC(0.00)[gmail.com]
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 09 Mar 2019 22:56:15 -0000

On Sat, 9 Mar 2019 22:40:02 +0100
Hans Petter Selasky <hps@selasky.org> wrote:

> > Same problem with usb sound cards.
> > I try to fix it, but fail with dsp, only mixer can be fixed with
> > small code change. https://reviews.freebsd.org/D11140
> >   
> 
> Hi,
> 
> How will these apps detect that they need to open the new /dev/mixer
> node?
> 
> I mean, after hang is fixed, mixer app will still try to query the
> old file handle forever?
> 

Main problem for me is: usb device lost/reconnected, new device connected,
but FreeBSD does nothink because USB stack hang - it wait for all fd closed for mixer and dsp.

Apps can be rewrited/pathed: on dev lost - get error on operations with fd, then try to reopen it.
I dont remember now how that work in patch, it is undone.
Another OSS issue - apps do not react on hw.snd.default_unit change.

I mitigate reconnect issue in hardware:
- switch to sound via HDMI
- add real LC filter to home power line: I have long USB link from PC to work place USB HUB with
kb, mouse, usb sound ...,and every time after refregerator start/stop I got lost USB link to hub,
LC filter fix this. After that kb, mouse and other usb devices does not replug untill I close
all apps that have opened fd from mixer and dsp.