From owner-dev-commits-src-all@freebsd.org Mon Aug 2 03:44:05 2021 Return-Path: Delivered-To: dev-commits-src-all@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 7287666AFF3; Mon, 2 Aug 2021 03:44:05 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-lj1-x22d.google.com (mail-lj1-x22d.google.com [IPv6:2a00:1450:4864:20::22d]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4GdP4j28yFz4hV5; Mon, 2 Aug 2021 03:44:05 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by mail-lj1-x22d.google.com with SMTP id h9so22142091ljq.8; Sun, 01 Aug 2021 20:44:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=6bZTynY+Z2ApvDJDwouTTpLSDunc7mJcE3e8Bzl/Ask=; b=hXGVJJcacqNSLEzjbHJkZ2YFu9/gNhfTPIkrfU4Zifsi8hGwwhtUsRhepp6KSyWMJ5 6dLjn91BNof0tWguH1B9dnP4GheXHd1l3U54D2kiLlF2s16HV0Z584m+JC8Vsx056Cwz inEVoF831EZn6sEHi5uic5sNqldq6GiFSZlfoXDdlFwKcRWicHDBEWWpvIM+wHRXOzk3 oJJ2rnb/tNe+k1QPlhe49nRiO0J3AwFfAQgSAPNQefbI7j3UG+48M50++6wu7x9VD4/v I4hya+mqTJBgXFoNlORsKQ3OUEiIM54D3EUTjeUgNn3QhO90T8JhSw1/QIL3DhsV67IF UULQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=6bZTynY+Z2ApvDJDwouTTpLSDunc7mJcE3e8Bzl/Ask=; b=JygNfEhtfXszzDWMNowc1KwRLGrKSUBMEX887Z8lkaRe9RP2OQODkGxPcENKk0lKtV rz13gKOk90fkCF1ZADNXXcsk2sR2yDHMy4+XPlISqPaS9B8HRT4EpCl1MbRZ1WHwXZLp kL6exYBsK4Ew0L6yA8OtjpjINS1+gHui1X2HEBmGUiCB7N4kIcr8tqyZ8eVvA/Dzm3Sl 95ZjPO92o6f/eLQgEzUugZ0Xj3c7adwa4hu3w03vkK8UELOaRaJhQsbFPe/lFEbDh5xp X2XnH2LoFit5HqeqG6ijVY/ROJTRU4AyZL320ZG87DXfzDThBYXQZentHSBDsOQZaeLl V7SA== X-Gm-Message-State: AOAM531L8z3u4zob5l33YJgPQWKHrUdAZ1SPbzumtNbeSZ1J62RIQRdt v5N9Y1gkfJewTwIxBZlS7SzXzHHQzAJ7c+qgsxryrDqr X-Google-Smtp-Source: ABdhPJzGnasRktCRFkXZlLWzldSube0yky6y+z5ZXFs7su24UQKS9t/vrs/PgmhDVCQC+ZRCSQLtBZ6YZ+9VPtqVDIk= X-Received: by 2002:a2e:b81c:: with SMTP id u28mr9610751ljo.483.1627875838332; Sun, 01 Aug 2021 20:43:58 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a2e:a40d:0:0:0:0:0 with HTTP; Sun, 1 Aug 2021 20:43:57 -0700 (PDT) In-Reply-To: <202108020258.1722walG094152@gitrepo.freebsd.org> References: <202108020258.1722walG094152@gitrepo.freebsd.org> From: Mateusz Guzik Date: Mon, 2 Aug 2021 05:43:57 +0200 Message-ID: Subject: Re: git: 8bb173fb5bc3 - main - sched_ule(4): Use trylock when stealing load. To: Alexander Motin Cc: src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 4GdP4j28yFz4hV5 X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-BeenThere: dev-commits-src-all@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Commit messages for all branches of the src repository List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Aug 2021 03:44:05 -0000 Mostly related, I just had a look at the code and noticed: static uint32_t sched_random(void) { uint32_t *rndptr; rndptr = DPCPU_PTR(randomval); *rndptr = *rndptr * 69069 + 5; return (*rndptr >> 16); } Except randomval starts at 0 for all CPUs. Should be easy to pre-init with an actual random value. Is that something you played with? On 8/2/21, Alexander Motin wrote: > The branch main has been updated by mav: > > URL: > https://cgit.FreeBSD.org/src/commit/?id=8bb173fb5bc33a02d5a4670c9a60bba0ece07bac > > commit 8bb173fb5bc33a02d5a4670c9a60bba0ece07bac > Author: Alexander Motin > AuthorDate: 2021-08-02 02:42:01 +0000 > Commit: Alexander Motin > CommitDate: 2021-08-02 02:42:01 +0000 > > sched_ule(4): Use trylock when stealing load. > > On some load patterns it is possible for several CPUs to try steal > thread from the same CPU despite randomization introduced. It may > cause significant lock contention when holding one queue lock idle > thread tries to acquire another one. Use of trylock on the remote > queue allows both reduce the contention and handle lock ordering > easier. If we can't get lock inside tdq_trysteal() we just return, > allowing tdq_idled() handle it. If it happens in tdq_idled(), then > we repeat search for load skipping this CPU. > > On 2-socket 80-thread Xeon system I am observing dramatic reduction > of the lock spinning time when doing random uncached 4KB reads from > 12 ZVOLs, while IOPS increase from 327K to 403K. > > MFC after: 1 month > --- > sys/kern/sched_ule.c | 39 +++++++++++++++++++++++++-------------- > 1 file changed, 25 insertions(+), 14 deletions(-) > > diff --git a/sys/kern/sched_ule.c b/sys/kern/sched_ule.c > index 028e07efa889..1bdcfb1f793d 100644 > --- a/sys/kern/sched_ule.c > +++ b/sys/kern/sched_ule.c > @@ -300,6 +300,8 @@ static struct tdq tdq_cpu; > #define TDQ_LOCK_ASSERT(t, type) mtx_assert(TDQ_LOCKPTR((t)), (type)) > #define TDQ_LOCK(t) mtx_lock_spin(TDQ_LOCKPTR((t))) > #define TDQ_LOCK_FLAGS(t, f) mtx_lock_spin_flags(TDQ_LOCKPTR((t)), (f)) > +#define TDQ_TRYLOCK(t) mtx_trylock_spin(TDQ_LOCKPTR((t))) > +#define TDQ_TRYLOCK_FLAGS(t, f) mtx_trylock_spin_flags(TDQ_LOCKPTR((t)), > (f)) > #define TDQ_UNLOCK(t) mtx_unlock_spin(TDQ_LOCKPTR((t))) > #define TDQ_LOCKPTR(t) ((struct mtx *)(&(t)->tdq_lock)) > > @@ -989,13 +991,22 @@ tdq_idled(struct tdq *tdq) > if (steal->tdq_load < steal_thresh || > steal->tdq_transferable == 0) > goto restart; > - tdq_lock_pair(tdq, steal); > /* > - * We were assigned a thread while waiting for the locks. > - * Switch to it now instead of stealing a thread. > + * Try to lock both queues. If we are assigned a thread while > + * waited for the lock, switch to it now instead of stealing. > + * If we can't get the lock, then somebody likely got there > + * first so continue searching. > */ > - if (tdq->tdq_load) > - break; > + TDQ_LOCK(tdq); > + if (tdq->tdq_load > 0) { > + mi_switch(SW_VOL | SWT_IDLE); > + return (0); > + } > + if (TDQ_TRYLOCK_FLAGS(steal, MTX_DUPOK) == 0) { > + TDQ_UNLOCK(tdq); > + CPU_CLR(cpu, &mask); > + continue; > + } > /* > * The data returned by sched_highest() is stale and > * the chosen CPU no longer has an eligible thread, or > @@ -1948,18 +1959,18 @@ tdq_trysteal(struct tdq *tdq) > if (steal->tdq_load < steal_thresh || > steal->tdq_transferable == 0) > continue; > - tdq_lock_pair(tdq, steal); > /* > - * If we get to this point, unconditonally exit the loop > - * to bound the time spent in the critcal section. > - * > - * If a thread was added while interrupts were disabled don't > - * steal one here. > + * Try to lock both queues. If we are assigned a thread while > + * waited for the lock, switch to it now instead of stealing. > + * If we can't get the lock, then somebody likely got there > + * first. At this point unconditonally exit the loop to > + * bound the time spent in the critcal section. > */ > - if (tdq->tdq_load > 0) { > - TDQ_UNLOCK(steal); > + TDQ_LOCK(tdq); > + if (tdq->tdq_load > 0) > + break; > + if (TDQ_TRYLOCK_FLAGS(steal, MTX_DUPOK) == 0) > break; > - } > /* > * The data returned by sched_highest() is stale and > * the chosen CPU no longer has an eligible thread. > -- Mateusz Guzik