From owner-svn-src-head@freebsd.org  Sat Jun  8 23:35:58 2019
Return-Path: <owner-svn-src-head@freebsd.org>
Delivered-To: svn-src-head@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 994A015B66D0;
 Sat,  8 Jun 2019 23:35:58 +0000 (UTC)
 (envelope-from jonlooney@gmail.com)
Received: from mail-ed1-f44.google.com (mail-ed1-f44.google.com
 [209.85.208.44])
 (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
 server-signature RSA-PSS (4096 bits)
 client-signature RSA-PSS (2048 bits) client-digest SHA256)
 (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 10B0B77880;
 Sat,  8 Jun 2019 23:35:57 +0000 (UTC)
 (envelope-from jonlooney@gmail.com)
Received: by mail-ed1-f44.google.com with SMTP id k8so6930987eds.7;
 Sat, 08 Jun 2019 16:35:57 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=B3xQYw66SVv/rNQdesJ/VynH/G2IegPCkS/sdXuA9sM=;
 b=sw2T1/OF2cPGi/CL6qZbQtvBk2iZr2EN8EBEBFx7XgWBaK5JrwzMEZRuNfBYZxDiFe
 gzfXtPHchAsG82SbTATsIrWeKI4okrt61iTf/ijaLayuwOeEs9X8kID0L6STyBQUSUnd
 NXF+vHBOr1dfKUptdcEUEI+VJ2UdIU2qsNd0RCgIeSGh/KlKHW2pVEqrMdZ7W6F9vq31
 E7jy4nRm5QpYHmTT23ha7KvER+6EjaR6bxfM+EMvvY66yDmw6TdzBIJxbv8VgcRfR9FJ
 YOOgWPQPIVuYA2674OKTL+uJc6TX+LJITMle+47w10qIRtt3/8WlsAttmb2sjDvrYdxn
 lUyQ==
X-Gm-Message-State: APjAAAWPGDjcok/ThEeEM25inZZ+A2GD1eM3jOyV74s852omWzbcaWfp
 Y3059MgyxI4Ro94nLTftOs3IY12f
X-Google-Smtp-Source: APXvYqzDrp/wgwxO5eSC07e73TpQr/HbQSt67O20xrc6vB9WaJa+MZO+AGQrwFBT3sruRL/MXL366w==
X-Received: by 2002:a50:ac4a:: with SMTP id w10mr43770071edc.33.1560036950530; 
 Sat, 08 Jun 2019 16:35:50 -0700 (PDT)
Received: from mail-wm1-f44.google.com (mail-wm1-f44.google.com.
 [209.85.128.44])
 by smtp.gmail.com with ESMTPSA id bo11sm1109262ejb.12.2019.06.08.16.35.49
 (version=TLS1_3 cipher=AEAD-AES128-GCM-SHA256 bits=128/128);
 Sat, 08 Jun 2019 16:35:50 -0700 (PDT)
Received: by mail-wm1-f44.google.com with SMTP id a15so5256542wmj.5;
 Sat, 08 Jun 2019 16:35:49 -0700 (PDT)
X-Received: by 2002:a7b:ce8a:: with SMTP id q10mr7833920wmj.109.1560036949651; 
 Sat, 08 Jun 2019 16:35:49 -0700 (PDT)
MIME-Version: 1.0
References: <201906081826.x58IQnQe067743@repo.freebsd.org>
 <760B9EAA-6FD7-4B23-9BE3-5011B3F4C21C@gmail.com>
In-Reply-To: <760B9EAA-6FD7-4B23-9BE3-5011B3F4C21C@gmail.com>
From: "Jonathan T. Looney" <jtl@freebsd.org>
Date: Sat, 8 Jun 2019 19:35:38 -0400
X-Gmail-Original-Message-ID: <CADrOrmuKO+H1tweQ3Ctj6XixvfUBATSF+zmE5Rub2s=mgiaqQg@mail.gmail.com>
Message-ID: <CADrOrmuKO+H1tweQ3Ctj6XixvfUBATSF+zmE5Rub2s=mgiaqQg@mail.gmail.com>
Subject: Re: svn commit: r348810 - head/sys/x86/x86
To: Enji Cooper <yaneurabeya@gmail.com>
Cc: src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org, 
 svn-src-head@freebsd.org
X-Rspamd-Queue-Id: 10B0B77880
X-Spamd-Bar: ------
Authentication-Results: mx1.freebsd.org
X-Spamd-Result: default: False [-6.97 / 15.00];
 NEURAL_HAM_MEDIUM(-1.00)[-0.999,0];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]; REPLY(-4.00)[];
 NEURAL_HAM_SHORT(-0.97)[-0.969,0]
Content-Type: text/plain; charset="UTF-8"
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: svn-src-head@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: SVN commit messages for the src tree for head/-current
 <svn-src-head.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-head>,
 <mailto:svn-src-head-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-head/>
List-Post: <mailto:svn-src-head@freebsd.org>
List-Help: <mailto:svn-src-head-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-head>,
 <mailto:svn-src-head-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 08 Jun 2019 23:35:58 -0000

Hi Enji,

On Sat, Jun 8, 2019 at 3:58 PM Enji Cooper <yaneurabeya@gmail.com> wrote:

> > Modified:
> >  head/sys/x86/x86/mca.c
> >
> > Modified: head/sys/x86/x86/mca.c
> >
> ==============================================================================
> > --- head/sys/x86/x86/mca.c    Sat Jun  8 17:49:17 2019    (r348809)
> > +++ head/sys/x86/x86/mca.c    Sat Jun  8 18:26:48 2019    (r348810)
> > @@ -86,7 +86,6 @@ struct amd_et_state {
> >
> > struct mca_internal {
> >    struct mca_record rec;
> > -    int        logged;
> >    STAILQ_ENTRY(mca_internal) link;
> > };
> >
> > @@ -101,6 +100,7 @@ static MALLOC_DEFINE(M_MCA, "MCA", "Machine Check
> Arch
> >
> > static volatile int mca_count;    /* Number of records stored. */
> > static int mca_banks;        /* Number of per-CPU register banks. */
> > +static int mca_maxcount = -1;    /* Limit on records stored. (-1 =
> unlimited) */
> >
> > static SYSCTL_NODE(_hw, OID_AUTO, mca, CTLFLAG_RD, NULL,
> >     "Machine Check Architecture");
> > @@ -125,10 +125,11 @@ SYSCTL_INT(_hw_mca, OID_AUTO, erratum383,
> CTLFLAG_RDTU
> > static STAILQ_HEAD(, mca_internal) mca_freelist;
> > static int mca_freecount;
> > static STAILQ_HEAD(, mca_internal) mca_records;
> > +static STAILQ_HEAD(, mca_internal) mca_pending;
> > static struct callout mca_timer;
> > static int mca_ticks = 3600;    /* Check hourly by default. */
> > static struct taskqueue *mca_tq;
> > -static struct task mca_refill_task, mca_scan_task;
> > +static struct task mca_resize_task, mca_scan_task;
> > static struct mtx mca_lock;
> >
> > static unsigned int
> > @@ -557,32 +558,49 @@ mca_check_status(int bank, struct mca_record *rec)
> > }
> >
> > static void
> > -mca_fill_freelist(void)
> > +mca_resize_freelist(void)
> > {
> > -    struct mca_internal *rec;
> > -    int desired;
> > +    struct mca_internal *next, *rec;
> > +    STAILQ_HEAD(, mca_internal) tmplist;
> > +    int count, i, desired_max, desired_min;
> >
> >    /*
> >     * Ensure we have at least one record for each bank and one
> > -     * record per CPU.
> > +     * record per CPU, but no more than twice that amount.
> >     */
> > -    desired = imax(mp_ncpus, mca_banks);
> > +    desired_min = imax(mp_ncpus, mca_banks);
> > +    desired_max = imax(mp_ncpus, mca_banks) * 2;
> > +    STAILQ_INIT(&tmplist);
> >    mtx_lock_spin(&mca_lock);
> > -    while (mca_freecount < desired) {
> > +    while (mca_freecount > desired_max) {
> > +        rec = STAILQ_FIRST(&mca_freelist);
> > +        KASSERT(rec != NULL, ("mca_freecount is %d, but list is empty",
> > +            mca_freecount));
> > +        STAILQ_REMOVE_HEAD(&mca_freelist, link);
> > +        mca_freecount--;
> > +        STAILQ_INSERT_TAIL(&tmplist, rec, link);
> > +    }
> > +    while (mca_freecount < desired_min) {
> > +        count = desired_min - mca_freecount;
> >        mtx_unlock_spin(&mca_lock);
>
> Should this also be outside the loop, like it was before, to ensure the
> locking is correct, the lock is always unlocked, and the current thread
> yields to the other threads, or should the lock be held over both loops?
>

I think it is correct as is, but welcome feedback if you still think it is
incorrect after my explanation.

Here is what the new code does:

1. While holding the lock, we figure out how many entries we want to add to
the free list. We store it in a local variable (count).

2. We drop the lock and allocate count entries.

3. We reacquire the lock, add those entries to the list, and increment
mca_freecount by count.

4. We then check to see if we need to allocate more entries (which might
occur if, for example, the free list had been depleted by other threads
between steps 1 and 3, while we didn't hold the lock).

The key thing is that we must hold the lock while reading or
writing mca_freecount and mca_freelist. This code meets that criteria.

Even if mca_freecount changes between steps 1 and 3 above, we still add
count entries to the list. And, we still increment mca_freecount by count.
So, in the end, mca_freecount should match the number of entries on the
list. (It might be more or less than our target, but it will be accurate.)

The only reason we need to drop the lock is to allocate more entries, which
we can't do while holding a spin lock. I don't think there is particularly
an impetus to want to yield more often than that.

Also, note that you'll only ever enter one of the two loops. (If you are
freeing excess entries, you won't need to allocate more to meet a shortage.)

Finally, note that you'll never leave this function with less than the
desired minimum, but you might leave with more than the desired maximum (if
entries are freed by other threads between steps 1 and 3). I considered
that to be an acceptable tradeoff between complexity and functionality.
And, in any case, if you are using and freeing so many entries so quickly
that this occurs, it seems like two things are true: you could probably use
the extra headroom on the free list and also it is very likely that
something will call the resize task again very soon.

If I haven't explained this well enough and you still have questions, feel
free to let me know.

Jonathan