Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 28 Apr 2023 15:29:07 -0500
From:      Kyle Evans <kevans@freebsd.org>
To:        Alexander Motin <mav@freebsd.org>
Cc:        Kyle Evans <kevans@freebsd.org>, src-committers@freebsd.org,  dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org,  "Sideropoulos, Alexander" <alexander.sideropoulos@netapp.com>, Andrew Gallatin <gallatin@freebsd.org>,  Toomas Soome <tsoome@freebsd.org>
Subject:   Re: git: c1e813d12309 - main - hwpmc: Correct selection of Intel fixed counters.
Message-ID:  <CACNAnaH4ckGvKQkcLkdZGyHeQJszMWKA_6sYnsm92G9Mc3hePg@mail.gmail.com>
In-Reply-To: <5593fcdc-9adb-89a9-3ad8-151a9e7c5c0d@FreeBSD.org>
References:  <202205310005.24V05MTi034088@gitrepo.freebsd.org> <CACNAnaGyE6GunLMN-KNtuVPAMC7MNeRQU1uonaXBnyk_FdCFUw@mail.gmail.com> <5593fcdc-9adb-89a9-3ad8-151a9e7c5c0d@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Apr 20, 2023 at 1:57=E2=80=AFPM Alexander Motin <mav@freebsd.org> w=
rote:
>
> Kyle, Andrew,
>
> On 19.04.2023 16:06, Kyle Evans wrote:
> > On Mon, May 30, 2022 at 7:05=E2=80=AFPM Alexander Motin <mav@freebsd.or=
g> wrote:
> >>
> >> The branch main has been updated by mav:
> >>
> >> URL: https://cgit.FreeBSD.org/src/commit/?id=3Dc1e813d1230915e19a236ec=
687cadc1051841e56
> >>
> >> commit c1e813d1230915e19a236ec687cadc1051841e56
> >> Author:     Alexander Motin <mav@FreeBSD.org>
> >> AuthorDate: 2022-05-30 23:46:48 +0000
> >> Commit:     Alexander Motin <mav@FreeBSD.org>
> >> CommitDate: 2022-05-31 00:05:15 +0000
> >>
> >>      hwpmc: Correct selection of Intel fixed counters.
> >>
> >>      Intel json's use event=3D0 to specify fixed counter number via um=
ask.
> >>      Alternatively fixed counters have equivalent programmable event/u=
mask.
> >>
> >>      MFC after:      1 month
> >> ---
> >>   sys/dev/hwpmc/hwpmc_core.c | 34 +++++++++++++++++++++++++---------
> >>   1 file changed, 25 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/sys/dev/hwpmc/hwpmc_core.c b/sys/dev/hwpmc/hwpmc_core.c
> >> index fce97cbd5b8e..73cfee0b3975 100644
> >> --- a/sys/dev/hwpmc/hwpmc_core.c
> >> +++ b/sys/dev/hwpmc/hwpmc_core.c
> >> @@ -245,15 +245,31 @@ iaf_allocate_pmc(int cpu, int ri, struct pmc *pm=
,
> >>          ev =3D IAP_EVSEL_GET(config);
> >>          umask =3D IAP_UMASK_GET(config);
> >>
> >> -       /* INST_RETIRED.ANY */
> >> -       if (ev =3D=3D 0xC0 && ri !=3D 0)
> >> -               return (EINVAL);
> >> -       /* CPU_CLK_UNHALTED.THREAD */
> >> -       if (ev =3D=3D 0x3C && ri !=3D 1)
> >> -               return (EINVAL);
> >> -       /* CPU_CLK_UNHALTED.REF */
> >> -       if (ev =3D=3D 0x0 && umask =3D=3D 0x3 && ri !=3D 2)
> >> -               return (EINVAL);
> >> +       if (ev =3D=3D 0x0) {
> >> +               if (umask !=3D ri + 1)
> >> +                       return (EINVAL);
> >
> > Hey Alexander,
> >
> > This seems to break system-mode sampling of fixed-mode counters; e.g.,
> > from the manpage, `pmcstat -S instructions`, and I'm not at all
> > familiar with these parts of pmc. We come in once with umask=3D1, ri=3D=
0;
> > then again with umask=3D1, ri=3D1. The latter fails and we try umask=3D=
1,
> > ri=3D2, which again fails, and the PMCALLOCATE op ultimately fails.
> >
> > Is `umask !=3D ri + 1` correct, or should this be reverted back to
> > `umask =3D=3D 0x3 && ri !=3D 2` or something else? I've no idea what th=
e
> > `umask` values represent in this context.
>
> I still believe this code is correct, at least as much as data in
> lib/libpmc/pmu-events/arch/x86 is consistent (look for
> "INST_RETIRED.ANY", that actually fails here after translation from
> "instructions").  The multiple calls you see is the result of hwpmc
> trying to find counter "matching" parameters.  For fixed counters the
> match is really just (umask =3D=3D ri + 1), since umask for fixed counter=
s
> in the Intel events data is just a counter number starting from 1.
>
> I've tried to test fixed-mode counters myself and got it failing also.
> But looking a bit deeper, I think the problem may be caused by this
> patch: https://reviews.freebsd.org/D35709 .  I think it tries to
> allocate the same fixed counter twice, where the first attempt succeeds,
> while the second fails, since the hardware is already busy.  I haven't
> looked close on the pmcstat code, but I suppose the patch was wrong, so
> if you or Andrew wish to fix it properly -- I'd be happy.
>

Hi,

I looked at the initial review where Andrew reported his issue:
https://reviews.freebsd.org/D35342 -- I think we need to take a closer
look at understanding how the original issue happened. I note:

Core was generated by `pmcstat -R out.pmclog_cpi01 -z 32 -G out.pmcstat_cpi=
01'.

With those flags, the patch in D35709 could not have actually fixed
the assertion. Note that the -R flag sets FLAG_READ_LOGFILE:

https://cgit.freebsd.org/src/tree/usr.sbin/pmcstat/pmcstat.c#n825

There's a sanity check a little later; if any `args.pa_events` are
specified (-p/-s/-P/-S), it throws an error:

https://cgit.freebsd.org/src/tree/usr.sbin/pmcstat/pmcstat.c#n950

Thus, we have nothing to iterate over, and the pmc_allocate() call at
https://cgit.freebsd.org/src/tree/usr.sbin/pmcstat/pmcstat.c#n1188 is
not actually reachable in any circumstance with -R specified.

I think 0aa150775179a4f683f should be backed out, but it's also
important to figure out what actually happened to induce those crashes
and why it has apparently disappeared in the interim.

Thanks,

Kyle Evans



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACNAnaH4ckGvKQkcLkdZGyHeQJszMWKA_6sYnsm92G9Mc3hePg>