Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 11 Jun 2018 20:28:44 +0200
From:      Harry Schmalzbauer <freebsd@omnilan.de>
To:        Scott Long <scottl@samsco.org>
Cc:        scsi@freebsd.org
Subject:   =?UTF-8?Q?Re:_What_is_ENXIO_=e2=80=93_MSI_allocation_regression_in_?= =?UTF-8?Q?:[Was_Re:_svn_commit:_r321714_-_in_head/sys/dev:_mpr_mps]?=
Message-ID:  <6e1e5f9f-4ece-dc9d-b059-08d52c9e6965@omnilan.de>
In-Reply-To: <A2AECF9F-2BB1-4FC0-8330-658336A3A4F0@samsco.org>
References:  <201707300653.v6U6rwLN099096@repo.freebsd.org> <597DA578.6030101@omnilan.de> <597F56A8.1060603@omnilan.de> <D18DFAD4-6E93-4AE2-BE15-EFF4D8ABCB2A@samsco.org> <59804C8C.1020003@omnilan.de> <e7d94e6a-89e8-ffa1-40da-7fb67e6bfc2b@omnilan.de> <78611650-D7A4-4B1D-A254-DB058E1AC1C6@samsco.org> <d99e383d-b09a-f3bd-f1e2-a6a808016347@omnilan.de> <A2AECF9F-2BB1-4FC0-8330-658336A3A4F0@samsco.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Am 05.06.2018 um 19:54 schrieb Scott Long:
…
>>>> Late in the 11.2 phase, I identified this commit as a regression for MSI (non-x) alloctaion.
>>>> I have an idea what probably causes the problem here (INTx allocation, although MSI (and MSI-x) capability):
>>>> disable_msix is not 0 (I need to disable MSI-x because of ESXi-passthru…).
>>>>
>>>> Corresponding lines:
>>>> {
>>>>          device_t dev;
>>>>          int error, msgs;
>>>>
>>>>          dev = sc->mps_dev;
>>>>          error = 0;
>>>>          msgs = 0;
>>>>
>>>>          if ((sc->disable_msix == 0) &&
>>>>              ((msgs = pci_msix_count(dev)) >= MPS_MSI_COUNT))
>>>>                  error = mps_alloc_msix(sc, MPS_MSI_COUNT);
>>>>          if ((error != 0) && (sc->disable_msi == 0) &&
>>>>              ((msgs = pci_msi_count(dev)) >= MPS_MSI_COUNT))
>>>>                  error = mps_alloc_msi(sc, MPS_MSI_COUNT);
>>>>          if (error != 0)
>>>>                  msgs = 0;
>>>>
>>>>          sc->msi_msgs = msgs;
>>>>          return (error);
>>>> }
>>>>
>>>> Before r321714, error was assigned ENXIO, which, if != 0, could help make me understand the problem.
>>>> Unfortunately I have no idea what ENXIO means, where it's defined and most important, how to find the place where the declaration/definition happens.  Only joe and vi available here, any hints highly appreciated.
>>>>
>>>> I can confirm that MSI allocation works with mps.ko_21.02.00.00-fbsd-r321415 with my ESXi-passthru-non_msi-x setup.
>>>> Although the dirver emits no message that an MSI was allocated, like toher drivers do.  That's a cosmetic one though.
>>>> But the MSI->INTx regression is a severe one for me, which I'd like to fix myself but I'm missing so many fundamental skills :-(
>>>>
>>> Hi Harry,
>>> You are correct about the bug.  Please change the line at the top of the function that reads
>>> error = 0;
>>> to
>>> error = ENXIO;
>>> Let me know if that fixes the MSI problem for you.

…
>> BTW, does anybody have a link where I can get info about ENXIO?
> 
> ENXIO means that the device is not available.  I use it in the driver to signal when the hardware cannot be accessed.  The manual page for error codes is “man errno"

Oic, there's a man page :-)

Haven't had time to look into it, but since you confirmed that ENXIO!=0, 
I simply changed that and now mps(4) allocates MSI again in my setup.
For completeness my diff:
Index: src/sys/dev/mps/mps_pci.c
===================================================================
--- sys/dev/mps/mps_pci.c   (Revision 334948)
+++ sys/dev/mps/mps_pci.c   (Arbeitskopie)
@@ -244,7 +244,7 @@
         int error, msgs;

         dev = sc->mps_dev;
-       error = 0;
+       error = ENXIO;
         msgs = 0;

         if ((sc->disable_msix == 0) &&

Unfortunately my other real problem persists – iSCSI sessions lock up 
the machine (11.2-RC2).  No deadlock, since it will recover within some 
minutes, but otherwise a complete lock until iSCSI sessions time out, 
since no single ethernet/ip packet get's processed.

Unfortunately I'm very short on testing resources here and don't know 
how to trace ctld/whatelse to find the lock-circle.
So far I can only tell that it happens only with Server2016 iSCSI 
connections (using 4k block size).

Will open a different thread/PR as soon as I found out anything…

Thanks,

-harry

P.S.: I guess it's far too late to get that into 11.2?



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6e1e5f9f-4ece-dc9d-b059-08d52c9e6965>