Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 6 May 2014 17:40:38 -0400
From:      Ryan Stone <rysto32@gmail.com>
To:        "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject:   RFC: PCI SR-IOV Driver interface
Message-ID:  <CAFMmRNyDpLuxqJVC%2Bwdm856E0Abx4XrOZyR9iB7g2dvDeX4BMQ@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
PCI Single Root I/O Virtualization (SR-IOV) is an optional part of the
PCIe standard that provides hardware acceleration for the
virtualization of PCIe devices. When SR-IOV is in use, a function in a
PCI device (known as a Physical Function, or PF) will present multiple
Virtual PCI Functions (VF) on the PCI bus. These VFs are fully
independent PCI devices that have access to the resources of the PF.
For example, on a network interface card, VFs could transmit and
receive packets independent of the PF.

I've been working on FreeBSD support for SR-IOV.  Because the
capabilities of the VFs are very much dependent on the hardware, the
PF driver (which is just a normal PCI driver) has to do a lot of the
work in configuring the VFs.  The SR-IOV infrastructure needs to
accept configuration requests, do the work to create the VFs
(including creating the PCI layer device_t objects in the kernel), and
then hand things off to the PF and VF drivers to for them to do the
device-specific configuration.

One of my goals in this project is to have a single unified tool for
configuring SR-IOV.  This quite complicated because of the various
capabilities that different PCI devices can have.  I don't want driver
maintainers to have to extend the IOV infrastructure or the
configuration tool to add support for new capabilities in new
hardware, so the tool needs to flexible in the type of configuration
that it accepts.  On the other hand, I don't want to burden driver
writers with the need to do complicated parsing of configuration in
the driver itself.

The approach that I've taken has been to port pjd@'s nv(3) interface
into the kernel.  I have a functional prototype of this now but it's
not production ready yet.  The driver will implement a method that
returns a schema defining the names and types of the configuration
parameters that it accepts.  Parameters can be flagged as required, in
which case the infrastructure will reject the configuration if it does
not contain the parameter.  Alternatively, the schema can define a
default value for the parameter, in which case the infrastructure will
add the parameter to the configuration if the user did not specify a
value for it.  The SR-IOV infrastructure will also validate that every
parameter had the correct type as specified in the schema, as well as
rejecting any configuration that contains a parameter that is not
defined in the schema at all.  The goal is, as much as possible, to
free the driver writers from having to do complicated verification of
the configuration.

That was a big wall of text, so let's look at what the interface
actually looks like.


int pci_setup_iov(device_t dev);

This function should be called by a PF driver during device_attach()
to register itself as an SR-IOV-capable driver (perhaps
pci_attach_iov() would be a better name).  An error from it probably
shouldn't be treated as fatal, but it does mean the SR-IOV
functionality won't be available.

int pci_cleanup_iov(device_t dev);

This function should be called during device_detach().  It's safe to
call if pci_setup_iov() failed.  If it fails, the detach must be
aborted (the most likely cause is active VFs that must be destroyed
before the PF driver can detach).



PF drivers implement the following method to advertise their
configuration schema:

METHOD void get_iov_config_schema {
        device_t        dev;
        nvlist_t        *pf_schema;
        nvlist_t        *vf_schema;
}

The use of the nvlist_t in the interface is somewhat unfortunate.  The
problem is that now every driver that includes "pci_if.h" needs to
have the typedef nvlist_t defined (and I *really* don't want to modify
every PCI driver in the tree...).  I have a somewhat hacky workaround
for the problem in my tree right now but I thought that I would
highlight the issue in case people had opinions on the issue.

There are separate schemas for the PF and VF.  The drivers are not
expected to manipulate the nvlists directly.  Instead the
infrastructure provides some functions for defining the schema:

#define    IOV_SCHEMA_HASDEFAULT    (1 << 0)
#define    IOV_SCHEMA_REQUIRED    (1 << 1)

void    pci_iov_schema_add_binary(nvlist_t *schema, const char *name,
        const char *type, uint32_t flags, uint8_t * defaultVal,
        size_t len);
void    pci_iov_schema_add_bool(nvlist_t *schema, const char *name,
        uint32_t flags,  int defaultVal);
void    pci_iov_schema_add_string(nvlist_t *schema, const char *name,
        uint32_t flags, const char *defaultVal);
void    pci_iov_schema_add_uint8(nvlist_t *schema, const char *name,
        uint32_t flags, uint8_t defaultVal);
void    pci_iov_schema_add_uint16(nvlist_t *schema, const char *name,
        uint32_t flags, uint16_t defaultVal);
void    pci_iov_schema_add_uint32(nvlist_t *schema, const char *name,
        uint32_t flags, uint32_t defaultVal);
void    pci_iov_schema_add_uint64(nvlist_t *schema, const char *name,
        uint32_t flags, uint64_t defaultVal);


A sample usage of these functions (from the ixgbe PF driver that I
have been working on):

static void
ixgbe_get_iov_schema(device_t dev, nvlist_t *pf, nvlist_t *vf)
{
    uint8_t null_mac[ETHER_ADDR_LEN] = {0, 0, 0, 0, 0, 0};

    pci_iov_schema_add_binary(vf, "mac-addr", "mac-addr",
        IOV_SCHEMA_HASDEFAULT, null_mac, sizeof(null_mac));
    pci_iov_schema_add_uint16(vf, "vlan", 0, 0);
    pci_iov_schema_add_bool(vf, "spoof-check", IOV_SCHEMA_HASDEFAULT, 1);
    pci_iov_schema_add_bool(vf, "allow-set-mac", IOV_SCHEMA_HASDEFAULT, 0);
}


This says that:
- the VF accepts a parameter called "mac-addr" using the "mac-addr"
type.  The default value of this is 6 0 bytes (00:00:00:00:00:00)
- The VF accepts an optional uint16_t parameter parameter called vlan.
 I chose not to set a default value because a VF could be configured
to sent untagged traffic.
- The VF accepts a boolean parameter called spoof-check, with defaults
to true.
- The VF accepts a boolean parameter called allow-set-mac, with
defaults to false.  I have a default value here because the VF has
either be permitted to set the mac or not, and it's better to
explicitly document the default values in the schema rather than
encode them implicitly in the driver (the schema will be viewable
using the userland configuration tool)

- The PF doesn't have any configuration parameters in this driver


METHOD int init_iov {
        device_t        dev;
        int             num_vfs;
        const nvlist_t  *config;
};

This method is called by the SR-IOV infrastructure when a request to
enable SR-IOV has been received.  This is called before SR-IOV is
actually enabled in the hardware.  The driver should use this method
to do global (non per-VF) configuration of the PF.  The config nvlist
contains configuration parameters from the PF schema.  If this method
returns an error, the request is aborted.

METHOD int uninit_iov {
        device_t        dev;
};

This method is called when SR-IOV is disabled, or if an enable request
fails after init_iov() was called and returned without an error.

METHOD int add_vf {
        device_t        dev;
        int             vfnum;
        const nvlist_t  *config;
};

This method is called once per VF, after SR-IOV has been enabled but
before the VFs have been probed.  The driver should set any
VF-specific configuration in the PF at this time.  The config nvlist
contains parameters specified in the VF schema.  For example, from the
ixgbe schema example that I gave above the driver could call
nvlist_get_bool(config, "allow-set-mac") to get the value of the
"allow-set-mac" parameter (and this call would be guaranteed to
succeed because a default value was set).  On the other hand the
driver would have to test for the presence of the "vlan" parameter as
it was neither set as a required parameter nor was it given a default
value.  However if the parameter is present, the driver may assume
that it's a


If you want an example of what a PF driver using this interface might
look like, you can check out my (in-progress) ixgbe PF driver on
github:

https://github.com/rysto32/freebsd/blob/38518e85c1e50254c78cfa9e0cc9cd1a7d8b10cf/sys/dev/ixgbe/ixgbe.c
(Note: As I start preparing this work for review I will be rebasing
and editing history fairly extensively.  Clone my private branches at
your own risk. :) )

The majority of the changes to ixgbe.c have to do with configuring the
hardware and not dealing with the SR-IOV infrastructure, so I hope
that's an indication that I'm on the right path.

At this point I'm not ready for the code to be reviewed (although any
comments would be welcome).  At this stage I'm looking for comments on
the design and the interface before I become fully committed to this
path.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFMmRNyDpLuxqJVC%2Bwdm856E0Abx4XrOZyR9iB7g2dvDeX4BMQ>