Date: Tue, 6 May 2014 17:40:38 -0400 From: Ryan Stone <rysto32@gmail.com> To: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org> Subject: RFC: PCI SR-IOV Driver interface Message-ID: <CAFMmRNyDpLuxqJVC%2Bwdm856E0Abx4XrOZyR9iB7g2dvDeX4BMQ@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
PCI Single Root I/O Virtualization (SR-IOV) is an optional part of the PCIe standard that provides hardware acceleration for the virtualization of PCIe devices. When SR-IOV is in use, a function in a PCI device (known as a Physical Function, or PF) will present multiple Virtual PCI Functions (VF) on the PCI bus. These VFs are fully independent PCI devices that have access to the resources of the PF. For example, on a network interface card, VFs could transmit and receive packets independent of the PF. I've been working on FreeBSD support for SR-IOV. Because the capabilities of the VFs are very much dependent on the hardware, the PF driver (which is just a normal PCI driver) has to do a lot of the work in configuring the VFs. The SR-IOV infrastructure needs to accept configuration requests, do the work to create the VFs (including creating the PCI layer device_t objects in the kernel), and then hand things off to the PF and VF drivers to for them to do the device-specific configuration. One of my goals in this project is to have a single unified tool for configuring SR-IOV. This quite complicated because of the various capabilities that different PCI devices can have. I don't want driver maintainers to have to extend the IOV infrastructure or the configuration tool to add support for new capabilities in new hardware, so the tool needs to flexible in the type of configuration that it accepts. On the other hand, I don't want to burden driver writers with the need to do complicated parsing of configuration in the driver itself. The approach that I've taken has been to port pjd@'s nv(3) interface into the kernel. I have a functional prototype of this now but it's not production ready yet. The driver will implement a method that returns a schema defining the names and types of the configuration parameters that it accepts. Parameters can be flagged as required, in which case the infrastructure will reject the configuration if it does not contain the parameter. Alternatively, the schema can define a default value for the parameter, in which case the infrastructure will add the parameter to the configuration if the user did not specify a value for it. The SR-IOV infrastructure will also validate that every parameter had the correct type as specified in the schema, as well as rejecting any configuration that contains a parameter that is not defined in the schema at all. The goal is, as much as possible, to free the driver writers from having to do complicated verification of the configuration. That was a big wall of text, so let's look at what the interface actually looks like. int pci_setup_iov(device_t dev); This function should be called by a PF driver during device_attach() to register itself as an SR-IOV-capable driver (perhaps pci_attach_iov() would be a better name). An error from it probably shouldn't be treated as fatal, but it does mean the SR-IOV functionality won't be available. int pci_cleanup_iov(device_t dev); This function should be called during device_detach(). It's safe to call if pci_setup_iov() failed. If it fails, the detach must be aborted (the most likely cause is active VFs that must be destroyed before the PF driver can detach). PF drivers implement the following method to advertise their configuration schema: METHOD void get_iov_config_schema { device_t dev; nvlist_t *pf_schema; nvlist_t *vf_schema; } The use of the nvlist_t in the interface is somewhat unfortunate. The problem is that now every driver that includes "pci_if.h" needs to have the typedef nvlist_t defined (and I *really* don't want to modify every PCI driver in the tree...). I have a somewhat hacky workaround for the problem in my tree right now but I thought that I would highlight the issue in case people had opinions on the issue. There are separate schemas for the PF and VF. The drivers are not expected to manipulate the nvlists directly. Instead the infrastructure provides some functions for defining the schema: #define IOV_SCHEMA_HASDEFAULT (1 << 0) #define IOV_SCHEMA_REQUIRED (1 << 1) void pci_iov_schema_add_binary(nvlist_t *schema, const char *name, const char *type, uint32_t flags, uint8_t * defaultVal, size_t len); void pci_iov_schema_add_bool(nvlist_t *schema, const char *name, uint32_t flags, int defaultVal); void pci_iov_schema_add_string(nvlist_t *schema, const char *name, uint32_t flags, const char *defaultVal); void pci_iov_schema_add_uint8(nvlist_t *schema, const char *name, uint32_t flags, uint8_t defaultVal); void pci_iov_schema_add_uint16(nvlist_t *schema, const char *name, uint32_t flags, uint16_t defaultVal); void pci_iov_schema_add_uint32(nvlist_t *schema, const char *name, uint32_t flags, uint32_t defaultVal); void pci_iov_schema_add_uint64(nvlist_t *schema, const char *name, uint32_t flags, uint64_t defaultVal); A sample usage of these functions (from the ixgbe PF driver that I have been working on): static void ixgbe_get_iov_schema(device_t dev, nvlist_t *pf, nvlist_t *vf) { uint8_t null_mac[ETHER_ADDR_LEN] = {0, 0, 0, 0, 0, 0}; pci_iov_schema_add_binary(vf, "mac-addr", "mac-addr", IOV_SCHEMA_HASDEFAULT, null_mac, sizeof(null_mac)); pci_iov_schema_add_uint16(vf, "vlan", 0, 0); pci_iov_schema_add_bool(vf, "spoof-check", IOV_SCHEMA_HASDEFAULT, 1); pci_iov_schema_add_bool(vf, "allow-set-mac", IOV_SCHEMA_HASDEFAULT, 0); } This says that: - the VF accepts a parameter called "mac-addr" using the "mac-addr" type. The default value of this is 6 0 bytes (00:00:00:00:00:00) - The VF accepts an optional uint16_t parameter parameter called vlan. I chose not to set a default value because a VF could be configured to sent untagged traffic. - The VF accepts a boolean parameter called spoof-check, with defaults to true. - The VF accepts a boolean parameter called allow-set-mac, with defaults to false. I have a default value here because the VF has either be permitted to set the mac or not, and it's better to explicitly document the default values in the schema rather than encode them implicitly in the driver (the schema will be viewable using the userland configuration tool) - The PF doesn't have any configuration parameters in this driver METHOD int init_iov { device_t dev; int num_vfs; const nvlist_t *config; }; This method is called by the SR-IOV infrastructure when a request to enable SR-IOV has been received. This is called before SR-IOV is actually enabled in the hardware. The driver should use this method to do global (non per-VF) configuration of the PF. The config nvlist contains configuration parameters from the PF schema. If this method returns an error, the request is aborted. METHOD int uninit_iov { device_t dev; }; This method is called when SR-IOV is disabled, or if an enable request fails after init_iov() was called and returned without an error. METHOD int add_vf { device_t dev; int vfnum; const nvlist_t *config; }; This method is called once per VF, after SR-IOV has been enabled but before the VFs have been probed. The driver should set any VF-specific configuration in the PF at this time. The config nvlist contains parameters specified in the VF schema. For example, from the ixgbe schema example that I gave above the driver could call nvlist_get_bool(config, "allow-set-mac") to get the value of the "allow-set-mac" parameter (and this call would be guaranteed to succeed because a default value was set). On the other hand the driver would have to test for the presence of the "vlan" parameter as it was neither set as a required parameter nor was it given a default value. However if the parameter is present, the driver may assume that it's a If you want an example of what a PF driver using this interface might look like, you can check out my (in-progress) ixgbe PF driver on github: https://github.com/rysto32/freebsd/blob/38518e85c1e50254c78cfa9e0cc9cd1a7d8b10cf/sys/dev/ixgbe/ixgbe.c (Note: As I start preparing this work for review I will be rebasing and editing history fairly extensively. Clone my private branches at your own risk. :) ) The majority of the changes to ixgbe.c have to do with configuring the hardware and not dealing with the SR-IOV infrastructure, so I hope that's an indication that I'm on the right path. At this point I'm not ready for the code to be reviewed (although any comments would be welcome). At this stage I'm looking for comments on the design and the interface before I become fully committed to this path.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFMmRNyDpLuxqJVC%2Bwdm856E0Abx4XrOZyR9iB7g2dvDeX4BMQ>