From nobody Wed Aug 27 12:39:05 2025 X-Original-To: hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4cBkcN2xy3z66D4g for ; Wed, 27 Aug 2025 12:39:16 +0000 (UTC) (envelope-from aokblast@freebsd.org) Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R13" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4cBkcN2NbSz3hn7; Wed, 27 Aug 2025 12:39:16 +0000 (UTC) (envelope-from aokblast@freebsd.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1756298356; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1LRXK6oUobmXGPwXbzxaiMF3SC6cwYHDw+OTC2dCxi0=; b=Oj8rx3TAoktpU9TtdNLkI8497q8fEN5pgVFx5PQmFmUg6jzkTRgVkAaTkFX0p+y6GrZcUz LQuXoU2EWT10IgN1CUTb4nHJcFdErPcVkKmQngx7z9+6yNhgmOGRbFHODF++tSEHW5JncY yu1Nc/CnpZA5dcUYxYvd0cb3Jyka8m9OojStoqZ0A4qWzyES3sVIHduxoMFO3TBeLwvVYu hTsTb8ulSuxx2qTY5RCcDIvnw6tpBqM7Axdekls4KbyTAeU9a+sktD7emCD2AJc9Te90k4 LldO3EtoVu54McAYgZWmJ4urI9GpvtS3de4p8biAApascgpQRwmofgW3Nmd92g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1756298356; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1LRXK6oUobmXGPwXbzxaiMF3SC6cwYHDw+OTC2dCxi0=; b=e74Vl4CI8FmOz1PbUBN/dAFMXoKh/K7bC66dPrQWtkVu4aKxWWUeQzaMXHdEYVcn1L8V8C BUpZYakh7ZBU09OnMCi2ETLAZxhLD8zBpYc1dqanAazwxYryENPRj7cZYG8j7ATGLP3zc+ v24ze+4PAZBaJgLIMPrNhd1EDmBwJk4n/z4ipd4ZQ+0RjaJsUhceBBZ95PUDUZCXxyI7BB vZx0FWkFTOX+TtskFpyq/aIUmr+GAr+1v5E/IhLgc3oVSh5R2U6KZmABuz0lz+H9FO0938 ZG+evY8q/pDct/wnRv6SrTRZopJRHAEwrQMOGLtqUJaOwezh3z0DtM5Vmw3tag== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1756298356; a=rsa-sha256; cv=none; b=ahcqPPwMlG1g6TldJZfF2vQEQl+lAGrB4G3XHcqRimZ5dLWVo9yP1i85YDvsgpKiNvrs7x rQklTt7Gts5t5UC4iu3aZCfe1wvszk8WlKuNcFTjQCVc/djz2gxGXtSmucv021v+YFp4+w CUkKmCZIK5ow3rivoZsvP3Njk1t9mFFQmf1SVWSKSTSjyZQhaBMQBufIQScbhgp+3ymc1o 1IkSJFLs0T8XgcvH/aZ7MSo3bHgs92imOOA0uGGVSbCCuE12FEg7JsNS0sZD6LVWhbnezU IHImbGcVrMClkBF1dQYU0BGOLALfWlqLBHmGoD6SQ8/8djfq2h8wZc/xV6cMYw== ARC-Authentication-Results: i=1; mx1.freebsd.org; none Received: from aokblastdeMacBook-Pro.local (2001-b011-3808-3444-adeb-380f-b625-5c9e.dynamic-ip6.hinet.net [IPv6:2001:b011:3808:3444:adeb:380f:b625:5c9e]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (prime256v1) server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) (Authenticated sender: aokblast) by smtp.freebsd.org (Postfix) with ESMTPSA id 4cBkcL5Jfzz4TS; Wed, 27 Aug 2025 12:39:14 +0000 (UTC) (envelope-from aokblast@freebsd.org) From: ShengYi Hung To: Zhenlei Huang Cc: hackers@freebsd.org, John Baldwin Subject: Re: downgraded pcie link width In-Reply-To: <062CDEA5-69A0-49BA-A294-75DBBC3F5829@FreeBSD.org> (Zhenlei Huang's message of "Sat, 23 Aug 2025 00:46:20 +0800") References: <062CDEA5-69A0-49BA-A294-75DBBC3F5829@FreeBSD.org> User-Agent: mu4e 1.12.12; emacs 30.1 Date: Wed, 27 Aug 2025 20:39:05 +0800 Message-ID: List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable >From my understanding, PCIe training does not only occur during POST. After boot, the OS can instruct a PCI device to retrain, or the hardware itself can perform retraining (up-configuration, as you mentioned). To clarify, the final bandwidth of a PCI device is determined by two parameters: speed (Gen1 2.5=E2=80=AFGT/s, Gen2 5=E2=80=AFGT/s, Gen3 8=E2=80= =AFGT/s) and width (x1, x2, x4, or x8). Speed: This can be configured via the TLS[1] bit in the PCI capability, and the device can be allowed to retrain by setting the retrain bit. You have to trigger retraining by write retraining[2] bit after you overwrite the value. Width: This is determined automatically by the LTSSM TS2 training and cannot be directly configured. However, it is possible to disable up-configuration by clearing the HAWD[3] bit in LCTL. After doing so, issuing a retrain[2] allows the device to re-evaluate and set its width. There is a tool called setpci that can write to the PCI configuration space from userspace. On FreeBSD, this tool is available in sysutils/pciutils. [1]: https://edc.intel.com/content/www/it/it/design/publications/12th-generation= -core-processor-datasheet-volume-2-of-2/link-control-2-lctl2-offset-70_2/ [2]: https://edc.intel.com/content/www/it/it/design/publications/12th-generation= -core-processor-datasheet-volume-2-of-2/link-control-lctl-offset-50_2/ [3]: https://edc.intel.com/content/www/it/it/design/publications/12th-generation= -core-processor-datasheet-volume-2-of-2/link-control-lctl-offset-50_2/ Zhenlei Huang writes: > Hi, > > I'm recently hacking on the QLogic FastLinQ QL41212HLCU 25GbE adapter, an= d found something weird. > > It is a two SFP28 port card with PCIe 3.0 x8 link [1]. I connected the tw= o ports with DAC cable directly to do benchmark. > The weirdness is that no matter how much load I try to put into the card,= it can only reach to about 13Gbps. > I used iperf3 to do the benchmark. Also tried disabling TSO and LRO, enab= ling Jumbo MTU, but no luck. > > I checked the SFP module ( SFP28 DAC cable ) and ifconfig shows the link = is 25000G, > > ``` > # ifconfig -j1 -mv ql0 > ql0: flags=3D1008843 met= ric 0 mtu 1500 > options=3D8d00bb > capabilities=3D8d07bb > ether xx:xx:xx:xx:xx:xx > inet 172.16.1.1 netmask 0xffffff00 broadcast 172.16.1.255 > media: Ethernet autoselect (25GBase-CR ) > status: active > supported media: > media autoselect > media autoselect mediaopt full-duplex > media 25GBase-CR > media 25GBase-SR > nd6 options=3D29 > drivername: ql0 > plugged: SFP/SFP+/SFP28 25GBASE-CR CA-25G-S (Copper pigtail) > vendor: OEM PN: CAB-ZSP/ZSP-P2M SN: XXXXXXXXXXXXX DATE: 2025-07-04 > ``` > > and finally I observed something unusual from pciconf, > > ``` > # pciconf -lcv ql0 > ... > cap 10[70] =3D PCI-Express 2 endpoint max data 256(512) FLR NS > max read 4096 > link x2(x8) speed 8.0(8.0) ClockPM disabled > ``` > > That can also be verified by lspci from pciutils ports. > ``` > # lspci -s 08:00.0 -vv > ... > LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM not supported > ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ > LnkSta: Speed 8GT/s, Width x2 (downgraded) > TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- > ``` > > What I have tried, > > 1. Plugged the card into different mother board ( 3 different vendors, D= ell, HP, and Gigabyte ), and different PCIe slot ( x16 and x4 ). > 2. Upgraded the BIOS of mother board. > 3. Disabled ASPM in BIOS. > 4. Upgraded the firmware of card. > 5. Booted with Debian 13 live CD. > > Nothing has changed. The PCIe link width can only be negotiated to maximu= m of x2, with or without driver loaded, with / without load on the card. > It is also interesting that it can only be negotiated to x1 on Gigabyte m= otherboard, which has only one PCIe 2.0 x16 slot. > > After Googling I found some articles say that the PCIe link width is nego= tiated at the training stage, which is at POST before the driver loads. > They hint that downgraded link width is mostly caused by wrong BIOS confi= gure, or hardware issues such as scratched gold fingers. > I would almost give up and found the product brief [2], in which it decla= res `Supports PCIe upconfigure to reduce link width to conserve power`. > So interesting, maybe it is the firmware's fault that the firmware does n= ot **upconfigure** ( retraining ) on sufficient load ? > > Are your FastLinQ 41000 ethernet cards been rightly negotiated to x8 ? > > What can I do next ? > > CC John, I guess he is familiar with PCIe spec :) > > > [1] https://www.marvell.com/products/ethernet-adapters-and-controllers/41= 000-ethernet-adapters.html > [2] https://www.marvell.com/content/dam/marvell/en/public-collateral/ethe= rnet-adaptersandcontrollers/marvell-ethernet-adapters-fastlinq-41000-series= -product-brief.pdf > > Best regards, > Zhenlei --=20 Best Regards. ShengYi Hung.