From owner-freebsd-net@FreeBSD.ORG Wed Oct 3 10:42:33 2007 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1E2DC16A417 for ; Wed, 3 Oct 2007 10:42:33 +0000 (UTC) (envelope-from wawa@yandex-team.ru) Received: from cmail.yandex.ru (cmail.yandex.ru [213.180.193.1]) by mx1.freebsd.org (Postfix) with ESMTP id 4A97913C4A3 for ; Wed, 3 Oct 2007 10:42:31 +0000 (UTC) (envelope-from wawa@yandex-team.ru) Received: from [87.250.250.1] (wawa.yandex.ru [87.250.250.1]) by cmail.yandex.ru (8.14.1/8.14.1) with ESMTP id l93AgTLX013060; Wed, 3 Oct 2007 14:42:29 +0400 (MSD) (envelope-from wawa@yandex-team.ru) Message-ID: <47037246.2070400@yandex-team.ru> Date: Wed, 03 Oct 2007 14:43:18 +0400 From: Vladimir Ivanov Organization: Yandex LLC User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.13pre) Gecko/20070505 Iceape/1.0.9 (Debian-1.0.10~pre070720-0etch3+lenny1) MIME-Version: 1.0 To: Bruce Evans References: <46B07931.3080300@yandex-team.ru> <2a41acea0708010923m7b21095ajc2ee84c37e0d5354@mail.gmail.com> <470280F6.9070009@yandex-team.ru> <20071003111737.U14276@delplex.bde.org> In-Reply-To: <20071003111737.U14276@delplex.bde.org> Content-Type: multipart/signed; protocol="application/x-pkcs7-signature"; micalg=sha1; boundary="------------ms030100060907060803030002" Cc: "freebsd-net@freebsd.org" , Jack Vogel Subject: Re: SMPable version of EM driver X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Oct 2007 10:42:33 -0000 This is a cryptographically signed message in MIME format. --------------ms030100060907060803030002 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Bruce Evans wrote: > On Tue, 2 Oct 2007, Vladimir Ivanov wrote: > >> Main improvement of this version: driver does not use TX interrupts >> at all. So, interrupt rate reduced significantly. > > Polling for anything is a bug IMO. Buggy hardware may work better > with it, > but em is not buggy :-). The driver does not use polling. We've disabled TX interrupts because we seem interrupt hook is too strange and ineffective place to make watchdog calculations and queue cleaning. We can do it from application context much easier. RX queue procedure uses another technique. We send wakeup message to RX kernel threads and mask RX interrupts. Each RX thread parses RX queue while it isn't empty. After completion RX kernel thread unmask interrupt. This hint let us avoid both RX interrupt storm and additional latency (due to admin's throttling). RX interrupt is being masked if and only if there are no threads to handle interrupt. Also, the driver behave itself like polling mode under heavy load. But the major benefit of our patchset is SMP. > For bge, I tune the interrupt moderation parameters to reduce the tx > interrupt rate to almost as low as possible without doing polling. > The rate is either 1 interrupt per second if the tx is almost inactive > or 1 interrupt every 384 packets if the tx is active. -current mistunes > these parameters to 150 (microseconds) and 10 (descriptos). Old tuning > of 150 and 128 only loses a little compared with 1000000 and 384. (150 > gives 6667 interrupts per second under load. This interrupt rate is > quite manageable and is about the same rate as you have to use with > polling to get the same throughput but lower efficiency as with > interrupts. 128 for the descriptor limit causes in a max interrupt rate > of only a few hundred per second except with tiny packets, but 10 is > excessively small and requires a rate of up to 140000 per second to keep > up with tiny packets. 140000 isn't manageable.) > > em has more/better interrupt parameters with non-broken defaults so I > haven't > needed to tune them. For bge, I implement dynamic rx interrupt > moderation > in software where em has it in hardware. 10000 interrupts/second for rx > is a good limit. IIRC, em uses 8000 which is a bit low for a max, and > is missing a sysctl for easy tuning. I've spent a lot of time for em tuning. This way has a limit. :-) > > Bruce Regards, PS: we have published newest 1.16 revision. Just small tuning fix. -- Vladimir Ivanov Network Operations Center OOO "Yandex" t: +7 495 739-7000 f: +7 495 739-7070 @: noc@yandex.net (corporate) wawa@yandex-team.ru (personal) www: www.yandex.ru -- --------------ms030100060907060803030002 Content-Type: application/x-pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIGJzCC AuAwggJJoAMCAQICEA2B08GbcpEEl6Da/kpOht8wDQYJKoZIhvcNAQEFBQAwYjELMAkGA1UE BhMCWkExJTAjBgNVBAoTHFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAqBgNVBAMT I1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFpbCBJc3N1aW5nIENBMB4XDTA3MDcwNDE1MTM0NVoX DTA4MDcwMzE1MTM0NVowRTEfMB0GA1UEAxMWVGhhd3RlIEZyZWVtYWlsIE1lbWJlcjEiMCAG CSqGSIb3DQEJARYTd2F3YUB5YW5kZXgtdGVhbS5ydTCCASIwDQYJKoZIhvcNAQEBBQADggEP ADCCAQoCggEBANuooNgTWqT0D35N7rdbZAAje8iyZcELUHy3Dgh6Pymm+s7RIeP8EoxTnn1o YQMFkZdthNT/j+MXl61O0zBshti+34/9m0rQzntCHDboJf9yTeA0bOqL43EdnEMlUWTEaf00 dcOySQ3fpTKiiQKqFASI1MUPDCfQQuu6ansTCpddG8fOu+zaE570aH6hoy/NRGhH8SCbcARY QxjjiddCUknclX2gz4ak+wVB4IapHNSdtRG3APj5GZY9VK7sAwjOqodcNwbQEG/Gj6j99fU3 7GYAL+x3bz9wve9YGEJ7TUPLpd582tZtiiakqurnluId4Ix1B/HSyAZnPAr5WYJZrwcCAwEA AaMwMC4wHgYDVR0RBBcwFYETd2F3YUB5YW5kZXgtdGVhbS5ydTAMBgNVHRMBAf8EAjAAMA0G CSqGSIb3DQEBBQUAA4GBABzUVmJvH3Cr++WFtTFVewG2cLZo3geMNRuT+wIPULXt59LPuSg7 ZnK04wXNC2Am5UKilWxvDS6gs6pW2ZIDHw8YttQzej7z7+Scujr9uyfxMcTxHfk826UAdadz eKYGHEvb41wokW/lZR6fMLqRzfjHLDTZM46GiXQFVSMtqCT0MIIDPzCCAqigAwIBAgIBDTAN BgkqhkiG9w0BAQUFADCB0TELMAkGA1UEBhMCWkExFTATBgNVBAgTDFdlc3Rlcm4gQ2FwZTES MBAGA1UEBxMJQ2FwZSBUb3duMRowGAYDVQQKExFUaGF3dGUgQ29uc3VsdGluZzEoMCYGA1UE CxMfQ2VydGlmaWNhdGlvbiBTZXJ2aWNlcyBEaXZpc2lvbjEkMCIGA1UEAxMbVGhhd3RlIFBl cnNvbmFsIEZyZWVtYWlsIENBMSswKQYJKoZIhvcNAQkBFhxwZXJzb25hbC1mcmVlbWFpbEB0 aGF3dGUuY29tMB4XDTAzMDcxNzAwMDAwMFoXDTEzMDcxNjIzNTk1OVowYjELMAkGA1UEBhMC WkExJTAjBgNVBAoTHFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAqBgNVBAMTI1Ro YXd0ZSBQZXJzb25hbCBGcmVlbWFpbCBJc3N1aW5nIENBMIGfMA0GCSqGSIb3DQEBAQUAA4GN ADCBiQKBgQDEpjxVc1X7TrnKmVoeaMB1BHCd3+n/ox7svc31W/Iadr1/DDph8r9RzgHU5VAK MNcCY1osiRVwjt3J8CuFWqo/cVbLrzwLB+fxH5E2JCoTzyvV84J3PQO+K/67GD4Hv0CAAmTX p6a7n2XRxSpUhQ9IBH+nttE8YQRAHmQZcmC3+wIDAQABo4GUMIGRMBIGA1UdEwEB/wQIMAYB Af8CAQAwQwYDVR0fBDwwOjA4oDagNIYyaHR0cDovL2NybC50aGF3dGUuY29tL1RoYXd0ZVBl cnNvbmFsRnJlZW1haWxDQS5jcmwwCwYDVR0PBAQDAgEGMCkGA1UdEQQiMCCkHjAcMRowGAYD VQQDExFQcml2YXRlTGFiZWwyLTEzODANBgkqhkiG9w0BAQUFAAOBgQBIjNFQg+oLLswNo2as Zw9/r6y+whehQ5aUnX9MIbj4Nh+qLZ82L8D0HFAgk3A8/a3hYWLD2ToZfoSxmRsAxRoLgnSe JVCUYsfbJ3FXJY3dqZw5jowgT2Vfldr394fWxghOrvbqNOUQGls1TXfjViF4gtwhGTXeJLHT HUb/XV9lTzGCAlEwggJNAgEBMHYwYjELMAkGA1UEBhMCWkExJTAjBgNVBAoTHFRoYXd0ZSBD b25zdWx0aW5nIChQdHkpIEx0ZC4xLDAqBgNVBAMTI1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFp bCBJc3N1aW5nIENBAhANgdPBm3KRBJeg2v5KTobfMAkGBSsOAwIaBQCggbEwGAYJKoZIhvcN AQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG9w0BCQUxDxcNMDcxMDAzMTA0MzE4WjAjBgkqhkiG 9w0BCQQxFgQUcj54UL6kKkVwOZPKIGh7XfJTXRowUgYJKoZIhvcNAQkPMUUwQzAKBggqhkiG 9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYIKoZIhvcN AwICASgwDQYJKoZIhvcNAQEBBQAEggEAYHfpVrYQkH4vZVr6s5c/aGhCzlR1Z4EjDvYO9lzU 8MC7mSSzWB2bVjdJYsRS9bqP9K2eguSc6WfmibA8xBKJ9EY5MqzzeL1HK7iH932eaybtLwJS 7Mkl2smTVlzma2va0vdTrEdm18wRFZSkpRC/NI0ai9Tuq0YjPm0KX2e70ewaJpylt8aReZdM zkPXIYqTAAZcQj20QSSYZSNQDsWbPX+RO8zrO8TJHFeXamLlvnnlxw0dr8Uwipb/v1XZP1Ij kozx+0TLnBK0rSgXEQQr5ZVR9xbC1KfQUD7aDl2A7ayqNy3cwpNh+v0XpR3+F/q8LT7IZGOw Sq1CuAXctXUM6QAAAAAAAA== --------------ms030100060907060803030002--