From owner-freebsd-net@freebsd.org Fri Feb 14 20:00:31 2020 Return-Path: Delivered-To: freebsd-net@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 2D9B7240B31 for ; Fri, 14 Feb 2020 20:00:31 +0000 (UTC) (envelope-from crapsh@monkeybrains.net) Received: from mail.monkeybrains.net (mail.monkeybrains.net [208.69.40.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.monkeybrains.net", Issuer "AlphaSSL CA - SHA256 - G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 48K44F4QP3z4BRP for ; Fri, 14 Feb 2020 20:00:29 +0000 (UTC) (envelope-from crapsh@monkeybrains.net) Received: from [10.2.86.111] (public.monkeybrains.net [208.69.41.107] (may be forged)) (authenticated bits=0) by mail.monkeybrains.net (8.15.2/8.15.2) with ESMTPSA id 01EK0PL8056491 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO) for ; Fri, 14 Feb 2020 12:00:27 -0800 (PST) (envelope-from crapsh@monkeybrains.net) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=monkeybrains.net; s=dkim; t=1581710427; bh=6JErkxfmS6+m+a4QW2dhL5Cj1e+Xa7HHqxkxMb8M58Y=; h=Subject:To:References:From:Date:In-Reply-To; b=fXrgQMlY1jjZZpQMA2v99W+h6M12tbxhZEnbvUSN16EUIgr5aV9wpiOYbMUDPd71r sifXizVVT0+CR4RkQET+KwZOiO2Oa5usOq3il8lhq3e8YzlCSuL3PCmHjAO3q5LGet QOWFquH/72025dL6lOpRGX2Bs6mTBBV1cjwHvuGM= X-Authentication-Warning: mail.monkeybrains.net: Host public.monkeybrains.net [208.69.41.107] (may be forged) claimed to be [10.2.86.111] Subject: Re: Issue with BGP router / high interrupt / Chelsio / FreeBSD 12.1 To: freebsd-net@freebsd.org References: <1aa78c6e-e640-623c-73d3-473df132eb72@monkeybrains.net> <428f3cdf-9035-90a7-14f8-f294c2131682@monkeybrains.net> From: BulkMailForRudy Message-ID: <9e1c0666-3dea-f946-24d4-e2dea48b30af@monkeybrains.net> Date: Fri, 14 Feb 2020 12:00:25 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Virus-Scanned: clamav-milter 0.101.4 at mail.monkeybrains.net X-Virus-Status: Clean X-Rspamd-Queue-Id: 48K44F4QP3z4BRP X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org; dkim=pass header.d=monkeybrains.net header.s=dkim header.b=fXrgQMlY; dmarc=pass (policy=none) header.from=monkeybrains.net; spf=pass (mx1.freebsd.org: domain of crapsh@monkeybrains.net designates 208.69.40.19 as permitted sender) smtp.mailfrom=crapsh@monkeybrains.net X-Spamd-Result: default: False [-6.79 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_DKIM_ALLOW(-0.20)[monkeybrains.net:s=dkim]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; DWL_DNSWL_NONE(0.00)[monkeybrains.net.dwl.dnswl.org : 127.0.5.0]; R_SPF_ALLOW(-0.20)[+ptr]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; HAS_XAW(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-net@freebsd.org]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; TO_DN_NONE(0.00)[]; DKIM_TRACE(0.00)[monkeybrains.net:+]; DMARC_POLICY_ALLOW(-0.50)[monkeybrains.net,none]; IP_SCORE(-3.69)[ip: (-9.69), ipnet: 208.69.40.0/22(-4.80), asn: 32329(-3.90), country: US(-0.05)]; RCVD_IN_DNSWL_LOW(-0.10)[19.40.69.208.list.dnswl.org : 127.0.5.1]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:32329, ipnet:208.69.40.0/22, country:US]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Feb 2020 20:00:31 -0000 On 2/14/20 10:00 AM, Olivier Cochard-Labbé wrote: > On Fri, Feb 14, 2020 at 6:25 PM Rudy wrote: > >> On 2/12/20 7:21 PM, Rudy wrote: >> > I'm having issues with a box that is acting as a BGP router for my >> network. 3 Chelsio cards, two T5 and one T6. It was working great >> until I turned up our first port on the T6. It seems like traffic >> passing in from a T5 card and out the T6 causes a really high load (and >> high interrupts). >> >> >> Looking better! I made some changes based on BSDRP which I hadn't known >> about -- I think ifqmaxlen was the tunable I overlooked. >> >> # >> >> https://github.com/ocochard/BSDRP/blob/master/BSDRP/Files/boot/loader.conf.local >> net.link.ifqmaxlen="16384" >> >> > This net.link.ifqmaxlen was set to help in case of lagg usage: I was not > aware it could improve your use case. oThanks for the feedback.  Maybe it was a coincidence.  Load has creep back up to 15. > From your first post, it looks like your setup is a 2 packages, 10 cores, > 20 threads (disabled). > And you have configured your Chelsio to use 16 queues (hw.cxgbe.Xrx=16): > It's a good think to have a power of 2 number of queues with Chelsio, but > I'm not sure it's a good idea to spread those queue across the 2 packages. > So perhaps you should try: > 1. To reduce queues to 8 queues and bind them to the local domain > 2. Or keeping 16 queues, but re-enabling HyperThreading and bing them to > the local domain too. (on -head with recent CPU > and machdep.hyperthreading_intr_allowed, using hyper-threading improve > forwarding performance). > > But anyway even with 16 queues spread over 2 domains, you should have > better performance: > https://github.com/ocochard/netbenches/blob/master/Xeon_E5-2650v4_2x12Cores-Chelsio_T520-CR/hw.cxgbe.nXxq/results/fbsd12-stable.r354440.BSDRP.1.96/README.md OK, I can work on the chelsio_affinity script.  .... hour later ... OK, tested and updated on github. > Notice that I never monitoring the CPU load during my benches. > Increasing the hw.cxgbe.holdoff_timer_idx was a good idea: I would expect > lower interrupt usage too. I've have some standard SNMP monitoring and can correlate the load spinning out of control to ping loss and packet loss. # vmstat -i | tail -1 Total                        12217353774     324329 > Did you monitor the QPI link usage ? (kldload cpuctl && pcm-numa.x) I haven't.  I'll look into that.  Hoping the numa-domain locking helps. Currently I have things bound to the right domain, just need to shrink the queue size and reboot! irq289: t6nex0:err:261 @cpu0(domain0): 0 irq290: t6nex0:evt:263 @cpu0(domain0): 4 irq291: t6nex0:0a0:265 @cpu1(domain0): 0 irq292: t6nex0:0a1:267 @cpu2(domain0): 0 irq293: t6nex0:0a2:269 @cpu3(domain0): 0 irq294: t6nex0:0a3:271 @cpu4(domain0): 0 irq295: t6nex0:0a4:273 @cpu5(domain0): 0 irq296: t6nex0:0a5:275 @cpu6(domain0): 0 irq297: t6nex0:0a6:277 @cpu7(domain0): 0 irq298: t6nex0:0a7:279 @cpu8(domain0): 0 irq299: t6nex0:0a8:281 @cpu9(domain0): 0 irq300: t6nex0:0a9:283 @cpu1(domain0): 0 irq301: t6nex0:0aa:285 @cpu2(domain0): 0 irq302: t6nex0:0ab:287 @cpu3(domain0): 0 irq303: t6nex0:0ac:289 @cpu4(domain0): 0 irq304: t6nex0:0ad:291 @cpu5(domain0): 0 irq305: t6nex0:0ae:293 @cpu6(domain0): 0 irq306: t6nex0:0af:295 @cpu7(domain0): 0 irq307: t6nex0:1a0:297 @cpu8(domain0): 185404641 irq308: t6nex0:1a1:299 @cpu9(domain0): 146802111 irq309: t6nex0:1a2:301 @cpu1(domain0): 133930820 irq310: t6nex0:1a3:303 @cpu2(domain0): 173156318 irq311: t6nex0:1a4:305 @cpu3(domain0): 132151349 irq312: t6nex0:1a5:307 @cpu4(domain0): 149108252 irq313: t6nex0:1a6:309 @cpu5(domain0): 149196634 irq314: t6nex0:1a7:311 @cpu6(domain0): 184211395 irq315: t6nex0:1a8:313 @cpu7(domain0): 151266056 irq316: t6nex0:1a9:315 @cpu8(domain0): 169259534 irq317: t6nex0:1aa:317 @cpu9(domain0): 164117244 irq318: t6nex0:1ab:319 @cpu1(domain0): 157471862 irq319: t6nex0:1ac:321 @cpu2(domain0): 127662140 irq320: t6nex0:1ad:323 @cpu3(domain0): 172750013 irq321: t6nex0:1ae:325 @cpu4(domain0): 173559485 irq322: t6nex0:1af:327 @cpu5(domain0): 227842473 irq323: t5nex0:err:329 @cpu0(domain1): 0 irq324: t5nex0:evt:331 @cpu0(domain1): 8 irq325: t5nex0:0a0:333 @cpu10(domain1): 1340449 irq326: t5nex0:0a1:335 @cpu11(domain1): 1128580 irq327: t5nex0:0a2:337 @cpu12(domain1): 1311599 irq328: t5nex0:0a3:339 @cpu13(domain1): 1157356 irq329: t5nex0:0a4:341 @cpu14(domain1): 1257426 irq330: t5nex0:0a5:343 @cpu15(domain1): 1169697 irq331: t5nex0:0a6:345 @cpu16(domain1): 1089689 irq332: t5nex0:0a7:347 @cpu17(domain1): 1117782 irq333: t5nex0:0a8:349 @cpu18(domain1): 1186770 irq334: t5nex0:0a9:351 @cpu19(domain1): 1147015 irq335: t5nex0:0aa:353 @cpu10(domain1): 1238148 irq336: t5nex0:0ab:355 @cpu11(domain1): 1134259 irq337: t5nex0:0ac:357 @cpu12(domain1): 1262301 irq338: t5nex0:0ad:359 @cpu13(domain1): 1233933 irq339: t5nex0:0ae:361 @cpu14(domain1): 1284298 irq340: t5nex0:0af:363 @cpu15(domain1): 1257873 irq341: t5nex0:1a0:365 @cpu16(domain1): 204307929 irq342: t5nex0:1a1:367 @cpu17(domain1): 221035308 irq343: t5nex0:1a2:369 @cpu18(domain1): 218431173 irq344: t5nex0:1a3:371 @cpu19(domain1): 197270425 irq345: t5nex0:1a4:373 @cpu10(domain1): 181544184 irq346: t5nex0:1a5:375 @cpu11(domain1): 187715982 irq347: t5nex0:1a6:377 @cpu12(domain1): 184945609 irq348: t5nex0:1a7:379 @cpu13(domain1): 161060780 irq349: t5nex0:1a8:381 @cpu14(domain1): 162546561 irq350: t5nex0:1a9:383 @cpu15(domain1): 188539721 irq351: t5nex0:1aa:385 @cpu16(domain1): 153407315 irq352: t5nex0:1ab:387 @cpu17(domain1): 171904505 irq353: t5nex0:1ac:389 @cpu18(domain1): 163256903 irq354: t5nex0:1ad:391 @cpu19(domain1): 162976257 irq355: t5nex0:1ae:393 @cpu10(domain1): 186167299 irq356: t5nex0:1af:395 @cpu11(domain1): 205566989 irq357: t5nex0:2a0:397 @cpu12(domain1): 113070700 irq358: t5nex0:2a1:399 @cpu13(domain1): 172641475 irq359: t5nex0:2a2:401 @cpu14(domain1): 121577604 irq360: t5nex0:2a3:403 @cpu15(domain1): 109659638 irq361: t5nex0:2a4:405 @cpu16(domain1): 112705459 irq362: t5nex0:2a5:407 @cpu17(domain1): 127206944 irq363: t5nex0:2a6:409 @cpu18(domain1): 109712072 irq364: t5nex0:2a7:411 @cpu19(domain1): 108579249 irq365: t5nex0:2a8:413 @cpu10(domain1): 121687614 irq366: t5nex0:2a9:415 @cpu11(domain1): 100657878 irq367: t5nex0:2aa:417 @cpu12(domain1): 99212108 irq368: t5nex0:2ab:419 @cpu13(domain1): 107358669 irq369: t5nex0:2ac:421 @cpu14(domain1): 114883419 irq370: t5nex0:2ad:423 @cpu15(domain1): 104580916 irq371: t5nex0:2ae:425 @cpu16(domain1): 107601764 irq372: t5nex0:2af:427 @cpu17(domain1): 116284819 irq373: t5nex0:3a0:429 @cpu18(domain1): 341626 irq374: t5nex0:3a1:431 @cpu19(domain1): 254931 irq375: t5nex0:3a2:433 @cpu10(domain1): 273165 irq376: t5nex0:3a3:435 @cpu11(domain1): 254925 irq377: t5nex0:3a4:437 @cpu12(domain1): 254915 irq378: t5nex0:3a5:439 @cpu13(domain1): 254917 irq379: t5nex0:3a6:441 @cpu14(domain1): 254942 irq380: t5nex0:3a7:443 @cpu15(domain1): 254943 irq381: t5nex0:3a8:445 @cpu16(domain1): 254928 irq382: t5nex0:3a9:447 @cpu17(domain1): 254936 irq383: t5nex0:3aa:449 @cpu18(domain1): 254941 irq384: t5nex0:3ab:451 @cpu19(domain1): 254927 irq385: t5nex0:3ac:453 @cpu10(domain1): 255604 irq386: t5nex0:3ad:455 @cpu11(domain1): 254923 irq387: t5nex0:3ae:457 @cpu12(domain1): 254937 irq388: t5nex0:3af:459 @cpu13(domain1): 254931 irq389: t5nex1:err:461 @cpu0(domain1): 0 irq390: t5nex1:evt:463 @cpu0(domain1): 5 irq391: t5nex1:0a0:465 @cpu14(domain1): 0 irq392: t5nex1:0a1:467 @cpu15(domain1): 0 irq393: t5nex1:0a2:469 @cpu16(domain1): 0 irq394: t5nex1:0a3:471 @cpu17(domain1): 0 irq395: t5nex1:0a4:473 @cpu18(domain1): 0 irq396: t5nex1:0a5:475 @cpu19(domain1): 0 irq397: t5nex1:0a6:477 @cpu10(domain1): 0 irq398: t5nex1:0a7:479 @cpu11(domain1): 0 irq399: t5nex1:0a8:481 @cpu12(domain1): 0 irq400: t5nex1:0a9:483 @cpu13(domain1): 0 irq401: t5nex1:0aa:485 @cpu14(domain1): 0 irq402: t5nex1:0ab:487 @cpu15(domain1): 0 irq403: t5nex1:0ac:489 @cpu16(domain1): 0 irq404: t5nex1:0ad:491 @cpu17(domain1): 0 irq405: t5nex1:0ae:493 @cpu18(domain1): 0 irq406: t5nex1:0af:495 @cpu19(domain1): 0 irq407: t5nex1:1a0:497 @cpu10(domain1): 0 irq408: t5nex1:1a1:499 @cpu11(domain1): 0 irq409: t5nex1:1a2:501 @cpu12(domain1): 0 irq410: t5nex1:1a3:503 @cpu13(domain1): 0 irq411: t5nex1:1a4:505 @cpu14(domain1): 0 irq412: t5nex1:1a5:507 @cpu15(domain1): 0 irq413: t5nex1:1a6:509 @cpu16(domain1): 0 irq414: t5nex1:1a7:511 @cpu17(domain1): 0 irq415: t5nex1:1a8:513 @cpu18(domain1): 0 irq416: t5nex1:1a9:515 @cpu19(domain1): 0 irq417: t5nex1:1aa:517 @cpu10(domain1): 0 irq418: t5nex1:1ab:519 @cpu11(domain1): 0 irq419: t5nex1:1ac:521 @cpu12(domain1): 0 irq420: t5nex1:1ad:523 @cpu13(domain1): 0 irq421: t5nex1:1ae:525 @cpu14(domain1): 0 irq422: t5nex1:1af:527 @cpu15(domain1): 0 irq423: t5nex1:2a0:529 @cpu16(domain1): 159872451 irq424: t5nex1:2a1:531 @cpu17(domain1): 154946549 irq425: t5nex1:2a2:533 @cpu18(domain1): 163392585 irq426: t5nex1:2a3:535 @cpu19(domain1): 248248091 irq427: t5nex1:2a4:537 @cpu10(domain1): 151825795 irq428: t5nex1:2a5:539 @cpu11(domain1): 211623937 irq429: t5nex1:2a6:541 @cpu12(domain1): 146996842 irq430: t5nex1:2a7:543 @cpu13(domain1): 149654776 irq431: t5nex1:2a8:545 @cpu14(domain1): 159051009 irq432: t5nex1:2a9:547 @cpu15(domain1): 147511578 irq433: t5nex1:2aa:549 @cpu16(domain1): 151366677 irq434: t5nex1:2ab:551 @cpu17(domain1): 166419088 irq435: t5nex1:2ac:553 @cpu18(domain1): 155997667 irq436: t5nex1:2ad:555 @cpu19(domain1): 153777002 irq437: t5nex1:2ae:557 @cpu10(domain1): 148026677 irq438: t5nex1:2af:559 @cpu11(domain1): 146783174 irq439: t5nex1:3a0:561 @cpu12(domain1): 156624537 irq440: t5nex1:3a1:563 @cpu13(domain1): 173749953 irq441: t5nex1:3a2:565 @cpu14(domain1): 177033995 irq442: t5nex1:3a3:567 @cpu15(domain1): 173715859 irq443: t5nex1:3a4:569 @cpu16(domain1): 174333864 irq444: t5nex1:3a5:571 @cpu17(domain1): 157006064 irq445: t5nex1:3a6:573 @cpu18(domain1): 160822294 irq446: t5nex1:3a7:575 @cpu19(domain1): 153622866 irq447: t5nex1:3a8:577 @cpu10(domain1): 158965692 irq448: t5nex1:3a9:579 @cpu11(domain1): 153345040 irq449: t5nex1:3aa:581 @cpu12(domain1): 166902519 irq450: t5nex1:3ab:583 @cpu13(domain1): 159972013 irq451: t5nex1:3ac:585 @cpu14(domain1): 171917959 irq452: t5nex1:3ad:587 @cpu15(domain1): 166200690 irq453: t5nex1:3ae:589 @cpu16(domain1): 152933459 irq454: t5nex1:3af:591 @cpu17(domain1): 144512181