Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 14 Feb 2020 12:00:25 -0800
From:      BulkMailForRudy <crapsh@monkeybrains.net>
To:        freebsd-net@freebsd.org
Subject:   Re: Issue with BGP router / high interrupt / Chelsio / FreeBSD 12.1
Message-ID:  <9e1c0666-3dea-f946-24d4-e2dea48b30af@monkeybrains.net>
In-Reply-To: <CA%2Bq%2BTcrUyzfNtLGD4Vtc3a0v5MHcxVvED=eK57aGL21LTQzL4w@mail.gmail.com>
References:  <1aa78c6e-e640-623c-73d3-473df132eb72@monkeybrains.net> <c921825a-3a9c-cc15-78e6-c7e3776ab12a@monkeybrains.net> <bb6c3997-c369-28c3-9d85-c9cca526e093@monkeybrains.net> <a4c98e33-3aae-f08b-4132-52350a33a56c@monkeybrains.net> <428f3cdf-9035-90a7-14f8-f294c2131682@monkeybrains.net> <CA%2Bq%2BTcrUyzfNtLGD4Vtc3a0v5MHcxVvED=eK57aGL21LTQzL4w@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On 2/14/20 10:00 AM, Olivier Cochard-Labbé wrote:
> On Fri, Feb 14, 2020 at 6:25 PM Rudy <crapsh@monkeybrains.net> wrote:
>
>> On 2/12/20 7:21 PM, Rudy wrote:
>>   > I'm having issues with a box that is acting as a BGP router for my
>> network.  3 Chelsio cards, two T5 and one T6.  It was working great
>> until I turned up our first port on the T6.  It seems like traffic
>> passing in from a T5 card and out the T6 causes a really high load (and
>> high interrupts).
>>
>>
>> Looking better!  I made some changes based on BSDRP which I hadn't known
>> about -- I think ifqmaxlen was the tunable I overlooked.
>>
>> #
>>
>> https://github.com/ocochard/BSDRP/blob/master/BSDRP/Files/boot/loader.conf.local
>> net.link.ifqmaxlen="16384"
>>
>>
> This net.link.ifqmaxlen was set to help in case of lagg usage: I was not
> aware it could improve your use case.


oThanks for the feedback.  Maybe it was a coincidence.  Load has creep 
back up to 15.


>  From your first post, it looks like your setup is a 2 packages, 10 cores,
> 20 threads (disabled).
> And you have configured your Chelsio to use 16 queues (hw.cxgbe.Xrx=16):
> It's a good think to have a power of 2 number of queues with Chelsio, but
> I'm not sure it's a good idea to spread those queue across the 2 packages.
> So perhaps you should try:
> 1. To reduce queues to 8 queues and bind them to the local domain
> 2. Or keeping 16 queues, but re-enabling HyperThreading and bing them to
> the local domain too. (on -head with recent CPU
> and machdep.hyperthreading_intr_allowed, using hyper-threading improve
> forwarding performance).
>
> But anyway even with 16 queues spread over 2 domains, you should have
> better performance:
> https://github.com/ocochard/netbenches/blob/master/Xeon_E5-2650v4_2x12Cores-Chelsio_T520-CR/hw.cxgbe.nXxq/results/fbsd12-stable.r354440.BSDRP.1.96/README.md


OK, I can work on the chelsio_affinity script.  .... hour later ... OK, 
tested and updated on github.



> Notice that I never monitoring the CPU load during my benches.
> Increasing the hw.cxgbe.holdoff_timer_idx was a good idea: I would expect
> lower interrupt usage too.

I've have some standard SNMP monitoring and can correlate the load 
spinning out of control to ping loss and packet loss.

# vmstat -i | tail -1
Total                        12217353774     324329


> Did you monitor the QPI link usage ? (kldload cpuctl && pcm-numa.x)


I haven't.  I'll look into that.  Hoping the numa-domain locking helps.




Currently I have things bound to the right domain, just need to shrink 
the queue size and reboot!


irq289: t6nex0:err:261 @cpu0(domain0): 0
irq290: t6nex0:evt:263 @cpu0(domain0): 4
irq291: t6nex0:0a0:265 @cpu1(domain0): 0
irq292: t6nex0:0a1:267 @cpu2(domain0): 0
irq293: t6nex0:0a2:269 @cpu3(domain0): 0
irq294: t6nex0:0a3:271 @cpu4(domain0): 0
irq295: t6nex0:0a4:273 @cpu5(domain0): 0
irq296: t6nex0:0a5:275 @cpu6(domain0): 0
irq297: t6nex0:0a6:277 @cpu7(domain0): 0
irq298: t6nex0:0a7:279 @cpu8(domain0): 0
irq299: t6nex0:0a8:281 @cpu9(domain0): 0
irq300: t6nex0:0a9:283 @cpu1(domain0): 0
irq301: t6nex0:0aa:285 @cpu2(domain0): 0
irq302: t6nex0:0ab:287 @cpu3(domain0): 0
irq303: t6nex0:0ac:289 @cpu4(domain0): 0
irq304: t6nex0:0ad:291 @cpu5(domain0): 0
irq305: t6nex0:0ae:293 @cpu6(domain0): 0
irq306: t6nex0:0af:295 @cpu7(domain0): 0
irq307: t6nex0:1a0:297 @cpu8(domain0): 185404641
irq308: t6nex0:1a1:299 @cpu9(domain0): 146802111
irq309: t6nex0:1a2:301 @cpu1(domain0): 133930820
irq310: t6nex0:1a3:303 @cpu2(domain0): 173156318
irq311: t6nex0:1a4:305 @cpu3(domain0): 132151349
irq312: t6nex0:1a5:307 @cpu4(domain0): 149108252
irq313: t6nex0:1a6:309 @cpu5(domain0): 149196634
irq314: t6nex0:1a7:311 @cpu6(domain0): 184211395
irq315: t6nex0:1a8:313 @cpu7(domain0): 151266056
irq316: t6nex0:1a9:315 @cpu8(domain0): 169259534
irq317: t6nex0:1aa:317 @cpu9(domain0): 164117244
irq318: t6nex0:1ab:319 @cpu1(domain0): 157471862
irq319: t6nex0:1ac:321 @cpu2(domain0): 127662140
irq320: t6nex0:1ad:323 @cpu3(domain0): 172750013
irq321: t6nex0:1ae:325 @cpu4(domain0): 173559485
irq322: t6nex0:1af:327 @cpu5(domain0): 227842473
irq323: t5nex0:err:329 @cpu0(domain1): 0
irq324: t5nex0:evt:331 @cpu0(domain1): 8
irq325: t5nex0:0a0:333 @cpu10(domain1): 1340449
irq326: t5nex0:0a1:335 @cpu11(domain1): 1128580
irq327: t5nex0:0a2:337 @cpu12(domain1): 1311599
irq328: t5nex0:0a3:339 @cpu13(domain1): 1157356
irq329: t5nex0:0a4:341 @cpu14(domain1): 1257426
irq330: t5nex0:0a5:343 @cpu15(domain1): 1169697
irq331: t5nex0:0a6:345 @cpu16(domain1): 1089689
irq332: t5nex0:0a7:347 @cpu17(domain1): 1117782
irq333: t5nex0:0a8:349 @cpu18(domain1): 1186770
irq334: t5nex0:0a9:351 @cpu19(domain1): 1147015
irq335: t5nex0:0aa:353 @cpu10(domain1): 1238148
irq336: t5nex0:0ab:355 @cpu11(domain1): 1134259
irq337: t5nex0:0ac:357 @cpu12(domain1): 1262301
irq338: t5nex0:0ad:359 @cpu13(domain1): 1233933
irq339: t5nex0:0ae:361 @cpu14(domain1): 1284298
irq340: t5nex0:0af:363 @cpu15(domain1): 1257873
irq341: t5nex0:1a0:365 @cpu16(domain1): 204307929
irq342: t5nex0:1a1:367 @cpu17(domain1): 221035308
irq343: t5nex0:1a2:369 @cpu18(domain1): 218431173
irq344: t5nex0:1a3:371 @cpu19(domain1): 197270425
irq345: t5nex0:1a4:373 @cpu10(domain1): 181544184
irq346: t5nex0:1a5:375 @cpu11(domain1): 187715982
irq347: t5nex0:1a6:377 @cpu12(domain1): 184945609
irq348: t5nex0:1a7:379 @cpu13(domain1): 161060780
irq349: t5nex0:1a8:381 @cpu14(domain1): 162546561
irq350: t5nex0:1a9:383 @cpu15(domain1): 188539721
irq351: t5nex0:1aa:385 @cpu16(domain1): 153407315
irq352: t5nex0:1ab:387 @cpu17(domain1): 171904505
irq353: t5nex0:1ac:389 @cpu18(domain1): 163256903
irq354: t5nex0:1ad:391 @cpu19(domain1): 162976257
irq355: t5nex0:1ae:393 @cpu10(domain1): 186167299
irq356: t5nex0:1af:395 @cpu11(domain1): 205566989
irq357: t5nex0:2a0:397 @cpu12(domain1): 113070700
irq358: t5nex0:2a1:399 @cpu13(domain1): 172641475
irq359: t5nex0:2a2:401 @cpu14(domain1): 121577604
irq360: t5nex0:2a3:403 @cpu15(domain1): 109659638
irq361: t5nex0:2a4:405 @cpu16(domain1): 112705459
irq362: t5nex0:2a5:407 @cpu17(domain1): 127206944
irq363: t5nex0:2a6:409 @cpu18(domain1): 109712072
irq364: t5nex0:2a7:411 @cpu19(domain1): 108579249
irq365: t5nex0:2a8:413 @cpu10(domain1): 121687614
irq366: t5nex0:2a9:415 @cpu11(domain1): 100657878
irq367: t5nex0:2aa:417 @cpu12(domain1): 99212108
irq368: t5nex0:2ab:419 @cpu13(domain1): 107358669
irq369: t5nex0:2ac:421 @cpu14(domain1): 114883419
irq370: t5nex0:2ad:423 @cpu15(domain1): 104580916
irq371: t5nex0:2ae:425 @cpu16(domain1): 107601764
irq372: t5nex0:2af:427 @cpu17(domain1): 116284819
irq373: t5nex0:3a0:429 @cpu18(domain1): 341626
irq374: t5nex0:3a1:431 @cpu19(domain1): 254931
irq375: t5nex0:3a2:433 @cpu10(domain1): 273165
irq376: t5nex0:3a3:435 @cpu11(domain1): 254925
irq377: t5nex0:3a4:437 @cpu12(domain1): 254915
irq378: t5nex0:3a5:439 @cpu13(domain1): 254917
irq379: t5nex0:3a6:441 @cpu14(domain1): 254942
irq380: t5nex0:3a7:443 @cpu15(domain1): 254943
irq381: t5nex0:3a8:445 @cpu16(domain1): 254928
irq382: t5nex0:3a9:447 @cpu17(domain1): 254936
irq383: t5nex0:3aa:449 @cpu18(domain1): 254941
irq384: t5nex0:3ab:451 @cpu19(domain1): 254927
irq385: t5nex0:3ac:453 @cpu10(domain1): 255604
irq386: t5nex0:3ad:455 @cpu11(domain1): 254923
irq387: t5nex0:3ae:457 @cpu12(domain1): 254937
irq388: t5nex0:3af:459 @cpu13(domain1): 254931
irq389: t5nex1:err:461 @cpu0(domain1): 0
irq390: t5nex1:evt:463 @cpu0(domain1): 5
irq391: t5nex1:0a0:465 @cpu14(domain1): 0
irq392: t5nex1:0a1:467 @cpu15(domain1): 0
irq393: t5nex1:0a2:469 @cpu16(domain1): 0
irq394: t5nex1:0a3:471 @cpu17(domain1): 0
irq395: t5nex1:0a4:473 @cpu18(domain1): 0
irq396: t5nex1:0a5:475 @cpu19(domain1): 0
irq397: t5nex1:0a6:477 @cpu10(domain1): 0
irq398: t5nex1:0a7:479 @cpu11(domain1): 0
irq399: t5nex1:0a8:481 @cpu12(domain1): 0
irq400: t5nex1:0a9:483 @cpu13(domain1): 0
irq401: t5nex1:0aa:485 @cpu14(domain1): 0
irq402: t5nex1:0ab:487 @cpu15(domain1): 0
irq403: t5nex1:0ac:489 @cpu16(domain1): 0
irq404: t5nex1:0ad:491 @cpu17(domain1): 0
irq405: t5nex1:0ae:493 @cpu18(domain1): 0
irq406: t5nex1:0af:495 @cpu19(domain1): 0
irq407: t5nex1:1a0:497 @cpu10(domain1): 0
irq408: t5nex1:1a1:499 @cpu11(domain1): 0
irq409: t5nex1:1a2:501 @cpu12(domain1): 0
irq410: t5nex1:1a3:503 @cpu13(domain1): 0
irq411: t5nex1:1a4:505 @cpu14(domain1): 0
irq412: t5nex1:1a5:507 @cpu15(domain1): 0
irq413: t5nex1:1a6:509 @cpu16(domain1): 0
irq414: t5nex1:1a7:511 @cpu17(domain1): 0
irq415: t5nex1:1a8:513 @cpu18(domain1): 0
irq416: t5nex1:1a9:515 @cpu19(domain1): 0
irq417: t5nex1:1aa:517 @cpu10(domain1): 0
irq418: t5nex1:1ab:519 @cpu11(domain1): 0
irq419: t5nex1:1ac:521 @cpu12(domain1): 0
irq420: t5nex1:1ad:523 @cpu13(domain1): 0
irq421: t5nex1:1ae:525 @cpu14(domain1): 0
irq422: t5nex1:1af:527 @cpu15(domain1): 0
irq423: t5nex1:2a0:529 @cpu16(domain1): 159872451
irq424: t5nex1:2a1:531 @cpu17(domain1): 154946549
irq425: t5nex1:2a2:533 @cpu18(domain1): 163392585
irq426: t5nex1:2a3:535 @cpu19(domain1): 248248091
irq427: t5nex1:2a4:537 @cpu10(domain1): 151825795
irq428: t5nex1:2a5:539 @cpu11(domain1): 211623937
irq429: t5nex1:2a6:541 @cpu12(domain1): 146996842
irq430: t5nex1:2a7:543 @cpu13(domain1): 149654776
irq431: t5nex1:2a8:545 @cpu14(domain1): 159051009
irq432: t5nex1:2a9:547 @cpu15(domain1): 147511578
irq433: t5nex1:2aa:549 @cpu16(domain1): 151366677
irq434: t5nex1:2ab:551 @cpu17(domain1): 166419088
irq435: t5nex1:2ac:553 @cpu18(domain1): 155997667
irq436: t5nex1:2ad:555 @cpu19(domain1): 153777002
irq437: t5nex1:2ae:557 @cpu10(domain1): 148026677
irq438: t5nex1:2af:559 @cpu11(domain1): 146783174
irq439: t5nex1:3a0:561 @cpu12(domain1): 156624537
irq440: t5nex1:3a1:563 @cpu13(domain1): 173749953
irq441: t5nex1:3a2:565 @cpu14(domain1): 177033995
irq442: t5nex1:3a3:567 @cpu15(domain1): 173715859
irq443: t5nex1:3a4:569 @cpu16(domain1): 174333864
irq444: t5nex1:3a5:571 @cpu17(domain1): 157006064
irq445: t5nex1:3a6:573 @cpu18(domain1): 160822294
irq446: t5nex1:3a7:575 @cpu19(domain1): 153622866
irq447: t5nex1:3a8:577 @cpu10(domain1): 158965692
irq448: t5nex1:3a9:579 @cpu11(domain1): 153345040
irq449: t5nex1:3aa:581 @cpu12(domain1): 166902519
irq450: t5nex1:3ab:583 @cpu13(domain1): 159972013
irq451: t5nex1:3ac:585 @cpu14(domain1): 171917959
irq452: t5nex1:3ad:587 @cpu15(domain1): 166200690
irq453: t5nex1:3ae:589 @cpu16(domain1): 152933459
irq454: t5nex1:3af:591 @cpu17(domain1): 144512181





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9e1c0666-3dea-f946-24d4-e2dea48b30af>