From owner-freebsd-net@FreeBSD.ORG Wed Oct 8 17:56:57 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A1C3669E; Wed, 8 Oct 2014 17:56:57 +0000 (UTC) Received: from mail-oi0-x22a.google.com (mail-oi0-x22a.google.com [IPv6:2607:f8b0:4003:c06::22a]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 534812B6; Wed, 8 Oct 2014 17:56:57 +0000 (UTC) Received: by mail-oi0-f42.google.com with SMTP id a141so8482066oig.29 for ; Wed, 08 Oct 2014 10:56:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=dJo0eP/xdn+4fSu9l6b8OthGg0xSus03yXgCMvltTlY=; b=Pv9GQgEhrkLRZCNFsg66YuZAYL758Av4AvTlhvPzpu4VAr7DHWOQtjjd7KXnC1BfuO zUi0vKIozFXqME2jTXE0+EZdeE0tVZBEz+1cU6ASx2CCeD8k/6ZNLgq/990oypwfSJX8 ZdACYKWwoYUF9yU1pHo4BuUj8XXOJQy9Plj0fJra8goCggJbUgjezBI+g1CtJf5g5oMV myVJGLX/rHhVZDVkNjQgRuy0kFXAzqN6cRg9bLt66Zv3PZbx0NXgtoJyA1CG0EB4EKRl uoVQG9mIiwmVVoR5rwzlzGGyMbIwZQCjpkg8147ckJuSlJc/wDpXjITs0Pt5467H3PU2 hbSw== MIME-Version: 1.0 X-Received: by 10.182.38.138 with SMTP id g10mr14621306obk.21.1412791016668; Wed, 08 Oct 2014 10:56:56 -0700 (PDT) Received: by 10.202.200.196 with HTTP; Wed, 8 Oct 2014 10:56:56 -0700 (PDT) In-Reply-To: <201410071428.15753.jhb@freebsd.org> References: <1410203348.1343.1.camel@bruno> <201410071146.26274.jhb@freebsd.org> <201410071428.15753.jhb@freebsd.org> Date: Wed, 8 Oct 2014 10:56:56 -0700 Message-ID: Subject: Re: ixgbe(4) spin lock held too long From: Jason Wolfe To: John Baldwin , freebsd-net@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: Sean Bruno , davide@freebsd.org, Eric van Gyzen X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Oct 2014 17:56:57 -0000 On Tue, Oct 7, 2014 at 11:28 AM, John Baldwin wrote: > On Tuesday, October 07, 2014 2:06:42 pm Jason Wolfe wrote: > > Hey John, > > > > Happy to do this, but the pool of boxes is about 500 large, which is the > > reason I'm able to see a crash every day or so. I've pulled a portion of > > them out of service to switch them over to a a kernel with WITNESS > options, > > might that assist in the same manner if we catch a crash there? If > you're > > thinking the KTR traces are required to diagnose this I'll get started on > > that though. I've also cut 10.1-RC1 with some ixgbe fixes for future > use, > > but noticed kern_timeout.c has been static for the past 8 months. Thanks > > again for your help. > > I don't think WITNESS will really help in this case. ixgbe just happens to > be trying to schedule a timer and running into a problem that the timer is > spinning on another CPU. While the kern_timeout code hasn't changed in a > while, it may be an edge case provoked by another bug elsewhere. It's also > possible I suppose that a lot of callouts are scheduled for some reason. > > Hmm, maybe we can still figure this out from your dumps. We can still get > to > '*cc' via an alternate route. First, IIRC, the thread is an idle thread, > so its name should indicate which CPU it is for: > > % sudo procstat -at | grep 100003 > 11 100003 idle idle: cpu0 0 255 run - > > (Hmm, I bet that is always CPU 0). That means we can find 'cc' by doing > 'p cc_cpu[0]' in kgdb. Can you do that from the last core dump along with > 'p last' in frame 4? > > Grrr. Looks like cc->cc_lastscan is already overwritten by the time you've > crashed, so we can't work back to the original value of 'firstb'. > > However, there is a sysctl you can invoke to get callout stats. Maybe run > 'sysctl kern.callout_stats=1' on one of your existing servers and then > check > dmesg. On an idle machine sitting here running stable/10 I have 86 callout > scheduled currently: > > Scheduled callouts statistic snapshot: > Callouts: 86 Buckets: 32768*8 Bucket size: 0.003906s > C/Bk: med 0 avg 0.000328 max 19 > Time: med 2.000000s avg 719.566333s max 19807.162515s > Prec: med 0.500000s avg 60.250613s max 1799.999875s > Distribution: buckets time tcum prec pcum > 0.000000s 2**-12 0 0 1 1 > 0.007812s 2**1 1 1 1 2 > 0.015625s 2**2 0 1 2 4 > 0.031250s 2**3 5 6 4 8 > 0.062500s 2**4 0 6 20 28 > 0.125000s 2**5 1 7 1 29 > 0.250000s 2**6 4 11 11 40 > 0.500000s 2**7 7 18 14 54 > 1.000000s 2**8 23 41 11 65 > 2.000000s 2**9 2 43 3 68 > 4.000000s 2**10 13 56 4 72 > 8.000000s 2**11 2 58 0 72 > 16.000000s 2**12 11 69 2 74 > 32.000000s 2**13 1 70 1 75 > 64.000000s 2**14 3 73 0 75 > 128.000000s 2**15 1 74 3 78 > 256.000000s 2**16 4 78 1 79 > 512.000000s 2**17 0 78 6 85 > 2048.000000s 2**19 2 80 1 86 > 8192.000000s 2**21 5 85 0 86 > 16384.000000s 2**22 1 86 0 86 > > It would be interesting to see if you have a much higher number. > > -- > John Baldwin > I was trying to keep the chatter off the list, but John has been insanely helpful here. Looping the list back in as his insight into debugging this is likely relevant to others. John, Correct 100003 is always CPU0, on our end at least. Jackpot on the cc_cpu[0] also, here are cores across 3 different 'spin lock callout held by tid 100003 too long' machines. Callouts on an idle machine are in line with yours at 90, and a somewhat idle box pushing ~500Mb/s has in the 3000-4000 range. (kgdb) tid 100003 [Switching to thread 40 (Thread 100003)]#0 0xffffffff80ac39b8 in cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1432 1432 savectx(&stoppcbs[cpu]); (kgdb) up 4 #4 0xffffffff80733f79 in callout_process (now=3549882673091690) at /usr/src/sys/kern/kern_timeout.c:443 443 tmp = LIST_FIRST(sc); (kgdb) list 438 } 439 440 /* Iterate callwheel from firstb to nowb and then up to lastb. */ 441 do { 442 sc = &cc->cc_callwheel[firstb & callwheelmask]; 443 tmp = LIST_FIRST(sc); 444 while (tmp != NULL) { 445 /* Run the callout if present time within allowed. */ 446 if (tmp->c_time <= now) { 447 /* (kgdb) p cc_cpu[0] $1 = {cc_lock = {lock_object = {lo_name = 0xffffffff80ccbec7 "callout", lo_flags = 720896, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, cc_exec_entity = {{cc_next = 0x0, cc_curr = 0x0, ce_migration_func = 0, ce_migration_arg = 0x0, ce_migration_cpu = 64, ce_migration_time = 0, ce_migration_prec = 0, cc_cancel = false, cc_waiting = false}, { cc_next = 0xffffffff812a7af0, cc_curr = 0x0, ce_migration_func = 0, ce_migration_arg = 0x0, ce_migration_cpu = 64, ce_migration_time = 0, ce_migration_prec = 0, cc_cancel = false, cc_waiting = false}}, cc_callout = 0xfffffe000069f000, cc_callwheel = 0xfffffe00007c1000, cc_expireq = {tqh_first = 0x0, tqh_last = 0xffffffff812a0510}, cc_callfree = {slh_first = 0xfffffe00007c02c0}, cc_firstevent = 3549882673088716, cc_lastscan = 3549882673091690, cc_cookie = 0xfffff800151f4b80, cc_bucket = 6515} (kgdb) p last $2 = 3549882929381376 (kgdb) tid 100003 [Switching to thread 40 (Thread 100003)]#0 0xffffffff80ac39b8 in cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1432 1432 savectx(&stoppcbs[cpu]); (kgdb) up 4 #4 0xffffffff80733fab in callout_process (now=6792755852650669) at /usr/src/sys/kern/kern_timeout.c:467 467 LIST_REMOVE(tmp, c_links.le); (kgdb) list 462 #endif 463 1); 464 tmp = cc->cc_exec_next_dir; 465 } else { 466 tmpn = LIST_NEXT(tmp, c_links.le); 467 LIST_REMOVE(tmp, c_links.le); 468 TAILQ_INSERT_TAIL(&cc->cc_expireq, 469 tmp, c_links.tqe); 470 tmp->c_flags |= CALLOUT_PROCESSED; 471 tmp = tmpn; (kgdb) p cc_cpu[0] $1 = {cc_lock = {lock_object = {lo_name = 0xffffffff80ccbec7 "callout", lo_flags = 720896, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, cc_exec_entity = {{cc_next = 0x0, cc_curr = 0x0, ce_migration_func = 0, ce_migration_arg = 0x0, ce_migration_cpu = 64, ce_migration_time = 0, ce_migration_prec = 0, cc_cancel = true, cc_waiting = false}, { cc_next = 0x0, cc_curr = 0x0, ce_migration_func = 0, ce_migration_arg = 0x0, ce_migration_cpu = 64, ce_migration_time = 0, ce_migration_prec = 0, cc_cancel = false, cc_waiting = false}}, cc_callout = 0xfffffe000069f000, cc_callwheel = 0xfffffe00007c1000, cc_expireq = {tqh_first = 0xfffffe0003f461b0, tqh_last = 0xffffffff81257138}, cc_callfree = {slh_first = 0xfffffe00007c02c0}, cc_firstevent = 6792755922031366, cc_lastscan = 6792755852650669, cc_cookie = 0xfffff800151f4b80, cc_bucket = 31145} (kgdb) p last $2 = 6792757989343232 (kgdb) tid 100003 [Switching to thread 40 (Thread 100003)]#0 0xffffffff80ac39b8 in cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1432 1432 savectx(&stoppcbs[cpu]); (kgdb) up 4 #4 0xffffffff8073406e in callout_process (now=4354614125734311) at /usr/src/sys/kern/kern_timeout.c:487 487 if (tmp->c_time < first) (kgdb) list 482 if (tmp->c_time > last) { 483 lastb = nowb; 484 goto next; 485 } 486 /* Update first and last time, respecting this event. */ 487 if (tmp->c_time < first) 488 first = tmp->c_time; 489 tmp_max = tmp->c_time + tmp->c_precision; 490 if (tmp_max < last) 491 last = tmp_max; (kgdb) p cc_cpu[0] $1 = {cc_lock = {lock_object = {lo_name = 0xffffffff80ccbec7 "callout", lo_flags = 720896, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, cc_exec_entity = {{cc_next = 0x0, cc_curr = 0x0, ce_migration_func = 0, ce_migration_arg = 0x0, ce_migration_cpu = 64, ce_migration_time = 0, ce_migration_prec = 0, cc_cancel = false, cc_waiting = false}, { cc_next = 0x0, cc_curr = 0x0, ce_migration_func = 0, ce_migration_arg = 0x0, ce_migration_cpu = 64, ce_migration_time = 0, ce_migration_prec = 0, cc_cancel = false, cc_waiting = false}}, cc_callout = 0xfffffe000069f000, cc_callwheel = 0xfffffe00007c1000, cc_expireq = {tqh_first = 0x0, tqh_last = 0xffffffff812a0510}, cc_callfree = {slh_first = 0xfffffe00007c02c0}, cc_firstevent = 4354614125730601, cc_lastscan = 4354614125734311, cc_cookie = 0xfffff800151f4b80, cc_bucket = 32667} (kgdb) p last $2 = 4354614660956160 Scheduled callouts statistic snapshot: Callouts: 90 Buckets: 32768*6 Bucket size: 0.003906s C/Bk: med 0 avg 0.000457 max 13 Time: med 4.000000s avg 1231.642745s max 56836.522122s Prec: med 1.000000s avg 194.507344s max 5399.999627s Distribution: buckets time tcum prec pcum 0.000000s 2**-12 0 0 1 1 0.000030s 2**-7 1 1 0 1 0.015625s 2**2 0 1 2 3 0.031250s 2**3 1 2 17 20 0.062500s 2**4 1 3 14 34 0.125000s 2**5 15 18 1 35 0.250000s 2**6 6 24 1 36 0.500000s 2**7 12 36 5 41 1.000000s 2**8 3 39 15 56 4.000000s 2**10 17 56 9 65 8.000000s 2**11 2 58 0 65 16.000000s 2**12 0 58 12 77 32.000000s 2**13 2 60 1 78 64.000000s 2**14 6 66 4 82 128.000000s 2**15 17 83 3 85 256.000000s 2**16 0 83 2 87 1024.000000s 2**18 4 87 0 87 4096.000000s 2**20 0 87 3 90 32768.000000s 2**23 2 89 0 90 65536.000000s 2**24 1 90 0 90 Scheduled callouts statistic snapshot: Callouts: 3613 Buckets: 32768*6 Bucket size: 0.003906s C/Bk: med 0 avg 0.018376 max 22 Time: med 64.000000s avg 167.175352s max 86397.094792s Prec: med 4.000000s avg 13.837570s max 5399.999627s Distribution: buckets time tcum prec pcum 0.000000s 2**-12 0 0 1 1 0.001953s 2**-1 0 0 2 3 0.003906s 2**0 2 2 0 3 0.007812s 2**1 2 4 0 3 0.015625s 2**2 3 7 48 51 0.031250s 2**3 5 12 79 130 0.062500s 2**4 9 21 139 269 0.125000s 2**5 24 45 27 296 0.250000s 2**6 74 119 120 416 0.500000s 2**7 117 236 30 446 1.000000s 2**8 89 325 202 648 2.000000s 2**9 76 401 24 672 4.000000s 2**10 156 557 2862 3534 8.000000s 2**11 187 744 1 3535 16.000000s 2**12 370 1114 12 3547 32.000000s 2**13 589 1703 1 3548 64.000000s 2**14 1876 3579 55 3603 128.000000s 2**15 18 3597 2 3605 256.000000s 2**16 1 3598 2 3607 512.000000s 2**17 1 3599 0 3607 1024.000000s 2**18 6 3605 0 3607 2048.000000s 2**19 2 3607 0 3607 4096.000000s 2**20 0 3607 6 3613 32768.000000s 2**23 1 3608 0 3613 65536.000000s 2**24 5 3613 0 3613 Jason