From owner-freebsd-net@FreeBSD.ORG  Wed Jan 25 12:24:53 2012
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D45FB106566C
	for <freebsd-net@freebsd.org>; Wed, 25 Jan 2012 12:24:53 +0000 (UTC)
	(envelope-from citrin@citrin.ru)
Received: from mail-chaos.rambler.ru (mail-chaos.rambler.ru [81.19.68.130])
	by mx1.freebsd.org (Postfix) with ESMTP id 4CCB48FC18
	for <freebsd-net@freebsd.org>; Wed, 25 Jan 2012 12:24:53 +0000 (UTC)
Received: from citrin.office.vega.ru (office-nat.spylog.net [193.169.234.6])
	(Authenticated sender: citrin@citrin.ru)
	by mail-chaos.rambler.ru (Postfix) with ESMTPSA id 23A771702C
	for <freebsd-net@freebsd.org>; Wed, 25 Jan 2012 15:14:07 +0300 (MSK)
Message-ID: <4F1FF20E.7080108@citrin.ru>
Date: Wed, 25 Jan 2012 16:14:06 +0400
From: Anton Yuzhaninov <citrin@citrin.ru>
User-Agent: Mozilla/5.0 (X11; FreeBSD i386;
	rv:6.0.2) Gecko/20110922 Thunderbird/6.0.2
MIME-Version: 1.0
To: freebsd-net@freebsd.org
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Subject: livelock with full loaded em(4)
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Jan 2012 12:24:54 -0000

Hello.

I have test boxes with em(4) network card - Intel 82563EB
FreeBSD version - 8.2 stable from 2012-01-15, amd64

When this NIC is full loaded livelock occurs - system is unresponsive
even from local console.

To generate load I use netsend from /usr/src/tools/tools/netrate/
but other traffic source (e. g. TCP instead UDP) cause same problem.

There is need 2 conditions for this livelock:

1. With full NIC load, kernel thread "em1 taskq" hogs CPU.

top -zISHP for interface load a bit less, than full.
Traffic is generated by
# netsend 172.16.0.2 9001 8500 14300 3600
where 14300 is packets per second:

112 processes: 10 running, 82 sleeping, 20 waiting
CPU 0:  0.0% user,  0.0% nice, 27.1% system,  0.0% interrupt, 72.9% idle
CPU 1:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 2:  2.3% user,  0.0% nice, 97.7% system,  0.0% interrupt,  0.0% idle
CPU 3:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 4:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 5:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 6:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 7:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 26M Active, 378M Inact, 450M Wired, 132K Cache, 63M Buf, 15G Free
Swap: 8192M Total, 8192M Free

   PID USERNAME      PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
  7737 ayuzhaninov   119    0  5832K  1116K CPU2    2   0:04 100.00% netsend
     0 root          -68    0     0K   144K -       0   2:17 22.27% {em1 taskq}

top -zISHP for full interface load (some drops occurs), load is
generated by
# netsend 172.16.0.2 9001 8500 14400 3600
112 processes: 11 running, 81 sleeping, 20 waiting
CPU 0:  0.0% user,  0.0% nice,  100% system,  0.0% interrupt,  0.0% idle
CPU 1:  4.1% user,  0.0% nice, 95.9% system,  0.0% interrupt,  0.0% idle
CPU 2:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 3:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 4:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 5:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 6:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 7:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 26M Active, 378M Inact, 450M Wired, 132K Cache, 63M Buf, 15G Free
Swap: 8192M Total, 8192M Free

   PID USERNAME      PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
     0 root          -68    0     0K   144K CPU0    0   2:17 100.00% {em1 taskq}
  7759 ayuzhaninov   119    0  5832K  1116K CPU1    1   0:01 100.00% netsend

So pps increased from 14300 to 14400 (0.7%), but CPU load from "em1 taskq" thread
increased from 27.1% to 100.00%

This at least strange, but system still works fine until I run
sysctl dev.cpu.0.temperature

2. sysctl handler code for coretemp must be executed on target CPU,
e. g. for dev.cpu.0.temperature code executed on CPU0.

If CPU0 is fully loaded by "em1 taskq" sysctl handler for
dev.cpu.0.temperature acquires Giant mutex lock then tries to run code
on CPU0, but it can't - CPU0 is busy.

If Giant mutex hold for long time system is unresponsive. In my case
Giant mutex acquired when sysctl dev.cpu.0.temperature started and hold
all time while netsend is running.

This seems to be a scheduler problem:
1. Why "em1 taskq" runs only on CPU0 (there is no affinity for this tread)?

# procstat -k 0 | egrep '(PID|em1)'
   PID    TID COMM             TDNAME           KSTACK
     0 100038 kernel           em1 taskq        <running>
# cpuset -g -t 100038
tid 100038 mask: 0, 1, 2, 3, 4, 5, 6, 7

2. Why "em1 taskq" is not preempted to execute sysctl handler code? This
is not short term condition - is netsend running for a hour, "em1 taskq"
is not preempted for a hour - sysctl all this time in running state but
don't have a chance to be executed.

--
  Anton Yuzhaninov

P. S. I tried to use EM_MULTIQUEUE, but this is don't help in my case.