From owner-freebsd-net@FreeBSD.ORG  Thu Jan 12 09:31:19 2012
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EB672106566C;
	Thu, 12 Jan 2012 09:31:19 +0000 (UTC) (envelope-from lev@FreeBSD.org)
Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru
	[IPv6:2a01:4f8:131:60a2::2])
	by mx1.freebsd.org (Postfix) with ESMTP id ACCD38FC08;
	Thu, 12 Jan 2012 09:31:19 +0000 (UTC)
Received: from lion.home.serebryakov.spb.ru (unknown
	[IPv6:2001:470:923f:1:1d3e:4d27:b4ee:e1e2])
	(Authenticated sender: lev@serebryakov.spb.ru)
	by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id 5BCF44AC2D; 
	Thu, 12 Jan 2012 13:31:18 +0400 (MSK)
Date: Thu, 12 Jan 2012 13:31:12 +0400
From: Lev Serebryakov <lev@FreeBSD.org>
Organization: FreeBSD
X-Priority: 3 (Normal)
Message-ID: <1379921442.20120112133112@serebryakov.spb.ru>
To: freebsd-current@freebsd.org, freebsd-net@freebsd.org
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1251
Content-Transfer-Encoding: quoted-printable
Cc: avg@FreeBSD.org, jhb@FreeBSD.org
Subject: SCHED_ULE / NetGraph interaction broken somwhere between r227874
	and r229818
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: lev@FreeBSD.org
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 Jan 2012 09:31:20 -0000

Hello, Freebsd-current.

  I have router, which connects to upstream ISP with mpd5 from ports
 using PPPoE.

  I've used SCHED_ULE for long time without nay problems. Under heavy
 network load (router is not the fastest one -- 500Mhz Geode CPU) main
 consumer of CPU was "intr{swi1: netisr 0}" thread. But it never
 consumes  more than 75% and even when upstream channel was
 competently saturated router was accessible and responsive.

  Latest "good" I'm sure about revision is about r227874 (yes, from
  November 2011, I didn't update router's system for long time).

  But revision r229818 behaves completely different: under network
 load 100% CPU is consumed by "ng_queue" thread (which is never ever
 consume any CPU on old system). System is unresponsive, DNS based on
 this system returns timeouts, I could not log-in via SSH or seral
 console (pause between login and passwd is so huge, that it leads to
 timeouts), etc. LA jumps up to 20+, pre-started `top' updates screen
 one time per 3-4 minutes, etc.

  Switching to 4BSD helps. 4BSD works as usual: all CPU time is
 interrupts and network thread, system is responsive under heaviest load,
 normal operations of DNS, DHCP and hostapd.

  There was NO significant changes in netgraph (svn log -r
 227874:229818 sys/netgraph) and three changes (r229429, r228960,
 r228718) in kern/sched_*.c files. But I'm not sure, that these
 changes are only which could affect this behavior.

  Now I'm trying to find "bad" revision by binary search, but it is
 very hard to do: old mpd5 doesn't work on new kernel and vice versa,
 so I need to rebuild whole world, update my build-box, rebuild ports
 with new world, and only after that build NanoBSD image for my
 router. It takes about 5 hours per iteration and here is more than
 512 revisions, so it is about 10 iterations :(

  I could provide any debug information from old and new systems.

--=20
// Black Lion AKA Lev Serebryakov <lev@FreeBSD.org>