From owner-freebsd-hackers@FreeBSD.ORG Mon Feb 7 23:49:36 2011 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CCBEA106564A for ; Mon, 7 Feb 2011 23:49:36 +0000 (UTC) (envelope-from dieterbsd@engineer.com) Received: from imr-ma03.mx.aol.com (imr-ma03.mx.aol.com [64.12.206.41]) by mx1.freebsd.org (Postfix) with ESMTP id 8A6818FC0A for ; Mon, 7 Feb 2011 23:49:36 +0000 (UTC) Received: from imo-da02.mx.aol.com (imo-da02.mx.aol.com [205.188.169.200]) by imr-ma03.mx.aol.com (8.14.1/8.14.1) with ESMTP id p17NnPwt013533; Mon, 7 Feb 2011 18:49:25 -0500 Received: from dieterbsd@engineer.com by imo-da02.mx.aol.com (mail_out_v42.9.) id n.f85.aa90127 (37516); Mon, 7 Feb 2011 18:49:17 -0500 (EST) Received: from smtprly-da01.mx.aol.com (smtprly-da01.mx.aol.com [205.188.249.144]) by cia-ma08.mx.aol.com (v129.8) with ESMTP id MAILCIAMA081-5baf4d5084f888; Mon, 07 Feb 2011 18:49:17 -0500 Received: from web-mmc-m02 (web-mmc-m02.sim.aol.com [64.12.224.135]) by smtprly-da01.mx.aol.com (v129.8) with ESMTP id MAILSMTPRLYDA016-5baf4d5084f888; Mon, 07 Feb 2011 18:49:12 -0500 To: freebsd-hackers@freebsd.org Content-Transfer-Encoding: quoted-printable Date: Mon, 07 Feb 2011 18:49:12 -0500 X-AOL-IP: 67.206.164.141 X-MB-Message-Source: WebUI Received: from 67.206.164.141 by web-mmc-m02.sysops.aol.com (64.12.224.135) with HTTP (WebMailUI); Mon, 07 Feb 2011 18:49:12 -0500 MIME-Version: 1.0 From: dieterbsd@engineer.com X-MB-Message-Type: User Content-Type: text/plain; charset="us-ascii"; format=flowed X-Mailer: Mail.com Webmail 33189-STANDARD Message-Id: <8CD9538C148865F-1B10-A4C8@web-mmc-m02.sysops.aol.com> X-Spam-Flag: NO X-AOL-SENDER: dieterbsd@engineer.com Cc: Subject: Re: Why does printf(9) hang network? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Feb 2011 23:49:37 -0000 Robert writes: >> Why would doing a printf(9) in a device driver (usb, firewire,=20 probably >> others) cause an obscenely long lockout on >> /usr/src/sys/kern/uipc_sockbuf.c:148 (sx:so_rcv_sx) ? >> >> Printf(9) alone isn't the problem, adding printfs to chown(2) does=20 not >> cause the problem, but printfs from device drivers do. >> >> Grep says that uipc_sockbuf.c is the only file that locks/unlocks=20 sb_sx. >> The device drivers and printf don't even know that sb_sx exists. > > I can't speak to the details of your situation, but one possible > explanation might be: printf runs at the speed of the console, which=20 for > serious consoles can be extremely slowly. But shouldn't the RS-232 driver just fill up the UART's FIFO and then=20 sleep? Why can't the network code run while the RS-232 driver sleeps? Actually this must be happening in the call-printf-from-chown case. There must=20 be something different when printf is called from a device driver. > Device driver interrupt threads can preempt other threads, > possibly while those threads hold locks. That causes them to hold > the locks for much longer, as the threads may not get rescheduled for=20 some > period (for example, until the device driver is done doing a printf), If the CPU is mostly idle, shouldn't the network thread get scheduled=20 right away? > leading > other threads waiting for that lock to wait significantly longer. > Especially the case if the other thread was spinning adaptively, in=20 which > case it will then yield since the holder of the lock effectively=20 yielded. My head is spinning attempting to understand this... > You might try forcing all the various threads to run on different CPUs > using cpuset and see if the variance goes down. Uniprocessor > You can also use KTR + schedgraph > to explore the specific scheduling going on, although be aware that=20 KTR > can also noticeably perturb schediling itself. Scheduling? The CPU can be mostly idle and the problem still happens. > In general, things shouldn't call kernel printf in steady state=20 operation; > if they need to log something, they should use log(9) or similar. =20 printf > is primarily a tool for printing out device probe information, and for > debugging purposes: it is not intended to be fast. Sounds fine to me. Is there a consensus on this? If so, does this=20 need to go into some developer's handbook? How do we get developers to fix the existing code?