From owner-freebsd-stable@FreeBSD.ORG Tue Jun 16 15:03:36 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 77E70106571C; Tue, 16 Jun 2009 15:03:36 +0000 (UTC) (envelope-from pluknet@gmail.com) Received: from mail-bw0-f227.google.com (mail-bw0-f227.google.com [209.85.218.227]) by mx1.freebsd.org (Postfix) with ESMTP id BC2718FC25; Tue, 16 Jun 2009 15:03:35 +0000 (UTC) (envelope-from pluknet@gmail.com) Received: by bwz27 with SMTP id 27so399420bwz.43 for ; Tue, 16 Jun 2009 08:03:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=wIURRbQE+o0NY5aVQ2N+fUcO3e2RWpjkVQOly7Qh8jA=; b=ac9XlgIF3aGmE0/zS1bd2kDGKDwdMEBD0/NXjSf6w4IzPQUCuUUprcggzEqtRUknjV w6h3+XcDx2jV67PyVcNq1zQy1XAZM0l4kY4WLxsWRjQpYP2Tl+cYVCNqWXadgtiQfPX6 IFSFarf0Elf7+0CJHgkFK2XbAUM7TrB/VOZCo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=oqRwru9kXCVncXsECweLWTeRPP8Mt9jAI6iNIElH7Lqy4VIBpBDxStWJkPv0z+l4jO rkHfIOO6BigGfKGCW93LOFQQ14+rmr98YEkSfgYx/rn7f1rgNAkSnFlKaFhUqm1wN3Nn MwoMEP6Q/y1qPlLnlf2/bqLNIKKlyWVcCOqDo= MIME-Version: 1.0 Received: by 10.103.238.19 with SMTP id p19mr1455006mur.124.1245164614208; Tue, 16 Jun 2009 08:03:34 -0700 (PDT) In-Reply-To: <200906160830.29721.jhb@freebsd.org> References: <200906160830.29721.jhb@freebsd.org> Date: Tue, 16 Jun 2009 19:03:34 +0400 Message-ID: From: pluknet To: John Baldwin Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-stable@freebsd.org Subject: Re: 6.2 sporadically locks up X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Jun 2009 15:03:37 -0000 2009/6/16 John Baldwin : > On Tuesday 16 June 2009 6:23:47 am pluknet wrote: >> Hi all. >> >> This is one of livelocks we have on a weekly basis. >> Yes, we do still use ULE scheduler on 6.2 and not moved to 7 yet. >> Any thought? >> >> db> ps >> =A0pid =A0ppid =A0pgrp =A0 uid =A0 state =A0 wmesg =A0 =A0 wchan =A0 =A0= cmd >> 70304 69700 69670 =A0 =A0 0 =A0R =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 sh >> 70303 70292 93818 =A03572 =A0RL =A0 =A0 =A0CPU 2 =A0 =A0 =A0 =A0 =A0 =A0= =A0 chrsh >> 70302 70294 93818 =A03572 =A0R =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 crond >> 70299 93818 93818 =A0 =A0 0 =A0R =A0 =A0 =A0 CPU 1 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 crond >> 70298 93818 93818 =A0 =A0 0 =A0R =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 crond >> 70294 93818 93818 =A03572 =A0S =A0 =A0 =A0 piperd =A0 0xd1d8d330 crond >> 70292 93818 93818 =A03572 =A0R =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 crond >> 70284 70279 70040 10229 =A0S =A0 =A0 =A0 biord =A0 =A00xdbe2e4e8 perl5.8= .8 >> 70283 70278 93818 10229 =A0SL =A0 =A0 =A0biord =A0 =A00xdbd70710 exim-4.= 63-0 >> 70279 70040 70040 10229 =A0S =A0 =A0 =A0 wait =A0 =A0 0xc9005860 sh >> 70278 69996 93818 10229 =A0S =A0 =A0 =A0 wait =A0 =A0 0xcaf4ac90 sh >> 70191 =A04680 =A04680 =A09738 =A0S =A0 =A0 =A0 select =A0 0xc0a12944 htt= pd >> 70190 =A04796 =A04796 10008 =A0R =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 httpd >> 70188 =A05043 =A05043 30532 =A0RL =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0httpd >> 70043 69999 70043 =A03572 =A0Ss =A0 =A0 =A0select =A0 0xc0a12944 wget >> 70042 70000 70042 =A03572 =A0Ss =A0 =A0 =A0select =A0 0xc0a12944 wget >> 70041 70001 70041 =A03572 =A0Ss =A0 =A0 =A0select =A0 0xc0a12944 wget >> 70040 69996 70040 10229 =A0Ss =A0 =A0 =A0piperd =A0 0xca35e990 perl5.8.8 >> 70039 70002 70039 =A03572 =A0Ss =A0 =A0 =A0select =A0 0xc0a12944 wget > > This is not a full listing so one cannot assume it is a deadlock. Ok, usually that listing doesn't show anything interesting in this sort of lockup. I'll share a full ps output next time (sure, rather soon). > >> db> show lockchain Giant >> thread -3420549 (pid 434, ) ??? (0xc099cb0c) > > You would use 'show lock' or perhaps 'show turnstile' with specific lock > variables. =A0'show lockchain' needs a TID or PID. Ok. As for turnstile, it showed nothing at all, hence omitted. > >> db> show allpcpu >> cpuid =A0 =A0 =A0 =A0=3D 0 >> curthread =A0 =A0=3D 0xc7cfec80: pid 18 "swi4: clock sio" >> >> cpuid =A0 =A0 =A0 =A0=3D 1 >> curthread =A0 =A0=3D 0xc99f9960: pid 70299 "crond" >> >> cpuid =A0 =A0 =A0 =A0=3D 2 >> curthread =A0 =A0=3D 0xc99f9af0: pid 70303 "chrsh" >> >> cpuid =A0 =A0 =A0 =A0=3D 3 >> curthread =A0 =A0=3D 0xd087d320: pid 69700 "sh" >> >> cpuid =A0 =A0 =A0 =A0=3D 4 >> curthread =A0 =A0=3D 0xc98f84b0: pid 69604 "httpd" >> >> cpuid =A0 =A0 =A0 =A0=3D 5 >> curthread =A0 =A0=3D 0xcaebe190: pid 69598 "httpd" >> >> cpuid =A0 =A0 =A0 =A0=3D 6 >> curthread =A0 =A0=3D 0xc7cfe960: pid 27 "irq17: bce1 aacu0" >> >> cpuid =A0 =A0 =A0 =A0=3D 7 >> curthread =A0 =A0=3D 0xc837fe10: pid 69711 "arcconf" > > This is far more useful output than the truncated 'ps'. =A0From this, all= of the > CPUs are busy (in at least some deadlocks, all the CPUs would be idle > instead). =A0There are several deadlocks fixed since 6.2 that I am aware = of, > but this doesn't look like any of those. =A0I'm not sure why you aren't g= etting > useful stack traces of running threads. I'll do next time. I thought it would be similar to bt PID output and simply didn't include. As for allpcpu, I often see the picture, when one CPU runs the "irq17: bce1 aacu0" thread and another one runs arcconf. I wonder if that might be a source of bad locking or races, or.. The arcconf utility uses ioctl that goes into aac/aacu(4) internals. > Perhaps DDB in 6.2 doesn't know to > look in stoppcbs[]. =A0Hmm, looks like 6.2 only does that if you are usin= g > KDB_STOP_NMI. =A0Are you using that kernel option? =A0If not, you probabl= y want > to. No, I'm not. Will that add an additional visible overhead on a running syst= em? > > -- > John Baldwin > Thank you. --=20 wbr, pluknet