From owner-freebsd-current@FreeBSD.ORG Mon Jan 26 10:59:59 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 10F9516A4CE; Mon, 26 Jan 2004 10:59:59 -0800 (PST) Received: from mail-in.m-online.net (mail-in.m-online.net [62.245.150.237]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2F66943D46; Mon, 26 Jan 2004 10:59:26 -0800 (PST) (envelope-from h@schmalzbauer.de) Received: from mail.m-online.net (svr14.m-online.net [192.168.3.144]) by svr8.m-online.net (Postfix) with ESMTP id 3923C14214; Mon, 26 Jan 2004 19:59:25 +0100 (CET) Received: from cale.flintsbach.schmalzbauer.de (ppp-62-245-160-246.mnet-online.de [62.245.160.246]) by mail.m-online.net (Postfix) with ESMTP id CF4073E755; Mon, 26 Jan 2004 19:59:24 +0100 (CET) From: Harald Schmalzbauer To: Robert Watson Date: Mon, 26 Jan 2004 19:59:18 +0100 User-Agent: KMail/1.5.4 References: In-Reply-To: X-Birthday: 06 Oktober 1972 X-Name: Harald Schmalzbauer X-Phone1: +49 (0) 163 555 3237 X-Phone2: +49 (0) 89 18947781 X-Address: Munich, 80686 X-Country: Germany MIME-Version: 1.0 Content-Type: multipart/signed; protocol="application/pgp-signature"; micalg=pgp-sha1; boundary="Boundary-02=_MOWFAUbske6f5O2"; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <200401261959.24303@harrymail> cc: current@freebsd.org Subject: Re: 5.2-rel NFS lockup and networking performance X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jan 2004 18:59:59 -0000 --Boundary-02=_MOWFAUbske6f5O2 Content-Type: text/plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Content-Description: signed data Content-Disposition: inline On Monday 26 January 2004 19:08, Robert Watson wrote: *SNIP* > > On the console, since it sounds like that's still running, it would be > very interesting to see the output of: > > while (1) > vmstat -i > sleep 1 > end > This gives tho following lines: (Note the very constant values, and they really were constant (vr was 72 for at least 50 loops) no matter if the network was hung or not) interrupt total rate irq0: clk 6144766 999 irq1: atkbd0 2 0 irq4: sio0 17539 2 irq7: ppc0 1 0 irq8: rtc 786435 127 irq10: vr0 447325 72 irq11: atapci1 754942 122 irq13: npx0 1 0 irq15: ata1 32 0 Total 8151043 1326 interrupt total rate irq0: clk 6145839 999 irq1: atkbd0 2 0 irq4: sio0 17577 2 irq7: ppc0 1 0 irq8: rtc 786572 127 irq10: vr0 447336 72 irq11: atapci1 755510 122 irq13: npx0 1 0 irq15: ata1 32 0 Total 8152870 1326 interrupt total rate irq0: clk 6146915 999 irq1: atkbd0 2 0 irq4: sio0 17615 2 irq7: ppc0 1 0 irq8: rtc 786710 127 irq10: vr0 447342 72 irq11: atapci1 756160 123 irq13: npx0 1 0 irq15: ata1 32 0 Total 8154778^C > And see what's going on with interrupts. Also, when it's "hung", could > you do a stack trace of any nfsd processes lying around? It's as though If i only knew how to do that. But like mentioned I still can't break to debugger. > we're hitting some edge case that causes the server to spin quite hard > doing some sort of work, it's just not clear what it is. Speaking of > spinning, general vmstat -w 1 would be interesting during the hang also to > see what's going on with CPU. Here are some lines from vmstat -w 1 (Note the lines with the zeros - That's when the ssh connection was locked!) 0 0 0 44260 104876 121 0 0 0 28 0 0 0 1435 0 649 1 2 97 0 0 0 44260 104876 0 0 0 0 0 0 0 0 1423 0 611 0 2 98 0 0 0 44260 104876 12 0 0 0 0 0 0 0 1606 0 1040 0 3 97 0 0 0 44260 199816 0 0 0 0 25020 0 0 0 7863 0 10519 0 59 41 0 0 0 44260 191416 12 0 0 0 8 0 0 0 12194 0 16161 0 75 25 0 0 0 44260 181388 0 0 0 0 1 0 0 0 14104 0 17630 0 66 34 1 0 0 42364 173152 0 0 0 0 93 0 0 0 12191 0 14839 0 58 42 0 0 0 42364 163248 94 0 0 0 106 0 0 0 13564 0 16778 1 70 29 0 0 0 42364 153232 0 0 0 0 0 0 0 0 13939 0 16858 0 71 29 0 0 0 42364 145568 94 0 0 0 114 0 0 0 11151 0 14119 3 54 43 procs memory page disks faults cpu r b w avm fre flt re pi po fr sr ad4 ad6 in sy cs us sy id 0 0 0 42364 136160 94 0 0 0 106 0 0 0 13192 0 16682 2 65 32 0 4 0 42364 129808 94 0 0 0 110 0 0 0 9715 0 11998 2 52 46 0 4 0 44260 129436 121 0 0 0 28 0 0 0 2655 0 3754 1 88 11 0 4 0 42364 129808 0 0 0 0 93 0 0 0 2646 0 3760 0 81 19 0 5 0 43696 129536 93 0 0 0 25 0 0 0 2620 0 3729 2 78 20 0 5 0 43696 129536 0 0 0 0 0 0 0 0 2628 0 3807 0 76 24 0 5 0 43696 129536 0 0 0 0 0 0 0 0 2640 0 3833 0 81 19 0 5 0 43696 129536 0 0 0 0 0 0 0 0 2645 0 3829 0 81 19 0 5 0 43696 129536 0 0 0 0 0 0 0 0 2647 0 3791 0 81 19 0 5 0 43696 129536 0 0 0 0 0 0 0 0 2627 0 3790 0 84 16 0 5 0 43696 129536 0 0 0 0 0 0 0 0 2633 0 3817 0 81 19 0 5 0 43696 129536 0 0 0 0 0 0 0 0 2657 0 3843 0 81 19 0 5 0 43696 129536 0 0 0 0 0 0 0 0 2637 0 3809 0 82 18 0 5 0 43696 129536 0 0 0 0 0 0 0 0 2630 0 3790 0 82 18 0 5 0 43696 129536 0 0 0 0 0 0 0 0 2644 0 3797 0 78 22 0 5 0 43696 129536 2 0 0 0 0 0 0 0 2631 0 3813 0 84 16 0 5 0 43696 129536 0 0 0 0 0 0 0 0 2636 0 3805 0 82 18 0 5 0 43696 129536 0 0 0 0 0 0 0 0 2623 0 3772 0 81 19 procs memory page disks faults cpu r b w avm fre flt re pi po fr sr ad4 ad6 in sy cs us sy id 0 5 0 43696 129536 0 0 0 0 0 0 0 0 2634 0 3816 0 82 18 0 5 0 43696 129536 0 0 0 0 0 0 0 0 2650 0 3836 0 84 16 0 5 0 43696 129536 0 0 0 0 0 0 0 0 2628 0 3794 0 84 16 > > Robert N M Watson FreeBSD Core Team, TrustedBSD Projects > robert@fledge.watson.org Senior Research Scientist, McAfee Research > > > Thanks, > > > > -Harry > > > > > > Breaking to debugger is working on the console. (Which crashes my > > > > /home each time) Is there a possibility to shutdown the machine > > > > "clean" after the ddb? Like I mentioned before, this is my production > > > > Fileserver :( > > > > > > Normally, assuming your machine isn't already hung, you can type in > > > "cont" to continue, which should allow the system to continue normally. > > > If the system was hung/generally broken when you entered DDB, you can > > > try "call boot(0)" to see if you can get it to cleanly sync to disk, > > > but whether that succeeds depends a lot on how hung/broken the kernel > > > was already. > > > > > > Robert N M Watson FreeBSD Core Team, TrustedBSD Projects > > > robert@fledge.watson.org Senior Research Scientist, McAfee > > > Research --Boundary-02=_MOWFAUbske6f5O2 Content-Type: application/pgp-signature Content-Description: signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (FreeBSD) iD8DBQBAFWOMBylq0S4AzzwRAqwTAKCSQ9JE71aRfPjzzLS3suQBKdrVhACfWCdr H/Ab02qO7VuP9bqlo3E4FeQ= =1IJN -----END PGP SIGNATURE----- --Boundary-02=_MOWFAUbske6f5O2--