From owner-freebsd-current@FreeBSD.ORG Tue Feb 17 16:06:16 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 40B7216A4CE for ; Tue, 17 Feb 2004 16:06:16 -0800 (PST) Received: from ozlabs.org (ozlabs.org [203.10.76.45]) by mx1.FreeBSD.org (Postfix) with ESMTP id E52A843D1D for ; Tue, 17 Feb 2004 16:06:15 -0800 (PST) (envelope-from grog@lemis.com) Received: from blackwater.lemis.com (blackwater.lemis.com [192.109.197.80]) by ozlabs.org (Postfix) with ESMTP id 8079E2BD45 for ; Wed, 18 Feb 2004 11:06:13 +1100 (EST) Received: by blackwater.lemis.com (Postfix, from userid 1004) id 63FFC5120F; Wed, 18 Feb 2004 10:36:11 +1030 (CST) Date: Wed, 18 Feb 2004 10:36:11 +1030 From: Greg 'groggy' Lehey To: FreeBSD current users Message-ID: <20040218000611.GC64477@wantadilla.lemis.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="jousvV0MzM2p6OtC" Content-Disposition: inline User-Agent: Mutt/1.4.1i Organization: The FreeBSD Project Phone: +61-8-8388-8286 Fax: +61-8-8388-8725 Mobile: +61-418-838-708 WWW-Home-Page: http://www.FreeBSD.org/ X-PGP-Fingerprint: 9A1B 8202 BCCE B846 F92F 09AC 22E6 F290 507A 4223 cc: Andrew Rutherford Subject: Soft updates problems with 5.2.1-RC1 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Feb 2004 00:06:16 -0000 --jousvV0MzM2p6OtC Content-Type: text/plain; charset=us-ascii Content-Disposition: inline A couple of days ago, a large company here in Adelaide installed 5.2.1-RC1 on their main Internet gateway :-( They're having problems, which I'm looking at in conjunction with Andrew Rutherford (copied). Looking for the problems is complicated by the fact that the machine has to be kept running. I'm posting what we've seen so far, in case this rings bells with anybody else. The main symptom is that round midday, when they get some large mail messages in and the system load rises, natd stops working. They can sometimes get it restarted by reloading the firewall rules, but sometimes they have to restart the processes. This causes massive loss of connections, of course. At the same time they were getting console messages saying "backtrace". The backtrace itself was going into the KVM, of course, in an unattended server room. We sent somebody in yesterday lunchtime, and he reported: backtrace(c08e24f8,2,cb89fd08,0,22) at backtrace+0x17 getdirtybuf(d48c2bbc,0,1,cb89fd08,1) at getdirtybuf+0x30 flush_inodedep_deps(c6926000,13c78d,d48c2c10,c062f993,d48c2c40) at flush_inodedep_deps+0xa3 softdep_sync_metadata(d48c2a8,0,c086875d,124,0) at softdep_sync_metadata+0x87 ffs_fsync(d48c2ca8,0,c085aa45,beb,0) at ffs_fsync+0x3b9 fsync(c74f9c80,d48cd14,c086f793,3ee,1) at fsync+0x1d syscall(2f,2f,2f,80b59e1)at syscall+0x2a0 Xint0x80_syscall()at Xint0x80_syscall+0x1d ...syscall(95) eip=0x282909af,esp=0xbfbfb2cc,ebp=0xbfbfbba8 This is handwritten, but it points about as far from ipfw and natd as you could imagine. Looking at the code, it's really trying to tell us that one of the buffer headers in the bpp parameter to getdirtybuf() has a null vp: /* * XXX This code and the code that calls it need to be reviewed to * verify its use of the vnode interlock. */ for (;;) { if ((bp = *bpp) == NULL) return (0); if (bp->b_vp == NULL) backtrace(); Given the comment, it looks like the vnode interlock is currently not being used correctly. Based on the fact that this happens when big mail messages are being received, we've guessed that the file system in question is /var, and we've turned off soft updates there. We're both out of town effectively for the rest of the week, and we'll continue looking after that, but if anybody has any thoughts, we'd be grateful. Greg -- See complete headers for address and phone numbers. --jousvV0MzM2p6OtC Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.0 (FreeBSD) iD8DBQFAMqxzIubykFB6QiMRAoZuAKCbP2DffL4gjshH6d7EyVqlWTNO/wCgicui 2aPQpYYP6Rnn2A/iXq73Xgo= =QCIZ -----END PGP SIGNATURE----- --jousvV0MzM2p6OtC--