From owner-freebsd-threads@FreeBSD.ORG Sun Aug 15 21:21:23 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D27B016A4CE; Sun, 15 Aug 2004 21:21:23 +0000 (GMT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8849C43D2F; Sun, 15 Aug 2004 21:21:23 +0000 (GMT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.12.11/8.12.11) with ESMTP id i7FLJZCJ065508; Sun, 15 Aug 2004 17:19:35 -0400 (EDT) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)i7FLJZSD065505; Sun, 15 Aug 2004 17:19:35 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Sun, 15 Aug 2004 17:19:35 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: John Polstra In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: threads@freebsd.org cc: mbr@freebsd.org Subject: RE: thread-unsafe syslog code in libc? X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 15 Aug 2004 21:21:24 -0000 On Sun, 15 Aug 2004, John Polstra wrote: > On 15-Aug-2004 Robert Watson wrote: > > On Sun, 15 Aug 2004, John Polstra wrote: > > > >> The above is only to handle an unusual error case. > >> > >> There is some thread-unsafeness here, but it doesn't look like it would > >> matter under normal conditions. > > > > So maybe we're dealing with a user space race where multiple threads > > attempt to do a first syslog in parallel? > > Probably not that. You said it was a simultaneous connect() and > close(), right? The close is only done in disconnectlog() and > closelog(). The former is only called in unusual error cases, and the > latter is called by applications. So I guess one culprit could be a > first syslog call in one thread and a closelog call in another thread. > > Or, maybe the system ran out of mbufs and the send() did fail, causing > disconnectlog to be used and exercising the race. An out of mbufs > condition might also contribute to the kernel panic you mentioned. The race in question was one where we failed to protect against namei() in connect() possibly sleeping during a lookup and a close() on the file descriptor during that period disconnecting the PCB from the socket. When connect() woke up again, it would try to dereference the PCB and cause a page fault. The problem is a larger issue concerning how we want to handle file descriptors, etc, but it was triggered by odd use of a file descriptor in user space that is also suggestive of a user space race. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Principal Research Scientist, McAfee Research