From owner-freebsd-arch@FreeBSD.ORG  Fri Jul  1 15:00:01 2005
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: arch@freebsd.org
Delivered-To: freebsd-arch@FreeBSD.ORG
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 3E80916A41C;
	Fri,  1 Jul 2005 15:00:01 +0000 (GMT)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [204.156.12.53])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 08DEE43D49;
	Fri,  1 Jul 2005 15:00:01 +0000 (GMT)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by cyrus.watson.org (Postfix) with ESMTP id 1316F46B46;
	Fri,  1 Jul 2005 11:00:00 -0400 (EDT)
Date: Fri, 1 Jul 2005 16:04:23 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Peter Edwards <peadar@FreeBSD.org>
In-Reply-To: <20050701132104.GA95135@freefall.freebsd.org>
Message-ID: <20050701155757.A36905@fledge.watson.org>
References: <20050701132104.GA95135@freefall.freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: arch@freebsd.org
Subject: Re: ktrace and KTR_DROP
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Jul 2005 15:00:01 -0000


On Fri, 1 Jul 2005, Peter Edwards wrote:

> Ever since the introduction of a separate ktrace worker thread for 
> writing output, there's the distinct possibility that ktrace output will 
> drop requests. For some proceses, it's actually inevitable: as long as 
> the traced processes can sustain a rate of generating ktrace events 
> faster than the ktrace thread can write them, you'll eventually run out 
> of ktrace requests.
>
> I'd like to propose that rather than just dropping the request on the 
> floor, we at least configurably allow ktraced threads to block until 
> there are resources available to satisfy their requests.

There are two benefits to the current ktrace dispatch model:

(1) Avoiding untimely sleeping in the execution paths of threads that are
     being traced.

(2) Allowing the traced thread to run ahead asynchronously, hopefully
     impacting performance less.

One of the things I've been thinking for a few years is that I think I 
actually preferred the old model better, there processes (now threads) 
would hang a "current record" off of their process (now thread) structure, 
and fill it in as they went along.  The upsides of this have to do with 
the downsides of the current model: that you don't allow fully 
asynchronous execition of the threads with respect to queueing the records 
to disk, so you don't run into "drop" scenarios, instead slowing down the 
process.  Likewise, the downsides.

In the audit code, we pull from a common record queue, but we allocate the 
record when the system call starts for each process -- if there aren't 
records available (or various other reliability-related conditions fail, 
such as adequate disk space), we stall the thread entering the kernel 
until we can satisfy its record allocation requirements.

There are two cases where I really run into problems with the current 
model:

(1) When I'm interacting with a slow file system, such as NFS over
     100mbps, I will always lose records, because it doesn't take long for
     the process to get quite ahead of the write-behind.

(2) When I trace more than one process at a time, the volume of records
     overwhelms the write-behind.

Write coalescing/etc is already provided "for free" by pushing the writes 
down into the file system, so other than slowing down the traced process a 
little, I think we don't lose much by moving back to this model.  And if 
we pre-commit the record storage on system call entry (with the exception 
of paths, which generally require potential sleeps anyway), we probably 
won't hurt performance all that much, and avoid sleeping in bad places.

Robert N M Watson