From owner-freebsd-threads@FreeBSD.ORG Wed Aug 6 11:55:32 2003 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7BCA037B401 for ; Wed, 6 Aug 2003 11:55:32 -0700 (PDT) Received: from ns1.xcllnt.net (209-128-86-226.BAYAREA.NET [209.128.86.226]) by mx1.FreeBSD.org (Postfix) with ESMTP id 94AEF43FCB for ; Wed, 6 Aug 2003 11:55:31 -0700 (PDT) (envelope-from marcel@xcllnt.net) Received: from athlon.pn.xcllnt.net (athlon.pn.xcllnt.net [192.168.4.3]) by ns1.xcllnt.net (8.12.9/8.12.9) with ESMTP id h76ItVwO039333 for ; Wed, 6 Aug 2003 11:55:31 -0700 (PDT) (envelope-from marcel@piii.pn.xcllnt.net) Received: from athlon.pn.xcllnt.net (localhost [127.0.0.1]) by athlon.pn.xcllnt.net (8.12.9/8.12.9) with ESMTP id h76ItUG0000975 for ; Wed, 6 Aug 2003 11:55:30 -0700 (PDT) (envelope-from marcel@athlon.pn.xcllnt.net) Received: (from marcel@localhost) by athlon.pn.xcllnt.net (8.12.9/8.12.9/Submit) id h76ItUdn000974 for threads@FreeBSD.org; Wed, 6 Aug 2003 11:55:30 -0700 (PDT) (envelope-from marcel) Date: Wed, 6 Aug 2003 11:55:30 -0700 From: Marcel Moolenaar To: threads@FreeBSD.org Message-ID: <20030806185530.GA893@athlon.pn.xcllnt.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.4i Subject: KSE/ia64: a quick update X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Aug 2003 18:55:32 -0000 Gang, Given the panics that Daniel is having on pluto1 it's probably a good idea to fill people in on the status of KSE/ia64: The groundwork is finished. I practical terms this means that I/O bound threads work to its fullest extend. There's just one tiny little annoying and complicated thing: thread_userret(), and consequently thread_export_context() can be called for interrupts, traps and faults as well. Since syscalls are not implemented as traps, we have two distinct paths into (and out from) the kernel. One (the syscall) is synchronous WRT to program execution and the other (interrupts) is asynchronous. Synchronous contexts don't have scratch registers in them. Asynchronous context need to have them. This is not the hard problem: just add some flags to indicate what parts of the context are valid and thus should be restored and we're ok. The problem is when we preempt an interrupted thread, export the context to the UTS and do an upcall. We end up having an async. context in userland. I'm not sure at this time what we should do with it. We have the following options: o Extend _ia64_restore_context() so that libkse can restore async contexts. The downside is that it will very likely cause a disabled high FP trap, which results in the process having the high FP registers enabled. A performance hit. (see also below) o Have _ia64_restore_context() call setcontext() for async contexts and do the work in the kernel. Restoring the high FP will not result in the enablement of the high FP registers, because we can restore them to the PCB. They will be loaded into the CPU when there's a need for them (which may be never). Both cases have the problem that we're using a synchronous method (the call/ret sequence) to restore an async context. I'm not sure how ugly it gets to change the return path and mimic an interrupt return. In short: The KSE framework works, as long as we don't preempt threads. I'm not sure how to solve that exactly... About the high FP: On ia64 the FP registers are split in two (2) sets: low and high. Both sets can be enabled and disabled independently from each other and each set has a modified bit to keep track of usage. The low FP registers are f0-f31 and are always enabled. The high FP registers are f32-f127 and are disabled by default. We use lazy context switching to save and restore these on a need to have basis. When a process uses a high FP register and the set is disabled, we take a trap, save the high FP registers currently on the CPU and load the high FP registers of the process that trapped. We then enable the high FP registers (for that process) and let it continue. As long as there's only 1 process using the high FP registers, there's no performance penalty when they are enabled. Note that compilers will avoid using the high FP registers as much as possible. AFAICT, none of the code in our source tree uses the high FP registers. Hence: they should only be enabled when the process is highly FP intensive. FYI, -- Marcel Moolenaar USPA: A-39004 marcel@xcllnt.net