From owner-freebsd-hackers@FreeBSD.ORG Tue Jan 28 13:27:30 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 93FE9764 for ; Tue, 28 Jan 2014 13:27:30 +0000 (UTC) Received: from mail.tu-berlin.de (mail.tu-berlin.de [130.149.7.33]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 559161D56 for ; Tue, 28 Jan 2014 13:27:30 +0000 (UTC) X-tubIT-Incoming-IP: 130.149.91.212 Received: from kbs-212.kbs.tu-berlin.de ([130.149.91.212]) by mail.tu-berlin.de (exim-4.72/mailfrontend-8) with esmtpa for id 1W88Nl-00053F-kY; Tue, 28 Jan 2014 14:07:14 +0100 From: Jens Krieg Subject: ULE locking mechanism Message-Id: Date: Tue, 28 Jan 2014 14:07:08 +0100 To: freebsd-hackers@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 7.1 \(1827\)) X-Mailer: Apple Mail (2.1827) Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.17 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Jan 2014 13:27:30 -0000 Hello, we are currently working on project for our university. Our goal is to = implement a simple round robin scheduler for FreeBSD 9.2 on a single = core machine. So far we removed most of the functionality of the ULE scheduler except = the functions that are called from outside. The system successfully = boots to user land with our RR scheduler managing thread in a list based = run queue. Further, it is possible to interact with the system using the = shell. The next step is to replace the locking mechanism of the ULE scheduler. = Therefore, we replaced the scheduling dependent = thread_lock/thread_unlock functions by simply disabling/enabling the = interrupts. With this modification the kernel works fine until we hit = the user land then the system crashes. The error occurs in the init user process (init_main.c:start_init:685). = We found out that the page fault is triggered while executing the subyte = function for the first time. See the error description below = (unfortunately not shown in backtrace). We compared the ULE scheduler with our RR implementation and it appears, = that the parameters passed to subyte as well as the register values are = identical. We assume, that whatever caused the error is related to the = thread locking replacement. Every time the kernel want to modify thread data the corresponding = thread is locked to prevent any interference by other threads. Since we = are using a single core machine why isn=92t it sufficient to simply = disable interrupt while modifying thread data. Could you provide us with = detailed information about the locking mechanism in FreeBSD and also = answer the following questions, please. What is the purpose of thread_lock/thread_unlock besides protecting = thread data? How does the TDQ LOCK works and how is it related to a thread LOCK? - all thread LOCKs of the thread located in the run queue = pointing to the TDQ LOCK, and - the TDQ LOCK points to the currently running thread - on context switching the current thread passes the TDQ LOCK to = the new chosen thread - Could you explain the idea behind that locking concept, = please?=20 Any suggestions we shall care about in our own lock implementation? Kind regards, Jens Krieg start_init: trying /sbin/init Fatal trap 12: page fault while in kernel mode fault virtual address =3D 0x7fffffffefff fault code =3D supervisor write data, page not = present instruction pointer =3D 0x20:0xffffffff808ab119 stack pointer =3D 0x28:0xffffff800020db30 frame pointer =3D 0x28:0xffffff800020dbe0 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, long 1, def32 0, gran = 1 processor eflags =3D interrupt enabled, resume, IOPL =3D 0 current process =3D 1 (kernel) trap number =3D 12 panic: page fault KDB: stack backtrace: #0 0xffffffff806e19cf at kdb_backtrace+0x5f #1 0xffffffff806b2ddb at panic+0x15b #2 0xffffffff808ac797 at trap_fatal+0x267 #3 0xffffffff808accfc at trap_pfault+0x40c #4 0xffffffff808ad0ca at trap+0x37a #5 0xffffffff8089839f at calltrap+0x8 #6 0xffffffff80687c4d at fork_exit+0x9d #7 0xffffffff808988ce at fork_trampoline+0xe