From owner-freebsd-hackers@FreeBSD.ORG Tue Feb 11 19:49:47 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EA48DFF for ; Tue, 11 Feb 2014 19:49:46 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id A944E19A3 for ; Tue, 11 Feb 2014 19:49:46 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 9E9C8B9B9; Tue, 11 Feb 2014 14:49:45 -0500 (EST) From: John Baldwin To: freebsd-hackers@freebsd.org Subject: Re: ULE locking mechanism Date: Tue, 11 Feb 2014 13:25:24 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20130906; KDE/4.5.5; amd64; ; ) References: In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="windows-1252" Content-Transfer-Encoding: quoted-printable Message-Id: <201402111325.24523.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 11 Feb 2014 14:49:45 -0500 (EST) Cc: Jens Krieg X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Feb 2014 19:49:47 -0000 On Tuesday, January 28, 2014 8:07:08 am Jens Krieg wrote: > Hello, >=20 > we are currently working on project for our university. Our goal is to=20 implement a simple round robin scheduler for FreeBSD 9.2 on a single core=20 machine. > So far we removed most of the functionality of the ULE scheduler except t= he=20 functions that are called from outside. The system successfully boots to us= er=20 land with our RR scheduler managing thread in a list based run queue. Furth= er,=20 it is possible to interact with the system using the shell. >=20 > The next step is to replace the locking mechanism of the ULE scheduler.=20 Therefore, we replaced the scheduling dependent thread_lock/thread_unlock=20 functions by simply disabling/enabling the interrupts. With this modificati= on=20 the kernel works fine until we hit the user land then the system crashes. > The error occurs in the init user process (init_main.c:start_init:685). W= e=20 found out that the page fault is triggered while executing the subyte funct= ion=20 for the first time. See the error description below (unfortunately not show= n=20 in backtrace). > We compared the ULE scheduler with our RR implementation and it appears,= =20 that the parameters passed to subyte as well as the register values are=20 identical. We assume, that whatever caused the error is related to the thre= ad=20 locking replacement. >=20 > Every time the kernel want to modify thread data the corresponding thread= is=20 locked to prevent any interference by other threads. Since we are using a=20 single core machine why isn=92t it sufficient to simply disable interrupt w= hile=20 modifying thread data. Could you provide us with detailed information about= =20 the locking mechanism in FreeBSD and also answer the following questions,=20 please. >=20 > What is the purpose of thread_lock/thread_unlock besides protecting threa= d=20 data? > How does the TDQ LOCK works and how is it related to a thread LOCK? > - all thread LOCKs of the thread located in the run queue pointing to th= e=20 TDQ LOCK, and > - the TDQ LOCK points to the currently running thread > - on context switching the current thread passes the TDQ LOCK to the new= =20 chosen thread > - Could you explain the idea behind that locking concept, please?=20 > Any suggestions we shall care about in our own lock implementation? thread_lock is quite intertwined with other locks. E.g. when a thread is blocked on a turnstile, thread_lock() for that thread locks the 'ts_lock' spin mutex for that turnstile. If you want to replace thread lock, you need to change all the locks that td_lock can be to use your new primitive. You= 'd probably have an easier time just changing how mtx_lock_spin() works. (In= =20 fact, if you just disable 'options SMP', the stock kernel turns=20 mtx_lock_spin() into a function that just disables interrupts.) =46or your core dump, the first step would be to use gdb to map that addres= s to=20 a file line. For example, you can just do 'l *fork_exit+0x9d', or you can = do 'l *' where you use the value from the trap message. Looking at that can probably tell you why you panic'd. =2D-=20 John Baldwin