Date: Tue, 11 Feb 2014 13:25:24 -0500 From: John Baldwin <jhb@freebsd.org> To: freebsd-hackers@freebsd.org Cc: Jens Krieg <jkrieg@mailbox.tu-berlin.de> Subject: Re: ULE locking mechanism Message-ID: <201402111325.24523.jhb@freebsd.org> In-Reply-To: <FD4193F4-FA47-4D77-BC1F-23749D9B7E5F@mailbox.tu-berlin.de> References: <FD4193F4-FA47-4D77-BC1F-23749D9B7E5F@mailbox.tu-berlin.de>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tuesday, January 28, 2014 8:07:08 am Jens Krieg wrote: > Hello, >=20 > we are currently working on project for our university. Our goal is to=20 implement a simple round robin scheduler for FreeBSD 9.2 on a single core=20 machine. > So far we removed most of the functionality of the ULE scheduler except t= he=20 functions that are called from outside. The system successfully boots to us= er=20 land with our RR scheduler managing thread in a list based run queue. Furth= er,=20 it is possible to interact with the system using the shell. >=20 > The next step is to replace the locking mechanism of the ULE scheduler.=20 Therefore, we replaced the scheduling dependent thread_lock/thread_unlock=20 functions by simply disabling/enabling the interrupts. With this modificati= on=20 the kernel works fine until we hit the user land then the system crashes. > The error occurs in the init user process (init_main.c:start_init:685). W= e=20 found out that the page fault is triggered while executing the subyte funct= ion=20 for the first time. See the error description below (unfortunately not show= n=20 in backtrace). > We compared the ULE scheduler with our RR implementation and it appears,= =20 that the parameters passed to subyte as well as the register values are=20 identical. We assume, that whatever caused the error is related to the thre= ad=20 locking replacement. >=20 > Every time the kernel want to modify thread data the corresponding thread= is=20 locked to prevent any interference by other threads. Since we are using a=20 single core machine why isn=92t it sufficient to simply disable interrupt w= hile=20 modifying thread data. Could you provide us with detailed information about= =20 the locking mechanism in FreeBSD and also answer the following questions,=20 please. >=20 > What is the purpose of thread_lock/thread_unlock besides protecting threa= d=20 data? > How does the TDQ LOCK works and how is it related to a thread LOCK? > - all thread LOCKs of the thread located in the run queue pointing to th= e=20 TDQ LOCK, and > - the TDQ LOCK points to the currently running thread > - on context switching the current thread passes the TDQ LOCK to the new= =20 chosen thread > - Could you explain the idea behind that locking concept, please?=20 > Any suggestions we shall care about in our own lock implementation? thread_lock is quite intertwined with other locks. E.g. when a thread is blocked on a turnstile, thread_lock() for that thread locks the 'ts_lock' spin mutex for that turnstile. If you want to replace thread lock, you need to change all the locks that td_lock can be to use your new primitive. You= 'd probably have an easier time just changing how mtx_lock_spin() works. (In= =20 fact, if you just disable 'options SMP', the stock kernel turns=20 mtx_lock_spin() into a function that just disables interrupts.) =46or your core dump, the first step would be to use gdb to map that addres= s to=20 a file line. For example, you can just do 'l *fork_exit+0x9d', or you can = do 'l *<instruction pointer>' where you use the value from the trap message. Looking at that can probably tell you why you panic'd. =2D-=20 John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201402111325.24523.jhb>