Date: Thu, 26 Aug 2010 13:03:38 -0700 From: mdf@FreeBSD.org To: freebsd-current@freebsd.org Cc: Jeff Roberson <jroberson@jroberson.net> Subject: sched_pin() bug in SCHED_ULE Message-ID: <AANLkTin6V9pc3d7ifyOmR2V-H5-g_AQFji-LbZzWfAKM@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
Back at the beginning of August I posted about issues with sched_pin() and witness_warn(): http://lists.freebsd.org/pipermail/freebsd-hackers/2010-August/032553.html After a lot of debugging I think I've basically found the issue. I found this bug on stable/7, but looking at the commit log for CURRENT I don't see any reason the bug doesn't exist there. This analysis is specific to amd64/i386 but the problem is likely to exist on most arches. The basic problem is that in both tdq_move() and sched_setcpu() can manipulate the ts->ts_cpu variable of another process, which may be running. This means that a process running on CPU N may be set to run on CPU M the next context switch. Then, that process may call sched_pin(), then grab a PCPU variable. An IPI_PREEMPT can come in, causing the thread to call mi_switch() in ipi_bitmap_handler(). sched_switch() will then notice that ts->ts_cpu is not PCPU_GET(cpuid), and call sched_switch_migrate(), migrating the thread to the intended CPU. Thus after sched_pin() and potentially before using any PCPU variable the process can get migrated to another CPU, and now it is using a PCPU variable for the wrong processor. Given that ts_cpu can be set by other threads, it doesn't seem worth checking at sched_pin time whether ts_cpu is not the same as td_oncpu, because to make the check would require taking the thread_lock. Instead, I propose adding a check for !THREAD_CAN_MIGRATE() before calling sched_switch_migrate(). However, sched_pin() is also used by sched_bind(9) to force the thread to stay on the intended cpu. I would handle this by adding a TSF_BINDING state that is different from TSF_BOUND. The thing that has me wondering whether this is all correct is that I don't see any checks in tdq_move() or sched_setcpu() for either TSF_BOUND or THREAD_CAN_MIGRATE(). Perhaps that logic is managed in the various calling functions; in that case I would propose adding asserts that the thread is THREAD_CAN_MIGRATE() unless it's marked TSF_BINDING. Does this analysis seem correct? Thanks, matthew
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTin6V9pc3d7ifyOmR2V-H5-g_AQFji-LbZzWfAKM>