Date: Thu, 13 Feb 2014 15:57:33 -0800 From: Adrian Chadd <adrian@freebsd.org> To: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>, "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org> Subject: can the scheduler decide to schedule an interrupted but runnable thread on another CPU core? What are the implications for code? Message-ID: <CAJ-Vmo=7Nz1jqXy%2BrTQ7u9_ZP7jeFOKUJxU1O51tYJjvTUmWTg@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
Hi, Whilst digging into collisions in the flowtable code I discovered that a bunch of them are due to some of the flowtable_lookup() code running on a different CPU - the contention on the l2/l3 lookup lock(s) was enough to block things so they'd get an obvious chance to be migrated. So this led me to wonder whether in a fully preemptive kernel, a running kernel thread would stay on the current CPU until it hit a very specific subset of things (exited to userland, hit a lock, etc.) Apparently (according to kan and rwatson) this is how we define fully preemptive - it's not just interruptable at almost any point, but the running task may be interrupted and rescheduled on a different CPU outside of specific critical sections. This means that for the flowtable case, the current setup (without atomics to maintain the lists) can only work if all threads doing work with the flowtable structures (ie, lookup, insert, purge) have to be CPU pinned. Otherwise we may have the situation where: sequentually: * lookup occurs on CPU A; * lookup succeeds on CPU A for some almost-expired entry; * preemption occurs, and it gets scheduled to CPU B; then simultaneously: * CPU A's flowtable purge code runs, and decides to purge entries including the current one; * the code now running on CPU B has an item from the CPU A flowtable, and dereferences it as it's being freed, leading to potential badness. Now, it's a ridiculously small window of opportunity, but I'd rather the code be written to be correct and mostly-fast versus very fast and potentially exploding. I'm sure those in operations would agree. :-) So, my questions: * is this actually how fully pre-emptive kernels _may_ behave? * I believe there's a difference between what 4BSD and ULE will do here - is this the case? What are the scheduler behaviours? * are there any other areas in the kernel that rely on pcpu uma zones / curcpu indexes for things outside of sched_pin() ? Thanks, -a
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-Vmo=7Nz1jqXy%2BrTQ7u9_ZP7jeFOKUJxU1O51tYJjvTUmWTg>