Date: Tue, 21 Jul 2009 14:59:36 +0400 From: Kamigishi Rei <spambox@haruhiism.net> To: freebsd-current@FreeBSD.org Cc: Lawrence Stewart <lstewart@freebsd.org> Subject: [follow-up] Fatal trap 12 in r195146+ in netisr_queue_internal Message-ID: <4A659F98.2060007@haruhiism.net>
next in thread | raw e-mail | index | archive | help
Hello, hope you're having a good day, I've been researching the issue I mentioned in my last message in "r194546 amd64: kernel panic in tcp_sack.c" thread since July 07 and here are some of the findings: The fatal trap triggers inside mtx_lock_sleep() during a dereference of a pointer (owner, points to struct thread @ m->mtx_lock & ~MTX_FLAGMASK). The code goes like this (shortened): v = m->mtx_lock; if (v == MTX_UNOWNED) { turnstile_cancel(ts); continue; } owner = (struct thread *)(v & ~MTX_FLAGMASK); if (TD_IS_RUNNING(owner)) { turnstile_cancel(ts); continue; } Everything goes fine until - under heavy load on an interface, usually - we reach a point where: 1. m->mtx_lock is 4 (== MTX_UNOWNED). 2. v is assigned mtx_lock's value (4 == MTX_UNOWNED). 3. condition (v == MTX_UNOWNED) fails. 4. owner is assigned an address from v. 5. dereference fails as v has a bogus value which is not inside kernel address space. The only affected variable is v; I've added temporary variables around it (i.e. uint64ptr_t foo1, v, foo2;) and those variables are not altered - even though v has moved 64bits further inside the stack. The variable is not only altered at that point; by adding debugging lines along the code I've seen multiple cases of v and mtx_lock being changed during the execution of mtx_lock_sleep(). Moreover, my own test variables were changing inside it. I had the following structure for tests: 1. At the start of the function, foo1 = 0. 2. Before lock_profile_obtain_lock_failed, foo1 = 1. 3. After lock_profile_obtain_lock_failed, foo1 = 2. 4. Before (v == MTX_UNOWNED) conditional, foo1 = 3. During tests, foo1 changed values inside this range (0..3) several times; during heavy lo0/em0 local traffic load, these conditionals failed multiple (up to 100) times in 2-5 seconds. v gets changed like that as well, but in 99.99% cases it gets assigned a value that references kernel memory area so the dereference works. Is this behaviour (variables changing their value inside a single function call) correct? -- Kamigishi Rei KREI-RIPE
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4A659F98.2060007>