Date: Thu, 17 May 2018 14:15:54 -0400 From: Andrew Gallatin <gallatin@cs.duke.edu> To: Mark Johnston <markj@freebsd.org>, Conrad Meyer <cem@freebsd.org> Cc: src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r333703 - head/sys/vm Message-ID: <1b4d6c8c-82a9-1953-7283-fe21158cfd5f@cs.duke.edu> In-Reply-To: <20180517180752.GA5515@raichu> References: <201805170427.w4H4R8lv058775@repo.freebsd.org> <CAG6CVpWpQgJOyXpN3z%2B7xBu2%2BQ_gRvXDfhVLvrKfUp7EgFzvDQ@mail.gmail.com> <20180517180752.GA5515@raichu>
next in thread | previous in thread | raw e-mail | index | archive | help
On 05/17/18 14:07, Mark Johnston wrote: > On Thu, May 17, 2018 at 10:07:34AM -0700, Conrad Meyer wrote: >> On Wed, May 16, 2018 at 9:27 PM, Mark Johnston <markj@freebsd.org> wrote: >>> Author: markj >>> Date: Thu May 17 04:27:08 2018 >>> New Revision: 333703 >>> URL: https://urldefense.proofpoint.com/v2/url?u=https-3A__svnweb.freebsd.org_changeset_base_333703&d=DwIBAg&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=Ed-falealxPeqc22ehgAUCLh8zlZbibZLSMWJeZro4A&m=6lhtci2MYxtyrK5Ub70QC0DcEiQ77Ry2LTAb6cDtW5A&s=z0SOGvNGORjI-SySfy-aovuyFzy_K5CtCfbNeWbRGLA&e= >>> >>> Log: >>> Fix a race in vm_page_pagequeue_lockptr(). >>> >>> The value of m->queue must be cached after comparing it with PQ_NONE, >>> since it may be concurrently changing. >>> >>> Reported by: glebius >> >> What were the symptoms of this issue? The test plan in the linked >> phabricator revision says: >> >> "Gleb reported seeing panics as a result of the use of a bogus index >> into the pagequeue array, and also reported that this patch fixed the >> panics." >> >> So an attempt to lock pagequeues[PQ_NONE=255].pq_mutex, which is >> either something later in the vm_domain object, or bogus memory? One >> of the mtx asserts trips? > > I think it was "mtx_lock() of spin mutex"; I didn't get a lot of > details. > > I failed to note in the commit message that this race was introduced in > r332974. > The most common stack was: panic: mtx_lock() of spin mutex (null) @ /data/ocafirmware.alt/FreeBSD/sys/vm/vm_page.c:3344 cpuid = 4 time = 1526415167 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe158af62380 vpanic() at vpanic+0x1a3/frame 0xfffffe158af623e0 doadump() at doadump/frame 0xfffffe158af62460 __mtx_lock_flags() at __mtx_lock_flags+0x11a/frame 0xfffffe158af624a0 vm_page_dequeue() at vm_page_dequeue+0x8a/frame 0xfffffe158af624e0 vm_page_alloc_domain_after() at vm_page_alloc_domain_after+0x2cb/frame 0xfffffe158af62560 vm_page_grab_pages() at vm_page_grab_pages+0x274/frame 0xfffffe158af62610 vn_sendfile() at vn_sendfile+0x83a/frame 0xfffffe158af628e0 [Tue May 15 20:12:48 2018]sys_sendfile() at sys_sendfile+0x119/frame 0xfffffe158af62980 amd64_syscall() at amd64_syscall+0x298/frame 0xfffffe158af62ab0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe158af62ab0 I once saw one like this: Fatal trap 9: general protection fault while in kernel mode cpuid = 0; apic id = 00 instruction pointer = 0x20:0xffffffff8088bf74 stack pointer = 0x28:0xfffffe55af7712e0 frame pointer = 0x28:0xfffffe55af771330 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq446: mlx5_core0) [Mon May 14 04:45:10 2018]trap number = 9 panic: general protection fault cpuid = 0 time = 1526273109 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe55af770ff0 vpanic() at vpanic+0x1a3/frame 0xfffffe55af771050 panic() at panic+0x43/frame 0xfffffe55af7710b0 trap_fatal() at trap_fatal+0x35f/frame 0xfffffe55af771100 trap() at trap+0x6d/frame 0xfffffe55af771210 [Mon May 14 04:45:10 2018]calltrap() at calltrap+0x8/frame 0xfffffe55af771210 --- trap 0x9, rip = 0xffffffff8088bf74, rsp = 0xfffffe55af7712e0, rbp = 0xfffffe55af771330 --- vm_pqbatch_submit_page() at vm_pqbatch_submit_page+0x144/frame 0xfffffe55af771330 sendfile_free_page() at sendfile_free_page+0x10e/frame 0xfffffe55af771360 sendfile_free_mext_pg() at sendfile_free_mext_pg+0xb7/frame 0xfffffe55af7713b0 mb_free_ext() at mb_free_ext+0x103/frame 0xfffffe55af7713e0 m_freem() at m_freem+0x48/frame 0xfffffe55af771400 tcp_do_segment() at tcp_do_segment+0x1647/frame 0xfffffe55af771500 tcp_input_with_port() at tcp_input_with_port+0xfcc/frame 0xfffffe55af771650 tcp_input() at tcp_input+0xb/frame 0xfffffe55af771660 [Mon May 14 04:45:10 2018]ip_input() at ip_input+0xe9/frame 0xfffffe55af7716c0 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe55af771710 ether_demux() at ether_demux+0x140/frame 0xfffffe55af771740 ether_nh_input() at ether_nh_input+0x32c/frame 0xfffffe55af7717a0 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe55af7717f0 ether_input() at ether_input+0x26/frame 0xfffffe55af771810 tcp_lro_flush_all() at tcp_lro_flush_all+0xf2/frame 0xfffffe55af771850 mlx5e_rx_cq_comp() at mlx5e_rx_cq_comp+0x5e5/frame 0xfffffe55af771950 mlx5_cq_completion() at mlx5_cq_completion+0x73/frame 0xfffffe55af771990 <...> Thanks again for fixing it so quickly! Drew
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1b4d6c8c-82a9-1953-7283-fe21158cfd5f>