From owner-freebsd-stable@freebsd.org Tue Sep 20 21:15:24 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EF588BE3031 for ; Tue, 20 Sep 2016 21:15:24 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 805E68E2; Tue, 20 Sep 2016 21:15:24 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u8KLFIdG014464 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Wed, 21 Sep 2016 00:15:18 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u8KLFIdG014464 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id u8KLFIfx014462; Wed, 21 Sep 2016 00:15:18 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 21 Sep 2016 00:15:17 +0300 From: Konstantin Belousov To: Slawa Olhovchenkov Cc: John Baldwin , freebsd-stable@freebsd.org Subject: Re: nginx and FreeBSD11 Message-ID: <20160920211517.GJ38409@kib.kiev.ua> References: <20160907191348.GD22212@zxy.spb.ru> <1823460.vTm8IvUQsF@ralph.baldwin.cx> <20160918162241.GE2960@zxy.spb.ru> <2122051.7RxZBKUSFc@ralph.baldwin.cx> <20160920065244.GO2840@zxy.spb.ru> <20160920192053.GP2840@zxy.spb.ru> <20160920201925.GI38409@kib.kiev.ua> <20160920203853.GR2840@zxy.spb.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160920203853.GR2840@zxy.spb.ru> User-Agent: Mutt/1.6.1 (2016-04-27) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Sep 2016 21:15:25 -0000 On Tue, Sep 20, 2016 at 11:38:54PM +0300, Slawa Olhovchenkov wrote: > On Tue, Sep 20, 2016 at 11:19:25PM +0300, Konstantin Belousov wrote: > > > On Tue, Sep 20, 2016 at 10:20:53PM +0300, Slawa Olhovchenkov wrote: > > > On Tue, Sep 20, 2016 at 09:52:44AM +0300, Slawa Olhovchenkov wrote: > > > > > > > On Mon, Sep 19, 2016 at 06:05:46PM -0700, John Baldwin wrote: > > > > > > > > > > > If this panics, then vmspace_switch_aio() is not working for > > > > > > > some reason. > > > > > > > > > > > > I am try using next DTrace script: > > > > > > ==== > > > > > > #pragma D option dynvarsize=64m > > > > > > > > > > > > int req[struct vmspace *, void *]; > > > > > > self int trace; > > > > > > > > > > > > syscall:freebsd:aio_read:entry > > > > > > { > > > > > > this->aio = *(struct aiocb *)copyin(arg0, sizeof(struct aiocb)); > > > > > > req[curthread->td_proc->p_vmspace, this->aio.aio_buf] = curthread->td_proc->p_pid; > > > > > > } > > > > > > > > > > > > fbt:kernel:aio_process_rw:entry > > > > > > { > > > > > > self->job = args[0]; > > > > > > self->trace = 1; > > > > > > } > > > > > > > > > > > > fbt:kernel:aio_process_rw:return > > > > > > /self->trace/ > > > > > > { > > > > > > req[self->job->userproc->p_vmspace, self->job->uaiocb.aio_buf] = 0; > > > > > > self->job = 0; > > > > > > self->trace = 0; > > > > > > } > > > > > > > > > > > > fbt:kernel:vn_io_fault:entry > > > > > > /self->trace && !req[curthread->td_proc->p_vmspace, args[1]->uio_iov[0].iov_base]/ > > > > > > { > > > > > > this->buf = args[1]->uio_iov[0].iov_base; > > > > > > printf("%Y vn_io_fault %p:%p pid %d\n", walltimestamp, curthread->td_proc->p_vmspace, this->buf, req[curthread->td_proc->p_vmspace, this->buf]); > > > > > > } > > > > > > === > > > > > > > > > > > > And don't got any messages near nginx core dump. > > > > > > What I can check next? > > > > > > May be check context/address space switch for kernel process? > > > > > > > > > > Which CPU are you using? > > > > > > > > CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz (2000.04-MHz K8-class CPU) > > Is this sandy bridge ? > > Sandy Bridge EP > > > Show me first 100 lines of the verbose dmesg, > > After day or two, after end of this test run -- I am need to enable verbose. > > > I want to see cpu features lines. In particular, does you CPU support > > the INVPCID feature. > > CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz (2000.05-MHz K8-class CPU) > Origin="GenuineIntel" Id=0x206d7 Family=0x6 Model=0x2d Stepping=7 > Features=0xbfebfbff > Features2=0x1fbee3ff > AMD Features=0x2c100800 > AMD Features2=0x1 > XSAVE Features=0x1 > VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID > TSC: P-state invariant, performance statistics > > I am don't see this feature before E5v3: > > CPU: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz (2600.06-MHz K8-class CPU) > Origin="GenuineIntel" Id=0x306e4 Family=0x6 Model=0x3e Stepping=4 > Features=0xbfebfbff > Features2=0x7fbee3ff > AMD Features=0x2c100800 > AMD Features2=0x1 > Structured Extended Features=0x281 > XSAVE Features=0x1 > VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr > TSC: P-state invariant, performance statistics > > (don't run 11.0 on this CPU) Ok. > > CPU: Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz (2600.05-MHz K8-class CPU) > Origin="GenuineIntel" Id=0x306f2 Family=0x6 Model=0x3f Stepping=2 > Features=0xbfebfbff > Features2=0x7ffefbff > AMD Features=0x2c100800 > AMD Features2=0x21 > Structured Extended Features=0x37ab > XSAVE Features=0x1 > VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr > TSC: P-state invariant, performance statistics > > (11.0 run w/o this issuse) Do you mean that similarly configured nginx+aio do not demonstrate the corruption on this machine ? > > > Also you may show me the 'sysctl vm.pmap' output. > > # sysctl vm.pmap > vm.pmap.pdpe.demotions: 3 > vm.pmap.pde.promotions: 172495 > vm.pmap.pde.p_failures: 2119294 > vm.pmap.pde.mappings: 1927 > vm.pmap.pde.demotions: 126192 > vm.pmap.pcid_save_cnt: 0 > vm.pmap.invpcid_works: 0 > vm.pmap.pcid_enabled: 0 > vm.pmap.pg_ps_enabled: 1 > vm.pmap.pat_works: 1 > > This is after vm.pmap.pcid_enabled=0 in loader.conf > > > > > > > > > > Perhaps try disabling PCID support (I think vm.pmap.pcid_enabled=0 from > > > > > loader prompt or loader.conf)? (Wondering if pmap_activate() is somehow not switching) > > > > > > I am need some more time to test (day or two), but now this is like > > > workaround/solution: 12h runtime and peak hour w/o nginx crash. > > > (vm.pmap.pcid_enabled=0 in loader.conf). > > > > Please try this variation of the previous patch. > > and remove vm.pmap.pcid_enabled=0? Definitely. > > > diff --git a/sys/vm/vm_map.c b/sys/vm/vm_map.c > > index a23468e..f754652 100644 > > --- a/sys/vm/vm_map.c > > +++ b/sys/vm/vm_map.c > > @@ -481,6 +481,7 @@ vmspace_switch_aio(struct vmspace *newvm) > > if (oldvm == newvm) > > return; > > > > + spinlock_enter(); > > /* > > * Point to the new address space and refer to it. > > */ > > @@ -489,6 +490,7 @@ vmspace_switch_aio(struct vmspace *newvm) > > > > /* Activate the new mapping. */ > > pmap_activate(curthread); > > + spinlock_exit(); > > > > /* Remove the daemon's reference to the old address space. */ > > KASSERT(oldvm->vm_refcnt > 1,