From owner-svn-src-head@freebsd.org  Fri Nov 18 13:22:20 2016
Return-Path: <owner-svn-src-head@freebsd.org>
Delivered-To: svn-src-head@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7502DC4532C;
 Fri, 18 Nov 2016 13:22:20 +0000 (UTC)
 (envelope-from rb743@hermes.cam.ac.uk)
Received: from ppsw-41.csi.cam.ac.uk (ppsw-41.csi.cam.ac.uk [131.111.8.141])
 by mx1.freebsd.org (Postfix) with ESMTP id 3F336F9A;
 Fri, 18 Nov 2016 13:22:19 +0000 (UTC)
 (envelope-from rb743@hermes.cam.ac.uk)
X-Cam-AntiVirus: no malware found
X-Cam-ScannerInfo: http://www.cam.ac.uk/cs/email/scanner/
Received: from sc1.bsdpad.com ([163.172.212.18]:42312)
 by ppsw-41.csi.cam.ac.uk (smtp.hermes.cam.ac.uk [131.111.8.159]:587)
 with esmtpsa (LOGIN:rb743) (TLSv1:ECDHE-RSA-AES256-SHA:256)
 id 1c7j7T-0008Mk-QM (Exim 4.86_36-e07b163)
 (return-path <rb743@hermes.cam.ac.uk>); Fri, 18 Nov 2016 13:22:19 +0000
Date: Fri, 18 Nov 2016 13:21:28 +0000
From: Ruslan Bukin <ruslan.bukin@cl.cam.ac.uk>
To: Konstantin Belousov <kostikbel@gmail.com>
Cc: Alan Cox <alc@rice.edu>, Alan Cox <alc@FreeBSD.org>,
 src-committers@freebsd.org, svn-src-all@freebsd.org,
 svn-src-head@freebsd.org
Subject: Re: svn commit: r308691 - in head/sys: cddl/compat/opensolaris/sys
 cddl/contrib/opensolaris/uts/common/fs/zfs fs/tmpfs kern vm
Message-ID: <20161118132128.GA43507@bsdpad.com>
References: <201611151822.uAFIMoj2092581@repo.freebsd.org>
 <20161116133718.GA10251@bsdpad.com>
 <20161116165343.GX54029@kib.kiev.ua>
 <20161116165939.GA12498@bsdpad.com>
 <20161116175210.GA13203@bsdpad.com>
 <9047aad0-0713-5d7a-f92e-6f931642bb27@rice.edu>
 <20161118102235.GA40554@bsdpad.com>
 <20161118103728.GE54029@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <20161118103728.GE54029@kib.kiev.ua>
User-Agent: Mutt/1.6.1 (2016-04-27)
Sender: "R. Bukin" <rb743@hermes.cam.ac.uk>
X-BeenThere: svn-src-head@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: SVN commit messages for the src tree for head/-current
 <svn-src-head.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-head>,
 <mailto:svn-src-head-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-head/>
List-Post: <mailto:svn-src-head@freebsd.org>
List-Help: <mailto:svn-src-head-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-head>,
 <mailto:svn-src-head-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Nov 2016 13:22:20 -0000

On Fri, Nov 18, 2016 at 12:37:28PM +0200, Konstantin Belousov wrote:
> On Fri, Nov 18, 2016 at 10:22:35AM +0000, Ruslan Bukin wrote:
> > On Thu, Nov 17, 2016 at 10:51:40AM -0600, Alan Cox wrote:
> > > On 11/16/2016 11:52, Ruslan Bukin wrote:
> > > > On Wed, Nov 16, 2016 at 04:59:39PM +0000, Ruslan Bukin wrote:
> > > >> On Wed, Nov 16, 2016 at 06:53:43PM +0200, Konstantin Belousov wrote:
> > > >>> On Wed, Nov 16, 2016 at 01:37:18PM +0000, Ruslan Bukin wrote:
> > > >>>> I have a panic with this on RISC-V. Any ideas ?
> > > >>> How did you checked that the revision you replied to, makes the problem ?
> > > >>> Note that the backtrace below is not reasonable.
> > > >> I reverted this commit like that and rebuilt kernel:
> > > >> git show 2fa36073055134deb2df39c7ca46264cfc313d77 | patch -p1 -R
> > > >>
> > > >> So the problem is reproducible on dual-core with 32mb mdroot.
> > > >>
> > > > I just found another interesting behavior:
> > > > depending on amount of physical memory :
> > > > 700m - panic
> > > > 800m - works fine
> > > > 1024m - panic
> > > 
> > > I think that this behavior is not inconsistent with your report of the
> > > system crashing if you enabled two cores but not one.  Specifically,
> > > changing the number of active cores will slightly affect the amount of
> > > memory that is allocated during initialization.
> > > 
> > > There is nothing unusual in the sysctl output that you sent out.
> > > 
> > > I have two suggestions.  Try these in order.
> > > 
> > > 1. r308691 reduced the size of struct vm_object.  Try undoing the one
> > > snippet that reduced the vm object size and see if that makes a difference.
> > > 
> > > 
> > > @@ -118,7 +118,6 @@
> > >  	vm_ooffset_t backing_object_offset;/* Offset in backing object */
> > >  	TAILQ_ENTRY(vm_object) pager_object_list; /* list of all objects of this pager type */
> > >  	LIST_HEAD(, vm_reserv) rvq;	/* list of reservations */
> > > -	struct vm_radix cache;		/* (o + f) root of the cache page radix trie */
> > >  	void *handle;
> > >  	union {
> > >  		/*
> > > 
> > > 
> > > 2. I'd like to know if vm_page_scan_contig() is being called.
> > > 
> > > Finally, to simply the situation a little, I would suggest that you
> > > disable superpage reservations in vmparam.h.  You have no need for them.
> > > 
> > > 
> > 
> > I made another one merge from svn-head and problem disappeared for 700m,1024m of physical memory, but now I able to reproduce it with 900m of physical memory.
> > 
> > Restoring 'struct vm_radix cache' in struct vm_object gives no behavior changes.
> > 
> > Adding a panic() call to vm_page_scan_contig gives an original panic (so vm_page_scan_contig is not called),
> > it looks like size of function is changed and it unhides the original problem.
> > 
> > Disable superpage reservations changes behavior and gives same panic on 1024m boot.
> > 
> > Finally, if I comment ruxagg call in kern_resource then I can't reproduce the problem any more with any amount of memory in any setup:
> > 
> > --- a/sys/kern/kern_resource.c
> > +++ b/sys/kern/kern_resource.c
> > @@ -1063,7 +1063,7 @@ rufetch(struct proc *p, struct rusage *ru)
> >         *ru = p->p_ru;
> >         if (p->p_numthreads > 0)  {
> >                 FOREACH_THREAD_IN_PROC(p, td) {
> > -                       ruxagg(p, td);
> > +                       //ruxagg(p, td);
> >                         rucollect(ru, &td->td_ru);
> >                 }
> >         }
> > 
> > I found this patch in my early RISC-V development directory, so it looks the problem persist whole the freebsd/riscv life, but was hidden until now.
> > 
> 
> If you comment out the rufetch() call in proc0_post(), does the problem go
> away as well ?

Yes it goes away as well (sys/kern/kern_resource.c reverted).

--- a/sys/kern/init_main.c
+++ b/sys/kern/init_main.c
@@ -591,7 +591,7 @@ proc0_post(void *dummy __unused)
 {
        struct timespec ts;
        struct proc *p;
-       struct rusage ru;
+       //struct rusage ru;
        struct thread *td;
 
        /*
@@ -602,7 +602,7 @@ proc0_post(void *dummy __unused)
        FOREACH_PROC_IN_SYSTEM(p) {
                microuptime(&p->p_stats->p_start);
                PROC_STATLOCK(p);
-               rufetch(p, &ru);        /* Clears thread stats */
+               //rufetch(p, &ru);      /* Clears thread stats */
                PROC_STATUNLOCK(p);
                p->p_rux.rux_runtime = 0;
                p->p_rux.rux_uticks = 0;


> 
> I suggest to start with fixing the backtrace anyway, because the backtrace
> you posted is wrong.

yeah I see it show:
rufetch() at exec_shell_imgact+0x1204
instead of
rufetch() at proc0_post()+0x88

BTW problem is also goes away when Spike simulator run in debug mode (when you compare PC with some value on each cycle for example). I suppose it goes away on slower machines as well.

I will debug more.

Ruslan