From owner-freebsd-current@freebsd.org Fri May 22 23:46:30 2020 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 7E6822F10D9 for ; Fri, 22 May 2020 23:46:30 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-QB1-obe.outbound.protection.outlook.com (mail-eopbgr660086.outbound.protection.outlook.com [40.107.66.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "GlobalSign Organization Validation CA - SHA256 - G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 49TNRm740Lz44DM; Fri, 22 May 2020 23:46:28 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=nIy6wrsfLTrMuQtHT9thI/U2LIRvMbHEIEkQja996ahNCa3Qd1ipXm6imClsbsVz9ImNt3gXwsfrHAE9UNgnRwGVe6Sroc2ii6J7Kox18EP6mu4bvMYEIw406tdbp125+G8ouVHG6yDbiM4Oiva1u2w1cyLiyQkaf+/uU7PTYFQWTO49kOfhKTAlqSf/2t97TI9GTkWVyRk490hgbRVNK2JTEKoyWrixLAhO8Jd6vG9fDiQGqNgkQtUAXm85RwIp7J00uT/dxdBXLo2dGw1FQ5HA9B0a+bYjEUqEIDi6oLrL7LzaTIvsSwDapQTvHE7b1isGYPRdNkoPVTY7T75YIQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Z+OKVYPA6GaJpCTmnIxItoA3P/I0iVSarNhzD0Fjwg4=; b=TWm800ZHX12hi0DHY84ZGj2QRDl/KTpmCiO3yV2D95UCwvd3n8U7ot+4OA7RFuoNMSHAQ0l9jiJh8Q7X8USDfejQv+Q+Dt6E0qcy/TzLp/WDhBQBJGUllnUuoQHTKmt/5JxCS24HdxTCpgX4vNmQqvP9FcCIiDjUQjdbkfeadBO959LrLSxe6mGrGMug1Sf2m1jRZBfciFxQMyvtLg1R0wSZTFORAu4ZXInp5ikxQf5LXgTH83oeX3LaNV5YJ6VPSBaMdFUcivB0kRoVwOcQiSnGQ7G6FBcJM/a/be0rczxMCAt8Gwg/c9PF3sAmj4QnUCBoRD0mdj501eRpPgvR7A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=uoguelph.ca; dmarc=pass action=none header.from=uoguelph.ca; dkim=pass header.d=uoguelph.ca; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=uoguelph.ca; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Z+OKVYPA6GaJpCTmnIxItoA3P/I0iVSarNhzD0Fjwg4=; b=bPWDL5d1mz3jO/eDi067wczj7DADmSDpsyr+DJVKnI9C9hIzgALbDz9qE2AtyQAHTvxJXJTGApl0aYuJM1kIj4ST+JDTyhQGnn46S/Yd9XjpkBlX3QbG4r9GIeIYlaWxMx/0EFU+PEhJ96cY1kFycqbvDiFnYrrRr+gZoCA407SvQ7mFucil6v+6eeR/fCe4Wb/3DQD0bEk0PWEO/rEfCnEIl/NJgcxJaVcXdi31U62IFq4V17Y65lURRH/HA99cqO18noPN3aN5hcqUptvOKW7HIHY8Jzq9dhCtEGUjtZlLME7LOmPF0jcpFaWhJYA4E/88UzSWwSqwnh2zbuN9dw== Received: from QB1PR01MB3649.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:c00:32::26) by QB1PR01MB3761.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:c00:32::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3021.23; Fri, 22 May 2020 23:46:26 +0000 Received: from QB1PR01MB3649.CANPRD01.PROD.OUTLOOK.COM ([fe80::dd96:945c:b6ee:ffa2]) by QB1PR01MB3649.CANPRD01.PROD.OUTLOOK.COM ([fe80::dd96:945c:b6ee:ffa2%6]) with mapi id 15.20.3021.027; Fri, 22 May 2020 23:46:26 +0000 From: Rick Macklem To: Konstantin Belousov , Ryan Libby CC: "freebsd-current@FreeBSD.org" , Konstantin Belousov Subject: Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc Thread-Topic: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc Thread-Index: AQHWLwsKoH8lF1xuI0WPckyCNZ/JGaiyHHkAgAA2qQCAAlx9bA== Date: Fri, 22 May 2020 23:46:26 +0000 Message-ID: References: , <20200521101428.GC64045@kib.kiev.ua> In-Reply-To: <20200521101428.GC64045@kib.kiev.ua> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: dd4bd9f5-9131-4432-c815-08d7feaa57c9 x-ms-traffictypediagnostic: QB1PR01MB3761: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-forefront-prvs: 04111BAC64 x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: 9uhzkR95RMIl7kHZQYTb16TnNT5b2qD7vzmAHYfrIuCi/yi1LjwnT83DfFs7IGBXOyKXBSHOt+No3tFgHbERqYrsLniVl+rvuwSdmpbH5PPDHhHEDCh7VB3muDmjAXczPahLfHCHCi1Fp/7W37Uf4nPSzVWMiBxrbMI933z2PhvIeBXl5JeOZ/tOlj7NEsEwjVVdDF5EEW1T4vrKXBPon5axuen73RkZBrcSJ98i7iC13k/9WN3P9yspEPBjp6ZxWGI7jhCdumCjtaGtK/A5LYnMTuS7YKLut1oZYf78dFFw6kQ3NAHpJKtacl5twDyg x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:QB1PR01MB3649.CANPRD01.PROD.OUTLOOK.COM; PTR:; CAT:NONE; SFTY:; SFS:(396003)(376002)(136003)(39860400002)(346002)(366004)(66476007)(5660300002)(478600001)(2906002)(33656002)(86362001)(71200400001)(66946007)(76116006)(8676002)(6506007)(186003)(53546011)(8936002)(7696005)(4326008)(52536014)(786003)(66556008)(9686003)(66446008)(55016002)(316002)(54906003)(110136005)(64756008); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata: RoRzeKUSTzoyYkjAdB1+ND0C7UfsYN7ibrUvRCDe9FzoHS1YdKNhc1UgIiw68nhgZZwMHx9x3zHAMtSck0z3Av2wIn1MzNcbSEUeTTWZxYBNWlT5nZiiv0xgwU42ylZl6p1Eixcn6Dnw10aAco6aFF8dj0EEcLNE9H1wdA4xEHrnTnsbyKX8LIJrvRrjBeGw1YoYOZW0SXBhgfspRI/PunHu15fRvf+WW1Vwt6pijJ7G9BnGwa3rn/otE1gz3npmdAyHItzeRn1cZu1F3cDjUGrsEWVl5SU8lttg2SHEXY38mFXk2W0jLLtY6qnByW5sCnfJUxrSSqVDGM02b92n/rQMLNs4kFs1ohW3SJ8z+WyeRVBZEozpnTmLp+FfuIr1PUiWYe3KEcpOc39IgKsUlXMWN9NMEWvyCxv/5UTPNZX+tJDhKttwfpMXVRAdobiaYCyX0+gPMWcVOJj1d9cXkbnd4wL/KU0GR+En1NU8AEoQbSkP3P1U9UGBK70BlRHTOO6dvX8FWaoC+BJYxEKIzK7QPlOEL0dpvt38lVtCqL4= x-ms-exchange-transport-forked: True Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-Network-Message-Id: dd4bd9f5-9131-4432-c815-08d7feaa57c9 X-MS-Exchange-CrossTenant-originalarrivaltime: 22 May 2020 23:46:26.5462 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: 7K3XcsELMOnE4iLATLr2y/nX8nZhMfVGK+MqcusV9cVkjUeQu+7j5nvMMLWqwU6q8zwpdR7wdz3ObeNWjdUn1A== X-MS-Exchange-Transport-CrossTenantHeadersStamped: QB1PR01MB3761 X-Rspamd-Queue-Id: 49TNRm740Lz44DM X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=uoguelph.ca header.s=selector1 header.b=bPWDL5d1; dmarc=none; spf=pass (mx1.freebsd.org: domain of rmacklem@uoguelph.ca designates 40.107.66.86 as permitted sender) smtp.mailfrom=rmacklem@uoguelph.ca X-Spamd-Result: default: False [-5.18 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; NEURAL_HAM_MEDIUM(-0.99)[-0.994]; R_DKIM_ALLOW(-0.20)[uoguelph.ca:s=selector1]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:40.107.0.0/16]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[uoguelph.ca]; NEURAL_HAM_LONG(-0.99)[-0.987]; DWL_DNSWL_LOW(-1.00)[uoguelph.ca:dkim]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[uoguelph.ca:+]; NEURAL_HAM_SHORT(-0.70)[-0.702]; RCVD_IN_DNSWL_NONE(0.00)[40.107.66.86:from]; FREEMAIL_TO(0.00)[gmail.com,freebsd.org]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:8075, ipnet:40.64.0.0/10, country:US]; ARC_ALLOW(-1.00)[microsoft.com:s=arcselector9901:i=1]; RWL_MAILSPIKE_POSSIBLE(0.00)[40.107.66.86:from] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 May 2020 23:46:30 -0000 Konstantin Belousov wrote:=0A= >On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote:=0A= >> On Wed, May 20, 2020 at 6:04 PM Rick Macklem wrot= e:=0A= >> >=0A= >> > Hi,=0A= >> >=0A= >> > Since I hadn't upgraded a kernel through the winter, it took me a whil= e=0A= >> > to bisect this, but r358252 seems to be the culprit.=0A= >> >=0A= >> > If I do a kernel build over NFS using my not so big Pentium 4 (single = core,=0A= >> > 1.25Gbytes RAM, i386), about every second attempt will hang.=0A= >> > When I do a "ps" in the debugger, I see processes sleeping on btalloc.= =0A= >> > If I revert to r358251, I cannot reproduce this.=0A= >> >=0A= >> > Any ideas?=0A= >> >=0A= >> > I can easily test any change you might suggest to see if it fixes the= =0A= >> > problem.=0A= >> >=0A= >> > If you want more debug info, let me know, since I can easily=0A= >> > reproduce it.=0A= >> >=0A= >> > Thanks, rick=0A= >>=0A= >> Nothing obvious to me. I can maybe try a repro on a VM...=0A= >>=0A= >> ddb ps, acttrace, alltrace, show all vmem, show page would be welcome.= =0A= >>=0A= >> "btalloc" is "We're either out of address space or lost a fill race."=0A= >=0A= >Yes, I would be not surprised to be out of something on 1G i386 machine.= =0A= >Please also add 'show alllocks'.=0A= Ok, I used an up to date head kernel and it took longer to reproduce a hang= .=0A= This time, none of the processes are stuck on "btalloc".=0A= I'll try and give you most of the above, but since I have to type it in by = hand=0A= from the screen, I might not get it all. (I'm no real typist;-)=0A= > show alllocks=0A= exclusive lockmgr ufs (ufs) r =3D 0 locked @ kern/vfs_subr.c: 3259=0A= exclusive lockmgr nfs (nfs) r =3D 0 locked @ kern/vfs_lookup.c:737=0A= exclusive sleep mutex kernel area domain (kernel arena domain) r =3D 0 lock= ed @ kern/subr_vmem.c:1343=0A= exclusive lockmgr bufwait (bufwait) r =3D 0 locked @ kern/vfs_bio.c:1663=0A= exclusive lockmgr ufs (ufs) r =3D 0 locked @ kern/vfs_subr.c:2930=0A= exclusive lockmgr syncer (syncer) r =3D 0 locked @ kern/vfs_subr.c:2474=0A= Process 12 (intr) thread 0x.. (1000008)=0A= exclusive sleep mutex Giant (Giant) r =3D 0 locked @ kern/kern_intr.c:1152= =0A= =0A= > ps=0A= - Not going to list them all, but here are the ones that seem interesting..= .=0A= 18 0 0 0 DL vlruwt 0x11d939cc [vnlru]=0A= 16 0 0 0 DL (threaded) [bufdaemon]=0A= 100069 D qsleep [bufdaemon]=0A= 100074 D - [bufspacedaemon-0]=0A= 100084 D sdflush 0x11923284 [/ worker]=0A= - and more of these for the other UFS file systems=0A= 9 0 0 0 DL psleep 0x1e2f830 [vmdaemon]=0A= 8 0 0 0 DL (threaded) [pagedaemon]=0A= 100067 D psleep 0x1e2e95c [dom0]=0A= 100072 D launds 0x1e2e968 [laundry: dom0]=0A= 100073 D umarcl 0x12cc720 [uma]=0A= =85 a bunch of usb and cam ones=0A= 100025 D - 0x1b2ee40 [doneq0]=0A= =85=0A= 12 0 0 0 RL (threaded) [intr]=0A= 100007 I [swi6: task queue]=0A= 100008 Run CPU 0 [swi6: Giant taskq]=0A= =85=0A= 100000 D swapin 0x1d96dfc [swapper]=0A= - and a bunch more in D state.=0A= Does this mean the swapper was trying to swap in?=0A= =0A= > acttrace=0A= - just shows the keyboard=0A= kdb_enter() at kdb_enter+0x35/frame=0A= vt_kbdevent() at vt_kdbevent+0x329/frame=0A= kdbmux_intr() at kbdmux_intr+0x19/frame=0A= taskqueue_run_locked() at taskqueue_run_locked+0x175/frame=0A= taskqueue_run() at taskqueue_run+0x44/frame=0A= taskqueue_swi_giant_run(0) at taskqueue_swi_giant_run+0xe/frame=0A= ithread_loop() at ithread_loop+0x237/frame=0A= fork_exit() at fork_exit+0x6c/frame=0A= fork_trampoline() at 0x../frame=0A= =0A= > show all vmem=0A= vmem 0x.. 'transient arena'=0A= quantum: 4096=0A= size: 23592960=0A= inuse: 0=0A= free: 23592960=0A= busy tags: 0=0A= free tags: 2=0A= inuse size free size=0A= 16777216 0 0 1 23592960=0A= vmem 0x.. 'buffer arena'=0A= quantum: 4096=0A= size: 94683136=0A= inuse: 94502912=0A= free: 180224=0A= busy tags: 1463=0A= free tags: 3=0A= inuse size free size=0A= 16384 2 32768 1 16384=0A= 32768 39 1277952 1 32768=0A= 65536 1422 93192192 0 0=0A= 131072 0 0 1 131072=0A= vmem 0x.. 'i386trampoline'=0A= quantum: 1=0A= size: 24576=0A= inuse: 20860=0A= free: 3716=0A= busy tags: 9=0A= free tags: 3=0A= inuse size free size=0A= 32 1 48 1 52=0A= 64 2 208 0 0=0A= 128 2 280 0 0=0A= 2048 1 2048 1 3664=0A= 4096 2 8192 0 0=0A= 8192 1 10084 0 0=0A= vmem 0x.. 'kernel rwx arena'=0A= quantum: 4096=0A= size: 0=0A= inuse: 0=0A= free: 0=0A= busy tags: 0=0A= free tags: 0=0A= vmem 0x.. 'kernel area dom'=0A= quantum: 4096=0A= size: 56623104=0A= inuse: 56582144=0A= free: 40960=0A= busy tags: 11224=0A= free tags: 3=0A= inuse size free size=0A= 4096 11091 45428736 0 0=0A= 8192 63 516096 0 0=0A= 16384 12 196608 0 0=0A= 32768 6 196608 0 0=0A= 40960 2 81920 1 40960=0A= 65536 16 1048576 0 0=0A= 94208 1 94208 0 0=0A= 110592 1 110592 0 0=0A= 131072 15 2441216 0 0=0A= 262144 15 3997696 0 0=0A= 524288 1 524288 0 0=0A= 1048576 1 1945600 0 0=0A= vmem 0x.. 'kernel arena'=0A= quantum: 4096=0A= size: 390070272=0A= inuse: 386613248=0A= free: 3457024=0A= busy tags: 873=0A= free tags: 3=0A= inuse size free size=0A= 4096 35 143360 1 4096=0A= 8192 2 16384 2 16384=0A= 12288 1 12288 0 0=0A= 16384 30 491520 0 0=0A= 20480 140 2867200 0 0=0A= 65536 1 65536 0 0=0A= 131072 631 82706432 0 0=0A= 1048576 0 0 1 1339392=0A= 2097152 27 56623104 1 2097152=0A= 8388608 1 13774848 0 0=0A= 16777216 3 74883072 0 0=0A= 33554432 1 36753408 0 0=0A= 67108864 1 118276096 0 0=0A= =0A= > alltrace=0A= - I can't face typing too much more, but I'll put a few=0A= here that look interesting=0A= =0A= - for csh=0A= sched_switch()=0A= mi_switch()=0A= kern_yield()=0A= getblkx()=0A= breadn_flags()=0A= ffs_update()=0A= ufs_inactive()=0A= VOP_INACTIVE()=0A= vinactivef()=0A= vput_final()=0A= vm_object_deallocate()=0A= vm_map_process_deferred()=0A= kern_munmap()=0A= sys_munmap()=0A= =0A= - For cc=0A= sched_switch()=0A= mi_switch()=0A= sleepq_switch()=0A= sleepq_timedwait()=0A= _sleep()=0A= pause_sbt()=0A= vmem_bt_alloc()=0A= keg_alloc_slab()=0A= zone_import()=0A= cache_alloc()=0A= cache_alloc_retry()=0A= uma_zalloc_arg()=0A= bt_fill()=0A= vmem_xalloc()=0A= vmem_alloc()=0A= kmem_alloc()=0A= kmem_malloc_domainset()=0A= page_alloc()=0A= keg_alloc_slab()=0A= zone_import()=0A= cache_alloc()=0A= cache_alloc_retry()=0A= uma_zalloc_arg()=0A= nfscl_nget()=0A= nfs_create()=0A= vop_sigdefer()=0A= nfs_vnodeops_bypass()=0A= VOP_CREATE_APV()=0A= vn_open_cred()=0A= vn_open()=0A= kern_openat()=0A= sys_openat()=0A= =0A= Then there are a bunch of these... for cc, make=0A= sched_switch()=0A= mi_switch()=0A= sleepq_switch()=0A= sleepq_catch_signals()=0A= sleepq_wait_sig()=0A= kern_wait6()=0A= sys_wait4()=0A= =0A= - for vnlru=0A= sched_switch()=0A= mi_switch()=0A= sleepq_switch()=0A= sleepq_timedwait()=0A= _sleep()=0A= vnlru_proc()=0A= fork_exit()=0A= fork_trampoline()=0A= =0A= - for syncer=0A= sched_switch()=0A= mi_switch()=0A= critical_exit_preempt()=0A= intr_event_handle()=0A= intr_execute_handlers()=0A= lapic_handle_intr()=0A= Xapic_isr1()=0A= - interrupt=0A= memset()=0A= cache_alloc()=0A= cache_alloc_retry()=0A= uma_zalloc_arg()=0A= vmem_xalloc()=0A= vmem_bt_alloc()=0A= keg_alloc_slab()=0A= zone_import()=0A= cache_alloc()=0A= cache_alloc_retry()=0A= uma_zalloc_arg()=0A= bt_fill()=0A= vmem_xalloc()=0A= vmem_alloc()=0A= bufkva_alloc()=0A= getnewbuf()=0A= getblkx()=0A= breadn_flags()=0A= ffs_update()=0A= ffs_sync()=0A= sync_fsync()=0A= VOP_FSYNC_APV()=0A= sched_sync()=0A= fork_exit()=0A= fork_trampoline()=0A= =0A= - For bufdaemon (a bunch of these)=0A= sched_switch()=0A= mi_switch()=0A= sleepq_switch()=0A= sleepq_timedwait()=0A= _sleep()=0A= buf_daemon()=0A= fork_exit()=0A= fork_trampoline()=0A= =0A= vmdaemon and pagedaemon are basically just like above,=0A= sleeping in=0A= vm_daemon()=0A= or=0A= vm_pageout_worker()=0A= or=0A= vm_pageout_laundry_worker()=0A= or=0A= uma_reclaim_worker()=0A= =0A= That's all the typing I can take right now.=0A= I can probably make this happen again if you want more specific stuff.=0A= =0A= rick=0A= =0A= =0A= =0A= =0A=