Date: Wed, 31 May 2017 23:31:20 +0500 From: Alexander Morozov <alex@nixd.org> To: Freebsd fs <freebsd-fs@freebsd.org> Subject: Re: VFS vn_lock function makes system unresponsive when calling vn_fullpath Message-ID: <d4546ec44e24b2f385d9d9cfb1698cc5@nixd.org> In-Reply-To: <2d76399889bb95b75cb6b054c4c68116@nixd.org> References: <38666c423c33a5e1009c106c23aeb218@nixd.org> <CAGudoHGi5gjUERGm-HjMpYsp=M%2B44J0hdCcW1QhdQ40rnZM38Q@mail.gmail.com> <2d76399889bb95b75cb6b054c4c68116@nixd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Mateusz, Thank you for your response. >> Hard to say off hand, but so far it looks like the vnode is already exclusively locked It seems so. I have added "error = vn_lock(dvp, LK_SHARED | LK_SLEEPFAIL);" to the code of the kernel module before vn_gullpath() and vn_lock has returned error code: EDEADLK 11 /* Resource deadlock avoided */. >> Are you running the kernel with DEBUG_VFS_LOCKS? At the moment no, because I was only suspecting that something wrong with the vnode, but I hoped that the problem was located somewhere in the namecache lookup. Later, I will run kernel with DEBUG_VFS_LOCKS. >> What is the purpose of this module in the first place? It is a clone of the Apple's Sandbox (seatbelts or how it was called). The module performs: mandatory (by choice) integrity control of the executable files and option to apply additional constraints on the executable. The test version of the userland software is almost ready. The tinyscheme scripts are converted to the binary schemes (like apple sandbox does, but in another way to avoid problems with Apple). The rules can be loaded to the module, and testing against the mpo_kld_check_load was successful. This is more like a research and at the same time my personal "security" tool which ideally, should apply on some programs additional constraints and warn the admin/security team in real time if the application is attempting to leave the sandbox (is doing what it was prohibited to do) (i.e like access /etc/rc.conf, /dev/kmem even as root, execute shell), for instance php suddenly performs kldload from /tmp. (I know that this is more "WinNT" approach (Agnitum Outpost with proactive security), for this reason I am not advertising it a lot and probably I will keep source code available online) I need this kernel module because I am running some closed source programs on the server which are "black boxes" and can't be trusted from any point. And for other reasons... Three developers (inc me) are working on the development of the project in their spare time. >> regular vnode -> path resolution is not guaranteed to work Yes, I remember we discussed a bit this problem when I was studying in university. But for development it was more than enough. >>. For instance someone else can be modifying the directory tree as you translate Yes, this is also possible. The plan is to implement other MAC procedures and then decide what to do with vn_fullpath and vnodes in general. My intention was to confirm that vn_fullpath() is a dead end and that the FreeBSD developers know of the potential deadlock problem. To let others know where to look for solution should they experience the same problem. -- Kind Regards, Alexander Morozov > Hi Mateusz, > > Thank you for your response. > >>> Hard to say off hand, but so far it looks like the vnode is already exclusively locked > > It seems so. I have added "error = vn_lock(dvp, LK_SHARED | LK_SLEEPFAIL);" to the code of the kernel module before vn_gullpath() and vn_lock has returned error code: EDEADLK 11 /* Resource deadlock avoided */. > >>> Are you running the kernel with DEBUG_VFS_LOCKS? > > At the moment no, because I was only suspecting that something wrong with the vnode, but I hoped that the problem was located somewhere in the namecache lookup. Later, I will run kernel with DEBUG_VFS_LOCKS. > >>> What is the purpose of this module in the first place? > > It is a clone of the Apple's Sandbox (seatbelts or how it was called). The module performs: mandatory (by choice) integrity control of the executable files and option to apply additional constraints on the executable. The test version of the userland software is almost ready. The tinyscheme scripts are converted to the binary schemes (like apple sandbox does, but in another way to avoid problems with Apple). The rules can be loaded to the module, and testing against the mpo_kld_check_load was successful. > > This is more like a research and at the same time my personal "security" tool which ideally, should apply on some programs additional constraints and warn the admin/security team in real time if the application is attempting to leave the sandbox (is doing what it was prohibited to do) (i.e like access /etc/rc.conf, /dev/kmem even as root, execute shell), for instance php suddenly performs kldload from /tmp. (I know that this is more "WinNT" approach (Agnitum Outpost with proactive security), for this reason I am not advertising it a lot and probably I will keep source code available online) > > I need this kernel module because I am running some closed source programs on the server which are "black boxes" and can't be trusted from any point. And for other reasons... Three developers (inc me) are working on the development of the project in their spare time. > >>> regular vnode -> path resolution is not guaranteed to work > > Yes, I remember we discussed a bit this problem when I was studying in university. But for development it was more than enough. > >>> . For instance someone else can be modifying the directory tree as you translate > > Yes, this is also possible. The plan is to implement other MAC procedures and then decide what to do with vn_fullpath and vnodes in general. > > My intention was to confirm that vn_fullpath() is a dead end and that the FreeBSD developers know of the potential deadlock problem. To let others know where to look for solution should they experience the same problem. > > -- > > Kind Regards, > Alexander Morozov > > Mateusz Guzik писал 2017-05-31 00:27: > On Tue, May 30, 2017 at 4:42 PM, Alexander Morozov <alex@nixd.org> wrote: > > Hi, > > At the moment I am developing a kernel module based on MAC Framework which is invoking a vn_fullpath() from vfs_cache.c. > > Here is the thing: > When the MAC Framework mpo_vnode_check_write procedure is called, the kernel module is trying to retrieve: the path on the disk for the curthread and the path of the file in which the curthread is attempting to write. At the point when the program execution reaches the vn_fullpath() for the resolution of the file's path, the terminal window 'freezes' (actually the whole system is not responding, the rest ttys stop responding after entering login credentials). > > For instance, after loading and initializing a MAC kernel module, I am opening a existing text file using vi to edit it. After inserting some random text I press ESC key on keyboard and terminal window 'freezes' (at that moment the mpo_vnode_check_write is called) ( the struct vnode * vp which is passed to the MAC procedure is valid (not NULL) and type (enum vtype) is VREG). > > I have investigated this issue and found out the following: > The get_fullpath() is calling the get_fullpath1() where later the vn_vptocnp() is invoked. > Retrieving the location for the curthread is successful (the full path returned). > But when the kernel module is making attempt to retrieve the path for the vp (argument of the mpo_vnode_check_write) the function vn_lock(*vp, LK_SHARED | LK_RETRY) (line 2191) located in function vn_vptocnp() is grabbing control forever. > > Hard to say off hand, but so far it looks like the vnode is already exclusively locked and now the kernel deadlocks itself by locking it in shared mode. You can easily inspect the state in ddb with show lockedvnods. > > Are you running the kernel with DEBUG_VFS_LOCKS? > > What is the purpose of this module in the first place? > > regular vnode -> path resolution is not guaranteed to work. While it will work most of the time, it is inherently racy problematic in presence of multiple hardlinks. For instance someone else can be modifying the directory tree as you translate back and trick you into thinking the vnode represents a different file. Even if this was not the case, the translation of the sort on each write would be a performance killer. > > The only possibly working approach I see would attach metadata to the vnode after lookup and then use it. > Below I have copied and pasted the code which performers the path resolution and the MAC procedure handler: > static int > rw_retreive_data(struct thread * td, struct vnode *dvp, char ** rpath, char ** curpath, struct sandbox_rule_app ** rule_ptr) > { > [snip] error = vn_fullpath(td, td->td_proc->p_textvp, curpath, &curfreepath); > [snip] } > > int sandbox_vnode_check_write(struct ucred *active_cred, > struct ucred *file_cred, > struct vnode *vp, > struct label *vplabel) > { > IS_MODULE_INITED(0) > ASSERT_NULL_R(vp, 0); > > struct sandbox_rule_app * rule_ptr = NULL; > char * rpath = "-"; > char * curpath = "-"; > int error = 0; > > RWLOCK_BLOCK(&sandbox_rules_lock, RWLOCK_READ) > { > > error = rw_retreive_data(curthread, vp, &rpath, &curpath, &rule_ptr); > > If this is using rwlock the code is additionally wrong as vn_fullpath can induce unbound sleep, while the lock at hand only supports bound sleep. > -- > Mateusz Guzik <mjguzik gmail.com [1]> Links: ------ [1] http://gmail.com From owner-freebsd-fs@freebsd.org Wed May 31 20:04:45 2017 Return-Path: <owner-freebsd-fs@freebsd.org> Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A5FADB8880A for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 31 May 2017 20:04:45 +0000 (UTC) (envelope-from kisscoolandthegangbang@hotmail.fr) Received: from EUR03-DB5-obe.outbound.protection.outlook.com (mail-oln040092071036.outbound.protection.outlook.com [40.92.71.36]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "Microsoft IT SSL SHA2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 31B88668C5 for <freebsd-fs@freebsd.org>; Wed, 31 May 2017 20:04:44 +0000 (UTC) (envelope-from kisscoolandthegangbang@hotmail.fr) Received: from DB5EUR03FT045.eop-EUR03.prod.protection.outlook.com (10.152.20.53) by DB5EUR03HT069.eop-EUR03.prod.protection.outlook.com (10.152.20.212) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.1101.12; Wed, 31 May 2017 20:04:42 +0000 Received: from DBXPR05MB157.eurprd05.prod.outlook.com (10.152.20.60) by DB5EUR03FT045.mail.protection.outlook.com (10.152.21.164) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1101.12 via Frontend Transport; Wed, 31 May 2017 20:04:42 +0000 Received: from DBXPR05MB157.eurprd05.prod.outlook.com ([fe80::8121:96da:8303:3a63]) by DBXPR05MB157.eurprd05.prod.outlook.com ([fe80::8121:96da:8303:3a63%27]) with mapi id 15.01.1124.016; Wed, 31 May 2017 20:04:42 +0000 From: kc atgb <kisscoolandthegangbang@hotmail.fr> To: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: Re: Problem with zpool remove of log device Thread-Topic: Problem with zpool remove of log device Thread-Index: AQHS2kkkWE5shmvvSUeFr52IoccjVw== Date: Wed, 31 May 2017 20:04:42 +0000 Message-ID: <DBXPR05MB157DDB6C2FD923C8E442740A0F10@DBXPR05MB157.eurprd05.prod.outlook.com> References: <9188a169-cd81-f64d-6b9e-0e3c6b4af1bb@wasikowski.net> In-Reply-To: <9188a169-cd81-f64d-6b9e-0e3c6b4af1bb@wasikowski.net> Accept-Language: fr-FR, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: freebsd.org; dkim=none (message not signed) header.d=none;freebsd.org; dmarc=none action=none header.from=hotmail.fr; x-incomingtopheadermarker: OriginalChecksum:EC7AC1C015227AFB2F7CF62440A66A90105695E7E2AC856091624505B94CC504; UpperCasedChecksum:7CAB9416C1B950F7601C75AD116A74FB8324026DAA97D542BEF0DF59AF3F4381; SizeAsReceived:8075; Count:42 x-ms-exchange-messagesentrepresentingtype: 1 x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; DB5EUR03HT069; 24:3rddpmiNQ89yfk8VQNs/1Kiu/ohJv30U/12JOa9cp7NdEf5JvEAjYhCTNg+OZVqqmcbBXZERYSW5zDIbdblG36FlLwptITdUNuT9Ti579hE=; 7:c/0n+758ELHNUmz+V4QIW+fX/bzoV+wUV/MYJwNwk874sywZlQRlkg4uCpyV7snSLL86HuboGQqKWLWpKY8dJvaVfNjmB0XxzljMTSwRtjXI4Wu/h4Uq7HJ2NUfJT4hj1e+fPGzUM7Q+qtwiIfT0qH2MiARlSgTiRyAE6RqPMNHMmi2KGh1tTyjEueKkAA+XyjfGYvYxPuWvHOoWoJZKL1xFr8p9DMr+RM11Uq0SdfiBp2+Q5cEyO5sTFKWjiOICY8M9zltZaxPlOuaJHkrm9PR/jxtiiW9PPsfmvm8mwh/tpQurkeSvBZzYNaq4xfw2 x-incomingheadercount: 42 x-eopattributedmessage: 0 x-forefront-antispam-report: EFV:NLI; SFV:NSPM; SFS:(7070007)(98901004); DIR:OUT; SFP:1901; SCL:1; SRVR:DB5EUR03HT069; H:DBXPR05MB157.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; x-ms-traffictypediagnostic: DB5EUR03HT069: x-ms-office365-filtering-correlation-id: 10740008-ebea-45e5-53a3-08d4a8604685 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(201702061074)(5061506573)(5061507331)(1603103135)(2017031320274)(2017031324274)(2017031323274)(2017031322274)(1603101448)(1601125374)(1701031045); SRVR:DB5EUR03HT069; x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700073)(100105000095)(100000701073)(100105300095)(100000702073)(100105100095)(444000031); SRVR:DB5EUR03HT069; BCL:0; PCL:0; RULEID:(100000800073)(100110000095)(100000801073)(100110300095)(100000802073)(100110100095)(100000803073)(100110400095)(100000804073)(100110200095)(100000805073)(100110500095); SRVR:DB5EUR03HT069; x-forefront-prvs: 0324C2C0E2 spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-2" Content-ID: <6A49541CE18E4342B01D268AE6EF8A7B@eurprd05.prod.outlook.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-originalarrivaltime: 31 May 2017 20:04:42.6260 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Internet X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5EUR03HT069 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Filesystems <freebsd-fs.freebsd.org> List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>, <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe> List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/> List-Post: <mailto:freebsd-fs@freebsd.org> List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help> List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>, <mailto:freebsd-fs-request@freebsd.org?subject=subscribe> X-List-Received-Date: Wed, 31 May 2017 20:04:45 -0000 Le Fri, 26 May 2017 09:40:06 +0000, =A3ukasz W=B1sikowski <lukasz@wasikowski.net> a =E9crit : > Hi, >=20 > I cant remove log device from pool - operation ends ok, but log device > is still in the pool (bug?). >=20 > # uname -a > FreeBSD xxx.yyy.com 11.0-STABLE FreeBSD 11.0-STABLE #0 r316543: Thu Apr > 6 08:22:43 CEST 2017 root@xxx.yyy.com:/usr/obj/usr/src/sys/YYY amd64 >=20 > # zpool status tank > pool: tank > state: ONLINE > status: One or more devices are configured to use a non-native block size= . > Expect reduced performance. > action: Replace affected devices with devices that support the > configured block size, or migrate data to a properly configured > pool. > scan: scrub repaired 0 in 22h21m with 0 errors on Thu May 25 02:26:36 2= 017 > config: >=20 > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > ada2p3 ONLINE 0 0 0 > ada3p3 ONLINE 0 0 0 > logs > mirror-1 ONLINE 0 0 0 > gpt/tankssdzil0 ONLINE 0 0 0 block size: > 512B configured, 4096B native > gpt/tankssdzil1 ONLINE 0 0 0 block size: > 512B configured, 4096B native >=20 > errors: No known data errors >=20 > When I try to remove log device operation ends without errors: >=20 > # zpool remove tank mirror-1; echo $? > 0 >=20 > But the log device is still there: >=20 > # zpool status tank > pool: tank > state: ONLINE > status: One or more devices are configured to use a non-native block size= . > Expect reduced performance. > action: Replace affected devices with devices that support the > configured block size, or migrate data to a properly configured > pool. > scan: scrub repaired 0 in 22h21m with 0 errors on Thu May 25 02:26:36 2= 017 > config: >=20 > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > ada2p3 ONLINE 0 0 0 > ada3p3 ONLINE 0 0 0 > logs > mirror-1 ONLINE 0 0 0 > gpt/tankssdzil0 ONLINE 0 0 0 block size: > 512B configured, 4096B native > gpt/tankssdzil1 ONLINE 0 0 0 block size: > 512B configured, 4096B native >=20 > errors: No known data errors >=20 >=20 > I'd like to remove it - how should I proceed? >=20 >=20 Hi, I had a problem like this not so long ago.=20 I thought maybe it was something with my installation and because that I ha= ve an older version of FreeBSD and by the same occasion older version of zf= s.=20 I found some similar cases but for openzfs on linux and if I'm not wrong op= ensolaris. I know now that not the only one on FreeBSD.=20 It was something with zfs thinkink that writes are not synced from log to d= isks, and it can't remove the device.=20 How is your zpool iostat -v output after you offline the log device(s) ? Is= there any persistent data on allocated row ? K.=20
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?d4546ec44e24b2f385d9d9cfb1698cc5>