Date: Mon, 01 Jan 2024 21:55:10 +0000 From: bugzilla-noreply@freebsd.org To: fs@FreeBSD.org Subject: [Bug 276002] nfscl: data corruption using both copy_file_range and mmap'd I/O Message-ID: <bug-276002-3630-G67iwNCGGO@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-276002-3630@https.bugs.freebsd.org/bugzilla/> References: <bug-276002-3630@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D276002 Rick Macklem <rmacklem@FreeBSD.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |geoffrey@dommett.com --- Comment #34 from Rick Macklem <rmacklem@FreeBSD.org> --- Ok, here is my understanding of what currently can happen. Hopefully Kostik will correct me if I have this wrong. #1 - File is open(2)'d. #2 - A byte range (lets say the 1st 100Mbytes) is mmap(2)'d into the address space #3 - Some addresses within this address space are modified by the process, dirtying the corresponding pages. #4 - File is read(2) sequentially. Now, when #4 happens, there will be read-aheads done by the nfsiod threads. These simply do Read RPCs against the NFS server to read the byte ranges of the file into the buffer cache blocks. They are done asynchronously and without any vnode lock. --> At this time, I do not see anything that stops these read-aheads from filling the buffer cache blocks/pages from the NFS server's now stale data. Now, I thought adding a msync(2) with MS_SYNC between #3 and #4 would be sufficient to cause the pages dirtied by #3 to be written to the NFS server (via VOP_PUTPAGES(), which is ncl_putpages()). I believe that an fsync(2) between #3 and #4 will also write the dirtied pages to the NFS server. Without either a msync(2) or fsync(2) between #3 and #4, what could be done to make this work? - Don't do read-ahead. This would be a major performance hit and is imho a non-starter. - Don't do read-ahead when a file is mmap(2)'d. This sounds better, since it will be a rare case that a file will be both mmap(2)'d and read via read(2) syscalls. --> To do this, the NFS client needs to know if the file has been mmap(2)'d. A flag could be set on the vnode when the file is mmap(2)'d and that flag can be checked by the NFS client. --> The problem is when can the flag be cleared? My recollection from a previous round of discussing this is...not until all the process(es) that mmap(2)'d the file exit. (I cannot recall if the vnode's v_usecount going to 0 is sufficient.) - Having some way that the nfsiod threads can check to see if there are dirty pages related to the buffer cache block and write those back to the NFS server before doing the read. (Recall that the buffer cache block will be quite a few pages, typically 128K to 1Mbyte in size.) --> This could be done by having the nfsiod thread LK_EXCLUSIVE lock the vnode, but that would be a major performance hit, as well. That's as far as I've gotten in previous discussions about this. Note that this PR started with a specific problem related to copy_file_range(2) and that has been fixed (or kib@'s patch will fix it when committed). The more general case as above, well?? --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-276002-3630-G67iwNCGGO>