From owner-svn-src-all@FreeBSD.ORG Sat Mar 3 06:01:39 2012 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3AD6B1065672; Sat, 3 Mar 2012 06:01:39 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 82D0C8FC08; Sat, 3 Mar 2012 06:01:38 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AmwAACmzUU+DaFvO/2dsb2JhbABDhTKdMJJigX0BAQQBI1YFFg4GBAICDRkCWQaIFAULpyKRc4EvjgqBFgSIUIxukxM X-IronPort-AV: E=Sophos;i="4.73,524,1325480400"; d="scan'208";a="158854064" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 03 Mar 2012 01:01:32 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 4E575B414D; Sat, 3 Mar 2012 01:01:32 -0500 (EST) Date: Sat, 3 Mar 2012 01:01:32 -0500 (EST) From: Rick Macklem To: John Baldwin Message-ID: <1978600220.298005.1330754492306.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <201203021153.06614.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Linux)/6.0.10_GA_2692) Cc: svn-src-head@freebsd.org, Peter Holm , svn-src-all@freebsd.org, src-committers@freebsd.org Subject: Re: svn commit: r226967 - head/sys/ufs/ufs X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Mar 2012 06:01:39 -0000 John Baldwin wrote: > On Friday, March 02, 2012 8:29:21 am Peter Holm wrote: > > On Thu, Mar 01, 2012 at 04:47:41PM -0500, John Baldwin wrote: > > > On Monday, October 31, 2011 11:01:47 am Peter Holm wrote: > > > > Author: pho > > > > Date: Mon Oct 31 15:01:47 2011 > > > > New Revision: 226967 > > > > URL: http://svn.freebsd.org/changeset/base/226967 > > > > > > > > Log: > > > > The kern_renameat() looks up the fvp using the DELETE flag, > > > > which causes > > > > the removal of the name cache entry for fvp. > > > > > > > > Reported by: Anton Yuzhaninov > > > > In collaboration with: kib > > > > MFC after: 1 week > > > > > > > > Modified: > > > > head/sys/ufs/ufs/ufs_vnops.c > > > > > > So I ran into this at work recently, and even this fix applied I > > > was still > > > seeing rename()'s that were seemingly not taking effect. After > > > getting some > > > extra KTR traces, I figured out that the same purge needs to be > > > applied to the > > > destination vnode. Specifically, the issue I ran into was that was > > > renaming > > > 'foo' to 'bar', but lookups for 'bar' were still returning the old > > > file. The > > > reason was that a lookup after the namei(RENAME) of the > > > destination while > > > ufs_rename() had its locks dropped was readding the name cache > > > entry for > > > 'bar', and then a cache_lookup() of 'bar' would return the old > > > vnode as long > > > as that vnode was valid (e.g. if it had a link in another > > > location, or other > > > processes had an open file descriptor for it). I'm currently > > > testing the > > > patch below: > > > > > > > I now have a scenario that fails, but not quite the same way you > > describe. > > > > It looks like this: > > > > touch file1 > > echo xxx > file2 > > rename(file1, file2) > > > > A different process performs stat() on both files in a tight loop. > > > > Once in a while I observe that a stat() of file2, after the rename, > > returns a link count of zero. Size is zero as expected, but the > > inode > > number of file2 is unchanged. > Peter, were you doing a stat() using the file name, or an fstat()? (Using stat() with afile name might explain it, maybe??) > Hmm, that is surprising. I would not expect inconsistent stat info. I > have no explanation for why that would happen. I do not have a > simplified > test program, just a specific workload at work. In this case it's > workflow > is more like this: > > fd = flopen(file1, O_CREAT); > fstat(fd); > if (st.st_size == 0) { > fd2 = open(file1.temp, O_CREAT | O_EXLOCK); > fd3 = open(someotherfile); > copy_data(fd3, fdf2); > close(fd3); > rename(file1.temp, file1); > close(fd); > fd = fd2; > } > link(file1, uniquedir/file1); > close(fd); > > /* Use uniquedir/file1, and unlink it when done. */ > > What I observed was that sometimes uniquedir/file1 would end up > referencing > the empty file created by flopen() after the rename() rather than > linking > to the file created when file1.temp was created. > > > I've been running the same test with your patch and not observed > > this > > problem. This on UFS2 with SU enabled. > > Hmm, I wish I could explain explain your odd result above in terms of > this > bug, but the results from stat() should always be consistent > (VOP_GETATTR() > can't switch vnodes mid-stream as it were). > > BTW, note that in my case where I had multiple processes all doing the > same > loop, in the edge case, another process always had the file open (and > was > blocked in flock() in flopen()) when the rename() happened, so that > prevented > the vnode from going away. This is important as otherwise the use > count would > drop to zero and marked inactive which removes all references to it > from the > name cache. In my case the flock() in flopen() and the fact that the > "first" > process held the flock until after the rename and call to link() made > the > race more likely to trigger. > > Hmm, perhaps one way to do this would be: > > touch file.always (save its i-node) > fork worker process that just continually does a 'stat file1' in a > loop > main process: > ln file.always file1 > touch file2 > rename file2 file1 > stat file1 > complain if file1 has the saved i-node > > Hmm, I just whipped up something to do this and it fails early and > often on > an unpatched kernel, but does not with my patch. > > #include > #include > #include > #include > #include > #include > #include > #include > > static char *always, *file1, *file2; > static ino_t always_ino; > > static void > usage(void) > { > fprintf(stderr, "Usage: rename_race \n"); > exit(1); > } > > static void > child(void) > { > struct stat sb; > > /* Exit as soon as our parent exits. */ > while (getppid() != 1) { > stat(file1, &sb); > } > exit(0); > } > > static void > create_file(const char *path) > { > int fd; > > fd = open(path, O_CREAT, 0666); > if (fd < 0) > err(1, "open(%s)", path); > close(fd); > } > > int > main(int ac, char **av) > { > struct stat sb, sb2; > pid_t pid; > > if (ac != 2) > usage(); > if (stat(av[1], &sb) != 0) > err(1, "stat(%s)", av[1]); > if (!S_ISDIR(sb.st_mode)) > errx(1, "%s not a directory", av[1]); > > asprintf(&always, "%s/file.always", av[1]); > asprintf(&file1, "%s/file1", av[1]); > asprintf(&file2, "%s/file2", av[1]); > > create_file(always); > if (stat(always, &sb) != 0) > err(1, "stat(%s)", always); > always_ino = sb.st_ino; > > pid = fork(); > if (pid < 0) > err(1, "fork"); > if (pid == 0) > child(); > for (;;) { > if (unlink(file1) < 0 && errno != ENOENT) > err(1, "unlink(%s)", file1); > if (link(always, file1) < 0) > err(1, "link(%s, %s)", always, file1); > create_file(file2); > if (stat(file2, &sb2) < 0) > err(1, "stat(%s)", file2); > if (rename(file2, file1) < 0) > err(1, "rename(%s, %s)", file2, file1); > if (stat(file1, &sb) < 0) > err(1, "stat(%s)", file1); > if (sb.st_ino != sb2.st_ino || > sb.st_ino == always_ino) > printf("Bad stat: always: %d file1: %d (should be %d)\n", > always_ino, sb.st_ino, sb2.st_ino); > } > return (0); > } > > -- > John Baldwin