Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 09 May 2018 04:25:36 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 228087] F_SETLK randomly fails on NFS4 in threaded operation in MySQL
Message-ID:  <bug-228087-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D228087

            Bug ID: 228087
           Summary: F_SETLK randomly fails on NFS4 in threaded operation
                    in MySQL
           Product: Base System
           Version: 11.1-STABLE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Many People
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: barry.boes@acciodata.com

Tried in 10.4, 11.1-RELEASE, 11.1-STABLE, and 11.2-PRERELEASE client and
server.  Currently client and server are 11.2-PRERELEASE.

Ktrace shows the following :

 66181 mysqld   CALL  close(0x30)
 66181 mysqld   RET   openat 48/0x30
 66181 mysqld   CALL  fcntl(0x30,F_SETLK,0x7fffdd3e5cc0)
 66181 mysqld   RET   close 0
 66181 mysqld   RET   fcntl -1 errno 13 Permission denied


Examining a full trace, the files being locked are never locked twice by My=
SQL
or locked by another process.  The file closed in the first line is a diffe=
rent
file than that opened in the second line.   MySQL does this same operation =
tens
or hundreds of thousands of times successfully then fails on one.  From all=
 of
the trace data that I've been able to gather, the FCNTL works 100% of the t=
ime
IF the close returns before another thread calls open and F_SETLK and fails
100% of the time that the SETLK completes before the close returns in anoth=
er
thread.
    Observation affects the results.  Failure occurs tens to hundreds of ti=
mes
more rapidly when not tracing the process.

The higher the network latency, the more likely it is to happen.  With a
latency of 200uS, it happens in seconds on a loaded server.  With a latency=
 of
100us, it happens in tens of seconds.  With a latency of 20uS it happens
rarely, and below 15uS I have yet to see this failure.

No kernel messages are logged.  I have duplicated the problem on a variety =
of
hardware, from 28 core Supermicro motherboards with ECC memory and E5-2XXX =
V4's
to laptops with i3's, 5's, or 7's.

The filesystem setup is as follows :

server : ZFS on 11.2-PRERELEASE configured for very low latency (optimized =
SSDs
and persistent write caches or sync=3Ddisabled).

The filesystem is either a base ZFS filesystem or a clone of a snapshot (for
easy testing, it happens on either).

The client mounts the server system via NFS4 and also runs 11-2-PRERELEASE.=
=20
Tested with 100Mb, gigabit, 50 gigabit, and 100Gigabit NICs.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-228087-227>