From owner-freebsd-net@freebsd.org Fri Jan 13 23:02:25 2017 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8598FCAE530 for ; Fri, 13 Jan 2017 23:02:25 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from NAM03-BY2-obe.outbound.protection.outlook.com (mail-by2nam03on0058.outbound.protection.outlook.com [104.47.42.58]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "Microsoft IT SSL SHA2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3BBE41361 for ; Fri, 13 Jan 2017 23:02:24 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from YTXPR01MB0189.CANPRD01.PROD.OUTLOOK.COM (10.165.218.133) by YTXPR01MB0190.CANPRD01.PROD.OUTLOOK.COM (10.165.218.134) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.845.12; Fri, 13 Jan 2017 23:02:22 +0000 Received: from YTXPR01MB0189.CANPRD01.PROD.OUTLOOK.COM ([10.165.218.133]) by YTXPR01MB0189.CANPRD01.PROD.OUTLOOK.COM ([10.165.218.133]) with mapi id 15.01.0845.014; Fri, 13 Jan 2017 23:02:22 +0000 From: Rick Macklem To: Slawa Olhovchenkov CC: Eugene Grosbein , Michael Sinatra , "freebsd-net@freebsd.org" Subject: Re: NFSv4 stuck Thread-Topic: NFSv4 stuck Thread-Index: AQHSbFpEjAFIf9ZbxU6pF+N5/AhAH6Ez3cvbgAAGMgCAAAXeAIAACF8AgABrMACAAHWnAIAApBHYgAAFBgCAAYnXdQ== Date: Fri, 13 Jan 2017 23:02:22 +0000 Message-ID: References: <20170111220818.GD30374@zxy.spb.ru> <20170111225922.GE30374@zxy.spb.ru> <20170111235020.GF30374@zxy.spb.ru> <58771EA6.1020104@grosbein.net> <20170112131504.GG30374@zxy.spb.ru> , <20170112232016.GM30374@zxy.spb.ru> In-Reply-To: <20170112232016.GM30374@zxy.spb.ru> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=rmacklem@uoguelph.ca; x-ms-office365-filtering-correlation-id: e7b964a0-0075-43a7-0dc0-08d43c083bc8 x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001);SRVR:YTXPR01MB0190; x-microsoft-exchange-diagnostics: 1; YTXPR01MB0190; 7:lKga5oZ2rLvLJdIZIIQyzFYWei2UmQmXk00FutVX5e2aPKU0k+32R1onFjnktU0T4XWcb3M9txNNi4/SZTYmS4S3YIPlYbo5P1LPFYiB7A4+/2lBfrKurj+JiFYcsbca3mxgJE7MpkIv80IBc7dGFsshPo9WnAwqFNP1mCeue2k/K4FTvlviKLb5XRR1Blr4M2IYIvQdsIXh7VYOUmKwBDv1A4CI5rdADQ9JKWtL3Eh0DfNZ3E+FZP1m5al/qAcP945iAWhzDlxcfop2jh4XtY1cJexXZujUItgqCaPXqhlApIuhNDuqMcDTjnmcUSvfNrrHO93hOohlQb+hrvyZkLiYzd9pwUdeIZM4vMHGOHH9hFdk1PQdA2zkIbq90VLKg0vcA1iI3YEyUG7I3BdlRaGT/+eiCRWFmP8YoSw6hZykv5hc0Z8eAh6f6pkVb/ckFV9N1s1OhUrEGUfd7UBDCA== x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(158342451672863); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040375)(2401047)(5005006)(8121501046)(3002001)(10201501046)(6041248)(20161123562025)(20161123555025)(20161123564025)(20161123560025)(6072148); SRVR:YTXPR01MB0190; BCL:0; PCL:0; RULEID:; SRVR:YTXPR01MB0190; x-forefront-prvs: 018632C080 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(7916002)(39450400003)(199003)(24454002)(189002)(76176999)(50986999)(54356999)(55016002)(3280700002)(54906002)(102836003)(86362001)(122556002)(101416001)(2900100001)(27001)(74482002)(6916009)(189998001)(7116003)(33656002)(2950100002)(92566002)(5660300001)(110136003)(97736004)(9686003)(7696004)(229853002)(77096006)(6506006)(93886004)(6436002)(38730400001)(81166006)(74316002)(105586002)(106116001)(2906002)(305945005)(8676002)(106356001)(8936002)(3660700001)(81156014)(4326007)(68736007); DIR:OUT; SFP:1101; SCL:1; SRVR:YTXPR01MB0190; H:YTXPR01MB0189.CANPRD01.PROD.OUTLOOK.COM; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; received-spf: None (protection.outlook.com: uoguelph.ca does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-originalarrivaltime: 13 Jan 2017 23:02:22.4532 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-Transport-CrossTenantHeadersStamped: YTXPR01MB0190 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Jan 2017 23:02:25 -0000 Slawa Olhovchenkov wrote: [stuff snipped] >> > >> >What data? In may case no data. You have a file system with no files in it. (It is file data I am referring= to.) Admittedly a read-only file system won't get corrupted, but you will still = have trouble reading files, since NFSv4 require that they be Open'd before reading. >> Certain NFSv4 operations (such as open and byte range locking) are stric= tly ordered using a >> seqid#. If you fail an RPC in progress (via a soft timeout or intr via a= signal) then this seqid gets >> out of sync between client and server and your mount is badly broken. > >Mount can be droped? Automatic forced unmount? >Or application can be manual killed for manual unmount? >This is will be perfect for me. This is will be best that current behavior= . Well, since recently written data could be lost, I can't see this ever bein= g automatic. The manual "umount -f " should work, but only if a "umount " has not already been done. (The latter gets stuck in the kernel, usually after = locking the mounted-on vnode and that blocks the subsequent "umount -f ". Someday, I plan on adding a new option to "umount" that goes directly to NF= S (via the nfssvc(2) syscall) to force a dismount, but I haven't gotten around to doing it. Until then, it's "umount -f" or reboot. And please don't use "soft,intr" op= tions, they won't usually help and will break the mount for opening files sooner or later. > >> I do not believe this caused your hang though, since processes were slee= ping on rpccon, which >> means they were trying to do a new TCP connection to the server unsucces= sfully. >> - Which normally indicates a problem with your underlying network fabric= . > >Network can fail always, at any time. >This should not cause a blockage of the system. Would you expect a local filesystem to keep working when the JBOD interface= to a drive is broken. For NFS, a broken network means "can't talk to the file system" just like a= broken JBOD to a file system's drive would mean this. For NFS to work well, you want the most reliable network fabric possible. One the network is fixed, it should again be possible for the mount to work= . (The processes in "rpccon" are trying to create a new TCP connection and wh= en they succeed the mount point should again start working.) rick