From owner-freebsd-scsi@FreeBSD.ORG Sun Jun 14 21:50:37 2015 Return-Path: Delivered-To: freebsd-scsi@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6CBD7AFB for ; Sun, 14 Jun 2015 21:50:37 +0000 (UTC) (envelope-from maxg@mellanox.com) Received: from emea01-am1-obe.outbound.protection.outlook.com (mail-am1on0098.outbound.protection.outlook.com [157.56.112.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "MSIT Machine Auth CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id BB65582C for ; Sun, 14 Jun 2015 21:50:35 +0000 (UTC) (envelope-from maxg@mellanox.com) Received: from DB3PR05CA0077.eurprd05.prod.outlook.com (10.163.44.45) by AMSPR05MB161.eurprd05.prod.outlook.com (10.242.86.11) with Microsoft SMTP Server (TLS) id 15.1.190.14; Sun, 14 Jun 2015 16:16:12 +0000 Received: from DB3FFO11FD050.protection.gbl (2a01:111:f400:7e04::185) by DB3PR05CA0077.outlook.office365.com (2a01:111:e400:9448::45) with Microsoft SMTP Server (TLS) id 15.1.190.14 via Frontend Transport; Sun, 14 Jun 2015 16:16:12 +0000 Authentication-Results: spf=none (sender IP is 193.47.165.134) smtp.mailfrom=mellanox.com; freebsd.org; dkim=none (message not signed) header.d=none; Received-SPF: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) Received: from mtlcas13.mtl.com (193.47.165.134) by DB3FFO11FD050.mail.protection.outlook.com (10.47.217.81) with Microsoft SMTP Server (TLS) id 15.1.190.9 via Frontend Transport; Sun, 14 Jun 2015 16:16:10 +0000 Received: from MTLCAS13.mtl.com (10.0.8.78) by mtlcas13.mtl.com (10.0.8.78) with Microsoft SMTP Server (TLS) id 15.0.775.38; Sun, 14 Jun 2015 19:16:25 +0300 Received: from MTLCAS01.mtl.com (10.0.8.71) by MTLCAS13.mtl.com (10.0.8.78) with Microsoft SMTP Server (TLS) id 15.0.775.38 via Frontend Transport; Sun, 14 Jun 2015 19:16:25 +0300 Received: from [10.223.0.78] (10.223.0.78) by MTLCAS01.mtl.com (10.0.8.71) with Microsoft SMTP Server (TLS) id 14.3.123.3; Sun, 14 Jun 2015 19:16:17 +0300 Message-ID: <557DA8C0.1020209@mellanox.com> Date: Sun, 14 Jun 2015 19:16:00 +0300 From: Max Gurtovoy User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: , Sagi Grimberg , Oren Duer , Hans Petter Selasky Subject: gmultipath HA over iscsi/iser X-Originating-IP: [10.223.0.78] X-EOPAttributedMessage: 0 X-Microsoft-Exchange-Diagnostics: 1; DB3FFO11FD050; 1:RtXU2Ek7/rFXoKftJ2X2xAjZAN/IBa1ALqX3pFRr/RWArAjgxfcAOBI2C+8D+dB+HrAS5PxddVSw1EqCMP8lu8LPXKbdwmVt/y+oKh4r83c1VbtGzKK494Ul2ie8Imj4DM3Vaf2FObn2wnOabSqU9gVSG1TMUPM57cKAOf7IRxqQ3IVnNogotoIg0FeZDEQGcdU7I4CKhhaz57UrjYUjFbSQ27aqZtGcUyVWiTjoTWSdtPgkWaqWHa1uhuwcVVx10bShOr/ijoz7FBoCWt88eRwTr74GjR/SfxfIjE03syE= X-Forefront-Antispam-Report: CIP:193.47.165.134; CTRY:IL; IPV:NLI; EFV:NLI; SFV:NSPM; SFS:(10009020)(428002)(199003)(189002)(164054003)(16236675004)(101416001)(59896002)(92566002)(189998001)(77096005)(450100001)(62966003)(77156002)(46102003)(33656002)(36756003)(107886002)(65956001)(65806001)(5001920100001)(105586002)(6806004)(87936001)(19625215002)(84326002)(50986999)(83506001)(54356999)(4001350100001)(87266999)(86362001)(65816999)(5001770100001)(106466001)(19580395003)(229853001)(3940600001)(4001450100001)(579004)(559001); DIR:OUT; SFP:1101; SCL:1; SRVR:AMSPR05MB161; H:mtlcas13.mtl.com; FPR:; SPF:None; MLV:sfv; A:1; MX:1; LANG:en; X-Microsoft-Exchange-Diagnostics: 1; AMSPR05MB161; 2:+mzFA92Z4OVnGTMEA+XfOeI6nRuzxV564bV9Q88w1tWUSMg/AXHi7MxufzWYD6o2; 2:X1Fdemxy4RFo398rYJ7AAb5TH9J8t9m8xqo7p6esk/vvMdJJPUl9Q3bSr5j6eBY+HEAidbNubYNEeU7WTUaQ9Pga+29tUxfmts9nSo8rCDvmIIUx75NY8zW+arGWV3DEQJvrebD9r8WZDHGO2mMsPYAE19EGcc3SZfKZV3X/c4a7Ytn/Ibi5zd2Reli2AMVSW0ocevXXYFaIKVJIykR3Bl+K/6YNs/sLQjyyQ8+LZdN5rq5AdFfE5qKNCH4m2iCM; 6:eKoFTAmCmabwONV/sN4VMFnDXOeJIbJWQh6Nrrj0JvO6GmLCMcQawwz5mZjkXqwNbEAWVlLqzPyQXk3RWMYkGKa1J+/3k1KoxpFos0MgMhHFw2vCGZoLKjxkeI3xIbr4iqv16ChDLVKsaUx1MNRbEfNNYL0AbUTujG6zskOikcmNlsR0Q2BblCqJcCKj2tAGP1V+GRLRv0ZjhajpmS5YzYcQ/QczEz4C7Bw8fVNf/QyacIudeDB8667rLlCriqXzZPE5KD4aeGCN1zhowmpjx28UzNwAqFyxo2W2Qt/vanWW+6C4AyMqwV6UmzRDSYe/e0proasRo4Eb+gzpkIK9K3HqDkh6fgYNpc5LVAcSExVvzhmM7WCZskIjG7Fj/xrtuNTzx2DPqIsg6wFk9UFqRm0PWsfvVSQTJcUJ6PdkdTSht623NpNVZGMCp02MpqtnnGNapS2Avo0mw0xUmfMwpJrdxFEdO29PSJLM11bA/c0YHpP9N2wQ5HxIK/pHrxPX X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:AMSPR05MB161; X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(601004)(520003)(5005006)(3002001); SRVR:AMSPR05MB161; BCL:0; PCL:0; RULEID:; SRVR:AMSPR05MB161; X-Microsoft-Exchange-Diagnostics: 1; AMSPR05MB161; 3:8WQLDQw1s6eEFkbXGpyouGK9RG+XMtRDYFxjVbxCPrifrKcsJLfgeiFQ7ZPO1cTVb2SD3ruYPRPDpYbbnS+XBuW3tY8tOCn47Qd7c9qXJtJkyhiLYJbZ55TOX8WPkIJSJHgGtNWDNVLOW15mXX8/3M2zjRs3MKxL96nSi1vnwh3ixkxLEmwXtnR8vUSJaXOE7RbWW9UjmTGK1VL1JbDs5aUzvusfR/Tlwx3m6StkvFJua9OD1ALpKaLRXycLIfdhqiNXLRhSn33wrWTuEsVkb+c5h647Tmj7x9p9TjUNmiSYswYKFJOpYw5iBuwcdSj+ X-Forefront-PRVS: 06070568C5 X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; AMSPR05MB161; 9:aCdUyOw5N8g7dh63MMiI/IwIzBIrhMHtv0yGudCh1fR?= =?us-ascii?Q?wWxAFRpTsVrwerqf5H/LbyUxMa7puDSSAKIjjrPJ2CCaE1lQhPB/opKU46YL?= =?us-ascii?Q?Y9Uxlwia3gW5ZJOJREsVzn+CR9kkZDBhS2pcCjV3ucvPBZsropI5hT4TUSpd?= =?us-ascii?Q?d85kuPh+YGcyrHhoPimdaoMgDuZmnbJHwxtvRsmmEK2hWNK+yPvz02ZBpU46?= =?us-ascii?Q?MaE7nbjJK74O706Mbd7lFxw2B7eiTblhfwN78wVleBRBokiF977xDkIoqh8g?= =?us-ascii?Q?HrO3vzzuEJvF+Qg4CepNr1cqvcByzLF+Lap7/TaeKNQLDYD68ua+Kp1HDWr+?= =?us-ascii?Q?8lx+K/gvCaav72zO048w5Zaj+oFvF3i3kc530NJrtF+v7pIjR6P/ofS1KSmu?= =?us-ascii?Q?vjENO7tp2lxmVjTBryG/x3NiC9Bx0pNniohRSDn660/VbCudl/nCRN9V9+ZL?= =?us-ascii?Q?QK7b7fUWQyirzGOnbqiFZCfkVeCvg4EykX3jf2RzPS/Oe5kfIv97FNJBLyfQ?= =?us-ascii?Q?7sZnROaZCM21xf0OSlJvvJuxk0PNGmpsL+AbVcjmEgQDvlWTR8Coo7YPHitJ?= =?us-ascii?Q?NeyqKkYl7JTaUfvwgmFE4OGiWZBMdwdYV6552cDvRskuVznbu2Jbo3W1hFtU?= =?us-ascii?Q?6Wyq4Je6e/gfcIm9HErvd7VthsOehLWpTX29UAvOhYCfL4hStVa6STE327hf?= =?us-ascii?Q?vysk4LQ1KmY9h1hCIeIx/1RSgxUikxmXIaxARVTGYH9Xyi65CeKT4IC4xF1L?= =?us-ascii?Q?2uLIQ8szWUt0SYDVETo6EnuYh3HXTgCairmRjE2zBKJ+l+xtjweg1Mzp6CB6?= =?us-ascii?Q?2DLuMiupmCiWeHPRr6V16SnR+DTQKm91nRA90b1lStcTysGJYMBlumgXqaim?= =?us-ascii?Q?g2aUogT+TzfQzyas81HhjZKN7RClo/mEWymI4Dowo8Vw2Ij3FF1jTsYpFhxi?= =?us-ascii?Q?wgTzcEfw5+0WZRxDd0kA02cm3tSoj0ZBPDf2386wyCNVJD2jeJ5SYta7h+Gs?= =?us-ascii?Q?V61TUaTYeZ2OnYX98/fjfmL9h1E+JeHAsSwi9KeocNv/1sbpd8WMTk6EjcHY?= =?us-ascii?Q?f/Ls=3D?= X-Microsoft-Exchange-Diagnostics: 1; AMSPR05MB161; 3:OScjCrm9Z8voaxuQuKBahtO6TlNqCus47HUWb4XBm2a1he1WGy7y5KJwNNINqFM9++x/wjb4uw6x4VJNjxkI8x4bpRGJ26aqi6aRop37676zD3cBBjSAUITA88FD2JTjj/hDasKRfF+25bFvfBPX9Q==; 10:KPLDgzEKU5MrBYszjY0xgTzFUV/lteH8Uzzrf0BXdHpEoNDhq1tXRETqg5nSnMmsfPvkiS7LVXNUFkW9AAYNaDubJMO6qoMx7/aPjJEo484=; 6:FboOhOeZYz7YIgtW9qovkDKqQV6XUDcsvUE8flDQW2lQgsdhtezzSAge0fPmu5HV X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Jun 2015 16:16:10.5131 (UTC) X-MS-Exchange-CrossTenant-Id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=a652971c-7d2e-4d9b-a6a4-d149256f461b; Ip=[193.47.165.134]; Helo=[mtlcas13.mtl.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AMSPR05MB161 Content-Type: text/plain; charset="windows-1255"; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Jun 2015 21:50:37 -0000 Hello, lately I was testing HA using gmultipath utility over iSCSI/iSER devices. I'm working on 11-current code base. I created 1 LUN on the target side and connected via 2 different physical ports from the initiator side. On the initiator side I see see /dev/da0 and /dev/da1. I created multipath device using: gmultipath label dm0 /dev/da0 /dev/da1. Now I have new device /dev/multipath/dm0. I set kern.iscsi.fail_on_disconnection=1 (to fail IO fast). Issue 1: ------------- I can't run simple fio/dd traffice over /dev/da0 nor /dev/da1. The only traffic that possible is using the multipath device dm0. Is this by design ? In the linux implementation we can run traffic on both block devices and multipath devices. Issue 2: -------------- I run some fio traffic utility over multipath device dm0 on initiator side with port toggling in a loop Port 1 down --> sleep 2 mins (iSCSI/ISER device reconnecting meanwhile with no success) --> port 1 up --> sleep 5 mins (iSCSI/ISER device reconnecting successecfully) Port 2 down --> sleep 2 mins (iSCSI/ISER device reconnecting meanwhile with no success) --> port 2 up --> sleep 5 mins (iSCSI/ISER device reconnecting successecfully) The expected result is that when the port N is down than the traffic moves to the available port and continue succesfully. I run this test for many hours and traffic FAILED (even though there was at least 1 suitable path between initiator and target). log: # gmultipath status Name Status Components multipath/dm_tcp OPTIMAL da0 (ACTIVE) da1 (PASSIVE) multipath/dm_iser OPTIMAL da2 (ACTIVE) da3 (PASSIVE) # fio ..... (over /dev/multipath/dm_iser or /dev/multipath/dm_tcp) fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning. task1: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=8 ... task1: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=8 fio-2.1.3 Starting 8 threads fio: pid=101071, err=6/file:filesetup.c:575, func=open(/dev/multipath/dm_tcp),error=Device not configured task1: (groupid=0, jobs=8): err= 6 (file:filesetup.c:575, func=open(/dev/multipath/dm_tcp), error=Device not configured): pid=101071: Thu Jun 11 17:25:47 2015 read : io=296400MB, bw=32122KB/s, iops=8030, runt=9448911msec clat (usec): min=131, max=5541.8K, avg=504.40, stdev=5660.23 lat (usec): min=132, max=5541.8K, avg=504.55, stdev=5660.23 clat percentiles (usec): | 1.00th=[ 251], 5.00th=[ 298], 10.00th=[ 330], 20.00th=[ 370], | 30.00th=[ 406], 40.00th=[ 446], 50.00th=[ 478], 60.00th=[ 510], | 70.00th=[ 540], 80.00th=[ 580], 90.00th=[ 644], 95.00th=[ 700], | 99.00th=[ 1448], 99.50th=[ 1704], 99.90th=[ 1976], 99.95th=[ 2064], | 99.99th=[ 2256] bw (KB /s): min= 2, max= 5576, per=12.64%, avg=4060.97, stdev=352.37 write: io=295596MB, bw=32034KB/s, iops=8008, runt=9448911msec clat (usec): min=125, max=5541.8K, avg=490.13, stdev=5143.96 lat (usec): min=125, max=5541.8K, avg=490.41, stdev=5143.96 clat percentiles (usec): | 1.00th=[ 239], 5.00th=[ 282], 10.00th=[ 310], 20.00th=[ 354], | 30.00th=[ 390], 40.00th=[ 426], 50.00th=[ 466], 60.00th=[ 502], | 70.00th=[ 532], 80.00th=[ 572], 90.00th=[ 628], 95.00th=[ 692], | 99.00th=[ 1432], 99.50th=[ 1688], 99.90th=[ 1960], 99.95th=[ 2040], | 99.99th=[ 2256] bw (KB /s): min= 3, max= 5512, per=12.64%, avg=4049.74, stdev=355.11 lat (usec) : 250=1.29%, 500=56.84%, 750=38.78%, 1000=0.94% lat (msec) : 2=2.08%, 4=0.07%, 10=0.01%, 20=0.01%, 50=0.01% lat (msec) : 100=0.01%, >=2000=0.01% cpu : usr=0.61%, sys=4.33%, ctx=151634083, majf=0, minf=3 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=75878522/w=75672554/d=0, short=r=0/w=0/d=0 Run status group 0 (all jobs): READ: io=296400MB, aggrb=32121KB/s, minb=32121KB/s, maxb=32121KB/s, mint=9448911msec, maxt=9448911msec WRITE: io=295596MB, aggrb=32034KB/s, minb=32034KB/s, maxb=32034KB/s, mint=9448911msec, maxt=9448911msec # gmultipath status Name Status Components multipath/dm_tcp DEGRADED da1 (ACTIVE) multipath/dm_iser DEGRADED da3 (ACTIVE) We can see that there is Active paths to multipath device but still traffice failed. Any suggestions ? Anyone saw this before ? Thanks, Max Gurtovoy. Mellanox Technologies.