From owner-freebsd-questions@freebsd.org Fri Mar 27 16:54:09 2020 Return-Path: Delivered-To: freebsd-questions@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 4A28927AF5D for ; Fri, 27 Mar 2020 16:54:09 +0000 (UTC) (envelope-from jjohnstone@tridentusa.com) Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2064.outbound.protection.outlook.com [40.107.93.64]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "GlobalSign Organization Validation CA - SHA256 - G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 48pnxc40Tkz3P04 for ; Fri, 27 Mar 2020 16:53:55 +0000 (UTC) (envelope-from jjohnstone@tridentusa.com) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=nq4MMuVt3ERUtLiOkHLU171vufSh2gV+TbPq+qO9uP/YMh56FuZs0P5qq17bOTjes6kZfnSx7EVgC8RthgJAKjlChlpM3g0Mx2fW7rn6dUrFjDuFB2mhnpkuYqI0YvoEOAa0A645tBpwqviiTmUWgyqmdCmA+Ke8LmIYseGqh4HhodHrRXc8+FJGJXOG8bCNttLSAmkxiWr/m5UY8IUYtQc58jleZdDSL2Ak68oWaUsr9cBFT6j/kMXJnIVhhaarDv7EtVa2+7JsyeRigQuVQ1EW8Ru3qTNGZzePIDUrGpsX5WKEdpPMdePMFauEg7yaG2fwQPWLZTuclIUjKtefWA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=FP4iHjRPVHJOnMTEScohwCtEFPrDP9rdqgfk+OicD6M=; b=RucsZaMiEz1x6MbMQSlczo5U+f19fMhLcPXgNJCgsBpk4s+NGgFVR1bkjH0SVKC/OlFAvqvT/kCc4HotiZ2WInqhWrtPiC9epG3ceXB86s/wEeUzjFXwMujE2L0FUxeQ4qN0xgZ2+ldDKAIPN1p+ZQaMbZw4M5lO6XzsmTY3pGWbuQEerZQOEf5+ZOK/aQv8xq7ldGWBE1qSTXPiA6BFXmlPyIxf+8i0k5TDsT9HU5z4Ir9p1e0lXjYqfUbBBcZYaB+rw1ds/WjjqdogF1LBQ8wEcFv4dxAPmbanu+odkXbrliRLn7SQHI2Pi9grFBG0sFyZn/GpXyvRPrx07Tdn/A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=tridentusa.com; dmarc=pass action=none header.from=tridentusa.com; dkim=pass header.d=tridentusa.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tridentusa90.onmicrosoft.com; s=selector1-tridentusa90-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=FP4iHjRPVHJOnMTEScohwCtEFPrDP9rdqgfk+OicD6M=; b=TJfh+lK2glJKt8h7MIsIl6zA87GFSOQAhtIvlwSHlITSe+dhuIBkGrvsJTEsv7EY/w9wcCDfVK5zeOcXcmriq8DS9iqQzV8lDd2sQjEn8d17AKV9kA9twqn1eRwmUNchmFQlBHj/vk167HFvGTttIz0srOGNDV4noKc5MUrBybI= Received: from MN2PR20MB3118.namprd20.prod.outlook.com (2603:10b6:208:1b8::27) by MN2PR20MB3088.namprd20.prod.outlook.com (2603:10b6:208:1b5::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2835.23; Fri, 27 Mar 2020 16:53:46 +0000 Received: from MN2PR20MB3118.namprd20.prod.outlook.com ([fe80::29ed:73e:66dd:a5d9]) by MN2PR20MB3118.namprd20.prod.outlook.com ([fe80::29ed:73e:66dd:a5d9%6]) with mapi id 15.20.2856.019; Fri, 27 Mar 2020 16:53:45 +0000 Subject: Re: drive selection for disk arrays To: freebsd-questions@freebsd.org References: <20200325081814.GK35528@mithril.foucry.net> <713db821-8f69-b41a-75b7-a412a0824c43@holgerdanske.com> <20200326124648725158537@bob.proulx.com> <20200327104555.1d6d7cd9.freebsd@edvax.de> From: John Johnstone Message-ID: Date: Fri, 27 Mar 2020 12:53:34 -0400 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 In-Reply-To: <20200327104555.1d6d7cd9.freebsd@edvax.de> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-ClientProxiedBy: MN2PR07CA0013.namprd07.prod.outlook.com (2603:10b6:208:1a0::23) To MN2PR20MB3118.namprd20.prod.outlook.com (2603:10b6:208:1b8::27) MIME-Version: 1.0 Sender: John Johnstone X-MS-Exchange-MessageSentRepresentingType: 2 Received: from Johns-MBP.fios-router.home (71.255.81.56) by MN2PR07CA0013.namprd07.prod.outlook.com (2603:10b6:208:1a0::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2856.19 via Frontend Transport; Fri, 27 Mar 2020 16:53:45 +0000 X-Originating-IP: [71.255.81.56] X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: abe37c8c-e9a7-401e-72b9-08d7d26f69bb X-MS-TrafficTypeDiagnostic: MN2PR20MB3088: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:8273; X-Forefront-PRVS: 0355F3A3AE X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(136003)(376002)(39830400003)(346002)(366004)(396003)(6916009)(81156014)(31686004)(6486002)(31696002)(6512007)(2616005)(316002)(81166006)(36756003)(8936002)(7846003)(8676002)(956004)(53546011)(6666004)(5660300002)(26005)(2906002)(508600001)(66556008)(52116002)(6506007)(186003)(66946007)(16526019)(66476007); DIR:OUT; SFP:1101; SCL:1; SRVR:MN2PR20MB3088; H:MN2PR20MB3118.namprd20.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; Received-SPF: None (protection.outlook.com: tridentusa.com does not designate permitted sender hosts) X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 2IMAnAFl2sk9TUIQNAcYCWV3r4YqYaXHn9JOFsDymcncuyulDfE9c2GDBHSCUiKgfwwpJjgjeVzdIFfjMCb89QPE9lP8odCf/HYxet++wjvSJXLtJTrom7+3QpI0sJ4FBxGIbW/Vf6uvXrjpliBYKmdTl2d5TrgpvYItrPoCxIxNLwfVQT6xWU+KJ/SawRjTW/o8V5rgR274RelylZvF0LsVOpBjxAJp07V1VdS9uPlTS+KVr3YF/BLeQ+WNTelZFMS5cinIXejgVvJxYBDfGzkIP+WGh+znuPyTVx3lw2fMJKCaERMqMFBeS0khJ0UgYwK9On6sVQPoGIxikuCRt8db8kq6g7tzxivJRkl5Tsy8eROfHvahdul7kDmqV5gfKE+qtr9px+VsXo0ESNl9q0rcg9MPz0E4CG8e35L8fRHRo07iwfYr1TLKvqmNqhSM X-MS-Exchange-AntiSpam-MessageData: kRm0m9X2FnrjfZauDTIP1lOXeIASsp8cwqJM8dYKCY7lFvrVPM6rm8JTe3NLQ71Y0Gcu3cS01yNbjF7ptNABQcbM13QFvY59HuJbXAae9pehBjhCPqBRD5aOYrgX/4EprjTUHtt+PxW+Zhu3LxW0lg== X-OriginatorOrg: tridentusa.com X-MS-Exchange-CrossTenant-Network-Message-Id: abe37c8c-e9a7-401e-72b9-08d7d26f69bb X-MS-Exchange-CrossTenant-OriginalArrivalTime: 27 Mar 2020 16:53:45.6847 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: a5d010c5-207b-4510-bdaf-c382c7a8c714 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: HJIzYX6I2VfSgh6cHLs7RhC218m2/0jT3yBoJ/tRUYSzB6mgs9aYIRCFa7nXPMt/KD5LB3Vz/xjxGEs13Os4qlgVppiyqmjM6ER+7KLYTi8= X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR20MB3088 X-Rspamd-Queue-Id: 48pnxc40Tkz3P04 X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=tridentusa90.onmicrosoft.com header.s=selector1-tridentusa90-onmicrosoft-com header.b=TJfh+lK2; dmarc=none; spf=pass (mx1.freebsd.org: domain of jjohnstone@tridentusa.com designates 40.107.93.64 as permitted sender) smtp.mailfrom=jjohnstone@tridentusa.com X-Spamd-Result: default: False [-4.58 / 15.00]; IP_SCORE(-1.38)[ipnet: 40.64.0.0/10(-3.75), asn: 8075(-3.13), country: US(-0.05)]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; R_DKIM_ALLOW(-0.20)[tridentusa90.onmicrosoft.com:s=selector1-tridentusa90-onmicrosoft-com]; HAS_XOIP(0.00)[]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:40.107.0.0/16]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; TO_DN_NONE(0.00)[]; DMARC_NA(0.00)[tridentusa.com]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; RCVD_COUNT_THREE(0.00)[4]; DKIM_TRACE(0.00)[tridentusa90.onmicrosoft.com:+]; RCVD_IN_DNSWL_NONE(0.00)[64.93.107.40.list.dnswl.org : 127.0.3.0]; RECEIVED_SPAMHAUS_PBL(0.00)[56.81.255.71.khpj7ygk5idzvmvt5x4ziurxhy.zen.dq.spamhaus.net : 127.0.0.10]; FORGED_SENDER(0.30)[jjohnstone-freebsdquestions@tridentusa.com,jjohnstone@tridentusa.com]; RWL_MAILSPIKE_POSSIBLE(0.00)[64.93.107.40.rep.mailspike.net : 127.0.0.17]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:8075, ipnet:40.64.0.0/10, country:US]; FROM_NEQ_ENVFROM(0.00)[jjohnstone-freebsdquestions@tridentusa.com,jjohnstone@tridentusa.com]; MID_RHS_MATCH_FROM(0.00)[]; ARC_ALLOW(-1.00)[i=1] X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Mar 2020 16:54:09 -0000 On 3/27/20 5:45 AM, Polytropon wrote: > When a drive _reports_ bad sectors, at least in the past > it was an indication that it already _has_ lots of them. > The drive's firmware will remap bad sectors to spare > sectors, so "no error" so far. When errors are being > reported "upwards" ("read error" or "write error" > visible to the OS), it's a sign that the disk has run > out of spare sectors, and the firmware cannot silently > remap _new_ bad sectors... > > Is this still the case with modern drives? Yes. And this ties in with the distinction that was made between when an error occurs and when it is noticed or reported. > How transparently can ZFS handle drive errors when the > drives only report the "top results" (i. e., cannot cope > with bad sectors internally anymore)? Do SMART tools help > here, for example, by reading certain firmware-provided > values that indicate how many sectors _actually_ have > been marked as "bad sector", remapped internally, and > _not_ reported to the controller / disk I/O subsystem / > filesystem yet? This should be a good indicator of "will > fail soon", so a replacement can be done while no data > loss or other problems appears. Smartmontools is definitely a help with this. The periodic task /usr/local/etc/periodic/daily/smart is exactly for staying on top of this. If drives are monitored more effectively it makes it more unlikely that you will suffer data loss. Perhaps multiple drives that failed over a short period of time and caused data loss were drives that encountered recoverable errors months, possibly years, before the un-recoverable errors occurred. But those recoverable errors were not handled as well as they could have been by firmware or software. The handling of disk errors is an inherently complicated topic and there is not much time available to discuss it. One thing to keep in mind is the behavior in systems with generic RAID controllers / HBAs and disks is substantially different from those in systems with proprietary controllers / HBAs and disks. The value of proprietary hardware can be debated but when it comes to server class systems, Dell, HP, IBM, Lenovo, etc. and the suppliers they use, go to great lengths in their controller / HBA / disk firmware design to be careful about avoiding failure scenarios that can cause data loss. The SMART technology does allow drives to keep track of various types of errors and to notify hosts before data loss. The proprietary controllers build on top of this with their own design. Their PF and PFA designations are part of this. When people have seen multiple disk failures over a short time period was that with generic hardware or proprietary? This has to be considered in order to properly understand the meaning. In my opinion the relative importance and likelihood a multiple disk failure is different for a generic hardware user compared one using proprietary hardware. No method is perfect but there are differences. SATA vs SAS is also an aspect too. Diversity is also a point for SSDs too. Bulletin: HPE SAS Solid State Drives - Critical Firmware Upgrade Required for Certain HPE SAS Solid State Drive Models to Prevent Drive Failure at 40,000 Hours of Operation which doesn't seem to be specific to HPE. - John J.