Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 27 Mar 2020 10:49:58 +0300
From:      Artem Kuchin <artem@artem.ru>
To:        freebsd-fs@freebsd.org
Subject:   Recovering bad sectors and smartctl no lba in error report
Message-ID:  <345b7285-958b-ef52-70a9-084872cf7409@artem.ru>

next in thread | raw e-mail | index | archive | help
Hello!

One of my RAID 1 disks went a little 'woohoo' and i got at least one 
read error on swap partition.

I've disabled swap alltogether (and it actually made everything better) 
and have run smartctl test.

here is the output: https://artem.ru/ada2.txt

I will describe my logic step by step and closer to the end i will haveĀ  
questions. You can skip to the end

to the QUESTIONS sections :)


What's strange is that

   5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       8
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0

So, seectors are in read error state, but off line uncrorrectable is 0. Okay, now the test
results
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     46183         -
# 2  Extended offline    Completed: read failure       20%     46181         -
# 3  Short offline       Completed without error       00%     46170         -

As you see - NO LBAsecrtor is sepcified.

 From the log:

rror 5 occurred at disk power-on lifetime: 46151 hours (1922 days + 23 hours)
   When the command that caused the error occurred, the device was active or idle.

   After command completion occurred, registers were:
   ER ST SC SN CL CH DH
   -- -- -- -- -- -- --
   40 51 a0 08 de 3e 0b  Error: UNC at LBA = 0x0b3ede08 = 188669448

   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
   -- -- -- -- -- -- -- --  ----------------  --------------------
   60 80 48 e8 84 4e 40 00      10:43:18.103  READ FPDMA QUEUED
   61 08 40 48 04 21 40 00      10:43:18.103  WRITE FPDMA QUEUED
   60 40 38 e8 94 32 40 00      10:43:18.103  READ FPDMA QUEUED
   61 08 30 20 b9 ef 40 00      10:43:18.103  WRITE FPDMA QUEUED
   61 30 28 68 22 03 40 00      10:43:18.103  WRITE FPDMA QUEUED

And  188669448 is the only LBA mentioned in the log.

So, my logic is the following:

This HDD has "Sector Sizes:     512 bytes logical, 4096 bytes physical"

So, LBA/(4096/512) =  physical sector number
So, what i need to write the whole physical sector (8 lba) to trigger
sector relocation.
Like doing simple:
|dd if=/dev/zero of=/dev/ada2 bs=4096 count=1 seek=CALCULATED_VALUE then 
do fsync to really make it write to hdd However, i need to know what 
file is damaged. So, now to the questions/ |
QUESTIONS:
1) Why smart report does not show LBA in the test result table?
2) Is my logic correct?
3) How  do i find what file is using LBA/SECTOR ?
4) I se that there are 9 pending sectors. Is it physical sectors or LBA? If LBA then okay, it matches one physical
sector, but if it is physical sector tben how to get a list of them?


Artem




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?345b7285-958b-ef52-70a9-084872cf7409>