Автор Тема: Это нормально или жесткий диск скоро помрет? (Прочитано 17881 раз)

kaligari · « **Ответ #15 :** 03.11.2019 11:34:06 »

Цитата: Kalt от 23.10.2019 19:53:39

Да возьмите вы с офсайта свежую V- 5.03

Отлично, что Victoria HDD обновилась после нескольких лет. Вопрос к знатокам: есть такая программа HDD Regenerator. Она действительно bad-блоки способна вылечить? Читал разные мнения, тут вроде не советуют использовать, на разных форумах пишут что способна убить диск.

Koi · « **Ответ #16 :** 03.11.2019 13:26:24 »

Из вики...
Причины появления:
Постепенный износ магнитного покрытия диска
Мельчайшие частицы пыли, просочившиеся через фильтр
Механические повреждения при ударе

https://forum.altlinux.org/index.php?topic=34370.msg250360#msg250360

Mr.Madguy · « **Ответ #17 :** 03.11.2019 15:28:11 »

Цитата: kaligari от 03.11.2019 11:34:06

Отлично, что Victoria HDD обновилась после нескольких лет. Вопрос к знатокам: есть такая программа HDD Regenerator. Она действительно bad-блоки способна вылечить? Читал разные мнения, тут вроде не советуют использовать, на разных форумах пишут что способна убить диск.

Victoria тоже это умеет. Просто выберите опцию Refresh во время тестирования диска. Диск эта операция может убить только в том случае, если он и так уже умирает. В таком случае лучше сначала попробовать скопировать данные при помощи ddrescue. Проблема в том, что начиная с Win7 эта опция не работает, если на диске есть загрузочная запись. Можно конечно попробовать с помощью версии под DOS, но она уже давно не обновляется. Жаль нет версии под Linux. Там все намного проще. Сделал загрузочную флэшку и дело в шляпе.

И да. Вроде проблема решена. На другом форуме я наконец получил исчерпывающий ответ. Данные параметры действительно не отражают здоровье диска, что несколько запутывает пользователей данных дисков.

Спойлер

Seagate's Seek Error Rate, Raw Read Error Rate and Hardware ECC Recovered SMART attributes

09-14-2011 11:44 PM - last edited on 09-15-2011 01:28 AM
Seagate's Seek Error Rate, Raw Read Error Rate, and Hardware ECC Recovered SMART attributes create a lot of anxiety amongst Seagate users. This is because the raw values are typically very high, and the normalised values (Current / Worst / Threshold) are usually quite low. Despite this, the numbers in most cases are perfectly OK.

The anxiety arises because we intuitively expect that the normalised values should reflect a "health" score, with 100 being the ideal value. Similarly, we would expect that the raw values should reflect an error count, in which case a value of 0 would be most desirable. However, Seagate calculates and applies these attribute values in a counterintuitive way.

In fact the normalised values of Seagate's Seek Error Rate, Raw Read Error Rate, and Hardware ECC Recovered attributes are logarithmic, not linear, and the raw values are sector counts or seek counts, not error counts.

Seagate's SMART documentation is not publicly available. The following information has not been gleaned from any official source, but is based on my own testing and observation, and on testing by others. Therefore it may contain errors.

Seek Error Rate

The raw value of each SMART attribute occupies 48 bits. Seagate's Seek Error Rate attribute consists of two parts -- a 16-bit count of seek errors in the uppermost 4 nibbles, and a 32-bit count of seeks in the lowermost 8 nibbles. In order to see these data, we will need a SMART utility that reports all 48 bits, preferably in hexadecimal. Two such utilities are HD Sentinel and HDDScan.

I believe the relationship between the raw and normalised values of the SER attribute is given by ...

normalised SER = -10 log (lifetime seek errors / lifetime seeks)

In the above formula, if the drive has recorded no errors, then we would still need to set the number of errors to 1, otherwise the result would be indeterminate.

The following table correlates the normalised SER against the actual error rate:
90 = <= 1 error per 1000 million seeks
80 = <= 1 error per 100 million
70 = <= 1 error per 10 million
60 = <= 1 error per million
50 = 10 errors per million
40 = 100 errors per million
30 = 1000 errors per million
20 = 10 errors per thousand

A drive that has not yet recorded 1 million seeks will show 100 and 253 for the Current and Worst values. I believe this is because the data are not considered to be statistically significant until the drive has recorded 1 million seeks. When this target is reached, the values drop to 60 and 60, assuming there have been no errors.

By way of example, here are the SMART data for my 13GB Seagate HDD:
http://www.users.on.net/~fzabkar/SmartUDM/13GB.RPT
Attribute ID Threshold Value Worst Raw
===============================================================
Seek Error Rate 7 30 53 38 052E0E3000EC

The number of lifetime seek errors = 0x052E (uppermost 4 nibbles) = 1326

The number of lifetime seeks = 0x0E3000EC (lowermost 8 nibbles) = 238 026 988

Using Google's calculator ...

0x052E = 1326
0x0E3000EC = 238 026 988

http://www.google.com/search?q=0x052E+in+decimal
http://www.google.com/search?q=0x0E3000EC+in+decimal

Applying the formula ...

normalised SER = -10 log (0x052E / 0x0E3000EC)

http://www.google.com/search?q=-10+log+(0x052E+/+0x0E3000EC)

... we get a result of 52.54.

Here is a second example:
http://www.users.on.net/~fzabkar/SmartUDM/120GB.RPT
Attribute ID Threshold Value Worst Raw
===============================================================
Seek Error Rate 7 30 79 60 00000580A6AC

The above drive is in fact error free. It has recorded 0x0580A6AC seeks (= 92 million) without error.

Applying the formula ...

normalised SER = -10 log (1 / 0x0580A6AC)

... we get a result of 79.65

Note that we have used 1 instead of 0 for the error count (because log 0 is indeterminate).

Raw Read Error Rate and Hardware ECC Recovered

The raw values of the RRER and HER attributes represent a sector count, not an error count. This figure rolls over to 0 once the count reaches about 250 million. I suspect that the drive records the total number of errors in each block of 250 million sectors, and then recalculates the normalised values of each attribute accordingly. This means that RRER and HER would be updated according to a rolling average rather than on a lifetime basis. I'm almost certain that the normalised values are also logarithmic, but I'm not sure how they are calculated. The above figure of 250 million sectors applies to the 7200.11 and DiamondMax 22 models, but may not apply to all.

While writing this article I came upon a Seagate document entitled "Diagnostic Commands". It doesn't discuss SMART attributes, but it refers to "Error Recovery Usage Rate" and defines it as ...

Error Recovery Usage Rate =

-log10 {(Number of sectors in which controller invoked specified error recovery scheme)/[(Number of sectors transferred) * (512 bytes/sector) * (8 bits/byte)]}

This lends support for my Seek Error Rate formula, and suggests that the RRER and HER attributes may be similarly calculated.

In fact the document mentions (but does not discuss) 5 different error recovery schemes:

HARD = multiple retries invoked and failed
FIRM = multiple retries invoked
SOFT = 5 retries invoked
OTF = 1 retry invoked (On The Fly)
RAW = OTF ECC invoked

"On The Fly" means that errored data is corrected using the ECC bytes, without an additional access of the platters.

Based on the abovementioned Error Recovery Usage Rate formula, I now postulate that the normalised value of the Raw Read Error Rate attribute could be calculated as follows:

normalised RRER = -10 log (number of errored sectors / total bits transferred)

The total number of bits is ...

(250 million sectors) x (512 bytes/sector) x (8 bits/byte) = 1.024 x 10^12

It seems to me that it makes more sense to use a round figure, say 10^12.

If we now let the number of errors equal 0 (or 1), then we have ...

max normalised RRER = -10 log (1 / 10^12) = 120

Similarly, if we let the number of errors equal 250 million (ie every sector is errored), then we have ...

min normalised RRER = -10 log (1 / 4096) = 36

Therefore, if my hypothesis is correct, we would expect that the threshold value of the RRER attribute would be 36, and its maximum possible value would be 120. In fact my Internet research tends to confirm a maximum of 120 for 7200.11 models, but the threshold figure is 34.

FWIW, here are the numbers for my own Seagate drives:
Attribute ID Threshold Value Worst Raw
================================================================
Raw Read Error Rate 1 6 114 100 00000386EBBA (ST3320620A)
Raw Read Error Rate 1 6 64 62 00000AFD20E3 (ST3120026A)
Raw Read Error Rate 1 34 77 66 000007820F8F (ST340016A)
Raw Read Error Rate 1 0 79 78 00000753BA8E (ST313021A)

Hardware ECC recovered 195 0 100 63 00000C62F66E (ST3320620A)
Hardware ECC recovered 195 0 64 62 00000AFD20E3 (ST3120026A)
Hardware ECC recovered 195 0 77 66 000007820F8F (ST340016A)

http://www.users.on.net/~fzabkar/SmartUDM/320GB.RPT
http://www.users.on.net/~fzabkar/SmartUDM/120GB.RPT
http://www.users.on.net/~fzabkar/SmartUDM/40GB.RPT
http://www.users.on.net/~fzabkar/SmartUDM/13GB.RPT

References:

Here are several Usenet discussions where I have posted the results of my experiments:

Seagate - SMART Raw Read Error Rate test:
http://groups.google.com/group/comp.sys.ibm.pc.hardw…b6eb8aa2476f9ca...

SER, RRER, and HEC discussion:
http://groups.google.com/group/comp.sys.ibm.pc.hardw…54b8ad6d34549e9...

Seek Error Rate discussion:
http://groups.google.com/group/comp.sys.ibm.pc.hardw…87001db5c567fb9...

A report from a Seagate user regarding the RRER attribute:
http://forums.seagate.com/t5/Barracuda-XT-Barracuda-…-500GB-S-M-A-R-...

HD Sentinel (DOS / Windows / Linux):
http://www.hdsentinel.com/

HDDScan for Windows:
http://hddscan.com/

Explanation of SMART attributes:

Speccyfighter · « **Ответ #18 :** 03.11.2019 16:35:18 »

Цитата: kaligari от 03.11.2019 11:34:06

Цитата: Kalt от 23.10.2019 19:53:39
Да возьмите вы с офсайта свежую V- 5.03

Отлично, что Victoria HDD обновилась после нескольких лет. Вопрос к знатокам: есть такая программа HDD Regenerator.

dd+whdd:
https://forum.altlinux.org/index.php?topic=13216.msg272847#msg272847
https://forum.altlinux.org/index.php?topic=13216.msg273716#msg273716

Цитата: kaligari от 03.11.2019 11:34:06

Она действительно bad-блоки способна вылечить?

Только если это софтбэды и блины hdd не сыпятся окончательно (устранение софтбэдов и блокировка хардбэдов):
Приёмы профессиональной работы в shell. Сражаемся с bad-блоками

Цитата: kaligari от 03.11.2019 11:34:06

Читал разные мнения, тут вроде не советуют использовать, на разных форумах пишут что способна убить диск.

Брэхня.
Диск убивается, если убивается системный трек hdd.

Но если hdd на пределе износа, то сдохнуть жёсткий может в любой момент:
Дела железные... Наша сила в плавках.
https://forum.altlinux.org/index.php?topic=13216.msg268919#msg268919

Т.е. с момента начала заметной вибрации по шпинделю, hdd прожил ещё года полтора:
https://forum.altlinux.org/index.php?topic=13216.msg268919#msg268919

Speccyfighter · « **Ответ #19 :** 03.11.2019 16:51:43 »

Цитата: Mr.Madguy от 03.11.2019 15:28:11

Цитата: kaligari от 03.11.2019 11:34:06
HDD Regenerator.
Victoria тоже это умеет.

Ну и нафик тогда этот Линукс впёрся, если для критических ситуаций железу Форточку подавай?

kaligari · « **Ответ #20 :** 03.11.2019 18:26:57 »

Цитата: Speccyfighter от 03.11.2019 16:35:18

dd+whdd

Спасибо за совет, пойду тестировать на старом HDD, может что-то смогу исправить.

Mr.Madguy · « **Ответ #21 :** 03.11.2019 18:56:52 »

А в whdd есть функция перезаписать или обновить данные только при превышении тайм-аута? Что зря здоровые сектора то гонять? К тому же вручную как то сложно определить, как правильно сделать операцию в стиле забэкапить данные c LBA=x до LBA=y, затереть все нулями, а потом залить обратно.

Но моему старому диску это все равно не особо поможет. Такой способ работает только для софт бэдов. Аппаратное переназначение исправить можно только через порт данных, а это без специального оборудования сделать нельзя.

Speccyfighter · « **Ответ #22 :** 03.11.2019 22:37:29 »

Цитата: Mr.Madguy от 03.11.2019 18:56:52

А в whdd есть функция перезаписать или обновить данные только при превышении тайм-аута?

В whdd функция записи есть только в процедуре Write zeros (если не считать Device copying):

Код: [Выделить]

# sed -n '20,22p' whdd_procedures.txt | sed 's/^ *//g' | sed 's/ *$//g'
� �    Read test           � �
� �    Device copying      � �
� �    Write zeros         � �

Код: [Выделить]

# sed -n '21,23p' whdd_write_zeros_help.txt | sed 's/^ *//g' | sed 's/ *$//g'
� Fills device space with   �
� zeros. Uses POSIX write() �
� call, in direct mode      �

Цитата: Mr.Madguy от 03.11.2019 18:56:52

Аппаратное переназначение исправить можно ...

Зачем? Померла так померла...
В wdc, релоцированный сектор сначала станет кандидатом на релокацию, а это уже повод для беспокойства:

Код: [Выделить]

# smartctl -a /dev/sda | sed -n '72p'
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0

Закрывать глаза на этот атрибут, это всё равно что стать терапевтом и закрыться от больного на все замки, в то время как больной (винчестер) ломится (через smart) в дверь к врачу.
Грубо и образно говоря, перед тем как попасть в морг (в релоцируемые), больной стучится к терапевту (становится кандидатом на релокацию). Если конечно терапевт (сисадмин) не глухой и не слепой. С винчестерами поддерживающими в smart 197-ой атрибут всё точно так же.

Mr.Madguy · « **Ответ #23 :** 04.11.2019 20:24:05 »

Цитата: Speccyfighter от 03.11.2019 22:37:29

В whdd функция записи есть только в процедуре Write zeros (если не считать Device copying):

Да вот плохо. Это очень удобная функция. Первый раз просто прогоняешь тестом. Второй раз по логу запускаешь на поломанной области с нужным тайм аутом и выбранной опцией рефрэш. И все. Но через API это не работает. Защита от записи.

Цитата: Speccyfighter от 03.11.2019 22:37:29

Зачем? Померла так померла...

Да не знаю. У нас свет бывает вырубают. Есть подозрение, что битые сектора появились из за этого. Т.к. общее состояние диска по тестам вроде ничего. Бэды давно появились и с тех пор висят стабильно. Если бы он помирал, то наверное уже бы умер. Кто его знает, как контроллер там определяет бэды. Пытается ли он их рефрэшнуть перед тем как хоронить? Хотелось бы попробовать их рефрэшнуть. Вдруг поможет. Просто если это так, то новый диск ждет та же участь. После каждого выключения света я старался всегда прогонять диск скандиском. Софт-бэдов не было никогда.

Форум сообщества
Альт Линукс