Page 1 of 1

Hitachi 7k1000 1TB drive failures

Posted: Sun Feb 06, 2011 4:37 pm
by Red Squirrel
I've always used these drives in my servers as they are cheap and have good performance for their price.

I had 3 in raid 5 and they ran great and are still running now. I recently added 2 more drives, 7k1000.c to be exact (honestly not sure what they others are, but they are a few years older) maybe a month ago. BOTH are failing. I ordered 3 more, added two of them in so I can rebuild the raid on them and pull out the others... the two new ones are failing too!

I am currently trying the 3rd new one. This time I'm going to give it more burn in time before I rebuild the raid to it. so far so good...

Also one of the 3 newer ones only had one error, and it seems to have cleared but I still don't trust these drives anymore.

Here's some error output of a few of the drives:

drive 1:

[root@borg rsbackup]# smartctl -a /dev/sde
smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright © 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:    Hitachi HDS721010CLA332
Serial Number:    JP2940HD034DRC
Firmware Version: JP4OA3EA
User Capacity:    1,000,204,886,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:  8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Sat Feb  5 20:07:19 2011 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
    was suspended by an interrupting command from host.
    Auto Offline Data Collection: Enabled.
Self-test execution status:      (  0) The previous self-test routine completed
    without error or no self-test has ever
    been run.
Total time to complete Offline
data collection:    (9988) seconds.
Offline data collection
capabilities:    (0x5b) SMART execute Offline immediate.
    Auto Offline data collection on/off support.
    Suspend Offline collection upon new
    command.
    Offline surface scan supported.
    Self-test supported.
    No Conveyance Self-test supported.
    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
    power-saving mode.
    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
    General Purpose Logging supported.
Short self-test routine
recommended polling time:  (  1) minutes.
Extended self-test routine
recommended polling time:  ( 167) minutes.
SCT capabilities:        (0x003d) SCT Status supported.
    SCT Feature Control supported.
    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x000b  096  096  016    Pre-fail  Always      -      262148
  2 Throughput_Performance  0x0005  135  135  054    Pre-fail  Offline      -      97
  3 Spin_Up_Time            0x0007  118  118  024    Pre-fail  Always      -      320 (Average 322)
  4 Start_Stop_Count        0x0012  100  100  000    Old_age  Always      -      24
  5 Reallocated_Sector_Ct  0x0033  100  100  005    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x000b  100  100  067    Pre-fail  Always      -      0
  8 Seek_Time_Performance  0x0005  138  138  020    Pre-fail  Offline      -      31
  9 Power_On_Hours          0x0012  100  100  000    Old_age  Always      -      2252
10 Spin_Retry_Count        0x0013  100  100  060    Pre-fail  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      23
192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      24
193 Load_Cycle_Count        0x0012  100  100  000    Old_age  Always      -      24
194 Temperature_Celsius    0x0002  253  253  000    Old_age  Always      -      23 (Lifetime Min/Max 18/31)
196 Reallocated_Event_Count 0x0032  100  100  000    Old_age  Always      -      0
197 Current_Pending_Sector  0x0022  100  100  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0008  100  100  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x000a  200  200  000    Old_age  Always      -      3

SMART Error Log Version: 1
ATA Error Count: 1
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 occurred at disk power-on lifetime: 1838 hours (76 days + 14 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 09 67 0d ad 09

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 28 30 48 0d ad 40 00  20d+04:03:37.681  READ FPDMA QUEUED
  60 10 28 80 09 ad 40 00  20d+04:03:37.680  READ FPDMA QUEUED
  60 28 20 30 f4 ac 40 00  20d+04:03:37.680  READ FPDMA QUEUED
  60 10 00 70 51 10 40 00  20d+04:03:37.680  READ FPDMA QUEUED
  60 10 30 48 71 ed 40 00  20d+04:03:37.674  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.





Drive 2:

[root@borg rsbackup]# smartctl -a /dev/sdf
smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright © 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:    Hitachi HDS721010CLA332
Serial Number:    JP2940HD032PMC
Firmware Version: JP4OA3EA
User Capacity:    1,000,204,886,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:  8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Sat Feb  5 20:08:10 2011 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
    was suspended by an interrupting command from host.
    Auto Offline Data Collection: Enabled.
Self-test execution status:      (  0) The previous self-test routine completed
    without error or no self-test has ever
    been run.
Total time to complete Offline
data collection:    (10517) seconds.
Offline data collection
capabilities:    (0x5b) SMART execute Offline immediate.
    Auto Offline data collection on/off support.
    Suspend Offline collection upon new
    command.
    Offline surface scan supported.
    Self-test supported.
    No Conveyance Self-test supported.
    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
    power-saving mode.
    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
    General Purpose Logging supported.
Short self-test routine
recommended polling time:  (  1) minutes.
Extended self-test routine
recommended polling time:  ( 175) minutes.
SCT capabilities:        (0x003d) SCT Status supported.
    SCT Feature Control supported.
    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x000b  094  094  016    Pre-fail  Always      -      786447
  2 Throughput_Performance  0x0005  134  134  054    Pre-fail  Offline      -      100
  3 Spin_Up_Time            0x0007  120  120  024    Pre-fail  Always      -      301 (Average 329)
  4 Start_Stop_Count        0x0012  100  100  000    Old_age  Always      -      21
  5 Reallocated_Sector_Ct  0x0033  100  100  005    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x000b  100  100  067    Pre-fail  Always      -      0
  8 Seek_Time_Performance  0x0005  140  140  020    Pre-fail  Offline      -      30
  9 Power_On_Hours          0x0012  100  100  000    Old_age  Always      -      2252
10 Spin_Retry_Count        0x0013  100  100  060    Pre-fail  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      19
192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      21
193 Load_Cycle_Count        0x0012  100  100  000    Old_age  Always      -      21
194 Temperature_Celsius    0x0002  253  253  000    Old_age  Always      -      23 (Lifetime Min/Max 18/30)
196 Reallocated_Event_Count 0x0032  100  100  000    Old_age  Always      -      0
197 Current_Pending_Sector  0x0022  100  100  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0008  100  100  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x000a  200  200  000    Old_age  Always      -      0

SMART Error Log Version: 0
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Sure it's normal for drives to fail every now and then, but 4 in a row?!?!

Do not buy!

Archived topic from Iceteks, old topic ID:5389, old post ID:39873

Hitachi 7k1000 1TB drive failures

Posted: Thu Feb 10, 2011 6:19 pm
by Red Squirrel
Got another one that failed. What pieces of crap.

Archived topic from Iceteks, old topic ID:5389, old post ID:39876

Hitachi 7k1000 1TB drive failures

Posted: Fri Feb 11, 2011 11:15 am
by Triple6_wild

What about WD? They have been reliable to me for sure though I don't run them in a raid but for price I wouldn't buy anything else. Not the quietest drives but they survive some abuse for a few years.

Archived topic from Iceteks, old topic ID:5389, old post ID:39877

Hitachi 7k1000 1TB drive failures

Posted: Fri Feb 11, 2011 8:10 pm
by Red Squirrel
Yeah I'm leaning towards WD or Samsung for my next drives.

Though I'm starting to wonder if I have another issue. The last drive that dropped out I put in another slot and it rebuilt the array with no new errors. I think I might have a more serious problem brewing on my server, which sucks as I don't really want to be trying to troubleshoot some weird I/O issues on a critical server.

Archived topic from Iceteks, old topic ID:5389, old post ID:39878

Hitachi 7k1000 1TB drive failures

Posted: Fri Feb 11, 2011 9:48 pm
by Triple6_wild

Wouldn't even know how to trouble shoot an I/O issue, Way above me :lol:
But anyways ya I would have to agree that the odds of 5 drives failing that quick are low even if the company was running horrible quality control.

Only odd thing is you said you had 3 older drives in there also and they are fine right? Have you tried re-arranging the drives so the older ones are in the slots causing the failures?
I'm 99.9% sure you have backups so if your gonna lose another drive testing at least have the older ones take the hit / prove it's the new drives are bad

Never used hitachi though so I donno anything about there quality but like I said Ive got WD in my ps3 and PC right now. My external is also WD. Horrible noises started 2-3 months after buying the external but it was running in a very damp/dirty basement for most of it's life, It's at least 3 years old now has been moved alot and dropped a few times, never really cleaned of dirt but still runs very reliable. Really the noises coming out of it sometimes would make you cringe/panic/backup but its been doing it for years and has never lost a file.

Archived topic from Iceteks, old topic ID:5389, old post ID:39879