Objective 5.3: Given a scenario, troubleshoot drive and RAID issues
Cert: CompTIA A+ Core 1 (220-1201) V15 Domain: 5.0 Hardware and Network Troubleshooting Weight: ~28% of Core 1 (largest domain) Depth: Given a scenario, troubleshoot. Recognize symptoms and apply remediation for storage and RAID issues.
What this objective tests
You should recognize storage symptoms (LED indicators, audible signs of failure, performance and capacity errors, RAID-specific alerts) and know the right next steps for each.
Key facts
LED status indicators:
- Most drive bays and RAID controllers have LEDs that signal drive activity and faults. Green/blue typically means OK or active; amber/red means fault, missing, or rebuilding. Always check the LED state before opening the chassis.
Grinding or clicking sounds:
- Grinding. Usually mechanical wear in an HDD spindle or actuator. Imminent failure.
- Clicking (the "click of death"). HDD read/write head failing to find tracks. Imminent failure. Save data immediately.
Boot symptoms:
- Bootable device not found. BIOS cannot find a bootable OS. Could be missing/disconnected drive, drive failure, corrupted boot sector, or wrong boot order.
- Data loss / corruption. Files missing or corrupted. Could be drive failure, filesystem corruption, malware, or human error.
RAID-specific symptoms:
- RAID failure. A drive in the array has failed. RAID 1/5/6/10 continue operating in degraded mode but with no further fault tolerance until rebuild completes.
- Array missing. The entire RAID array is no longer visible. Could be controller failure, multiple-drive failure beyond the array's tolerance, or RAID configuration loss.
- Audible alarms. Most RAID controllers emit a constant tone when a drive fails or the array degrades. Acknowledge and investigate immediately; the alarm itself is not the failure.
Health monitoring:
- S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology). Drives report internal health metrics: reallocated sectors, read errors, temperature, power-on hours. A S.M.A.R.T. warning is a heads-up that the drive is failing; replace before it fails fully.
- Extended read/write times. Operations that should be fast are now slow. Often the drive remapping bad sectors, which precedes total failure.
- Low performance IOPS. Specifically relevant for SSDs and storage arrays where IOPS (input/output operations per second) is the key metric. Drops indicate failing drives, controller issues, or array stress.
Missing drives in OS:
- A drive shows in BIOS but not in the OS, or vice versa. Could be filesystem damage, cable issues, drive controller failure, or a permission/mount issue.
Diagnostic approach
- Listen for grinding or clicking. Stop the drive if it's still spinning and back up data immediately.
- Check LEDs on the drive bay or RAID controller.
- Read S.M.A.R.T. data via OS tools (CrystalDiskInfo, smartctl on Linux/macOS) or the RAID controller's management interface.
- Check BIOS to see if the drive is detected.
- Check OS Disk Management or equivalent for partition and filesystem status.
- For RAID: check the controller's UI for array status, drive status, rebuild progress.
- For data recovery: stop using the drive if possible. Forensic recovery is easier on a quiesced drive than on one still being written.
Common gotchas
- Don't rebuild a RAID without backups. Rebuilding stresses the surviving drives. If a second drive is on the edge of failure, the rebuild can finish the job. Always have backups before rebuilding.
- S.M.A.R.T. is not perfect. A drive can fail without S.M.A.R.T. warnings. Conversely, S.M.A.R.T. can flag drives that keep running for months. Use it as one signal, not the only signal.
- Boot order vs missing drive. A "bootable device not found" message can mean the BIOS boot order changed (new USB plugged in first) rather than the drive failing.
- Loose SATA cable. Common cause of intermittent drive issues. Reseat both ends.
- Cloning a failing drive. Use specialized tools that tolerate read errors (ddrescue, HDD Regenerator, vendor utilities). A normal clone aborts on the first bad sector.
- RAID 5 rebuild risk on large drives. A rebuild on multi-TB drives can take days. Second drive failures during rebuild are the most common total-array-loss event. Consider RAID 6 or RAID 10 for large drives.
- Array missing after controller swap. Different RAID controllers store metadata differently. Swapping a controller across vendors can lose the array configuration even though all drives are intact.
Real-world context
For SMB Revtek customers:
- Backups first. RAID is for availability. Backup is for data protection. These are different concerns.
- Monitor S.M.A.R.T. on every drive (especially NAS, file servers). Replace warning drives proactively.
- Have spare drives on the shelf for any production RAID. Rebuild as soon as a drive fails.
- Test backup restores quarterly. A backup you cannot restore is no backup at all.
Common helpdesk calls:
- "My computer is slow at saving files." Check drive health. Often a dying HDD remapping sectors.
- "It says no boot device." Check BIOS, drive presence, boot order, cables.
- "The server alarm is going off." A drive in the array failed. Acknowledge, replace, monitor rebuild.
- "I deleted a file by accident." Check Recycle Bin, then backups. Never run recovery tools against a failing drive without imaging it first.
Sources
- [CompTIA A+ 220-1201 Exam Objectives Version 4.0, Section 5.2](../../../../../../30-RevyTechJourney/CompTIA%20A%2B%20220-1201%20Exam%20Objectives%20%284.0%29.pdf)
- [Wikipedia: S.M.A.R.T.](https://en.wikipedia.org/wiki/S.M.A.R.T.)
- [Wikipedia: Standard RAID levels](https://en.wikipedia.org/wiki/Standard_RAID_levels)
- [Wikipedia: Hard disk drive failure](https://en.wikipedia.org/wiki/Hard_disk_drive_failure)
- [Microsoft Learn: Storage troubleshooting](https://learn.microsoft.com/en-us/troubleshoot/windows-server/backup-and-storage/storage-resources)
