In September 2016, Samsung issued one of the largest consumer electronics recalls in history: 2.5 million Galaxy Note 7 devices were pulled from shelves worldwide. The phones were catching fire—in pockets, on nightstands, aboard aircraft. The FAA banned them from flights entirely.
The recall cost Samsung an estimated $5.3 billion directly, with brand damage adding billions more. The root cause? Two separate battery design flaws that weren't caught before mass production.
This was a failure mode that systematic analysis should have identified.
How Batteries Became Bombs
Investigation revealed two distinct failure modes:
- Supplier A defect: The upper-right corner of the cell was too small, causing electrode crimping and internal short circuits
- Supplier B defect: A welding defect created protrusions that punctured the separator between electrodes
Both conditions led to the same effect: thermal runaway. The batteries would overheat, swell, and ignite.
What makes this case instructive is that neither defect was mysterious or unpredictable:
- The physics of lithium-ion batteries are well understood
- The conditions for thermal runaway are documented
- The failure modes existed in engineering knowledge
They just weren't systematically examined in the context of this specific design under these specific manufacturing conditions.
FMEA: The Methodology That Could Have Changed Everything
Failure Mode and Effects Analysis (FMEA) originated in the 1940s when the U.S. military needed to ensure reliability in increasingly complex systems. NASA adopted it after early spacecraft failures made the consequences of unexamined risks painfully clear. The automotive and aerospace industries now mandate FMEA for safety-critical components.
The methodology is straightforward in principle: for every function a component must perform, identify how that function can fail, assess the consequences, and ensure adequate controls exist.
Traditional FMEA uses the Risk Priority Number (RPN)—Severity times Occurrence times Detection, each rated 1-10. Modern approaches like AIAG-VDA refine this with the Action Priority system and a structured seven-step process:
- Planning and Preparation – Define the analysis scope and assemble the right team
- Structure Analysis – Map the system architecture and component relationships
- Function Analysis – Document what each component must do
- Failure Analysis – Identify potential failure modes for each function
- Risk Analysis – Assess severity, occurrence likelihood, and detection capability
- Optimization – Implement controls for high-priority risks
- Results Documentation – Create traceable records for audits and continuous improvement
Applying FMEA to the Note 7 Battery Design
Let's walk through how this methodology would have approached the Note 7's battery.
Structure Analysis
The battery cell interfaces with:
- Phone chassis
- Charging circuitry
- Thermal management system
- External environment
The cell itself contains electrodes, separator, electrolyte, and casing.
Function Analysis
- The cell must store and release electrical energy safely
- The separator must prevent electrode contact while allowing ion flow
- The casing must contain cell expansion during normal operation
Failure Analysis
This is where the Note 7's problems would surface:
- Failure Mode: Manufacturing variation causes electrode proximity to cell edge
- Potential Effect: Internal short circuit during charging or physical stress
- Cause: Insufficient design margin for manufacturing tolerances
- Failure Mode: Welding defects create protrusions penetrating separator
- Potential Effect: Internal short circuit leading to thermal runaway
- Cause: Process variation in electrode tab welding
Risk Assessment
Both failure modes would rate highly concerning:
- Severity: 10 (fire risk, personal injury)
- Occurrence: 4-5 (depending on process capability data)
- Detection: 7-8 (X-ray inspection possible but not routinely performed on all units)
RPN scores of 280-400 demand immediate action. The optimization step would have prescribed additional design margin, enhanced incoming inspection, and destructive testing protocols that might have caught these issues before 2.5 million units shipped.
Electronics Failure Modes Worth Systematic Analysis
Electrostatic Discharge (ESD) Sensitivity
- Failure Mode: ESD event during handling damages IC junction
- Effect: Latent defect causing field failure weeks or months later
- Current Controls: ESD-safe handling procedures
- Detection Rating: 8 (damage may not appear in functional test)
- Action: ESD simulation testing, design hardening for critical paths
Solder Joint Reliability Under Thermal Cycling
- Failure Mode: Repeated temperature cycling causes solder fatigue
- Effect: Intermittent connection progressing to open circuit
- Current Controls: Accelerated life testing (limited cycles)
- Detection Rating: 6 (failures may take months to manifest)
- Action: Enhanced thermal cycling tests, design margin for coefficient of thermal expansion mismatch
Component Derating Violations
- Failure Mode: Component operated beyond safe derating limits
- Effect: Accelerated aging, parametric drift, premature failure
- Current Controls: Design review checklist
- Detection Rating: 5 (violations may not cause immediate failure)
- Action: Automated derating verification in design tools
Moisture Ingress in Sealed Assemblies
- Failure Mode: Moisture penetrates sealing over product lifetime
- Effect: Corrosion, dendrite growth, electrical shorts
- Current Controls: Initial seal testing
- Detection Rating: 7 (failures develop gradually)
- Action: Highly accelerated stress testing (HAST), improved seal designs
Embedding FMEA in Electronics Development
For consumer electronics teams operating on compressed schedules, FMEA can feel like overhead. The pressure to ship creates temptation to skip "paperwork." Samsung's engineers were certainly under pressure to meet launch dates.
But the methodology doesn't have to be burdensome. Effective implementation focuses FMEA where it matters most:
During Architecture Definition
Before detailed design, identify the highest-risk subsystems:
- Power management
- Thermal systems
- Mechanical interfaces
- RF components
These deserve early failure analysis.
At Design Reviews
Each milestone should include FMEA updates:
- New failure modes identified?
- Detection mechanisms implemented?
- Occurrence estimates validated with test data?
Before Production Transfer
The transition from prototype to mass production introduces new failure modes:
- Manufacturing process variations
- Supplier component variations
- Test coverage gaps
All warrant analysis.
After Field Returns
Real-world failures update your understanding. Every return is data:
- Does it represent a failure mode you identified?
- A new mode to add?
- An occurrence rate to revise?
The Automotive Standard Your Industry Should Learn From
Automotive electronics suppliers have perfected FMEA through decades of practice. Components going into vehicles must meet IATF 16949 quality standards, which require systematic FMEA. The result is remarkable reliability—modern vehicles contain more computing power than spacecraft, yet operate for years with minimal failures.
Consumer electronics can adopt these practices without automotive bureaucracy. The core discipline—systematically identifying failure modes, assessing risks, implementing controls—applies regardless of industry.
The Galaxy Note 7 debacle cost Samsung billions and damaged a premium brand built over decades. Rigorous FMEA practice costs a fraction of that and would have identified both battery failure modes before they reached customers.
The choice seems obvious in retrospect. It should be equally obvious in prospect.
NirmIQ Team
The NirmIQ team shares insights on requirements management, FMEA, and safety-critical systems engineering.
Follow on LinkedInRelated Articles
Industry InsightsThe Overlooked Danger of Ignoring FMEA – And How NirmIQ Fixes It
IT & InfrastructureWhy IT Infrastructure Needs FMEA: Lessons from the CrowdStrike Outage
Software Engineering