Disaster Insurance Coverage Gaps
A Python-driven analysis using EM‑DAT to study how much economic loss from natural disasters is actually insured, which regions are most under‑covered, and how extreme losses behave in the tail.
Abstract
This project examines recent natural disasters to see where economic losses are insured and where major protection gaps remain. Using EM‑DAT records for 1,985 events worldwide between 2020 and 2025, including 530 disasters with damage data and 52 with both total and insured losses, it applies descriptive statistics, hypothesis tests, portfolio risk measures and simple machine‑learning models to study patterns over time, across regions and by disaster type.
Results show that storms and floods dominate both event counts and total losses, while insurance data is mainly recorded for very large disasters in the Americas and parts of Asia. Many frequent hazards – especially floods and droughts in lower‑income regions – have little or no recorded coverage. Loss‑distribution fitting, Value at Risk and a simple reinsurance layer indicate that a small number of extreme events drive most of the portfolio loss, suggesting that available insurance data understates true disaster risk and that institutions need to account for missing data and heavy‑tailed losses when planning protection and capital.
What I did
- Data cleaning & EDA: Cleaned EM‑DAT fields (naming, units, filters for 2010–2025 natural disasters), constructed coverage‑gap features (insured vs total, uninsured share), and explored distributions by region and disaster type with summary tables and plots.
- Statistical analysis & hypothesis testing: Ran correlation, ANOVA, chi‑square, and t‑tests on damage, human impact, and coverage ratios across regions and hazard types to see where insurance participation is significantly different and how biased the recorded data are.
- Geospatial & temporal analysis: Built time‑series, seasonal, and country/region hotspot views to map how coverage gaps evolve over time and across space, highlighting lower‑income areas with frequent events but limited insurance data.
- Predicting insurance reporting with ML: Trained and compared multiple machine‑learning models (random forest, XGBoost, logistic regression, SVM, naive Bayes) to predict when an event is likely to have insured‑loss information recorded, using event size, deaths, year, region, and hazard type as features.
- Catastrophic loss distribution & risk analysis: Fitted loss distributions, computed Value at Risk and Expected Shortfall at several confidence levels, and analysed a basic excess‑of‑loss reinsurance layer and regional/hazard concentration to see how extreme events shape portfolio and systemic risk. .
Key findings
- Global disaster losses over 2020–2025 are around one trillion USD, with annual averages near 175–180B, while insured losses are much smaller, creating a persistent coverage gap.
- Losses are heavily concentrated: a few regions account for the majority of global damage, which matters for portfolio design and systemic‑risk assessment.
- Even though average annual losses are a small fraction of global GDP, tail events at high VaR levels can still create regional credit stress, funding pressure and the need for extra capital buffers.
What this shows about me
- Can take a messy real‑world risk dataset and build a full analysis pipeline: cleaning, EDA, statistical tests, geospatial views and ML models.
- Understands insurance‑specific ideas like coverage gaps, portfolio concentration, catastrophic tails, VaR and Expected Shortfall, and can implement them in Python.
- Can compress a long technical notebook into a clear abstract, methods list and key findings for risk or insurance stakeholders.