MML - Flood
Flooding is one of the most damaging natural disasters we face—and with climate change, the risks are only growing. In our latest research, we’ve developed a machine learning model that combines different types of data—like geographic information and past disaster records—to predict where floods are likely to happen over the next 1 to 5 years. By using both text and numbers together, our model performs significantly better than traditional methods, reaching up to 77% accuracy. It’s a step forward in using AI to plan ahead and reduce the impact of natural disasters.
Nicole Zhang & Zein Mukhanov
5 min read
Why Floods Are Getting Harder to Predict — and Why That Matters
In 2022, floods submerged a third of Pakistan, displacing 33 million people. Climate change is making floods more frequent, more severe — and harder to predict. But what if artificial intelligence could help us see them coming… years in advance?
Our team built a machine learning model that does just that — using weather data, disaster history, and even geography pages from Wikipedia. Here's how it works.
Current Models Can't See That Far Ahead
Most flood forecasting tools today are like weather apps — great for the next few days, but blind to what happens years from now. Traditional models rely on physical simulations that break down when pushed too far into the future.
Long-term planning — for cities, governments, and global risk managers — needs better tools. That’s where machine learning comes in.
Teaching AI to Understand Flood Risk
We trained a machine learning model (XGBoost + DistilBERT) to predict whether a flood will happen in a certain area over the next 1–5 years. But we didn’t just feed it numbers — we gave it multiple ways to learn:
1. Historical Data: Past floods from 1960–2018 (GDIS + EM-DAT datasets).
2. Weather Data: Global reanalysis maps of temperature, wind, pressure, snow, and more.
3. Geography Text: Location descriptions pulled from Wikipedia (“Geography” sections).
We divided the globe into 1°x1° grid cells (about 100km x 100km) and trained the model to classify: Will this cell flood within the next 1–5 years?
What Makes This Approach Different?
Most flood models are local — built for one region or river. Ours works globally, across 2,800+ locations.
Most models predict days in advance. Ours look years ahead (1 to 5 years).
Most models use only numbers. We added text — a key innovation.
Our multimodal approach helps the AI think like a planner — using both statistics and descriptions — not just data points.
Results: Can AI Actually Do This?
Yes — and it’s surprisingly good.
Our best model reached up to 77% ROCAUC, which measures how well it separates flooded vs. non-flooded cases.
This beats basic baselines and shows that combining text + stats really helps.
It's not perfect - yet!
Some parts of the world don’t have enough reliable data.
Wikipedia doesn’t cover every location equally.
Our weather data lags by a few months — not ideal for real-time work.
We’re working on these issues — and we’re also exploring ways to make the model more accurate — using more data, better features, and even AI-generated features.
We’re also working on a web app that lets you explore flood risk globally.
What's Next?
Imagine if cities, insurers, or governments could know which regions are likely to flood — not tomorrow, but next year.
Our model won’t stop the floods, but it can help us prepare. And that’s one step closer to resilience in a changing climate.