Article

MML - Drought

Droughts are silent killers—slow-moving but devastating. Now, AI may help predict them years in advance by combining climate stats with geographic clues from sources like Wikipedia, outperforming traditional models by 23%. This breakthrough could buy crucial time to protect crops, economies, and lives before the water runs dry.

Published
Authors
Nicole Zhang & Zein Mukhanov
Read time
5 min read

Droughts Are Getting Worse - Why This Matters

Over 2 million people have died from weather-related disasters since 1970. The number one killer? Droughts. They may not grab headlines like floods or wildfires, but they creep in slowly and leave behind trillions in economic damage, starvation, displacement, and ecological collapse.

And things are getting worse. According to the latest WMO report, more than half the world's landmass is seeing increasing drought trends, particularly across Africa, South Asia, and South America. This is tracked using something called SPEI-3, a drought index that tells us how wet or dry a place has been over a 3-month period. From 1990–2025, the data shows deepening reds—meaning worsening dryness—over the world's most vulnerable regions.

Drought

So the big question becomes: Can we see it coming? Not days or weeks ahead, but years?

Our Approach: A Smarter Forecast with AI

Traditional models use physical simulations—think rainfall, temperature, river flow—to predict droughts. These are great for short-term forecasts (days to weeks), and decent for seasonal outlooks (months ahead). But they fall short when we ask, "What about next year?"

That's where multimodal machine learning (ML) comes in.

What is Multimodal ML?

In short: it means the AI uses more than one type of data. In our case, we're using:

  • Statistical climate data: How many floods, storms, or droughts have happened in a location? How much economic damage did they cause?
  • Textual knowledge: We scrape geography information from Wikipedia (yup, seriously!). These descriptions often encode insights like elevation, climate zones, and proximity to water—all in natural language.

The model combines these two very different “modalities” to create what we call embeddings—mathematical summaries of meaning—and uses them to forecast drought risk years into the future.

How It Works: From Raw Data to Risk Maps

We divide the Earth into a grid—each square about 100km by 100km. For each grid cell, we do the following:

Step 1: Statistical Feature Engineering

We gather year-by-year summaries from disaster databases (like EM-DAT and GDIS): how many floods happened, how severe were they, what were the costs, etc. These numbers form a kind of “disaster fingerprint” for each location.

Step 2: Geographic Text Embedding

Using Wikipedia pages, we extract paragraphs describing the geography of each region—its terrain, rivers, elevation, and so on. This is processed using DistilBERT, a version of a large language model (LLM), to turn free text into numbers the model can understand.

Step 3: Fusion & Forecasting

We merge the text and numeric features into a single long vector (a fusion embedding) and feed it into XGBoost, a powerful machine learning model. It tries to learn patterns from the past to say: Will this region face a drought in the next 1, 2, or 5 years?

What We Found: Better Together

Our experiments showed that combining text + stats beat using either alone. Here's the performance comparison:

Years Ahead Statistical Only Multimodal (Text + Stats) ROCAUC
1 Year 54.4% 77.2% +23%
2 Years 53.4% 76.4% +23%
5 Years 53.9% 76.7% +23%

That's a big win. But there’s more: we used fine-tuning and transfer learning on the language model to make it even better at recognizing what kinds of geography are most drought-prone.

💡 Concept You Should Understand

Transfer learning means taking a model trained on one task (like reading Wikipedia) and tweaking it to do a new one (like spotting flood-prone areas). This way, we get the power of a big model without needing tons of new data.

Why This Is a Big Deal

  • Agriculture: Predicting where water shortages may strike years ahead helps with crop planning, insurance pricing, and food security.
  • Sovereign Risk: Governments of drought-prone nations may face debt crises when agricultural output collapses. These models could feed into country credit ratings or sovereign bond pricing.
  • Insurance: Drought-related losses in crops or water resources could be factored into catastrophe models used in reinsurance and risk pooling.

And because the model is generalizable, you could adapt it to other domains like wildfire—as they all hinge on forward-looking risk based on sparse, noisy data.

Seasonality Tip: Sine/Cosine Encoding

Droughts are seasonal—but months like December and January are close in time even though they're far apart numerically (12 and 1). To fix this, we use sine and cosine transformations to encode months in a circular way. This helps the model better "feel" time in a natural cycle.

Formula:

month_angle = 2π * (month - 1) / 12

month_sin = sin(month_angle)

month_cos = cos(month_angle)

Drought

It’s a subtle touch, but hugely important in climate modeling.

Cool Stuff & Take Action

  1. Open Datasets: You can explore EM-DAT, GDIS, and Wikipedia-based scraping techniques.
  2. Live Examples: The Texas flood crisis of 2025 underscores how important early warnings are. Imagine if we had had a similar drought model 3 years ago!
  3. Hack for Good: Look into NASA's Space Apps Challenge or Climate Change AI community.
  4. Start Simple: Even basic logistic regression with SPEI-3 values can tell you about drought dynamics.

A Final Thought

In Parable of the Sower, Octavia Butler imagines a world ravaged by drought, where water is worth more than gold, and entire communities are torn apart by desperation. It’s fiction—but it’s feeling less fictional by the year.

If we don’t invest in early warning systems, climate adaptation, and long-range predictive tools—like the one we’re building—we risk slipping into exactly that kind of future. One where we react too late, or not at all.

Machine learning won’t fix climate change on its own. But it can buy us time. And time is everything.

"We can't make it rain, but we can see the drought coming, and act before the well runs dry."

– MML Lab Members
Drought PredictionMachine LearningClimate Change