From research model to crisis maps at planetary scale
Turning a research model into a production data pipeline that maps how people move during a crisis — for the relief teams responding on the ground.
| Client | A major social-media company's "data-for-good" team, building map-based products that NGOs and disaster-relief organizations use in the field. Anonymized at the client's request. |
|---|---|
| My role | Lead Data Engineer on the research-to-product effort. |
| Engagement | Project-based, ~4 months. |
| Core stack | Airflow, Hive, Spark, Presto; spatial tiling (quadkeys) over a map product built on OpenStreetMap. |
Context
The team builds map-based products that NGOs and disaster-relief organizations rely on to respond to crises around the world. It was sitting on an enormous, untapped stream of telemetry: how mobile devices connect to cell towers globally, and how those connections shift and drop. Turned into population-movement signals, that data could reveal something hard to see any other way — how people actually move during a crisis.
The challenge
The data existed; the insight didn't. The team held a vast volume of raw connectivity telemetry and a promising research model for turning it into movement signals, but nothing that operationalized that research into something a relief worker could open and act on. Planetary-scale telemetry sat unanalyzed, with no pipeline to convert it into daily, map-ready movement metrics. A statistical model existed on the research side; it had never been built into a reliable, scalable production pipeline. And the people who would act on the output worked in disaster relief: they needed a map of where people were moving, not a notebook.
None of this was a quick fix. Extracting trustworthy signal from location telemetry at planetary scale is hard on its own. Doing it as a privacy-safe, production-grade pipeline that runs on a schedule and feeds a live product, translated faithfully from a researcher's model, is harder still. And the team's strength was research, not scaling distributed data systems.
The approach
Understand the data and the research
I sampled the telemetry to grasp its shape at scale, then worked through the researchers' statistical model and back-tested it against past crises, confirming the movement signals it produced matched what actually happened on the ground.
Engineer the model into a pipeline
I turned the validated model into net-new ETL: spatial aggregation over a standard map-tiling scheme (quadkeys) on top of OpenStreetMap, with the heavy computation designed to stay efficient rather than crunch for half a day. I tested Presto against Spark and moved key steps to Presto to cut the wall-clock time for data to land.
Validate, then wire it to the product
I validated the z-scores the pipeline generated against independent crisis data drawn from other sources, confirming the approach held. Then I added data-quality and monitoring alerts and partnered with front-end engineers to hook the daily output into the user-facing maps.
What I built
A privacy-safe production data pipeline that runs alongside the team's existing infrastructure and feeds a live, map-based product:
- Connectivity telemetry aggregated into daily movement signals using spatial tiling (quadkeys) over an OSM-based map, tuned so planetary-scale jobs ran efficiently.
- Presto adopted over Spark for the steps that mattered, cutting the wall-clock time for data to land.
- Movement anomalies expressed as z-scores against a baseline and validated against independent crisis data, so the signal was real rather than noise.
- Data masked, stripped of PII, and aggregated to the tile level, so individuals were never identifiable in the output.
- Net-new ETL orchestrated in Airflow with data-quality and monitoring alerts, triggered alongside the existing DAGs and wired into the front-end maps with the product engineers.
Rather than replace anything, the pipeline ran in parallel to the main infrastructure, reusing common workflow steps in the DAG and triggering on the same schedule as the existing jobs.
Results & impact
The work shipped a novel research idea as a production data product: 30 new crisis maps for 30 crisis events, each turning raw telemetry into a map relief teams could act on, and 80K+ downloads of those maps by NGOs and disaster-relief organizations. It gave responders visibility on the ground where there had been none, especially across parts of Africa, where movement during a crisis is otherwise very hard to track. Teams could see where people were actually going and allocate resources accordingly.
I led the data engineering end to end: sampling raw telemetry, building the model into ETL, and delivering a validated, monitored pipeline that fed a live product used by non-technical responders.
Why it matters
The domain was crisis response, but the capability is industry-agnostic: take a promising model and engineer it into a reliable, privacy-safe data product that plugs into existing infrastructure and lands in the hands of non-technical people. That is what turning AI from a research demo into a dependable business workflow requires.