← Back to Case Studies Data Engineering

Real Estate Development Signal Aggregator

H3 hexagonal spatial scoring system that aggregates permits, rezoning, subdivisions, and property transfers to identify land development signals — with velocity tracking, spike alerts, and spatial adjacency scoring.

Python PostgreSQL H3 Docker FastAPI

Problem

Real estate developers and land speculators make decisions based on development signals — building permits filed, rezonings approved, subdivisions platted, properties changing hands. But these signals are scattered across county GIS portals, permit databases, and deed transfer records. By the time a developer manually notices a cluster of activity in an area, the opportunity window has often closed. The developers who win are the ones who see the pattern first.

The raw data exists publicly, but aggregating it, scoring it spatially, and tracking changes over time requires infrastructure that most small-to-mid developers don’t have. They’re checking county websites manually, or paying for expensive commercial platforms that cover broad markets but miss hyperlocal signals.

Solution

Built a development signal aggregator that uses Uber’s H3 hexagonal grid system to score parcels based on nearby development activity. An ETL pipeline pulls permits, rezonings, subdivisions, and property transfers from county data sources daily, geocodes them, and maps each event to H3 hexagons. Every hex cell accumulates a composite score based on the volume and recency of development signals within it and its spatial neighbors.

The system tracks score history over time, calculates velocity (how fast a hex’s score is changing), and fires alerts when scores spike — indicating a sudden burst of development activity in an area that was previously quiet.

Architecture

County Data Sources

    ├── Wake County Permits
    ├── Rezoning Applications
    ├── Subdivision Plats
    └── Property Transfers


    ETL Pipeline (daily at 2 AM)

    Geocode → H3 Hex Assignment → Score Calculation

                                   PostgreSQL

                    ┌───────────────────┼───────────────────┐
                    │                   │                   │
            Score History      Velocity Tracking     Spike Alerts
            (per hex, daily)   (rate of change)      (threshold-based)
                    │                   │                   │
                    └───────────────────┼───────────────────┘

                                   FastAPI

                              CSV Export / API

H3’s hexagonal grid is key to the spatial scoring. Unlike square grids, hexagons have equidistant neighbors — every adjacent cell is the same distance from the center. This makes adjacency scoring mathematically clean: activity in neighboring hexes contributes uniformly to a parcel’s score regardless of which direction it’s in.

Key Decisions

H3 hexagonal indexing over arbitrary geofencing. Traditional approaches define neighborhoods or zones manually, which breaks when development patterns don’t follow political boundaries. H3 hexagons are resolution-independent (zoom in or out), tessellate perfectly, and have mathematically uniform adjacency. A hex’s score incorporates activity from all six neighbors equally, creating a natural spatial smoothing effect that reveals development corridors.

Velocity over absolute scores. A hex with a high absolute score might just be in a mature, built-out area — lots of historical permits but no new opportunity. Velocity tracking (score change over time) surfaces the interesting signal: areas that are accelerating. A hex that went from score 5 to score 25 in 30 days is more actionable than one that’s been at 50 for two years.

Spike alerts with configurable thresholds. Not every score increase matters. Spike detection uses configurable thresholds — both absolute (score exceeded X) and velocity-based (score increased by Y% in Z days). This lets users tune sensitivity based on their investment thesis: aggressive speculators want early signals, conservative developers want confirmed trends.

Daily automated ETL at 2 AM. County data sources update overnight. Running the ETL at 2 AM catches fresh data before business hours. The pipeline is idempotent — re-running it doesn’t create duplicates. Failed runs log errors and retry on the next cycle.

Results

  • H3 hexagonal spatial scoring across Wake County
  • Daily ETL pipeline pulling permits, rezonings, subdivisions, and property transfers
  • Score history with per-hex daily snapshots
  • Velocity tracking showing acceleration/deceleration of development activity
  • Spike alerts for sudden activity bursts in previously quiet areas
  • Spatial adjacency scoring — activity near a parcel raises its score
  • CSV export for analysis in external tools
  • Full Docker stack with automated daily ETL

How This Scales

  • Multi-county expansion — Extend ETL pipelines to Durham, Orange, Johnston, and other Triangle-area counties. Same H3 scoring engine, new data source adapters.
  • NCDOT and utility data — Incorporate road widening projects, sewer extension permits, and water/power infrastructure investments as additional development signals. Infrastructure precedes development.
  • SaaS productization — Web dashboard with hex map visualization, watchlist alerts, and custom threshold configuration. Target pricing at $199/mo for individual developers, higher for firms.
  • Historical backtesting — Validate the scoring model against historical development outcomes. Did areas that spiked in 2022 actually see construction in 2024?

Tech Stack

  • ETL: Python, scheduled daily at 2 AM
  • Spatial: H3 hexagonal indexing, geocoding
  • Database: PostgreSQL (score history, velocity, alerts)
  • API: FastAPI (score queries, CSV export)
  • Deployment: Docker Compose

Need something similar?

I've built this before. Let's talk about adapting it for your needs.