Philly Stat 360 · Residential Vacancy Risk Model

Finding the vacant homes the city hasn't found yet.

A machine learning model that scores all 520,000 Philadelphia parcels for vacancy risk — surfacing properties likely to be vacant that don't appear in current city records.

Explore the dashboard Read the methodology

0.940

AUC Area under the ROC curve — how well the model separates vacant from occupied parcels

84.0%

Sensitivity Share of actually-vacant parcels the model correctly flags

89.8%

Specificity Share of occupied parcels the model correctly leaves unflagged

~436k

Parcels scored Every residential parcel in Philadelphia receives a calibrated probability score

The problem

Philadelphia has a vacancy problem. The records don't show all of it.

Vacant properties are one of the most visible signs of disinvestment in a neighborhood. They attract illegal dumping, reduce property values for surrounding owners, create fire hazards, and signal to residents that a block is being left behind.

The city's official vacancy count, the Vacant Property Indicator, is compiled from Licenses and Inspections records, and it has a known gap. A building can sit empty for years before an inspector flags it or a neighbor files a complaint. The data reflects enforcement history, not ground truth.

That gap matters, because L&I can't inspect what it doesn't know about. Community development organizations, housing courts, and city planners deciding where to direct resources end up working from an incomplete picture.

This model was built to close part of that gap. It combines dozens of signals from public administrative data: code violation history, clean and seal actions, unsafe and imminently dangerous orders, business license records, building permits, parcel characteristics from OPA, and deed transfer history. The result is a probability score for every residential parcel in the city, and higher scores mean a property looks more like other properties that turned out to be vacant.

The goal is not to have a final determination of vacancy, not a lien or seizure trigger, and not a substitute for field judgement. It's to give the people doing the work a calibrated starting point, a prioritized list of addresses worth a second look based on data rather than chance or proximity to the last complaint.

Why records undercount?

How it works

From raw administrative data to a ranked list of addresses.

Data assembly

Six city datasets (Violations, Real Estate Transfer, OPA Properties, Spatial Lag, Clean & Seal, and Business Licenses) are joined at the parcel level.

Feature engineering

Raw fields are transformed into 34 predictive signals.

Model training

The pipeline trains four base learners (logistic regression, random forest, XGBoost, and LightGBM) on 34 features across 352K residential parcels, then blends the calibrated logistic regression and random forest into a 50/50 ensemble validated by ZIP- and tract-grouped spatial cross-validation.

Probability scoring

The ensemble outputs a calibrated probability of vacancy for every residential parcel, expressed as a 0 to 100 risk score, a top-one-percent flag, and a five-tier rank bucket for dashboard display.

Two ways in

Start with the data. Go deep with the methodology.

The model produces two finished artifacts. Use the interactive dashboard to explore parcel-level scores across the city. Read the methodology report to understand how the model was built, validated, and what its limitations are.

Interactive tool

Vacancy Risk Dashboard

An interactive map and table showing predicted vacancy risk scores for every residential parcel in Philadelphia. Filter by neighborhood, risk tier, or parcel type.

Parcel-level risk scores mapped across Philadelphia
Filter by risk tier
Overlay with existing L&I vacancy designations
Export filtered results as CSV

Open dashboard

Technical report

Model Methodology & Validation

A full account of how the model was built: data sources, feature engineering, training approach, validation strategy, equity audit, and known limitations.

Data sources: L&I, OPA, Revenue
Ensemble model with calibrated probabilities
Spatial cross-validation across Philadelphia districts
Equity audit by income quartile

Read the report

About Philly Stat 360

Philly Stat 360 is the City of Philadelphia's performance management initiative. We track how city government is doing — across every department, in plain language — and publish the results for every resident to see.

This vacancy risk model is part of a broader effort to use data to make city services more proactive — finding problems before they become crises, and doing it fairly.

Get in touch

General questions innovation@phila.gov
Data & methodology Read the full methodology report
Interactive data Open the vacancy risk dashboard
City standards standards.phila.gov