C.Eng · Builder · Massachusetts Institute of Technology

Methodology of record · v0.6.5 · Updated 2018-10-20 01:46 UTC

Platt: New York Clean-Energy Site-Screening Methodology

Executive summary

Platt fuses the public datasets that govern clean-energy siting in New York into one geospatial database and computes a deterministic 0–100 screening score for every parcel in the state. The inputs span utility hosting capacity, interconnection-queue pressure, parcel geometry, terrain, environmental constraints, and building footprints. The live model, v0.6.5, scores 2,594,588 parcels across sixteen counties: the five New York City boroughs plus Onondaga, Albany, Monroe, Erie, Westchester, St. Lawrence, Oneida, Orange, Chautauqua, Oswego, and Broome. The footprint grows county by county under the breadth program (spec 047, DL-049, DL-050), ranked by queued community-DG demand, with the same engine and data rules per county. Every score row carries the per-feeder grid intelligence behind it, cites its sources, and reproduces exactly on re-run. A developer shortlists viable sites in seconds. A lender gets a source-cited screen for diligence.

The validation verdict bounds every claim in this document. Against real build outcomes the model fails its pre-registered gate: within-county lift@50 = 1.18, 95% CI [0.863, 1.476]. The lower bound sits below parity (1.0). The model also loses to simple size baselines: its CI upper bound (1.476) sits below every size baseline's point estimate (building count 2.000, acreage 1.912, roof area 1.906). Once demand is de-confounded, the screen sits at parity with random (1.082 vs 1.038). §2 explains why no model could currently clear that bar from New York's data. Platt is a site screen. The path to a calibrated predictive claim is the forward-accrual program in §6.3.

The number itself is a relative suitability index that ranks parcels by interconnection and land conditions. It is not a probability (a 70 is not "70% likely"), and it has not been validated as a predictor of which parcels will actually be built. Earlier documents and some persisted API fields call this number the "feasibility score"; that is legacy naming, retained for compatibility only. This document calls it a screening score throughout, because §2 and §6.3 establish that feasibility itself is not identifiable from the available data.

What the validation does support is internal validity. The shortlist is robust to ±25% weight perturbation, and the physically meaningful inputs drive the ranking. These results need no outcome data and hold across the footprint; last re-run and holding on the v0.5.1 snapshot (§6.1), not yet re-run on the v0.6.x snapshots.

The rest of this document specifies the complete engine (every weight, threshold, and value function) and reports the validation results.

Product overview

The methodology below runs as a working product. The three screens are from the production build, captured 2026-06-11.

Platt atlas screenshot: an Onondaga County street-level map with parcel footprints shaded by screening score, a substation marker, a score legend, and a ranked parcel table beside the map — The production atlas over Onondaga County at street zoom: parcel footprints are shaded by screening score, the substation marker comes from the same hosting-capacity record the score reads, and the table beside the map ranks the same parcels by score.

Platt parcel page screenshot: a rooftop parcel scoring 100 with high confidence, showing the score path, the eligibility-gate verdict, and the parcel location map — A parcel page for a 2.1-acre Onondaga rooftop site: the score path panel carries the value arithmetic, the gate panel states why the parcel passed and why ground-mount is ineligible at this acreage, and the headline score sits beside the confidence tier and the screening disclaimer.

Platt diligence report screenshot: a generated report with summary, site, grid position, interconnection history, and cost evidence sections, each sentence carrying inline citations — The generated diligence report for a Salina warehouse parcel: every claim about the site, its grid position, the feeder's interconnection history, and the cost evidence carries an inline citation back to a named source.

1. What Platt is, and the problem it solves

Siting a distribution-scale clean-energy project in New York (community solar, battery storage, or solar-plus-storage) is slow, expensive, and data-fragmented. A developer's first question for any parcel is is this worth pursuing?, and answering it means assembling facts that no single public source holds: how much distribution hosting capacity the nearest usable feeder has, how far that feeder is, how congested its interconnection queue is, whether the parcel sits in a Con Edison secondary-network area where customer-sited export is constrained by default, how steep the land is, how much of it survives wetlands and floodplain, and how large a usable rooftop it carries. Those facts live across six investor-owned utilities, several state and federal regulators, and dozens of GIS layers that share no keys, units, or refresh cadences. Done by hand, one parcel at a time, this is days to weeks of GIS work.

Platt does it once, for the whole state. We fuse these public datasets into a single geospatial database and compute a deterministic 0–100 screening score for every parcel: today 2,594,588 parcels across sixteen counties, the five New York City boroughs (Bronx, Kings, New York, Queens, Richmond) plus Onondaga, Albany, Monroe, Erie, Westchester, St. Lawrence, Oneida, Orange, Chautauqua, Oswego, and Broome. A developer shortlists viable sites in seconds instead of weeks; a lender gets an auditable, source-cited screen for diligence. Every score row carries its exact inputs, with citations for the parcel, environmental-flag, substation, and hosting-capacity sources (§5), and the score is fully reproducible: re-running the engine on the same data snapshot returns the same number, to the decimal.

The score is a screen. §2 defines what it measures and its limits.

2. What the score measures

The 0–100 score is a deterministic site screen: a relative suitability index that ranks parcels by interconnection and land conditions for a distribution-scale solar project. It is not a probability. A 70 does not mean "70% likely to be built" or "70% likely to be feasible." It means the parcel ranks above one scoring 60 on the modeled suitability dimensions, under the weights and value functions documented in §3. The framing is borrowed from presence-only species-distribution modeling, where a model yields a relative suitability surface and the absolute prevalence is, by construction, not recoverable from the data.

The live model, v0.6.5, ships labeled improved, not-yet-validated interim. It is a measured improvement over its predecessors and passes the internal-validity tests of §6.1 (last re-run on the v0.5.1 snapshot), but it has not been shown to predict real build outcomes. The reason is structural, and it shapes every design choice that follows.

Why feasibility is not identifiable from "where solar got built." The only positive examples available are completed community-distributed-generation installs (call them CDG installs). This is a presence-only record (we observe successes and no labeled set of true negatives), and it carries two structural problems:

A survivorship and demand confound. A completed install reflects, roughly, P(build) ≈ feasibility × developer demand × market access × timing. Feasibility is only one factor in that product. "Where solar got built" therefore encodes where developers chose to build and succeeded, which is driven heavily by demand and market access. Feasibility and demand are not separately identifiable from successes alone.
A selection (collider) problem. The natural way to obtain negatives is to look at projects that applied to interconnect but did not energize. But application is itself a choice that sits downstream of both feasibility and demand. Conditioning on it does not close the confound; it opens a spurious feasibility–demand association (a Berkson-type selection effect). That induced association is plausibly negative, so it biases an outcome test against the model. But the population question "is this site feasible?" is not recoverable from applicant-conditioned, positives-only data.

When we de-confound for demand using a target-group-background design (we compare installs not against all land but against a background of plausible candidate parcels, here parcels with a usable roof ≥ 200 m², so that "demand lives where buildings are" is differenced out), the screen performs at parity with random (model within-county lift ≈ 1.082; random ≈ 1.038), and on the raw outcome test it loses to simple size baselines (§6.2). Platt is a site screen. It has no validated predictive claim.

Figure 4 · What a score means: relative suitability bands

The 0–100 output is a RELATIVE ranking, not a probability: a parcel at 78 ranks above one at 62, but neither number is a likelihood of being built. These bands are reading aids for the index, not calibrated outcome rates.

Figure 5 · Why feasibility is not identifiable from “where solar got built”

A completed install reflects feasibility × developer demand × market access jointly, and we observe only the successes: a presence-only record with no labeled negatives. So a density test of installs rewards whatever drives demand (parcel size, urban access), not feasibility, and the model honestly sits at parity with those size proxies (§6.2). Drawing negatives from interconnection applicants does not help: application is a collider downstream of both feasibility and demand, so conditioning on it opens a spurious path (Berkson) rather than closing the confound.

Modeling approach: why there is no trained model

Platt is a deterministic weighted-and-gated screen. There is no fitted machine-learning model behind it. We did not train a classifier on "where solar got built," and we did not fit a probability calibration. Two reasons, in order of importance:

Identifiability. As above, the only labels available are presence-only, applicant- conditioned, and demand-confounded. A model trained on them would learn where developers chose to build, which is demand and market access. It would not learn feasibility. Model capacity does not repair a label that fails to identify the target: a flexible learner would encode the confound in its output. We do not fit to those labels until cleaner ones exist.
Auditability. Year-two users are project-finance lenders and underwriters. Lenders can audit a screen whose every weight, threshold, and value function is inspectable and reproducible. They cannot audit a black-box probability. The screen also fails safe: a missing input resolves to a neutral discount of 1.0.

Calibration is deferred and gated. Once the forward-accrual program (§6.3) yields a point-in-time, less-confounded outcome set over an expanded footprint, the next step is to fit a monotone calibration from the score to an empirical energization rate: beta or Platt scaling while the resolved set is small, and isotonic regression once it is not (§6.3). Only then does the output become a calibrated probability rather than a relative index, and only if the identification assumptions hold.

3. The four-stage model

The model is gated additive-multiplicative: a hard eligibility gate (Stage 0) is checked before any score is computed, a weighted additive core then scores the surviving parcels, and every cost or friction is applied as a multiplicative discount. Nothing is ever subtracted. A constraint can dampen a score, but a high score can never "pay it back." Since the land/roof lens split (DL-067) the live headline is the GROUND-MOUNT (land) result, and on the live v0.6.5 population the gate excludes 2,505,689 of 2,594,588 parcels, leaving 88,899 land-eligible: below_min_viable_ground_mount (2,322,490), slope (121,059), undevelopable_land (49,181, the wetland/floodplain-dominant union, DL-077), duplicate_parcel_record (11,582, the DL-053 integrity dedup re-applied in v0.6.5, §7), open_water (525), no_hosting_capacity_within_range (507), and critical_habitat (345). Since v0.6.0 the gate binds materially, where the original additive model excluded almost nothing and the gating was carried by the multiplicative discounts. This architecture is the central correction over the original additive model (§8).

Two project archetypes are scored for every parcel: ground_mount and rooftop. The headline score is that of the first eligible archetype in preference order, ground-mount first, then rooftop. The higher-scoring of the two does not win. The archetypes renormalize different weight vectors onto non-comparable scales (rooftop weights three inputs, ground-mount four), so taking the maximum would let a renormalized rooftop score spuriously out-rank a ground- mount composite on a large rural parcel that is in fact a ground-mount site. Both full breakdowns are always retained alongside the headline. If neither archetype is eligible, the parcel failed a shared hard gate, and the headline row reports the rooftop result's exclusion reason.

Figure 6 · The four-stage non-compensatory pipeline

00GATE

Eligibility gate

Exclude with a stated reason

01SCORE

Graded suitability

(Σ wᵢ·vᵢ)·Π dⱼ

02SIGNALS

Economic & interconnection

Surfaced alongside

03CONFIDENCE

Confidence label

high / medium / low

Hard disqualifiers fire in Stage 0 before any score; survivors are graded multiplicatively in Stage 1; Stages 2-3 surface economic signals and a data-quality confidence label.

Stage 0: Eligibility gate (hard exclusions)

The gate returns (eligible, exclusion_reason, flags). A disqualified parcel is excluded with a stated reason. It is never assigned a small positive score. The gates fire in order; thresholds are exact and apply to both archetypes unless noted. (On the v0.6.0 population these gates exclude 171,391 of 2,594,588 parcels; the gate now binds. See the note below.)

#	Condition	Exclusion reason	Applies to	Rationale
1	`available_hc_mw` is null or ≤ 0	`no_hosting_capacity`	both	A parcel whose nearest usable feeder has no PV hosting capacity cannot interconnect.
2	`dist_to_feeder_m` is null or > 15,000 m	`no_hosting_capacity_within_range`	both	The nearest-feeder join has no distance cap, so it always finds some feeder with PV > 0. Beyond 15 km that feeder is effectively unreachable; we exclude rather than credit headroom the parcel cannot reach.
3	`in_critical_habitat` is true	`critical_habitat`	both	Federal designated critical habitat is a hard exclusion.
4	`slope_mean_pct` is not null and ≥ 15.0%	`slope`	ground-mount only	15% is the permissive edge of the developer range (grade-cost onset is ~5–10%; 15% is a common special-permit trigger in model solar zoning). A hard cut atop the graded slope penalty.
5	`net_buildable_acres` is null or < 8.0 acres	`below_min_viable_ground_mount`	ground-mount only	8 net acres is about 1 MW-AC at the NREL land-use benchmark (roughly 7 to 8 ac/MW-AC), near the minimum-viable ground-mount project size.
6	no measured building (`max_usable_roof_m2` null or < 10 m²) AND (assessment class is vacant land (3xx) OR ≥ 90% wetland)	`no_rooftop_host`	rooftop only	A rooftop install needs a building, but the free footprint layer misses ~5.8% of parcels (150,583 of 2,594,588 flagged `roof_footprint_missing` in the live v0.6.2 snapshot; Overture, v0.6.1). Exclude only on positive evidence of no building (vacant class or near-total wetland); an improved-class parcel with a missing footprint is kept and flagged `roof_footprint_missing`. v0.6.0 (DL-051).
7	largest CONTIGUOUS buildable piece < 8.0 acres	`below_min_viable_ground_mount`	ground-mount only	8 acres of scattered slivers is not a viable array. Measured geometrically (polygon minus buildings/wetlands/floodplain/open water, largest connected component) for parcels that pass the scalar gate. v0.6.0 (DL-051); open water added to the cut in v0.6.4 (DL-075).
8	`max(open_water_pct, undevelopable_pct)` > 90% AND no measured contiguous dry block ≥ the archetype floor	`open_water` (open water itself > 90%) or `undevelopable_land` (union-triggered)	ground-mount + storage	A parcel that is predominantly water/wetland/floodplain has no dry land to site on. Keys on the deduped undevelopable UNION (DL-075) so cross-layer water dominance is caught; a MEASURED contiguous dry block at or above the floor (8 ac ground-mount / 2 ac storage) overrides the kill and the parcel is judged on that block (DL-077). Rooftop is exempt (a dry shoreline building is a legitimate host; gate 6 catches buildingless water parcels).

The 15 km eligibility cap is generous, and it diverges from the grid- proximity value function, which already decays to zero by 5 km (§3.1). A parcel 5–15 km from its feeder therefore scores low but stays eligible: it could in principle justify a costly line extension, and a hard cut there would be a false negative.

A non-gating flag, coned_network_export_constrained, is appended for any parcel whose centroid falls inside a Con Edison secondary-network area. The parcel stays eligible; the export economics are priced as a discount in Stage 1.

net_buildable_acres computation (v0.6.0, DL-051). Net buildable acres starts from the measured GIS polygon area, subtracts the building footprint, then removes the wetland and floodplain coverage. The assessment-roll deed acreage is not the area base:

gis_acres  = ST_Area(geometry::geography) / 4046.8564     # geodesic polygon acres
roof_acres = parcels_roof.roof_area_m2     / 4046.8564     # measured building footprint
base       = gis_acres if gis_acres > 0 else total_acres   # deed only as a no-geometry fallback

if base is null or ≤ 0:        net_buildable_acres = null
elif in_critical_habitat:      net_buildable_acres = 0.0
else:
    after_roof = max(0, base − roof_acres)
    net_buildable_acres = max(0, after_roof × (1 − clamp01((wetlands_pct + floodplain_pct) / 100)))

The area base is the surveyed polygon, not the roll acreage, because the roll value is a legal and tax attribute that carries data-entry typos (a 0.26-acre polygon recorded as 75 acres); the polygon is the authoritative spatial measure (IAAO; NREL reV; GLAES). The building footprint is subtracted because a parcel's structures are not developable ground. Slope is no longer applied in this term: it is a hard ground-mount gate at 15% (gate 4) plus a graded scoring penalty (§3.2), so a slope-halving here was a double count. No packing or ground-coverage fraction is applied, because the 7.5 to 8 ac/MW-AC sizing downstream already embeds it (Ong et al. 2013, NREL/TP-6A20-56290). A non-gating QC flag, acreage_roll_gis_mismatch, marks parcels whose roll acreage diverges from the polygon beyond the IAAO size-band tolerance.

A note on what the gate currently does (v0.6.0). The gate now binds: it excludes 171,391 of 2,594,588 parcels — no_rooftop_host 170,539 (no measured building, and the roll calls it vacant land or it is near-total wetland/water), no_hosting_capacity_within_range 507, and critical_habitat 345. The two-signal rooftop rule deliberately keeps real buildings the free footprint layer misses: 150,583 parcels with no footprint but a building-use class stay eligible, flagged roof_footprint_missing in the live v0.6.2 snapshot — 5.80% of the 2,594,588 total (down from 206,056 since v0.6.1 unified footprints to Overture and closed much of the gap, DL-052). A separate contiguity pass demotes 4,794 ground-mount parcels whose buildable land is fragmented below a contiguous 8 acres. The cost is just 2 of the 500 known install-parcels (0.4%) now gate false-negatives, from building-footprint-layer gaps. See §7.

Stage 1: Graded suitability (the core formula)

For a surviving parcel, the suitability of an archetype is the weighted sum of value functions, multiplied by the product of discounts, clamped to [0, 1] and scaled to [0, 100]:

suitability = clamp01( ( Σᵢ weightᵢ · valueᵢ )  ·  Πⱼ discountⱼ )
score       = round( suitability × 100.0 , 1 )

A missing discount input resolves to 1.0, the multiplicative neutral.

3.1 Weights (each archetype's weights sum to 1.0)

Input	Ground-mount weight	Rooftop weight
Hosting-capacity headroom (`v_hc_headroom`)	0.40	0.40
Grid proximity (`v_grid_proximity`)	0.30	0.30
Buildable area (`v_buildable`)	0.20	n/a
Roof area, usable (`v_roof_area`)	n/a	0.30
Slope / terrain (`v_slope`)	0.10	n/a

The weights encode one domain principle: for distribution-scale solar, grid access dominates land. Hosting-capacity headroom and grid proximity together carry 0.70 of the weight in both archetypes. Rooftop has no terrain or buildable-land term; the 0.30 that ground-mount splits across buildable (0.20) and slope (0.10) is repurposed entirely to usable roof area (0.30). The weights are engineering judgments without empirical calibration; §6.1 reports the test showing the ranking is robust to them.

Figure 7 · Input weights by archetype

Grid access dominates land: hosting-capacity headroom + grid proximity carry 0.70 in both archetypes. Rooftop repurposes ground-mount’s buildable + slope share entirely to usable roof area. Weights are uncalibrated engineering judgments.

3.2 Value functions `vᵢ : input → [0, 1]`

Every value function is a continuous, monotone, saturating piecewise-linear interpolation over a fixed set of anchor points: below the first anchor the value equals the first anchor's y; above the last it equals the last anchor's y; in between it is linear. There are no step-band discontinuities. Anchors are stated as (input, value) pairs.

v_hc_headroom: available hosting capacity in MW. Null or ≤ 0 → 0.0.

MW headroom	0.0	1.0	3.0	5.0
value	0.00	0.40	0.75	1.00

The value saturates at 5 MW because 5 MW-AC is the distribution-level interconnection ceiling for this project class under New York's Standardized Interconnection Requirements (effective August 1, 2025). A CDG project cannot exceed it, so headroom beyond 5 MW carries zero marginal value. The ceiling comes from regulation; grid physics may allow more.

v_grid_proximity: straight-line distance from the parcel centroid to the nearest usable feeder line, in metres. Null → 0.0.

metres	0	250	750	1,500	3,000	5,000
value	1.00	0.95	0.80	0.45	0.10	0.00

The curve decays to ~0 by 3 km and fully to 0 by 5 km, anchored to interconnection-cost economics rather than raw distance: a sub-5 MW CDG project can typically justify a line extension of ~1–3 km; beyond ~3 km the upgrade cost is usually fatal. The applicant bears that cost, and no rule caps an individual applicant's share. Two points of precision (critique C5): the interconnection cost-sharing mechanism the PSC put in place across 2021 and 2022 (Case 20-E-0543; compliance filings approved April 14, 2022) caps a utility's unrecovered upgrade costs at 2% of its yearly capital investment budget, per secondary summaries of the proceeding (paragraph citation pending verification); that cap protects the utility's other customers and gives the applicant no relief. Separately, NYSEIA's Cost-Sharing 2.0 petition (Case 24-E-0621, open as of June 2026, comments through May 2026) seeks a cap at 115% of the CESIR estimate, binding for 36 months. An adopted applicant-side cap would shift the cost distribution that any future cost model trains on (§6.3).

Con Edison secondary-network override. Inside a Con Edison secondary-network area, grid proximity is not computed from feeder distance; it is set to a fixed PROX_NETWORK_DEFAULT = 0.95. The published radial-feeder layer barely maps the dense network core, so the measured distance to the nearest mapped radial feeder there is a coverage artifact (a network-core parcel can be kilometres from a mapped radial even though the grid runs under the street). 0.95 represents excellent-but-not-perfect access. The export economics of network areas are priced separately as a discount (§3.3), so this is not double-counting, and the engine logs which path fired (network-default vs. distance-based) on every parcel.

v_roof_area: usable roof area, in m², of the parcel's single largest building (building footprint × a usable-area derate: 0.60 for flat commercial roofs, 0.25 for pitched residential). A 10 m² minimum footprint applies at ingest across the single statewide Overture source (v0.6.1, DL-052). Null or ≤ 0 → 0.0. Rooftop only. The derate is currently selected from the parcel's property class (NY class code beginning "2" → residential 0.25; otherwise 0.60) and applied to the largest building, rather than from each building's own roof geometry, a proxy that mis-derates genuinely mixed-use lots (§7).

usable m²	0	200	500	1,000	2,500	5,000
value	0.00	0.30	0.55	0.75	0.92	1.00

The curve is monotone and saturating; a linear map would merely reproduce the raw-area baseline. As a rule of thumb: ~200 m² ≈ a 25 kW system → 0.30; ~1,000 m² ≈ 125 kW → 0.75; ~5,000 m² and up is MW-scale → 1.00 (at roughly 8 m²/kW of usable roof).

v_buildable: net buildable acres (from Stage 0). Not ground-mount, or null → 0.0. Ground-mount only.

net acres	0	8	25	40
value	0.00	0.25	0.70	1.00

The 8-acre and 40-acre anchors are set from the NREL land-use benchmark for utility-scale PV, roughly 7 to 8 acres per MW-AC (Ong et al. 2013, NREL/TP-6A20-56290), which places 8 acres near a 1 MW-AC minimum-viable project and 40 acres near a 5 MW-AC system. They are not drawn from the Model Solar Energy Local Law, which defines its tiers by capacity and on-site consumption and contains no capacity-to-acreage ratio. The 25-acre mid-tier is a derived interpolation point, about 3 MW-AC at the same benchmark, and carries no external citation.

v_slope: mean parcel slope in percent. Null → 0.5 (neutral). Ground-mount only.

slope %	0	5	10	15
value	1.00	0.85	0.55	0.00

Slope ≥ 15% is also a hard gate (Stage 0 #4), so the graded penalty operates on the 0–15% band and reaches 0 exactly at the gate threshold. The null default is 0.5 (neutral) rather than the 0.0 used for v_buildable/v_roof_area: a missing roof or missing parcel area is genuine evidence of absence (no roof, no land), whereas a missing slope reading is genuine ignorance about a terrain that is, on the New York parcels we score, usually mild. The default is neutral. Because slope is near-dead-weight on the current population (§6.1), this choice barely moves the ranking either way.

Figure 8 · Value functions vᵢ : input → [0,1]

Each input maps through a continuous, monotone, saturating piecewise-linear curve over fixed regulatory/economic anchor points. No step discontinuities.

3.3 Discounts `dⱼ : input → (0, 1]` (multiplicative friction)

The five live discounts apply multiplicatively, in the order [farmland, flood/wetland, queue, ConEd- network, land-use] (the fifth, discount_land_use, went LIVE with the spec-084 flip, DL-068; see below). Each is a friction in (0, 1]; a missing input resolves to 1.0. None is a hard gate.

Discount	Formula	Magnitude	Notes
`discount_farmland`	`1.0 − 0.15 × clamp01(prime_farmland_pct / 100)`	0.15	Prime/statewide-important farmland triggers a state notice-of-intent and mitigation-payment process. That process blocks no permit, so the model applies a mild discount and no gate. Null → 1.0.
`discount_flood_wetland`	rooftop: `1.0 − 0.30 × clamp01((floodplain_pct + wetlands_pct) / 100)`; ground-mount: `1.0`	0.30 (rooftop only)	Archetype-aware.¹ Ground-mount is neutral here because the usable-area loss is already owned by net buildable acres (§3.1); applying it twice would double-count the same geography. Rooftop has no buildable-area term, so this is its sole flood/wetland channel. Null coverage → 0%.
`discount_queue`	`1.0 − 0.60 × clamp01(queue_saturation)`	0.60	`queue_saturation = queued_mw / available_hc_mw`. Published hosting capacity is headroom before queued DG, so a feeder with a large queue relative to its headroom is heavily discounted. Null → 1.0.
`discount_coned_network`	constant by archetype	ground-mount 0.15; rooftop 0.85	Con Edison area networks are no-export by default. For ground-mount (always export-scale community solar) this is a near-gate (0.15); for rooftop it is softer (0.85), reflecting that small net-metered systems are only lightly constrained while > 50 kW-AC export systems bind. Not in a network → 1.0.

One input-join detail on discount_queue: 1,328 of 4,766 feeders (28%) publish no queued-DG figure, and the live input join coalesces that absence to a saturation of 0 rather than null. The discount is identical either way (1.0), but the displayed queue_saturation does not distinguish an empty queue from unpublished queue data.

Land-use acquisition friction and open-water mask (spec 084; first flipped 2026-06-27 as the PARTIAL v0.6.3, DL-068/070; fully corrected through DL-071..077 and LIVE as v0.6.5 since 2026-07-12 — live view = feasibility_scores_v2026_07_12, rollback feasibility_scores_v2026_06_27). A fifth multiplicative discount, discount_land_use, GROUND-MOUNT only and archetype-aware (the rooftop C&I lens stays at 1.0, mirroring the flood/wetland archetype rule), prices how hard a parcel's land use is to acquire. It is graded, not a block, so a high grid score can never pay back an unacquirable use, and a redevelopable use (golf, forest) is dampened rather than erased. The five tiers are keyed on the numeric property_class_code only (the corrupt property_class_desc is never read, §7), and a null, blank, or unrecognized class fails safe to the "none" tier (1.00, no penalty). The factors live in a named LAND_USE_TIERS table so the read layer (the workbench lens) can re-weight them.

Statewide reach (DL-070 defect 1, CLOSED in v0.6.4). The v0.6.3 flip silently bypassed the friction across all five NYC boroughs: 854,092 parcels carry 2-digit NYC-DCP land-use codes that the keying fail-safe mapped to the no-penalty "none" tier. v0.6.4 keys the NYC codes directly (DL-070 fix), layers the PAD-US protected-lands overlay (DL-072, working-lands easements excluded per DL-073, rec/edu easements kept protected per DL-075) and the NYC PLUTO structured-ownership overlay (DL-074) via a most-restrictive-tier rule with a landuse_source_conflict flag on disagreement. Live result: NYC parcels scoring ≥ 85 fell from 61 (including 10 cemeteries and 9 NYC-Parks properties) to 2, and every borough's top list is private/acquirable land.

Tier	Factor	NY property classes
Effectively unavailable	0.05	cemetery, correctional, state/public forest, parks, conservation, exempt public land
Institutional / dense-occupied	0.30	schools, religious, health, government (6xx), apartments / hotel (411/414/415), trailer parks (416), public-service utilities (8xx) except landfill
Redevelopable with effort	0.60	recreation (golf, marina, camps, 5xx), mining/quarry (720), private forest (91x/92x)
Owner-occupied, leasable field	0.90	all residential (2xx)
No friction	1.00	agricultural (1xx), vacant (3xx), commercial land (other 4xx), industrial (other 7xx), landfill (852)

Open water is treated as a physical impossibility, not a friction: the Stage-0 net_buildable_acres computation subtracts the deduped undevelopable UNION (wetland + floodplain + NHD open water, DL-071/073), and a parcel whose union clears 90% is hard-excluded unless a MEASURED contiguous dry block at or above the archetype floor rescues it (gate 8, DL-075/077). The v0.6.3 mask was inert (DL-070 defect 2: the NHD polygons were ingested but never intersected); v0.6.4 wired the intersection statewide. Live: 49,706 water/undevelopable exclusions, labeled truthfully since v0.6.5 (DL-077) — open_water (525) only where open water itself clears 90%, undevelopable_land (49,181) for the wetland/floodplain-dominant union (the v0.6.4 label told 49,208 dry floodplain parcels they were under water). Of the 50 v0.6.4 union kills, 27 had a contiguous dry block over the 8-ac floor and are restored with buildable capped at the measured block, including the DL-075 motivating state lake (81.9 on 8.36 contiguous ac).

Within the parcels it reaches, this is a one-variable change: the eligible land population is unchanged at 103,984 (the friction dampens, it never gates), the value boxes are unchanged at 805,528, and the non-frictioned ("none"-tier) parcels are byte-identical to live across all 2.6M rows except one parcel whose +1.4 traces to a discount_farmland input drift (DL-070; the friction did not move it). The friction cleaned the upstate shortlist: the score-≥-85 top drops from 4,945 to 2,227 with zero upstate cemeteries, parks, or institutions remaining, and the unavailable tier (2,748 sites) collapses to an average score of 1.9. The NYC bypass that let parks, cemeteries, and institutions surface at the top of the statewide ranking (DL-070) is CLOSED as of the v0.6.5 flip (DL-071..077; see the reach paragraph above). The model_version was re-labeled v0.6.3 at that first flip, since a new discount changes scores (DL-068); the live engine is v0.6.5 (§8).

With the Stage-0 gate currently non-binding, the multiplicative structure carries the gating in practice: a severe discount dominates the additive core no matter how strong that core is. A ground-mount parcel with every value function at 1.0 (Σ wᵢ vᵢ = 1.0) that sits inside a Con Edison network area scores 1.0 × 0.15 = 0.15 → 15, regardless of how strong its grid and land are: the no-export constraint dominates.

Figure 9 · Multiplicative friction discounts dⱼ

FARMLAND

d = 1 − 0.15·pct

FLOOD + WETLAND

rooftop · 1−0.30·pct

QUEUE SAT.

d = 1 − 0.60·sat

CONED-NETWORK · CONSTANT dⱼ

missing input → 1.00  ·  dashed cap = no friction (1.0)

Friction multiplies, never subtracts; a missing input resolves to 1.0. Flood/wetland is a rooftop-only discount. For ground-mount the same coverage is counted once, via the net-buildable-acres reduction (DL-020), so it is neutral here. The ConEd-network discount is a near-gate for export-scale ground-mount (0.15) and soft for rooftop (0.85).

Figure 10 · How friction compounds: the multiplicative discount cascade

A worked rooftop parcel begins with a perfect value-sum (100) and loses a compounding share to each friction: prime farmland (×0.90), flood/wetland (×0.85), a saturated interconnection queue (×0.70), and a ConEd-network export limit (×0.85). It lands at a score of 45.5. Because the discounts multiply rather than subtract, no single strength can pay a constraint back, and a fatal one (ground-mount in a network area, ×0.15) collapses the score on its own.

Land-use / acquirability: the multi-source determination and the expansion playbook

Whether a parcel can be acquired and built for solar is the single signal that has caused the most defects in this model (DL-068 through DL-073). The reason is structural. "Land use" is not one trustworthy field; it is a determination assembled from several imperfect, jurisdiction-specific sources, and no single source is authoritative. Getting it right is a method rather than a lookup, and the method has to port to the next county and the next state.

The principle. Combine multiple structured signals per parcel, rank them by reliability for that jurisdiction, take the most-restrictive result (a min over the friction rates, so a stronger signal can tighten but never loosen), and flag disagreements rather than silently resolving them. Owner-name text is never a primary signal; it is too brittle (free text, abbreviations, indirect ownership). Structured fields and authoritative datasets carry the determination.

The sources, and their reliability by jurisdiction.

Signal	Source	Reliability	Failure mode
Assessment land-use class	NYS 3-digit class (upstate); NYC 2-digit DCP land-use	Upstate codes reliable (descriptions corrupt). NYC coarse (09 lumps parks/cemeteries/golf) and institutions mis-coded as vacant.	Coarse buckets; mis-coding
Protected lands	USGS PAD-US (Fee + strict easements; working-lands easements excluded)	Good for public fee land. GAP status is not an acquirability filter.	Over-tag of productive land
NYC structured ownership/use	NYC PLUTO OwnerType + BldgClass (joined by BBL = `print_key`)	Authoritative for NYC public/institutional ownership; catches the assessment mis-coding.	NYC-only

The determination is effective_land_use_tier (pipelines/score/components_v2.py): the most-restrictive of the class-code tier, PAD-US-protected becoming unavailable, and the PLUTO institutional/public tier. When the structured signal is stricter than the class code (a school the assessment calls "vacant"), the engine records a landuse_source_conflict flag instead of overriding silently. It is archetype-aware (the rooftop lens keeps its own building-aware logic) and fail-safe (a parcel with no overlay signal returns the class tier unchanged).

Failure-mode catalog (what this run taught us, so the next jurisdiction inherits it).

Coarse-code bypass. A fail-safe that returns "no friction" for any unrecognized code silently exempts a whole classification scheme. NYC's 2-digit codes hit the 3-digit fail-safe and the boroughs' parks and institutions scored at the top (DL-070). Check on every new jurisdiction: does every code reach a real branch, or does it fall through?
Institutional mis-coding. Assessment rolls mis-classify institution-owned land (a school coded "vacant"). The reliable catch is a structured ownership or exemption source. The class code misses it and an owner-name regex is too brittle (DL-073 D3).
Protected-land over-tag, and the over-reach of its own fix. A national protected-lands layer over-tags productive land. PAD-US agricultural and forest-stewardship easements keep land in farming or forestry, which is solar-compatible, so treating them as unavailable stripped prime inventory (DL-073 D1). GAP status encodes biodiversity intent, not acquirability (DL-072). The fix itself can over-reach: excluding easement types by name, the first cut also dropped 'Recreation or Education Easement', which is a public conservation restriction (land trusts, Audubon sanctuaries) and not working land, so it wrongly freed protected parcels (DL-075). Enumerate the exact types and confirm each one is genuinely productive and acquirable before excluding it.
Layer double-count. Summing overlapping exclusion percentages (wetland, floodplain, open water) over-subtracts where the layers overlap. Use the spatial union of the geometry, floored at the largest single member so it can never read below any one layer (DL-071 D2).
Verification blindness. Confirm-it-works verification is structurally blind to a whole-segment bypass and to a feature that never ran. Adversarial assume-it-is-wrong verification (independent skeptics trying to break each claim) is mandatory before any flip; it has caught a real defect on every snapshot in this run.
Concurrency hazard. The score and environmental recomputes are not concurrency-safe (the parcel-id list refresh races). Run them per-jurisdiction sequentially, never in parallel.
Single-parcel-recompute corruption. Recomputing one parcel while its source table is mid-rebuild can silently corrupt a flag (it produced 0% farmland where 9.8% was correct, DL-070). Spot-verify any recompute against a fresh re-compute on a sample before trusting it.
Un-wired safeguard rot, and the gap a fix exposes. A safeguard built as a standalone post-pass, not wired into the score pipeline, silently stops running when a later run forgets to invoke it. The contiguity cap (DL-051), which demotes a parcel whose dry land does not form a contiguous block, ran for one snapshot then fell out of the re-score sequence for the next five, including the live one, so water-fragmented parcels passed the size gate (DL-075). Wire a safeguard into the pipeline so it cannot be skipped, or it rots. And a fix that removes a conservative error can expose a latent gap the error was masking: de-double-counting the buildable union re-admitted water-fragmented parcels the old over-counting had hidden, surfacing the dropped contiguity cap. After removing a conservative bias, re-audit the cases it used to suppress.

Expansion playbook (onboarding a new county or state).

Land-use source. Identify the jurisdiction's land-use classification, its coding scheme, and its known mis-coding modes. Confirm every code reaches a real branch in land_use_tier, with no silent fall-through.
Structured ownership/use. Find the authoritative structured ownership or exemption source (the institutional signal) and the key it joins on. Owner-name text does not qualify.
Protected land. Use PAD-US for the jurisdiction, exclude working-lands easements, and never gate on GAP status.
Precedence and conflict flag. Wire the new signals into effective_land_use_tier as additional min candidates and emit a conflict flag when they disagree with the class code.
Validation gate. Before trusting the jurisdiction, sample N parcels (start at about 50) against ground truth (imagery and the actual use), measure the land-use error rate, and record it. The jurisdiction is not trusted until this gate is run.
Adversarial audit. Run the full invariant-plus-adversary audit against the new snapshot and require a clean verdict before flipping it live. The audit is the gate, not the spot-check.

Stage 2: Economic & interconnection-risk signals (pass-through)

Stage 2 surfaces economic and interconnection-risk signals alongside the score (for ranking and diligence) rather than folding them into the single number, because they live on scales not commensurable with a 0–1 suitability. Two signal fields ride on every score row. queue_saturation also drives discount_queue in Stage 1, so it is a real scored input as well as a displayed metric; in the live join, a feeder with no published queued-DG figure surfaces as saturation 0 (§3.3). Value-stack eligibility (VDER/LSRV: New York's Value of Distributed Energy Resources tariff and its Locational System Relief Value adder) is currently unpopulated: the field exists in the schema and the API but is false for every parcel, pending an LSRV-area ingest.

A size-aware interconnection-risk band is also implemented, as a coarse size-only first cut.² It maps a parcel's buildable capacity (ground-mount ≈ net buildable acres ÷ ~8 ac/MW-AC; rooftop ≈ usable roof area ÷ ~8 m²/kW) onto three bands: low below 300 kW-AC, moderate from 300 to 500 kW-AC, and high at or above 500 kW-AC. These breakpoints predate the verified mapping of New York's SIR review tracks (50 kW-AC or less, the simplified process; above 50 up to 300 kW-AC, the expedited path with a UL 1741 supplement SA inverter; above that toward the 5 MW-AC ceiling, full review with a Coordinated Electric System Interconnection Review (CESIR) likely)³ and diverge from the regulatory bright lines at 50 and 300 kW-AC; a code fix is queued for the next re-score. Unknown size → null; the band has no favorable default. A parcel has no intrinsic project size, so the band is a proxy. The optional enrichment with feeder-level CESIR study-cost percentiles depends on further data ingest (§7).

Stage 3: Confidence / data readiness (`high` / `medium` / `low`)

This label is a data-readiness (fitness-for-use) rating: it reports whether a parcel's inputs are fresh, complete, and in-scope. It is not a probability, it is not calibrated to any outcome, and it does not measure site quality (a low-scoring parcel can be high-readiness, and vice versa). It stays orthogonal to the 0–100 score and is never multiplied into it.⁴ It is gated purely on data quality, using these thresholds:

Threshold	Value	Meaning
`_HC_FRESH_DAYS`	220 days	≈ 7 months, within the semi-annual hosting-capacity refresh cycle.
`_HC_STALE_DAYS`	400 days	> ~13 months: the feeder has missed a refresh.
`_CLOSE_FEEDER_M`	2,000 m	A confidently close, well-mapped feeder.
`_FAR_FEEDER_M`	10,000 m	A feeder far enough that the linkage is doubtful.

Rules. Return low if any of: available_hc_mw, dist_to_feeder_m, or total_acres is null; the parcel is in a Con Edison network area (network hosting capacity is published per area rather than per feeder); the hosting-capacity age exceeds 400 days; or the feeder distance exceeds 10,000 m. Otherwise return high only if all of: the hosting capacity is fresh (≤ 220 days) and the feeder is close (≤ 2,000 m) and slope is present. Anything else is medium.

On the live snapshot the three tiers are approximately 61% high, 3% medium, and 36% low (they sum to 100%). The large high share is up from 497,929 parcels (≈ 24%) under the prior v0.4.0 rule. That rise is a definitional change; parcels did not become more suitable. The prior rule capped otherwise-ready parcels at medium whenever their feeder was not reconciled to a canonical substation, a linkage the score never uses.⁵ Removing that requirement reveals that most parcels have fresh, complete, in-scope inputs. The 36% low tier is driven by Con Edison network membership (per-area hosting capacity), null inputs, and stale hosting-capacity dates. The 220-day and 2-km breakpoints are not tuned to rebalance the tier distribution.⁶ The thin medium band (≈ 3%) is the near-binary in-between; a queued per-flag panel will surface which gate is borderline rather than collapse it to one word.

Because this label keys on whether inputs are fresh, it is only as honest as the input data underneath it. Input freshness is now tracked: docs/build-status-and-data-registry.md records every dataset's source, vintage, and refresh path, and a weekly schema-drift tripwire plus a committed freshness manifest watch for drift (DL-063).

The value layer (a separate number, live as of 2026-06-23, DL-064)

Beside the 0-to-100 score, every parcel now carries a value stack in dollars per watt: a value_stack JSON breakdown plus a value_stack_dollar_per_w total. This is a second, separate number. It answers what is this site worth (the incentive dollars a project can claim), where the score answers can you build here. The two are kept apart by design and are never blended: the value adders drive where developers choose to build, so folding them into the screen, or validating the screen against them, would be circular (DL-054). The value stack is never an input to the score, the gate, or any validation metric.

The score is unchanged. The value layer ships score-neutral. The live feasibility_scores view was repointed from feasibility_scores_v2026_06_20 to feasibility_scores_v2026_06_24, a full clone of the prior snapshot with two appended columns and nothing else. The score, eligible, exclusion_reason, and model_version columns are identical row-for-row (0 of 2,594,588 rows differ); the eligible count holds at 2,416,262; the model version stays v0.6.2; and the within-county lift@50 gate stays 1.18, 95% CI [0.863, 1.476], FAIL. The "Current validation status" table (§6) is unchanged. Rollback is a one-line repoint back to feasibility_scores_v2026_06_20. (Those figures describe the DL-064 flip as executed; the live view has since advanced through the lens split and the 084 corrections to v0.6.5, §8. The value layer itself stayed byte-identical through every flip, verified by MD5 at each one, until the DL-083 territory re-key deliberately rewrote it in place on 2026-07-18 — still score-neutral, see "The territory signal" below.)

The adders. Each adder is a published dollar figure looked up from a program schedule, never a calibrated weight. The as-of source is the NY-Sun Program Manual v20 (Sept 2025), with the 2026-03-04 update (pipelines/score/value_adders.py). The live adders, all in dollars per watt:

Adder	Value	Applies when
Community Adder — Con Edison Tranche 1	$0.20/W	DAC-eligible parcel in Con Edison service territory (the only open Community Adder block)
Community Adder — Upstate	null (closed)	The Upstate block is fully allocated; null marks "no open block", distinct from $0.00
Inclusive Community Solar Adder (ICSA) — Con Edison standard / Community Benefit	$0.30/W	DAC-eligible parcel in Con Edison service territory
ICSA — Upstate base	$0.20/W	DAC-eligible parcel in one of the five NY-Sun upstate IOU territories (the `no_cc_no_ca` tier)
Brownfield / Landfill Adder (combined)	$0.15/W	Parcel on a brownfield or NYSDEC-closed landfill site, inside a NY-Sun region

The adders stack. A Con Edison parcel that is DAC-eligible and on a brownfield carries $0.20 + $0.30 + $0.15 = $0.65/W; an upstate DAC parcel carries $0.20/W; a brownfield-only parcel in a NY-Sun region carries $0.15/W; a parcel with no qualifying flag, and any parcel outside the NY-Sun footprint, carries no value box (NULL). On the sixteen live counties the flag tables hold 788,886 DAC-eligible parcels (the corrected 1,736-designated-tract DAC layer from DL-062, not the earlier 4,918-tract bug) and 30,100 brownfield-flagged parcels (414 of them landfill); 782,193 parcels carry a value box (DL-083 re-run, 2026-07-18).

The territory signal (DL-083, 2026-07-18). The region split above keys off the parcel's electric SERVICE TERRITORY: a persistent parcel_territory_flags table assigns each parcel the DPS statewide-layer territory containing its centroid (2,591,686 of 2,594,588 assigned; the 2,902 centroids outside every DPS polygon read "unassigned" and claim nothing). Through 2026-07-17 the split keyed off the ConEd secondary-network proxy (in_coned_network), which under-priced ~115-125k ConEd-territory parcels (all of Staten Island; the proxy's M&S plates exist only in four boroughs plus Westchester) and granted PSEG-LI (Rockaways) and municipal-enclave parcels upstate credits they cannot claim (DL-081/DL-082). The 2026-07-18 re-run in place rewrote 782,193 boxes and removed 23,335 phantom-credit boxes; the score checksum (score, eligible, exclusion reason, model version over all 2,594,588 rows) is identical before and after. PSEG-Long Island runs its own program and municipal utilities are outside NY-Sun, so parcels in those territories carry no NY-Sun dollar; the parcel page states the reason. The proxy keeps its separate score-side secondary-network semantics (dampening, export-constraint, confidence), which are correct and untouched.

Provisional caveats (carried honestly in the box).

The Con Edison Tranche-1 Community Adder is hardcoded at $0.20/W. It should be a live remaining-MW pull (the 100 MW block can fully subscribe and close); until that pull is wired it is marked provisional.
The Con Edison additional-environmental-justice ICSA tier ($0.40/W) is unconfirmed, so the value pass resolves to the $0.30/W standard tier and never auto-claims $0.40/W.
The Agricultural-District / Mineral-Soil-Group-1-4 mitigation dollar is not yet in the box. The MSG-layer ingest is the next task; until it lands, that ag-district dollar exposure (DL-054) is absent from the stack.

Still data-dark. Two signals from the DL-054 plan are not yet represented: the MSG-1-4 ag-district mitigation dollar (pending the layer ingest above) and solar-moratorium flags (pending founder sign-off; the only source is municipality-level, non-exhaustive, and carries no geometry, so it stays a flag, never a gate). A data-quality note: Broome county shows 15,733 brownfield-flagged parcels because NYSDEC remediation site 704038 is a single ~11,984-acre polygon that legitimately blankets thousands of parcels. That is authentic source geometry, not an over-count.

4. The two archetypes

Both archetypes are scored for every parcel; the headline is the first eligible one in the order ground-mount → rooftop.

Ground-mount is the primary land use: utility/community-scale and land-driven. It uses all four ground-mount inputs (hosting-capacity headroom, grid proximity, buildable area, slope) and is subject to the extra Stage-0 gates (slope ≥ 15% and net buildable < 8 ac). A parcel that clears the buildable-area floor is reported as ground-mount.
Rooftop is the community-DG fallback: roof-driven, for parcels too small for ground-mount. It drops slope and buildable land entirely (terrain is irrelevant to a roof) and adds usable roof area. It has no acreage floor, which keeps dense urban parcels (most of New York City) in scope: a small Manhattan lot with a large flat commercial roof can be a strong rooftop site even though it could never host a ground array.

Why first-eligible rather than higher score: the two archetypes renormalize different weight vectors onto non-comparable scales, so a rooftop's roof-area term can numerically out-rank a ground-mount composite on a large rural parcel that is, in reality, a ground-mount site. Preferring the first eligible archetype encodes the real-world precedence (if a parcel can host ground-mount, that is the project) and avoids a scale artifact. Because rooftop has no buildable-area gate, the vast majority of parcels in the dense-urban counties surface as rooftop headlines, which makes the overall ranking substantially a usable-roof-size ranking modulated by grid capacity (quantified in §6.1).

5. Data and provenance

Each score row cites its parcel, environmental-flag, substation, and hosting-capacity provenance; the remaining inputs (roof, network membership, slope, and the individual environmental layers) are documented at the dataset level here, and a per-county sources endpoint serves the full inventory. The table enumerates the datasets, what each provides, and its refresh cadence. Hosting capacity is assembled from all six New York investor-owned utilities into a single per-feeder layer of 4,766 feeders.

Dataset	Source	What it provides	Cadence
Hosting capacity (6 IOUs)	National Grid, Con Edison/CECONY, NYSEG, RG&E, Central Hudson, Orange & Rockland published HC layers	Per-feeder available PV hosting capacity (MW), queued DG (MW), feeder geometry, utility-named substation, HC refresh date	PV/storage HC semi-annual (≈ April / October); queued-DG monthly where published. Per-feeder as-of dates vary within and across utilities (the live layer spans 2025-04 to 2026-06); Stage 3 prices that staleness per parcel
Substations	National electric-substation point layer (ORNL/HIFLD)	Canonical substation identity, geometry, operator, max voltage, line count	Periodic (infrequent national refresh)
Interconnection queue	NY DPS Standardized Interconnection Requirements (SIR) monthly inventory: all six IOUs fully (NYSEG and RG&E publish a joint file), plus PSEG Long Island partially (energized Y/N only; no withdrawn status)	Per-application status (energized / active / withdrawn), estimated upgrade (CESIR) cost, technology, developer identity (5 of 6 utilities; National Grid redacts it), milestone dates	Monthly (publisher cadence); the live ingested snapshot is as-of 2026-05 (`sir_queue_v_2026_05`, 334,655 rows; PSEG-LI 2025-12). April was lost to a cron stall (DL-065) and is not re-pullable
Parcels	NY statewide tax-parcel service (county-direct for Monroe)	Geometry, total acres, property class, assessed value, owner/address, roll-year vintage	Per refresh
Slope	USGS 3DEP 1/3 arc-second DEM	Per-parcel mean slope (%), derived; the DEM itself is never stored	Per refresh (DEM updates infrequently)
Wetlands	USFWS National Wetlands Inventory (USGS-WIM service)	Per-parcel wetland coverage (%)	Per refresh
Floodplain	FEMA National Flood Hazard Layer (regulatory Special Flood Hazard Area)	Per-parcel 100-year floodplain coverage (%)	Per refresh
Critical habitat	USFWS designated critical habitat	Per-parcel critical-habitat intersection (boolean)	Per refresh
Prime farmland	USDA NRCS Soil Data Access (SSURGO)	Per-parcel prime-farmland coverage (%)	Per refresh
Network areas	Con Edison secondary/area-network polygons	Network-export-constraint membership flag	Per refresh
Buildings / roof	Overture Maps building footprints: a single uniform statewide source (release 2026-05-20.0, ODbL; v0.6.1/DL-052, replacing the prior NYC-Open-Data + Microsoft split)	Per-parcel usable roof area (m²), building count, largest-building footprint	Per refresh
Calibration positives / developer identity	NYSERDA CDG dataset (Complete + Pipeline rows; the contractor column carries developer identity)	Energized community-solar installs spatially joined to parcels for validation; also the only public source naming developers in National Grid territory	Source refreshes monthly; archived monthly by cron to dated snapshots

Five provenance details materially affect interpretation:

Hosting-capacity semantics. Published HC is headroom after connected DG but before queued DG. The per-feeder PV value we use (the maximum nodal HC on the feeder) therefore overstates immediately connectable capacity, which is why discount_queue exists. The queued-DG figure that feeds the discount comes from the utilities' HC layers. The SIR inventory plays a different role: it is the outcome and archival substrate (§6.3).
Feeder selection at score time. A parcel is matched to the nearest feeder line with PV > 0 (including feeders not reconciled to a canonical substation) via a planar nearest- neighbour search re-ranked by true geographic distance. We do not use the reconciled substation point for proximity; doing so once wrongly excluded a parcel 0.47 km from its feeder because that feeder's substation was 16 km away.⁷
Environmental coverage is now statewide-within-footprint. In the live v0.5.1 model the environmental layers (wetlands, floodplain, critical habitat, prime farmland) are populated across all ten scored counties, a correction over v0.5.0, in which they were ingested only over a three-county bounding box and defaulted to neutral elsewhere.⁸ The wetland source was re-ingested from the authoritative USGS-WIM National Wetlands Inventory service, with the per- parcel overlaps independently verified (§8). On the area base, v0.6.0 (DL-051) measures buildable acres from the GIS polygon and subtracts the measured building footprint, so buildable no longer approximates deed acres. The remaining gaps, stated in §7, are that prime farmland was re-ingested footprint-wide but has not been independently ground-truthed, and that no setback or internal-road subtraction is applied beyond the building footprint and the environmental overlap.
Roof-join coverage (v0.6.1, DL-052). Building footprints come from a single uniform statewide source, Overture (release 2026-05-20.0, a deduplicated conflation of Microsoft ML + OSM + Esri), replacing the prior NYC-Open-Data / Microsoft split. 2,247,820 of 2,594,588 parcels (86.6%) carry a parcels_roof row, up from 84.0%; the remainder have no building footprint (largely genuinely vacant land), and an improved-class parcel among them is flagged roof_footprint_missing rather than excluded (§3).
Monthly archival snapshots. Point-in-time snapshots of the SIR queue (5th of each month), all six hosting-capacity layers (6th), and the NYSERDA CDG dataset (8th) are archived monthly by scheduled jobs into dated tables. The published HC and SIR state is not reconstructable retroactively, and the archive begins June 2026; §6.3's temporal-validation panel depends on this cadence.

6. Validation program and results

The validation gate is pre-registered. Validation splits into two families: internal-validity legs that need no outcome data (and are therefore independent of the 10-county footprint), and outcome-validation legs that test the score against real build outcomes (and are fundamentally limited by the presence-only problem of §2).

Current validation status

Leg	Substrate	Status
Internal validity (rank stability, ablation)	v0.5.1 snapshot	Pass; re-run 2026-06-09 (§6.1)
Outcome gate (within-county lift@50)	v0.6.2 snapshot (frozen)	FAIL: 1.18, 95% CI [0.863, 1.476]; loses to size baselines; gate binds (§6.2). RETIRED as a pass/fail criterion (DL-077): the metric was computed on the pre-lens-split rooftop-fallback eligible set, two engine generations behind live (v0.6.5), and tests a demand-confounded construct the program's own pre-registration declares unidentifiable. It stays reported here as the honest historical record, not a gate to re-run.
TGB demand-de-confounded read	v0.4.0, post-hoc	Parity with random (1.082 vs 1.038); exploratory (§6.2)
CESIR cost corroboration	Complete-case SIR	Directionally supportive ($281 vs $112/kW); null under hostile imputation (§6.2)
Conditional-energization gate	SIR queue	Pre-registered; freeze STAGED 2026-07-12 (DL-078) and blocked by design on checklist item 1 — the first developer-conversation answer sets the lift@k operating point, then the freeze is a same-day action. No-peek rule in force (§6.3)⁹. The one confirmatory test designed to be passable; the next validation effort belongs here.

Footprint note: the scored footprint grew from ten counties to sixteen (St. Lawrence, Oneida, Orange, Chautauqua, Oswego, Broome; spec 047, DL-049, DL-050). Every verdict above remains stated on the frozen substrate it was computed on; the wider footprint changes the substrate only for future runs. The live engine is v0.6.5 (data-integrity and reach corrections, DL-070..077); none of those corrections claims predictive validity, so the posture is unchanged.

6.1 Internal validity: what we have shown

These legs touch no outcome data; they ask only whether the ranking is well-behaved. Both legs were re-run on the live v0.5.1 snapshot (feasibility_scores_v2026_06_06, 2026-06-09), frozen in metrics_rank_stability_v0_5_1.json / metrics_ablation_v0_5_1.json; the v0.4.0 runs remain frozen alongside and read the same way.

Leg 1: Rank-stability under weight perturbation. We re-score all eligible parcels from their stored value functions and discount product, perturbing every Stage-1 weight by × (1 ± φ) with φ = 0.25 (±25%), renormalizing to sum 1, across 500 seeded Monte-Carlo draws, and measure the retention of the published top-k shortlist (the fraction of the nominal top-k still present in the perturbed top-k). The recompute-vs-stored sanity check correlated 1.000. Results (n = 2,046,724 parcels; 500 draws; φ = 0.25):

Shortlist	k	Retention (median)	[p5, p95]
Top 0.1%	2,046	0.949	[0.887, 0.987]
Top 1%	20,467	0.937	[0.848, 0.976]
Top 5%	102,336	0.951	[0.860, 0.986]
Top 500 (fixed)	500	0.988	[0.506, 0.998]

The screening shortlist (top 0.1–5%) retains ~94–95% of its members under ±25% weight perturbation, so the exact weight values are not load-bearing. One nuance: the literal top-500 ordering has a wide tail (median 0.988, fifth percentile 0.506) because many rooftop parcels cluster at the score ceiling (v_grid_proximity = 0.95, v_hc_headroom = 1.0) and reshuffle under perturbation. The operative robust unit is the top 1–5% rather than the exact tip. This leg perturbs weights only; perturbing value-function anchors and discount magnitudes is a documented follow-up.

Leg 2: Component ablation. We zero each Stage-1 input's weight in turn, renormalize the survivors, re-score, and measure churn = 1 − Jaccard(nominal top-k, ablated top-k). High churn means the input drives the ranking; near-zero means it is near-dead-weight on current data. Sorted by top-1% churn (n = 2,046,724):

Archetype · input	Top-1% churn	Top-5% churn	Reading
rooftop · roof_area	0.949	0.642	Dominant driver of the rooftop-heavy shortlist.
rooftop · hc_headroom	0.260	0.464	Strong secondary.
ground_mount · buildable	0.217	0.018	Moderate.
ground_mount · hc_headroom	0.201	0.057	Moderate.
ground_mount · grid_proximity	0.161	0.031	Moderate.
rooftop · grid_proximity	0.153	0.235	Muted; see below.
ground_mount · slope	0.046	0.006	Near-dead-weight on the current population.

The rooftop ranking is substantially a usable-roof-size ranking modulated by grid capacity. Rooftop grid-proximity is muted because the Con Edison network override flattens proximity for the many NYC network-area parcels. Slope is near- dead-weight on the current low-weight, ground-mount-only population, but it stays because it is mechanism-correct. Our policy is to cut a component only when it is structurally inert, and slope is merely data-limited.

Neither leg touches outcome data, so both results are independent of the ten-county footprint. Shortlist robustness to weights does not extend to input-data corrections: the v0.5.1 environmental re-ingest retained only 6 of 50 of the prior ground-mount top-50 (§8),⁸ because the ground-mount ranking keys on the buildable-area term. The binding sensitivity is data provenance. Weight tuning barely matters next to it.

Figure 11 · Internal validity 1: rank-stability under ±25% weight perturbation

Across 500 seeded draws, the top 0.1–5% shortlist retains ~94–95% of its members under ±25% weight perturbation, so the exact weight values are not load-bearing. The literal top-500 tip is noisier.

Figure 12 · Internal validity 2: ablation (leave-one-out top-k churn = 1 − Jaccard)

Higher churn = the input drives the ranking. The rooftop shortlist is a usable-roof-size ranking modulated by grid capacity, exactly as designed; slope is near-dead-weight on the current population but kept as mechanism-correct.

6.2 Outcome validation: results

We tested the score against the 500 distinct parcels carrying a completed NYSERDA CDG install (spatially joined from ~1,402 statewide geocoded installs; all non-residential, median 83.9 kW-DC). The pre-registered primary endpoint is within-county lift at the top 50%: the ratio of the installs' density in the model's top-half to the population baseline, computed within county so that geography is not the thing being measured, with a feeder-cluster bootstrap and a gate on the 95% CI lower bound exceeding 1.0 (parity). The live result, with every comparison ranker on the same within-county axis:

Ranker	Within-county lift@50	Gate
Platt v0.5.1 model score (live)	1.156, 95% CI [0.836, 1.462]	FAIL: CI lower bound < 1.0
Building count (1-column sort)	2.000	beats model
Acreage (1-column sort)	1.912	beats model
Roof area (1-column sort)	1.905	beats model
Composite size	1.852	beats model
Random	0.996	n/a
Proximity (1-column sort)	0.472	n/a

The model's point estimate crosses parity (1.156 > 1.0) and it edges random (0.996), but the CI lower bound is below 1.0 (so the gate fails) and the score loses to trivial one-column size sorts: the model's CI upper bound (1.462) sits below every size baseline's point estimate (building count 2.000, acreage 1.912, roof area 1.905). The result is stable across versions (v0.4 was 1.144 [0.833, 1.448] and v0.5.0 was 1.144 [0.824, 1.437]), confirming that the v0.5.1 environmental correction improved the inputs and the shortlist (§8) without moving the measured lift (n = 500; 195 feeder clusters; the interval shifts slightly because the bootstrap is re-run per snapshot).

Decision-makers act on the top of a list, while the gate measures its top half, so we also report the finer operating points (critique C9).¹⁰ This is exploratory reporting on the same substrate; it adds no new gate (full bootstrap detail in docs/validation/metrics_v2026_06_09_liftk.json):

Ranker	Lift@1%	Lift@5%	Lift@10%
Platt v0.5.1 model score (live)	11.20 [7.71, 15.95]	6.36 [4.40, 8.34]	3.86 [2.43, 5.38]
Building count	13.20 [7.88, 20.69]	4.63 [3.20, 6.47]	3.40 [2.78, 4.16]
Acreage	22.40 [15.95, 31.53]	10.88 [8.86, 13.41]	7.56 [7.10, 8.18]
Roof area	26.41 [18.97, 36.97]	12.21 [10.62, 14.52]	7.90 [7.36, 8.53]
Random	0.80 [0.00, 1.49]	0.88 [0.53, 1.29]	0.98 [0.73, 1.28]

The reading is consistent with the gate. The model concentrates installs far above random at every operating point (CI lower bounds 7.7, 4.4, and 2.4), and it edges building count at @5% and @10% on the point estimate (overlapping CIs), but it sits below the pure size sorts (acreage, roof area) at every k. At the operating points developers act on, a one-column size sort beats the screen. (Roof area and building count are computed on the 462 of 500 installs carrying a parcels_roof row, 174 clusters, exactly as the frozen gate run treated missing values.)

Two further outcome angles point the same way:

A demand-de-confounded re-analysis. Using a target-group-background design (background = the 364 install-bearing parcels compared against parcels with usable roof ≥ 200 m², differencing out "demand lives where buildings are"), the model sat at parity with random (model within-county lift ≈ 1.082; random ≈ 1.038). This analysis is exploratory and post-hoc: it ran on v0.4.0, outside any frozen pre-registration.
An upgrade-cost corroboration. Withdrawn interconnection projects showed a median CESIR cost of $281/kW versus $112/kW for energized projects on a complete-case basis (directionally supportive), but this collapsed to a null under a hostile missing-data imputation, and the score- vs-cost link was underpowered (Spearman ≈ 0.00 across only 149 feeders). The sign is stable; the magnitude is not.

Two constraints cap all three outcome legs. The SIR queue spans all 62 New York counties (after county-name normalization) while scoring covers ten, so every score-requiring outcome leg is limited to the scoring footprint. And the available positives are demand-confounded, survivorship-biased, and positives-only: the presence-only / collider problem of §2. We did not tune the model to the test after the gate failed.

Summary: the screen loses to simple size baselines on raw outcome lift and sits at parity with random once demand is differenced out. It is not a demonstrated predictor. Feasibility is not identifiable from positives-only, applicant-conditioned outcomes. The gate fails because the data cannot yet support a predictive claim. The screen passes every internal-validity test in §6.1.

Figure 13 · Outcome gate: within-county lift@50 (fails the pre-registered gate)

The live model (1.156) beats random (0.996) but its 95% CI [0.836, 1.462] crosses parity, so the pre-registered gate fails. The model also loses to one-column size sorts (building count 2.00, acreage 1.912, roof 1.905). This is the presence-only confound of §2: installs cluster on large parcels for developer and market reasons, so a density test rewards measuring demand, not feasibility. The size sorts are not better feasibility models. Reported honestly, not tuned away.

Figure 14 · Score distribution: installs vs full population (live model)

Installs sit modestly above the population by median (58.1 vs 55.2), a weak signal consistent with the gate result. Bars mark p25–p90; the square marks the median.

Figure 15 · De-confounded for demand, the model sits at parity with chance

A target-group-background re-analysis (n = 364 install parcels, compared against a background of parcels with usable roof ≥ 200 m² to difference out “demand lives where buildings are”) puts the model’s within-county lift@50 (1.082) essentially level with random (1.038). The run is exploratory and post-hoc (v0.4.0, not a frozen pre-registration), so we read it as a direction, not a result: once demand is removed, the residual feasibility signal in the available data is near zero.

6.3 The path forward: conditional energization, not feasibility

The conclusion of §6.2 is that feasibility cannot be recovered from the data New York produces, and no quantity of forward data changes that, because the barrier is identification rather than sample size. We change the estimand.

The interconnection queue contains a labeled negative class: withdrawn applications. Among community-distributed-generation applications we observe energized projects, withdrawn projects, and active ones still pending. Conditioning on the fact that a developer pursued a site is not a defect here. It defines the population we predict in. A model trained on applicants and deployed on applicants estimates the correct conditional distribution, and the collider objection of §2, which is fatal to a causal claim about feasibility, does not bind a predictive model used inside its own conditioning regime. The estimand becomes the probability and timing of energization for a project already under development.

Two limits bound this; both were verified against the live data. First, the queue records no reason for withdrawal. We read the current inventory template column by column across utilities, and a withdrawn project carries only a status flag, with no field that separates an interconnection failure from a lost offtake or a financing collapse. The achievable model is therefore all-cause energization: the probability a pursued project energizes. Energization is the label, and the model makes no feasibility claim. Second, the model speaks only to the applicant population. Extending it to the millions of parcels no one has applied to develop is a selection shift; the model does not extend there. The broad 0-to-100 screen stays a separate breadth layer for prospecting; the conditional model is the precision layer that ranks sites a developer is already weighing.

The right method is competing-risks survival analysis, because the queue has two absorbing states that compete, energized and withdrawn, alongside a censored active state. A subdistribution-hazard model yields the cumulative incidence directly, the probability of energization by a given month, which is the quantity a developer or a lender wants. A cost regression sits beside it. Predicting the CESIR upgrade cost in dollars per kW from feeder state and project features is the number that most often kills New York distribution solar, and no public model predicts distribution-scale CESIR cost for New York.

Point-in-time feature snapshotting prevents label leakage. Today's hosting-capacity headroom on a feeder reflects whether past projects energized or withdrew, so joining current grid state to a past application's outcome would feed the model its answer. Every feature must be taken as of the application date. The monthly archival jobs (§5) exist to build this leak-proof panel: features as of the application date, joined to resolved outcomes. The hosting-capacity history exists only from June 2026 forward, so the panel accrues from that date. The model is then validated temporally, trained on everything resolved before a date and tested on what resolves after. That is the validation that mimics deployment.

The gate for that work is pre-registered as a numeric DRAFT artifact, docs/validation/conditional-energization-preregistration.md, with a no-peek rule in force: no estimation may run before the freeze commit hash is recorded.⁹ Three co-primary endpoints must all pass. E1, discrimination: Uno's IPCW time-dependent C-index at the 36-month horizon, 95% CI lower bound > 0.50 and point estimate ≥ 0.60. E2, calibration: the IPCW Brier score reported as an index of prediction accuracy, IPA ≥ 0.05 with CI lower bound > 0 against the covariate-free within-county Aalen-Johansen reference. E3, ranking: within-county lift@10 with CI lower bound > 1.0, point estimate ≥ 1.20, and a paired same-replicate bootstrap difference with CI lower bound > 0 against both the size baseline and the queue-position baseline. The substrate at draft is approximately 1,601 energized, 9,119 withdrawn, and 2,033 censored CDG applications, with PSEG Long Island excluded and a pre-registered Con Edison-excluded sensitivity. The lift@k operating point may be amended once, pre-freeze, from the developer-conversation answer; the freeze then requires founder sign-off. Until that gate is cleared, the product is a screen. Calibration, once the gate clears, will use whichever monotone method the sample supports: beta or Platt scaling while the resolved set is small, and isotonic regression once it is not.

7. Limitations

We group limitations by severity. The first tier changes how the current output should be used; the second is fixable with data and pipeline work already scoped; the third is inherent to a screen of this design and is retained by choice.

Figure 16 · The buildable-area right-skew: 95.7% fall below the ground-mount floor

Net buildable acres are severely right-skewed. Since v0.6.0 buildable is measured from the GIS polygon, so every parcel has a computed value (all 2,594,588): 90.0% carry under two buildable acres, and only about 2.4% reach the 8 to 25 acre range. A flat acreage minimum would erase almost the entire dataset, which is why the gate is archetype-aware: ground-mount requires at least 8 net acres in one contiguous piece, while rooftop has no floor and keeps dense urban parcels in scope.

Tier 1: Material (changes how to use the output today)

The score is a relative suitability index. It is not a probability, and it has not been validated as a predictor of build outcomes (§2, §6.2). This is the headline caveat and the reason it ships labeled improved, not-yet-validated interim.
Weights and value-function anchors are uncalibrated engineering judgments. The rank-stability result (§6.1) mitigates this without removing it: the exact weights are not load-bearing, but they have not been fit to outcomes.
The eligibility gate binds as of v0.6.0 (DL-051). Before v0.6.0 the gate was almost non-binding (172 of 2,449,973 parcels excluded, all in St. Lawrence for no_hosting_capacity_within_range). v0.6.0 changes this two ways: the rooftop archetype now requires a measured building, so buildingless parcels (open water, vacant land) are excluded with no_rooftop_host (170,539 parcels) only where there is positive evidence of no building (vacant class or near-total wetland), a contiguity check demotes 4,794 ground-mount parcels with fragmented buildable land, and the corrected ground-mount buildable (GIS polygon minus building footprint, not inflated deed acres) drops ground-mount eligibility from 106,734 to 99,280. Total Stage-0 exclusions rise from 852 to 171,391. Real buildings the free footprint layer misses (148,500 parcels as of v0.6.1, down from 206,056; DL-052) are kept and flagged roof_footprint_missing, not excluded. Prime farmland still enters only as a Stage-1 discount and cannot produce exclusions. The cost is just 2 of the 500 known installs (0.4%) now gate false-negatives, from building-footprint-layer gaps. The gate now demonstrably filters the population; its safety against real build outcomes remains untested (§6.2).
Parcel records carry duplicates and class/area errors the scorer cannot self-detect (partly corrected in v0.6.2, DL-053). The assessment roll's acreage is GIS-derived from the polygon, so a wrong or duplicated polygon is self-consistent and invisible to the per-parcel scorer; only geometry identity and the assessment class are independent signals. Three defects were found and three handled in v0.6.2 (a data-integrity pass, no score change): (1) ~12% of ground-mount-eligible parcels were byte-identical copies of one physical site (condo / co-op / trailer-park / subdivision sub-units each carrying the parent tract polygon; worst cluster 426 copies of one 87-acre parcel), inflating the candidate pool with ~2.5M phantom acres. v0.6.2 dedups to one representative per site (11,385 excluded duplicate_parcel_record). (2) 804 parcels in the small-residence codes (210/215/220) score ground-mount on >50 acres (class 240 "Rural Residence with Acreage" and 250 "Estate" are large by definition and are NOT flagged); some are real rural homesteads, some are source mis-classifications, so v0.6.2 flags class_area_mismatch rather than excluding (it cannot tell which without imagery). (3) ~2,083 ground-mount parcels carry a taxed structure but no footprint, so the roof is un-subtracted and buildable may be overstated; v0.6.2 flags roof_footprint_missing for ground-mount, mirroring rooftop. A fourth, not yet corrected: 3,033 NYC parcels (0.35%) get an attributed roof larger than their own lot because a building straddling lot lines is assigned wholesale to one parcel (the durable fix clips footprints to parcels). The remaining un-fixable residual is the wrong-polygon case, catchable only against imagery. The durable fix folds this normalization into the scorer.
Non-developable land uses no longer surface as top sites STATEWIDE (spec 084, corrected through DL-071..077, LIVE as v0.6.5 since 2026-07-12). Before spec 084 the eligibility gate tested only whether a panel could physically fit, never whether the land was acquirable, so cemeteries, public parks, golf courses, and institutional land passed the gate and scored as ground-mount sites next to real fields. A graded discount_land_use (§3.3), GROUND-MOUNT only and keyed on the numeric property class, now dampens those uses to the bottom of the ranking with a cited reason rather than blocking them. The fix is LIVE in feasibility_scores_v2026_07_12 (model v0.6.5); rollback is the one-line repoint to feasibility_scores_v2026_06_27. The first flip (v0.6.3, 2026-06-27) was a PARTIAL fix with three defects (DL-070), all since closed. First, the friction reached only the eleven upstate counties (3-digit NYS class codes), where the score-≥-85 top dropped from 4,945 to 2,227 with zero cemeteries, parks, or institutions remaining, while the five NYC boroughs' 2-digit NYC-DCP land-use codes (854,092 parcels) hit the keying fail-safe and were bypassed — closed in v0.6.4 by keying the NYC codes plus the PAD-US (DL-072/073/075) and NYC PLUTO (DL-074) overlays; NYC parcels ≥ 85 fell from 61 to 2. Second, the paired NHD open-water mask was inert everywhere (the 1,747 NHD polygons were ingested but never intersected) — closed in v0.6.4 (statewide intersection; 49,706 live water/undevelopable exclusions, labels split truthfully in v0.6.5, DL-077). Third, class 315 (underwater land) was mis-tiered to "none"/1.00 — closed in v0.6.4 (DL-070 fix). The full defect chain and fix history: docs/084-defects-and-fix-plan.md + DL-070..077.
The rooftop usable-area derate is selected at the parcel level rather than per building. The derate (0.60 flat / 0.25 pitched) is chosen from the parcel's property class and applied to its single largest building, and roof area is the dominant rooftop driver (ablation top-1% churn 0.95). For a mixed-use parcel (residential class, large flat commercial roof) this can mis-estimate usable roof by up to ~2.4× (the ratio of the flat to the pitched derate). The fix is per-building, class-aware derating;¹¹ the per-building roof signals exist only for NYC while the affected shortlisted parcels are upstate, so the durable fix is LiDAR-derived and is gated on outcome validity.¹²
Reported roof area is a roofprint and may overstate the true footprint. Upstate building footprints are satellite-traced roofprints; because usable roof area dominates the rooftop ranking, this systematic inflation may be as material as the derate approximation, and no height or roof-type source currently corrects it.¹²

Tier 2: Fixable with data / pipeline work (scoped)

The property_class_desc text field is unreliable; trust the numeric class code. Spot-checks (2026-06-20) found wrong descriptions filed under correct codes (e.g. "Used-car sale" under a one-family-residence code). The scorer and the class/area integrity flag use the numeric ORPTS code, which is clean, so scoring is unaffected; but the parcel page should render from the code (or a trusted code-to-label map), not this field. Flagged for a future release.
Coverage is sixteen New York counties. The breadth program (spec 047, DL-049, DL-050) grows the footprint toward 25 counties, ranked by queued community-DG MW; St. Lawrence, Oneida, Orange, Chautauqua, Oswego, and Broome have joined. Statewide remains a roadmap item; every score-requiring outcome test is currently capped by this footprint (§6.2), and every frozen gate metric was computed on the prior ten-county substrate.
Prime farmland was re-ingested footprint-wide in v0.5.1 but has not been independently ground-truthed.⁸ On buildable area, v0.6.0 (DL-051) closed the larger gap: net buildable acres is now the GIS polygon minus the measured building footprint minus the environmental overlap, so it no longer approximates deed acres. What remains is finer-grained: no setback or internal-road subtraction beyond the building footprint (reV-style structure, property-line, and road buffers need NY-parcel calibration), and the wetland-floodplain exclusion is approximated by summing the two polygon-relative percentages rather than computing a true geometric union.
Stage 2 is partially a stub. The LSRV value-stack flag is unpopulated: it is false for every parcel, pending an LSRV-area ingest, and exists today as schema and API plumbing only. Queue saturation is populated on every row, with unpublished queued-DG surfacing as 0 (§3.3). The size-aware interconnection-risk band is implemented as a size-only first cut;² its breakpoints diverge from the SIR bright lines pending a code fix (§3, Stage 2), and its optional feeder-level CESIR-cost enrichment remains data-gated.
Usable-roof-area has error sources the model does not capture: rooftop obstructions, fire-code setbacks, partial-polygon-vs-true-roof, tree/scan occlusion, footprint-vs-imagery temporal mismatch, and multi-building parcel linkage. Any of these can dominate usable-area error even with a correct flat/pitched derate, which is why a heavier roof fix is gated on outcome validity rather than pursued now.¹²
Substation reconciliation is incomplete. Feeder-to-substation name matching achieves a 52.8%–87.1% match rate across utilities, an any-tier match fraction (exact, whole-token subset, or 2 km nearest); it has not been validated as recall against a gold set. It affects only the data-readiness label and the substation citation. The grid score is untouched: it keys on the nearest feeder line, and unreconciled feeders are still scored.⁷
Hosting capacity is a periodic snapshot. A parcel's score reflects the last published grid state (headroom before queued DG, hence the queue discount); the data-readiness label is gated on its recency (§3, Stage 3).

Tier 3: Inherent / minor (retained by choice)

Slope is near-dead-weight on the current population (ablation top-1% churn 0.046). It stays because it is mechanism-correct, but it currently barely moves the ranking.
The buildable-area distribution is severely right-skewed. On the v0.5.1 data the 90th-percentile parcel has only ~0.95 buildable acres, and only ~18,224 parcels (~0.9%) have ≥ 25 usable acres, which is why the gate is archetype-aware (ground-mount ≥ 8 net ac; rooftop has no floor), so that dense urban parcels are not erased.
The Con Edison-network rooftop discount (0.85) is provisional and evidence-contradicted. The realized CDG population skews to larger non-residential systems (median ~84 kW-DC). The 0.85 softening assumed small net-metered systems, so the rooftop network discount is likely too lenient (it should probably sit closer to the ground-mount 0.15 than to 1.0). We have not yet re-fit it because doing so credibly depends on the same outcome data the gate awaits.

8. Version history: v0.1 → v0.6.5

The score model is versioned independently of any release tag. The current live engine reports v0.6.5; the legacy additive engine reports v0.1.0. The live view is repointed with a single statement and is instantly reversible: the v0.6.4 (engine artifact feasibility_scores_v2026_07_01), v0.6.3, v0.6.2, v0.6.1, v0.6.0, v0.5.1, v0.5.0, v0.4.0, v0.3.0, and v0.1.0 snapshots are all retained as rollback targets.

Current live: v0.6.5 (the 084 defect chain closed + integrity re-pass + truthful labels; LIVE, flipped 2026-07-12, DL-077). Built on the v0.6.4 re-score (feasibility_scores_v2026_07_01, the DL-075 fixes: NYC 2-digit land-use keying, statewide NHD open-water intersection, class 315, PAD-US working-lands/rec-edu calibration, NYC PLUTO overlay, contiguity pass re-wired — all verified fixed by the 2026-07-11 adversarial re-audit, DL-076). v0.6.5 adds: the DL-053 integrity dedup re-applied (11,582 duplicate-geometry eligibles demoted), the union water exclusion labeled truthfully (undevelopable_land vs open_water, DL-077), and a contiguity-aware water gate that restored 27 wrongly-killed parcels on their measured contiguous dry blocks. Live view feasibility_scores_v2026_07_12; land-eligible 88,899; value boxes byte-identical through every flip since DL-064. Rollback: feasibility_scores_v2026_06_27.

v0.6.3 (land-use acquisition friction + open-water mask, spec 084; flipped 2026-06-27, DL-068; PARTIAL fix, three defects per DL-070, all closed by v0.6.4/v0.6.5). A scoring-logic change, not a data patch. A fifth ground-mount-only multiplicative discount discount_land_use (§3.3) prices how hard a parcel's land use is to acquire, on five tiers keyed on the numeric property class (unavailable 0.05 / institutional 0.30 / redevelopable 0.60 / residential 0.90 / none 1.00), failing safe to 1.00 on an unknown class. An NHD open-water mask was specified to subtract pond and lake area from net buildable acres and hard-exclude parcels that are > 90% open water (open_water); the post-flip audit found it inert in the live snapshot (DL-070, below). Re-scored into feasibility_scores_v2026_06_27 (all 2,594,588 parcels, batch_v2 plus the score-neutral value_pass): land-eligible 103,984 and value boxes 805,528 are both byte-identical to the prior live (v2026_06_25; the friction dampens, it never gates), and the upstate score-≥-85 top cleaned from 4,945 to 2,227. LIVE: flipped on the founder's go 2026-06-27, the snapshot re-labeled model_version v0.6.3; the live view points at feasibility_scores_v2026_06_27. Rollback is the one-line repoint to feasibility_scores_v2026_06_25 (v0.6.2, DL-067).

084 is a PARTIAL fix (post-flip audit, DL-070). An adversarial re-audit of the live snapshot confirmed three open defects. (1) The land-use friction is silently bypassed across all five NYC boroughs: 854,092 parcels carry 2-digit NYC-DCP land-use codes, which the keying fail-safe (if len(c) < 3 ... return "none") maps to the no-penalty "none" tier, so the friction reaches only the eleven upstate counties and NYC parks (97.1), institutions (97.3), and cemeteries (95.9) still score at the 99th percentile undampened. (2) The NHD open-water mask is inert everywhere: open_water_pct = 0 on all 2,594,588 rows, 0 open_water exclusions, the 1,747 NHD polygons ingested but never intersected into the flag. (3) Class 315 (underwater land) is mis-tiered to "none"/1.00, while sibling water classes 971/972 correctly get 0.05. Live was NOT rolled back, because a partial fix is better-or-equal to pre-084 (upstate improved, NYC unchanged). The fix cycle ran DL-071..075 into the v0.6.4 re-score (feasibility_scores_v2026_07_01), the gating adversarial re-audit ran 2026-07-11 (DL-076: all prior defects fixed; three new findings), and the corrected v0.6.5 flipped live 2026-07-12 (DL-077). All three DL-070 defects are CLOSED in live.

v0.6.2 (parcel-record integrity pass, DL-053). NOT a scoring-logic change and NOT a score change: every parcel's 0-100 is byte-identical to v0.6.1 (verified, zero parcels differ). A post-score data-integrity layer (pipelines/score/integrity_pass.py) changes only which parcels surface: 11,385 duplicate parcel records (byte-identical polygons; condo / trailer-park / subdivision sub-units sharing one parent tract) dedupe to one representative per site and are excluded (duplicate_parcel_record); 804 class/area-mismatch and 2,083 ground-mount footprint-missing parcels are flagged, not excluded. Ground-mount eligibility 99,164 -> 87,779; total exclusions 171,391 -> 178,326. Applied to feasibility_scores_v2026_06_20. Gate re-run: within-county lift@50 1.18 [0.863, 1.476], still FAIL (a data-integrity change, not a predictive gain; metrics_v2026_06_20.json). Rollback = v2026_06_16 (v0.6.1).

v0.6.1 (footprint-source upgrade, DL-052). A data change, not a scoring-logic change (parallel to v0.5.1). Building footprints were unified to a single statewide source — Overture (release 2026-05-20.0, a deduplicated conflation of Microsoft ML + OSM + Esri) — replacing the NYC-Open-Data / Microsoft split. This closes the upstate coverage gap (statewide footprint coverage 84.0% -> 86.6%, +68,020 parcels; St. Lawrence 55 -> 69%, Albany 74 -> 83%) and removes the per-county source-quality artifact, so a parcel's score no longer depends on which dataset covered its county. Re-scored across all 16 counties (feasibility_scores_v2026_06_16). Gate re-run: within-county lift@50 1.18 [0.865, 1.469], still FAIL (a data change, not a predictive gain); roof_footprint_missing falls 206,056 -> 148,500 (metrics_v2026_06_16.json).

v0.6.0 (the land-reality correction, DL-051). A scoring-logic change, not a data patch. Ground-mount net buildable acres is now measured from the GIS polygon with the building footprint subtracted, instead of trusting the assessment-roll deed acreage, and the slope-halving in the area term is dropped (slope is already a hard gate plus a graded penalty). Rooftop eligibility now requires a measured building (no_rooftop_host), so open-water and vacant parcels no longer carry rooftop scores. The change needs no new data (polygon, building footprints, and environmental percentages were already stored) and rides one 16-county re-score into a fresh snapshot, flipped after verification. Population impact: ground-mount eligible 106,734 -> 99,280; rooftop eligible 2,487,002 -> 2,323,917; Stage-0 exclusions 852 -> 171,391 (170,539 the rooftop no-build gate, plus 4,794 ground-mount parcels demoted for fragmented buildable land). 205,318 real buildings the free footprint layer misses are kept and flagged, not excluded. The frozen within-county gate is 1.16 [0.84, 1.45], still FAIL (a correctness change, not a predictive gain), with 2 of 500 installs now footprint-gap false-negatives (§6; metrics_v2026_06_14.json).

v0.5.1 (live 2026-06-05 to 2026-06-14).¹³ That change was a data correction; the scoring logic is unchanged. The environmental layers were re-ingested (most consequentially a wetland source-swap to the authoritative USGS-WIM National Wetlands Inventory service) and per-parcel flags were recomputed across all ten counties. This corrects an env-blind footprint in v0.5.0 (the environmental tables were largely unpopulated when v0.5.0 was scored), moving ~11% of parcels. The same frozen gate returns within-county lift@50 = 1.156 [0.836, 1.462]: still failing, still below the size baselines (§6.2). The flagship ground-mount shortlist, however, changed materially (the live top-50 retained only 6 of the prior 50) because v0.5.0 had been surfacing parcels that are genuinely 50–60% wetland; the corrected list is clean. The correction was adversarially verified, which caught and forced the fix of an environmental under-count in one county before the change went live.⁸

v0.5.0 (live 2026-06-04 → 06-05). A correctness pass with three engine changes: the flood/wetland double-count resolved (the discount made archetype-aware);¹ the data-readiness label decoupled from substation-name reconciliation;⁵ and the Stage-2 interconnection-risk band implemented.² A regression run through the same frozen gate held the lift unchanged (1.144 [0.824, 1.437]); the fixes improved construct validity without moving the outcome correlation.

v0.4: fixing the anti-ranking finding. The first outcome-ranking validation found that v0.3 anti-ranked energized installs (they landed below the population median; lift@50 0.39, worse than random). The diagnosis identified two surfacing-blocking defects, fixed together in one re-score: (1) the Con Edison-network proximity artifact (the radial-feeder distance inside network areas was a coverage artifact suppressing strong urban parcels) was replaced by PROX_NETWORK_DEFAULT = 0.95; and (2) the rooftop archetype, which had no size input, gained v_roof_area (weight 0.30, from the buildable + slope share rooftop does not use). v0.4 then failed the pre-registered gate (1.144 [0.833, 1.448]) but passed both internal-validity legs, and was flipped live as a labeled improved, not-yet-validated interim.

v0.3: queue and network signals became real scored inputs (discount_queue and discount_coned_network began firing on live data), the grid-proximity curve was re-anchored to interconnection-cost economics (full decay by 5 km), and the network rooftop discount was softened from 0.60 to 0.85.

v0.2: the four-stage gated additive-multiplicative re-architecture that replaced v0.1's purely additive sum: the hard eligibility gate, the multiplicative (Σ wᵢ vᵢ) · Π dⱼ formula, continuous monotone value functions, multiplicative discounts with neutral 1.0 "unknown" defaults, the two-archetype structure with the first-eligible headline rule, and a confidence model in which high is reachable. A mid-cycle fix corrected feeder selection to use the nearest feeder line rather than the reconciled substation point, resolving ~1,489 wrong exclusions.

v0.1: the initial compensatory additive model (superseded). Eight inputs contributed signed points against fixed caps. It had four fatal defects that motivated the rewrite: disqualifiers were subtractable penalties (a high grid score could out-vote a fatal constraint); three inputs (queue, opposition, VDER) were frozen at favorable neutrals, so ~30 of 100 points carried no signal and inflated scores; the buildable term used coarse, mis-scaled step bands; and high confidence was structurally unreachable.

Document errata (post-release corrections to this document, after v0.5.1 shipped).

2026-06-06. Three regulatory anchors were corrected after verification against the primary SIR document: the SIR effective date (August 1, 2025; previously misstated as March 2025), the interconnection review tracks (≤ 50 kW-AC simplified; above 50 up to 300 kW-AC expedited with a UL 1741 supplement SA inverter, previously misstated as supplement SB; full review toward the 5 MW-AC ceiling above that), and the v_buildable acreage anchors (re-attributed from the Model Solar Energy Local Law, which carries no acreage ratio, to the NREL land-use benchmark, Ong et al. 2013). §6.3 was reframed to all-cause conditional energization in the same change.³
2026-06-09. The posture was corrected from "at parity with simple size baselines" (wrong: the model's CI upper bound sits below every size baseline's point estimate) to "loses to size baselines"; "feasibility score" was relabeled "screening score"; "non-compensatory" was relabeled "gated additive-multiplicative"; and the exploratory lift@1/5/10 table was added to §6.2.¹⁰
2026-06-09. The §6.3 conditional gate was pre-registered as a numeric DRAFT artifact with a no-peek rule,⁹ and an external verification round closed the cost-sharing (C5) and VDER-attribution (C6) items now reflected in §3.2 and the glossary.¹⁴
2026-06-11. The §3.2 cost-sharing description was restated on secondary-source support: the 2% cap on a utility's unrecovered upgrade costs is attributed to the cost-sharing proceeding (Case 20-E-0543; compliance filings approved April 14, 2022), and the order's paragraph citation is marked pending verification, since two attempts to pin it in the primary text failed. A document-wide voice pass landed in the same change (executive summary reordered, sentence structure varied, bolding reduced); no number, table, threshold, or section heading changed.

Figure 17 · The env correction: a cleaner flagship shortlist

Re-ingesting the wetlands data (USGS-WIM NWI) reshaped the top-50 ground-mount shortlist: the six sites that were ≥ 50% wetland are gone and the mean wetland coverage of the top 50 fell from 14.3% to 2.4%. Only 6 of the prior 50 survive, and the GM-eligible set tightened from 50,371 to 44,229. The correction removes false-positive wetland sites; it does not move the validated gate (§6.2).

Glossary

CDG (community distributed generation): New York's program for shared community-solar projects; the source of our completed-install positives.
Hosting capacity: the MW of new distributed generation a feeder can accept without grid upgrades, as published per-feeder by each utility.
Feeder: a distribution circuit; the unit at which hosting capacity and queue pressure are published and at which we match parcels to the grid.
Presence-only data: a dataset of observed successes (here, completed installs) with no labeled true negatives; absolute prevalence is not recoverable from it.
Survivorship / demand confound: completed installs reflect feasibility and developer demand, market access, and timing jointly; the four are not separable from successes alone.
Collider / Berkson selection effect: conditioning on a variable downstream of two causes (here, interconnection application) induces a spurious association between them; it opens, rather than closes, the feasibility–demand confound.
Target-group-background (TGB) design: a presence-only evaluation that compares positives against a background of plausible candidates (here, roofed parcels) rather than all land, to difference out demand.
Relative suitability index: a score that ranks parcels relative to one another; ordering is meaningful, absolute level is not a probability.
Lift@k / within-county lift@k: the density of true installs in the model's top-k% relative to a baseline; within-county computes it inside each county so the metric is not just measuring geography.
SIR: New York's Standardized Interconnection Requirements; the source of the monthly interconnection-queue inventory (including withdrawals).
CESIR is the Coordinated Electric System Interconnection Review, the full study (and cost) generally required for systems above 300 kW-AC, up to the 5 MW-AC ceiling.
VDER / LSRV is the Value of Distributed Energy Resources tariff and its Locational System Relief Value adder, the value-stack signals surfaced in Stage 2. VDER is a New York PSC construct (Phase One Order, March 9, 2017), implemented with NYSERDA. NYISO has no role in it.
UL 1741 (supplement SA) is the inverter certification the current SIR requires for the expedited path, which is open to systems above 50 kW-AC up to 300 kW-AC. Supplement SB, aligned to IEEE 1547-2018, is the newer revision; the expedited exception in the current SIR text cites SA.
DC:AC ratio: the ratio of a project's panel (DC) nameplate to its inverter (AC) rating (≈ 1.3 here); the regulatory ceilings are stated in AC.

Notes and references

This page is the public methodology of record, generated from the same document that governs the production engine. The references below are entries in Platt's internal, dated decision log and build specifications, which record when each choice was made and why; the full log, the frozen pre- registrations, and the source code are available to partners under diligence.

Flood/wetland discount made archetype-aware: neutral for ground-mount (where the usable- area loss is already counted in net buildable acres), retained for rooftop. ↩ ↩²
Stage-2 size-aware interconnection-risk band implemented as a size-only first cut with breakpoints at 300 kW-AC and 500 kW-AC. These predate the DL-030 SIR verification and diverge from the regulatory bright lines at 50 and 300 kW-AC; a code fix is queued (§3, Stage 2). ↩ ↩² ↩³
Regulatory anchors verified against the primary SIR document and corrected (2026-06-06): effective date August 1, 2025; review tracks ≤ 50 kW-AC simplified, above 50 up to 300 kW-AC expedited under UL 1741 supplement SA, full review toward the 5 MW-AC ceiling; the 8/40-acre anchors re-attributed to the NREL land-use benchmark (Ong et al. 2013). The Stage-2 band's 300/500 kW-AC breakpoints predate this verification; a code fix is queued. ↩ ↩²
"Confidence" reframed as a data-readiness (fitness-for-use) rating; a field rename and a per-flag readiness panel are queued. ↩
Data-readiness label decoupled from substation-name reconciliation (the score keys on the nearest feeder line; the reconciled substation plays no part in it). ↩ ↩²
Data-readiness research: cutpoints are not moved to rebalance the tier distribution. ↩
Feeder selection keys on the nearest feeder line (including unreconciled feeders) rather than the reconciled substation point. ↩ ↩²
Environmental cascade: NWI re-ingested via the USGS-WIM service, per-parcel flags recomputed across all ten counties, adversarially verified (an Erie under-count was caught and fixed before release). ↩ ↩² ↩³ ↩⁴
Conditional-energization gate pre-registered as a numeric DRAFT (docs/validation/conditional-energization-preregistration.md); the no-peek rule is in force until the freeze commit hash is recorded; companion resume/redirect/kill criteria are pre-specified in docs/re-evaluation-criteria.md. ↩ ↩² ↩³
Posture and naming corrections (2026-06-09): the loses-to-size-baselines posture, the screening-score relabel, the gated additive-multiplicative relabel, and the exploratory lift@1/5/10 table (docs/validation/metrics_v2026_06_09_liftk.json). ↩ ↩²
A per-building, class-aware roof derate (NYC roof-height/feature codes, upstate occupancy) was specified; the shipped pipeline used the parcel-class proxy pending the LiDAR fix. ↩
Roof fixes (per-building derate, roofprint inflation, and the broader usable-area error sources) deferred to a LiDAR-derived solution gated on outcome validity. ↩ ↩² ↩³
v0.5.1 flipped live 2026-06-05; one-line reversible rollback to the v0.5.0 snapshot. ↩
External verification round (2026-06-09): the 2% cost-sharing cap and the 115%-of-CESIR petition (Case 24-E-0621) in §3.2, and the VDER attribution (PSC Phase One Order, March 9, 2017) in the glossary. ↩