About

How we screen bills

How we match bills to the platform

Every bill goes through two complementary matchers — both transparent, both runnable on your laptop in under a second:

  1. Keyword matcher (precise). Each platform plank has a hand-picked keyword list. A word-boundary regex scans every bill's title + summary; a hit is recorded with the exact keyword that fired. This catches the obvious cases ("insider trading" in a bill → matches our plank against stock trading by officials) and is easy to audit — the keyword lists live in data/legislation/platform.json.
  2. TF-IDF cosine similarity (semantic). A standard text-similarity model (term-frequency × inverse-document-frequency) compares each bill against each plank's full description and keywords. Words are stemmed first, so "midwifery" matches "midwives". Hits above a similarity threshold are recorded along with the highest-contributing overlapping terms — those are what you see in the "shares terms" line on each card.

What we don't try to do automatically

Neither matcher knows whether a bill is good or bad from WTPPPA's perspective. They detect topical overlap — that this bill touches a plank in our platform. The alignment call (aligned / mixed / opposed) is a separate, optional editorial step recorded in data/legislation/manual_review.json. Bills without an editorial note show up as under review.

Why no AI scoring

We don't use a language model to decide alignment. Asking an AI "is this bill aligned with our platform?" produces confident-sounding output regardless of whether the model actually understood the bill — and we don't want to launder our editorial judgment through a black box. The math we do use (regex + TF-IDF) is open, deterministic, and reproducible from the same inputs.

The trade-off: surface-level vocabulary matching has obvious limits. The word "transparency" in a bill about parking-fee disclosure isn't the same as "government transparency" — that kind of false positive shows up as a low similarity score in the drill-down, and the editorial review step catches it.

The 5 pillars we check against

Why we only show bill titles

You'll notice every bill on this page is described by its legal title — usually something like "An Act amending the act of June 28, 1995, in section 12, further providing for ...". That isn't because we're being lazy. It's because Pennsylvania does not publish a plain-English summary of bills in any structured, machine-readable place.

We checked. OpenStates — the national clearinghouse that aggregates legislative data from all 50 states — has an "abstracts" field for exactly this purpose. For Pennsylvania bills, it's empty. The state doesn't supply it. The official palegis.us bill pages link to "Co-Sponsorship Memos" that do contain plain-English explanations, but those are PDF documents behind a JavaScript-only interface that returns 403 errors to anyone who isn't browsing with a normal web browser. They are technically public and practically inaccessible.

So a citizen who wants to know what a Pennsylvania bill actually does has to: find the bill on a site that doesn't work without JavaScript, click through to a PDF, and read several pages of cross-references to other laws. The information is "public," but the path to understanding it is hostile.

That's a transparency problem, not a software problem. Other states — Texas and California, for example — publish plain-English bill summaries as part of the legislative record itself. Pennsylvania could too, with one rule change. Until then, we work with what's available.

Update cadence

The bill list is refreshed weekly via a GitHub Actions cron. Editorial reviews are added as bills move through committee. Both the bill data and the platform definitions live in this repo's git history, so you can see exactly when each call was made.

A note on AI

We used Claude, an AI assistant from Anthropic, to help write code and draft copy for this dashboard. The data is not AI-generated — every number comes from a public PA source we cite.

The energy used during development was small — roughly comparable to streaming a few hours of video.

Open source

Everything is at github.com/wtp-pa/dashboards. Suggest a keyword, flag a missed bill, or argue with an editorial call — all welcome via GitHub Issues.

Found this useful? Tip the team.

These dashboards are built and maintained by volunteers. A few bucks helps keep them running and free.

Tip the team