About
How we screen bills
How we match bills to the platform
Every bill goes through two complementary matchers — both transparent, both runnable on your laptop in under a second:
- Keyword matcher (precise). Each platform plank has a hand-picked keyword list. A word-boundary regex scans every bill's title + summary; a hit is recorded with the exact keyword that fired. This catches the obvious cases ("insider trading" in a bill → matches our plank against stock trading by officials) and is easy to audit — the keyword lists live in
data/legislation/platform.json. - TF-IDF cosine similarity (semantic). A standard text-similarity model (term-frequency × inverse-document-frequency) compares each bill against each plank's full description and keywords. Words are stemmed first, so "midwifery" matches "midwives". Hits above a similarity threshold are recorded along with the highest-contributing overlapping terms — those are what you see in the "shares terms" line on each card.
What we don't try to do automatically
Neither matcher knows whether a bill is good or bad from WTPPPA's perspective. They detect topical overlap — that this bill touches a plank in our platform. The alignment call (aligned / mixed / opposed) is a separate, optional editorial step recorded in data/legislation/manual_review.json. Bills without an editorial note show up as under review.
Why no AI scoring
We don't use a language model to decide alignment. Asking an AI "is this bill aligned with our platform?" produces confident-sounding output regardless of whether the model actually understood the bill — and we don't want to launder our editorial judgment through a black box. The math we do use (regex + TF-IDF) is open, deterministic, and reproducible from the same inputs.
The trade-off: surface-level vocabulary matching has obvious limits. The word "transparency" in a bill about parking-fee disclosure isn't the same as "government transparency" — that kind of false positive shows up as a low similarity score in the drill-down, and the editorial review step catches it.
The 5 pillars we check against
- Community Outreach — strengthening community ties and civic participation. (4 planks)
- Health & The Environment — a clean, healthy, and free society. (6 planks)
- Constitutional Integrity — defending civil liberties and government accountability. (9 planks)
- Honest Government — restoring political integrity and accountability. (10 planks)
- Economic Opportunity — economic freedom and financial reform. (8 planks)
Why we only show bill titles
You'll notice every bill on this page is described by its legal title — usually something like "An Act amending the act of June 28, 1995, in section 12, further providing for ...". That isn't because we're being lazy. It's because Pennsylvania does not publish a plain-English summary of bills in any structured, machine-readable place.
We checked. OpenStates — the national clearinghouse that aggregates legislative data from all 50 states — has an "abstracts" field for exactly this purpose. For Pennsylvania bills, it's empty. The state doesn't supply it. The official palegis.us bill pages link to "Co-Sponsorship Memos" that do contain plain-English explanations, but those are PDF documents behind a JavaScript-only interface that returns 403 errors to anyone who isn't browsing with a normal web browser. They are technically public and practically inaccessible.
So a citizen who wants to know what a Pennsylvania bill actually does has to: find the bill on a site that doesn't work without JavaScript, click through to a PDF, and read several pages of cross-references to other laws. The information is "public," but the path to understanding it is hostile.
That's a transparency problem, not a software problem. Other states — Texas and California, for example — publish plain-English bill summaries as part of the legislative record itself. Pennsylvania could too, with one rule change. Until then, we work with what's available.
Update cadence
The bill list is refreshed weekly via a GitHub Actions cron. Editorial reviews are added as bills move through committee. Both the bill data and the platform definitions live in this repo's git history, so you can see exactly when each call was made.
A note on AI
We used Claude, an AI assistant from Anthropic, to help write code and draft copy for this dashboard. The data is not AI-generated — every number comes from a public PA source we cite.
The energy used during development was small — roughly comparable to streaming a few hours of video.
Open source
Everything is at github.com/wtp-pa/dashboards. Suggest a keyword, flag a missed bill, or argue with an editorial call — all welcome via GitHub Issues.