The data pipeline from source APIs through SQLite to published pages. This page explains what’s automated, what’s generated, and what’s reviewed.
Every page on KnowThePerch is built from structured data, not manual research. The pipeline runs:
The species database is seeded from the eBird taxonomy, maintained by the Cornell Lab of Ornithology and updated annually. It covers over 10,000 species worldwide. For each species we store: common name, scientific name, family, order, and eBird species code.
Frequency percentages come from the eBird Bar Charts API. They represent the proportion of eBird checklists from a region in a given month that include a sighting.
A frequency of 40% in June means 40% of observers in that region reported the bird that month. This reflects real-world patterns from millions of checklists — not estimates or predictions.
Conservation status is sourced from the IUCN API and displayed on every species profile. Five categories are used:
Descriptive prose is generated by Qwen3 running locally. Each page type has a structured prompt template that injects verified data and requests specific sections.
The model is explicitly prohibited from:
Inventing numerical claims · Contradicting injected data · Manufacturing sightings or observer counts · Asserting confidence the source layer does not support
Generated sections are parsed, cached in SQLite, and reviewed before being pushed to WordPress.
Before publication, every programmatic page passes through automated checks: