Methodology

How every number reaches the page.

The data pipeline from source APIs through SQLite to published pages. This page explains what’s automated, what’s generated, and what’s reviewed.

Pipeline

Data flows in one direction.

Every page on KnowThePerch is built from structured data, not manual research. The pipeline runs:

Source APIs (eBird, IUCN, Xeno-canto) SQLite database Prose generation (Qwen3, local) PAGE_DATA assembly WordPress publication
Species Data

eBird taxonomy is the foundation.

The species database is seeded from the eBird taxonomy, maintained by the Cornell Lab of Ornithology and updated annually. It covers over 10,000 species worldwide. For each species we store: common name, scientific name, family, order, and eBird species code.

Observation Frequency

Real checklists from real birders.

Frequency percentages come from the eBird Bar Charts API. They represent the proportion of eBird checklists from a region in a given month that include a sighting.

Example

A frequency of 40% in June means 40% of observers in that region reported the bird that month. This reflects real-world patterns from millions of checklists — not estimates or predictions.

Conservation Status

IUCN Red List categories.

Conservation status is sourced from the IUCN API and displayed on every species profile. Five categories are used:

LC — Least Concern
Population stable, no significant threats identified
NT — Near Threatened
Close to qualifying for a threatened category
VU — Vulnerable
High risk of extinction in the wild
EN — Endangered
Very high risk of extinction in the wild
CR — Critically Endangered
Extremely high risk of extinction
Content Generation

How prose is generated and constrained.

Descriptive prose is generated by Qwen3 running locally. Each page type has a structured prompt template that injects verified data and requests specific sections.

The model is explicitly prohibited from:

Inventing numerical claims · Contradicting injected data · Manufacturing sightings or observer counts · Asserting confidence the source layer does not support

Generated sections are parsed, cached in SQLite, and reviewed before being pushed to WordPress.

Quality Gates

What a page must pass before publication.

Before publication, every programmatic page passes through automated checks:

Minimum word count per section met
All required data fields present (taxonomy, measurements, status)
No placeholder or template language in output
Prose does not contradict injected data fields
Internal links resolve to existing pages
IUCN status matches current API value
Update Frequency

When data refreshes.

Species taxonomy (eBird) Annually, with eBird taxonomy release
Conservation status (IUCN) When IUCN publishes revised assessments
Observation frequency (eBird) Quarterly
Audio recordings (Xeno-canto) On species profile creation or update