Methodology

How every number reaches the page.

The data pipeline from source APIs through SQLite to published pages. This page explains what’s automated, what’s generated, and what’s reviewed.

Every page on KnowThePerch is built from structured data, not manual research. The pipeline runs:

Source APIs (eBird, IUCN, Xeno-canto) → SQLite database → Prose generation (Qwen3, local) → PAGE_DATA assembly → WordPress publication

The species database is seeded from the eBird taxonomy, maintained by the Cornell Lab of Ornithology and updated annually. It covers over 10,000 species worldwide. For each species we store: common name, scientific name, family, order, and eBird species code.

Frequency percentages come from the eBird Bar Charts API. They represent the proportion of eBird checklists from a region in a given month that include a sighting.

Example

A frequency of 40% in June means 40% of observers in that region reported the bird that month. This reflects real-world patterns from millions of checklists — not estimates or predictions.

Conservation status is sourced from the IUCN API and displayed on every species profile. Five categories are used:

LC — Least Concern

Population stable, no significant threats identified

NT — Near Threatened

Close to qualifying for a threatened category

VU — Vulnerable

High risk of extinction in the wild

EN — Endangered

Very high risk of extinction in the wild

CR — Critically Endangered

Extremely high risk of extinction

Descriptive prose is generated by Qwen3 running locally. Each page type has a structured prompt template that injects verified data and requests specific sections.

The model is explicitly prohibited from:

Inventing numerical claims · Contradicting injected data · Manufacturing sightings or observer counts · Asserting confidence the source layer does not support

Generated sections are parsed, cached in SQLite, and reviewed before being pushed to WordPress.

Before publication, every programmatic page passes through automated checks:

Minimum word count per section met
All required data fields present (taxonomy, measurements, status)
No placeholder or template language in output
Prose does not contradict injected data fields
Internal links resolve to existing pages
IUCN status matches current API value

Species taxonomy (eBird) Annually, with eBird taxonomy release

Conservation status (IUCN) When IUCN publishes revised assessments

Observation frequency (eBird) Quarterly

Audio recordings (Xeno-canto) On species profile creation or update

← Back to About · Editorial Policy →

How every number reaches the page.

Data flows in one direction.

eBird taxonomy is the foundation.

Real checklists from real birders.

IUCN Red List categories.

How prose is generated and constrained.

What a page must pass before publication.

When data refreshes.