- area: data loading
- issue: missing bronze data
- symptom: FileNotFoundError, empty
data/bronze/
- fix: run
validere bootstrap --year <YEAR>; check data/bronze/; run validere doctor
- issue: missing columns
- symptom: KeyError on column name
- fix: compare raw schema to expectations; check name and case; update
validere/schemas.py
- area: data quality
- issue: zero or negative production
- symptom: warnings, unexpected zeros
- fix: inspect source for sign or unit issues; confirm inactive facilities; review cleaning rules
- issue: missing facility links
- symptom: low well to facility match rate
- fix: check linkage fields and IDs; refresh infrastructure files; align ID formats
- issue: duplicate records
- symptom: duplicate facility month rows
- fix: inspect raw data; confirm silver dedupe keys; adjust grouping if needed
- area: validation
- issue: schema mismatch
- symptom:
pyarrow.lib.ArrowInvalid
- fix: align types with
validere/schemas.py; ensure required columns exist
- issue: emissions nan or zero
- symptom: NaN or zero emissions with production
- fix: verify FLARE, VENT, FUEL extraction; facility type logic; factors in
domain/emissions/factors.py; units
- area: performance
- issue: slow pipeline
- symptom: long runtime relative to dataset size
- fix: check RAM and CPU; keep partitioning on; test smaller date range; profile slow steps
- issue: out of memory
- symptom: MemoryError, frozen machine
- fix: process by year or quarter; use more RAM; prefer Polars for large joins; drop unused intermediates
- area: clustering
- issue: too few samples
- symptom: error that samples are fewer than clusters
- fix: relax filters; confirm input size; lower cluster count for small samples
- issue: poor clusters
- symptom: groups not meaningful
- fix: review features in
layers/gold/features.py; standardize features; handle outliers; tune cluster parameters
- area: configuration
- issue: carbon price not applied
- symptom: financial score off versus expected
- fix: check carbon price in
validere/config.py; env overrides; logic in domain/emissions/metrics.py
- issue: emission factors stale
- symptom: factor edits not reflected
- fix: update
domain/emissions/factors.py; rerun from factor stage; avoid cached outputs
- area: outputs
- issue: missing files
- symptom: no parquet in
data/gold/, missing plots
- fix: run
validere doctor; confirm pipeline success; include viz targets; ensure output dir is writable
- issue: unexpected rankings
- symptom: rankings look incorrect
- fix: review weights in
layers/gold/rankings.py; check score components; intensity percentiles; aggregation logic
- area: environment
- issue: dependency install failure
- symptom: ModuleNotFoundError, install errors
- fix:
pixi install; verify Python version; review pixi.toml; pixi clean && pixi install
- issue: graphviz missing
- symptom: graph rendering errors
- fix: install Graphviz via system package manager (for example,
brew install graphviz)
- issue: quarto render error
- symptom: docs or diagrams not rendering
- fix:
quarto --version; check _quarto.yml; fix .qmd front matter; read Quarto logs
- Code documentation: See API Reference for API details
- Technical details: See Technical Handoff for implementation specifics