Decarbonization Opportunity Analysis

Executive summary

NoteAt a glance
  • who: North American energy companies
    • this phase: 396 Alberta upstream and midstream operators (2023 Petrinex data)
  • what: identify which operators stand to benefit most from implementing carbon reduction strategies
  • why: rising carbon prices, regulation, and investor expectations turn emissions into a material financial exposure
  • how: operator level emissions and economics model on Petrinex 2023, BOE normalized intensities, Hamilton DAG pipeline designed to extend to GHGRP and NPRI
  • result: in the Alberta 2023 slice, about 9 Mt CO2e (around 25 percent of modeled emissions) are technically and economically addressable, worth well over 1 billion dollars per year at 2030 carbon prices

Alberta’s 2023 upstream and midstream emissions contain a 9 Mt CO2e reduction wedge that is both technically and economically attractive. Across 396 operators and about 780 million boe of production, this wedge represents roughly one quarter of modeled emissions and more than 1 billion dollars per year in avoided carbon charges at 2030 prices. A 500,000 tonne CO2e operator alone faces about 85 million dollars per year of carbon cost at 170 dollars per tonne. The only durable way to change that exposure is to change operations and investment.

This phase applies a consistent emissions and economics model to Alberta operators in 2023. It ranks operators by the financial benefit of decarbonization, based on emissions intensity, absolute scale, reduction pathways, and regulatory pressure. The pipeline is designed so that North American datasets such as GHGRP and NPRI can be added without altering the core structure, allowing the same approach to be reused at larger scale.

A follow-on 2022-2023 panel analysis uses the same machinery to look at realized behavior over time. It covers 798 operator-years (432 operators, 366 with both years). About 80 percent of these operators increased emissions as production grew, and the correlation between 2022 opportunity scores and realized 2022->2023 reductions is strongly negative (r ~ -0.73). High-opportunity operators grew emissions more because they are large and expanded output in a growth period. This validates that the scoring framework finds the right “where to look” operators, but also shows that in growth environments production dominates efficiency.

(a)
(b)
(c)
(d)
Figure 1: Top 10 operators ranked by investment opportunity, showing NPV, payback period, and reduction potential.
WarningScope note

Intensities in this phase are calculated from 2023 Petrinex reported operational emissions and production. They do not cover full lifecycle well to battery CO2 per boe. The focus is on emissions directly controlled by the operator. The 2022-2023 panel is used for descriptive validation and behavioral insight, not for production forecasting.

Context and objectives

The objective is to turn a vague North American decarbonization question into a concrete, data driven answer that can drive capital decisions. The brief is deliberately open ended: frame the problem, make defensible assumptions, build a solution using data and code, and connect it back to business value. The choice to start with Alberta 2023 reflects a tradeoff: depth and reliability in one jurisdiction before breadth.

Alberta is a good proving ground for three reasons:

  • it concentrates a large share of Canadian upstream production and oil sands output
  • it has a mix of asset types, including some of the highest intensity operations
  • Petrinex provides integrated facility month data for production, activities, and infrastructure

This allows a single pipeline to ingest raw data, build a medallion stack, and produce operator level metrics. The intent is not to stop at Alberta, but to ensure that the machinery is robust before it is pointed at a wider North American universe.

In this framing, an operator “benefits” from carbon reduction when it can reduce a material, policy driven cost using technologies that fit its asset base and capital cycle. Four factors matter:

  • emissions_intensity:
    • higher intensity than peers signals an efficiency gap
  • absolute_scale:
    • medium to high emissions ensure that reductions are financially meaningful
  • reduction_pathways:
    • proven technologies with short or moderate paybacks make projects fundable
  • external_pressure:
    • carbon prices, benchmarks, thresholds, and investor expectations create urgency

Operators where these four align are prioritized in the rankings and in the business discussion.

Data and pipeline

The data pipeline is a Hamilton based medallion stack that turns raw Petrinex tables into operator level decision metrics. It is structured to be simple to rerun and simple to extend.

Data flow:

graph LR
  P[Petrinex API] --> B[Bronze<br/>Raw parquet]
  B --> S[Silver<br/>Cleaned Facility Production]
  S --> G[Gold<br/>Emissions and Metrics]
  G --> A[Analysis<br/>Scenarios and Risk]
  A --> V[Viz<br/>Figures and Exports]

Scope and coverage:

  • geography:
    • Alberta only, via Petrinex
  • period:
    • January to December 2023 for the main ranking slice
    • 2022-2023 for the validation panel
  • coverage (2023 slice):
    • about 780 million boe of production
    • 396 operators
    • around 2,800 facilities
    • approximately 98 percent of volumes mapped to operators
  • coverage (2022-2023 panel):
    • 798 operator-years
    • 432 unique operators
    • 366 operators present in both years (used for realized reduction analysis)

Layer responsibilities:

  • bronze:
    • read raw Petrinex volumetric, NGL, and infrastructure files
    • write partitioned parquet under data/bronze
  • silver:
    • build facility month production and activity fact tables
    • compute NGL production and facility flow edges
    • maintain a facility dimension with SCD2 history and operator BAID
  • gold:
    • compute facility level emissions and intensity
    • aggregate to operator year emissions and intensities
    • calculate operator level decision metrics and views for visualization
  • analysis:
    • run scenario, clustering, and risk modules on Gold outputs
    • construct a multi-year operator panel for validation and ML experiments
  • viz:
    • produce PNG figures and tables from standardized views
    • include a descriptive “opportunity vs realized reduction” scatter from the 2022-2023 panel

Keys and grains:

  • facility_id:
    • derived from ReportingFacilityID and related fields
  • operator_baid and operator_name:
    • derived from Business Associate IDs and names
  • time:
    • production_month as a date, plus derived year and month
  • grains:
    • silver: facility_id x production_month
    • gold facility: facility_id x production_month
    • gold operator: operator_baid x year
    • panel: operator_baid x year across 2022-2023

The same keys will be used when GHGRP and NPRI are added, allowing new data sources to slot into the same pipeline.

Panel view and realized reductions (2022-2023)

In addition to the 2023 cross section, the pipeline builds an operator-year panel for 2022-2023. This panel is used to measure realized behavior, not just modeled potential.

Panel construction:

  • base:
    • operator level emissions and decision metrics for 2022 and 2023
  • panel:
    • stack both years into a single (operator_baid, year) table
  • labels:
    • for each operator and base year t, compute
      • E_total_t and E_total_t_plus_1
      • realized_reduction_t = E_total_t - E_total_t_plus_1
    • positive values = emissions fell
    • negative values = emissions rose

High level results:

  • 366 operators have data in both 2022 and 2023
  • 80 percent increased emissions 2022->2023 (negative realized_reduction_t)
  • 20 percent reduced emissions
  • about 45 percent improved emissions intensity, but production growth swamped those gains in absolute terms
  • correlation between 2022 opportunity_score and realized_reduction_t is about -0.73
    • high-opportunity operators increased emissions more in this growth period
    • they are large oil sands and large portfolio players expanding production

The pipeline also maintains an offline ML path that can train panel-based models on this dataset. Training happens in separate scripts that write predictions and diagnostics to parquet and JSON. The Hamilton DAG then optionally loads those predictions through a single node and degrades gracefully when they are absent. In this report, all findings are based on descriptive statistics and correlations, not on ML forecasts.

Emissions model and metrics

Emissions are calculated bottom up from facility activities into eight components, normalized by production, and then aggregated to the operator level. The model is transparent and uses factors that are explicit in code.

Facility level:

  • inputs per facility_month:
    • production: gas, oil, condensate, water
    • gas activities: FLARE, VENT, FUEL volumes for gas
    • throughput: gas volumes produced and received
    • drivers: NGL mix and steam volumes where applicable
  • components:
    • E_vent, E_flare, E_fuel, E_proc, E_oil, E_water, E_steam, E_fug
  • core formula:
    • E_total = sum(Activity_i * EF_i) across the eight components

Conversion and intensity:

  • gas:
    • convert e3m3 to m3, then to BOE using standard factors
  • oil and condensate:
    • convert m3 to BOE using regulator factors
  • NGL:
    • convert components using energy based factors to BOE
  • total_boe:
    • sum across all products
  • intensity:
    • intensity_kg_per_boe = (E_total * 1000) / total_boe
    • intensity null for very small volumes
    • extreme values clipped for plotting; original intensity and clip flags kept for analysis

Operator level:

  • group facility_aggregated by operator_baid and year
  • sum emissions and total_boe
  • recompute intensity_kg_per_boe from aggregated values

Core parameters used in the model include:

  • CH4 global warming potential:
    • 28 x CO2 on a 100 year basis (IPCC AR5)
  • discount rate:
    • 10 percent for NPV calculations
  • carbon price path:
    • about 65 dollars per tonne in 2023, rising to 170 dollars per tonne in 2030

Decision metrics derived per operator_year:

  • npv_mm:
    • NPV of modeled reduction projects given the discount rate and price path
  • reduction_potential_kt:
    • addressable emissions in kilotonnes based on source mix and assumed abatement fractions
  • regulatory_risk_score:
    • composite of emissions scale, intensity, and venting/flaring relative to thresholds
  • payback_years:
    • simple payback = CAPEX / annual savings

Composite scores:

  • investment_score:
    • based on the ratio of NPV to CAPEX, normalized to a 0-100 range
  • benefit_score:
    • uses a four part weighting:
      • intensity contribution: 35 percent
      • scale contribution: 25 percent
      • financial contribution: 30 percent
      • regulatory contribution: 10 percent
  • opportunity_score:
    • used in a subset of decision views with a 40 / 40 / 20 weighting for opportunity size, economics, and regulatory pressure

The 2022-2023 panel reuses these same metrics and scores. The realized reduction analysis then asks a behavioral question: given these scores in 2022, which operators actually reduced emissions in 2023, and which increased them?

Clustering and operator archetypes

Segmentation uses features that capture both emissions scale and emissions structure, not size alone. The goal is to group operators into intuitive archetypes that share similar profiles and levers.

Features used for clustering include:

  • production_boe
  • intensity_kgco2e_per_boe
  • gas_pct
  • flare_rate
  • vent_rate
  • flare_share
  • water_cut
  • ngl_intensity
  • facility_count

These features are standardized and fed into a K means algorithm with an adaptive choice of cluster count. From the resulting centroids, rule based labels are assigned, such as:

  • high_intensity_sagd
  • large_sagd_producer
  • multi_facility_conventional
  • high_venting_conventional
  • large_gas_producer
  • portfolio_operator
  • thermal_heavy_oil
  • growth_stage_operator

Representative metrics for these archetypes are:

archetype emissions_kt intensity_kg_per_boe benefit_score npv_mm payback_years addressable_pct
high_intensity_sagd 450 70-90 88 65 3.2 35
large_sagd_producer 380 70-80 82 52 3.8 32
multi_facility_conventional 220 30-40 79 38 2.9 42
high_venting_conventional 185 40-50 81 28 2.4 51
large_gas_producer 160 20-30 71 22 4.2 28
portfolio_operator 145 30-40 76 19 3.5 38
thermal_heavy_oil 125 60-70 74 16 4.8 29
growth_stage_operator 95 35-45 68 12 3.1 44

Patterns:

  • high_intensity_sagd and large_sagd_producer:
    • high intensity and high scale
    • carbon costs in the tens of millions per year per operator at 2030 prices
    • main levers: steam and power (cogeneration, steam optimization, waste heat recovery)
  • multi_facility_conventional and high_venting_conventional:
    • moderate to high intensity across many facilities
    • main levers: standardized vent, flare, and fuel programs
  • growth_stage_operator:
    • mid sized and growing, often near thresholds
    • main levers: intensity management, sequencing of projects and growth

For high venting clusters, venting and flaring account for a large share of emissions:

  • sector level:
    • vent and flare total around 10-15 percent of modeled emissions
  • high vent clusters:
    • vent and flare can account for 30-50 percent of emissions

Venting and flaring projects typically sit in the 60-140 dollars per tonne MAC range, with paybacks around 2-3 years. A typical VRU project on a 5 MMcf/d stream has a payback of roughly 1.5-2 years and a positive NPV at current price paths.

Risk, policy, and scenarios

Most of the value in this analysis is driven by policy and price, not sentiment, so scenarios around carbon and regulation are central. The model treats the policy environment explicitly.

Carbon price base path:

  • around 65 dollars per tonne in 2023
  • around 170 dollars per tonne by 2030

Under this path, a 300 kt/year operator faces gross carbon charges growing from roughly 20 million dollars per year to 50 million dollars per year. Alberta’s TIER regime defines how much of this manifests as net cost versus credit generation.

Thresholds and benchmarks:

  • below 100 kt:
    • lighter regime; reporting and compliance are limited
  • between 100 and 500 kt:
    • mandatory reporting and credit obligations
  • above 500 kt:
    • larger compliance footprint and higher scrutiny
  • illustrative benchmark paths:
    • conventional: mid 30s kg/boe trending to mid 20s
    • SAGD: high 70s trending to low 60s
    • gas: high teens trending to mid teens

Scenario set:

  • baseline:
    • current carbon price path and regulatory pressure
  • accelerated:
    • steeper price path and stronger enforcement
  • delayed:
    • softer price path and slower implementation
  • regulatory_shock:
    • baseline price path plus additional compliance cost and higher pressure

For each scenario, the model recomputes NPV, MAC, payback, and composite scores, then re ranks operators. Operators that retain high ranks across scenarios are robust targets. Operators whose position improves mainly under higher prices or stronger regulation are more speculative.

A risk return scatter uses npv_mm and regulatory_risk_score, with bubble size scaled by total emissions and color by investment_score. It highlights:

  • high_value_low_risk:
    • obvious early candidates
  • high_value_high_risk:
    • high potential but sensitive to policy or execution
  • low_value_low_risk:
    • opportunistic
  • low_value_high_risk:
    • low priority

Additional penalties for data quality, model uncertainty, and operational concentration adjust scores to reflect execution risk. The same scenario and risk logic can be applied once GHGRP and NPRI data are wired in.

Behavioral evidence from the 2022-2023 panel

The 2022-2023 panel links the scoring framework back to actual operator behavior. It does not change the 2023 ranking slice, but it sharpens how to use those rankings.

Key observations:

  • 366 operators have both 2022 and 2023 data
  • about 80 percent increased emissions in absolute terms
  • about 20 percent reduced emissions
  • roughly 45 percent improved intensity, but production growth dominated absolute outcomes
  • correlation between 2022 opportunity_score and realized_reduction_t is about -0.73
    • high-opportunity operators increased emissions more during this growth period
    • they grew production faster and from a higher base

Within this panel three behavioral segments show up clearly:

  • prime_targets:
    • high opportunity scores and actual reductions
    • skew to gas storage, midstream, and gas-heavy operators
    • proof points that large-scale reduction is achievable where assets and timing line up
  • hidden_gems:
    • low to mid opportunity scores but actual reductions
    • mainly smaller operators
    • may indicate low cost operational moves, divestments, or flexible field programs
  • all_talk_majors:
    • very high opportunity scores and very large emission increases
    • large oil sands and portfolio operators who grew 30-140 percent in production
    • they dominate the increase in absolute emissions over 2022-2023

The conclusion is simple:

  • the physics- and economics-based scoring framework correctly points at the big, inefficient operators
  • in a growth environment those same operators will often increase absolute emissions unless growth is explicitly constrained
  • behavioral evidence from the panel should be used to shape how engagement is framed:
    • majors: intensity and project portfolios, not “absolute 2030 cuts” in isolation
    • prime_targets and hidden_gems: early case studies and commercial wins

Any ML models trained on this panel are treated as exploratory. They run offline, write predictions and feature importance to parquet, and are loaded into the DAG only when needed. The descriptive panel and simple correlations already provide a strong reality check on the scoring logic.

Business implications and next steps

The analysis shows that decarbonization is a concentrated opportunity: a few operator types and a few emissions sources drive most of the value. In Alberta 2023, roughly 36 Mt CO2e of upstream and midstream emissions include about 9 Mt of addressable reductions. At 2030 prices, this wedge is worth over 1 billion dollars per year before fuel savings and recovered gas.

The 2022-2023 panel adds one important nuance: most of the recent emission increases came from a small set of large oil sands and portfolio operators who expanded production. At the same time, a mix of gas storage, midstream, and smaller conventional operators actually reduced emissions. These are natural early proof points, even if the strategic prize sits with the majors.

Priority operator profiles:

  • high_intensity_thermal:
    • large SAGD and thermal heavy oil operators
    • main levers: steam and power projects that reduce 20-30 percent of emissions on positive economics
  • large_conventional_portfolios:
    • multi facility conventional operators
    • main levers: standardized vent, flare, and fuel programs with short paybacks
  • growth_stage_near_threshold:
    • mid sized operators close to thresholds
    • main levers: design and timing of projects to influence benchmark and threshold outcomes
  • prime_targets and hidden_gems from the panel:
    • operators that already reduced between 2022 and 2023
    • main levers: scale what worked, codify patterns into repeatable playbooks

Recommended actions for a decarbonization partner:

  • target high vent and high flare sites first:
    • use venting_reduction_potential_kt and flaring_reduction_potential_kt to build a ranked facility list
    • propose VRUs, flare gas recovery, and simple fuel efficiency projects with paybacks under three years
  • use intensity as the anchor metric in discussions:
    • show where each operator sits versus peer quartiles
    • translate intensity gaps into implied carbon costs at 2030 prices
  • frame programs as portfolios:
    • assemble sequences of small, fast projects and larger, strategic ones
    • show how early wins can fund or de risk later stages
  • connect to thresholds and credits:
    • quantify how projects alter reporting status and credit positions
    • for efficient operators, highlight the potential to move into net credit generation
  • use panel evidence to set expectations:
    • with growth, absolute emissions will often rise even as intensity improves
    • focus conversations with majors on “intensity plus growth guardrails” rather than pure absolute targets in isolation

Interactive Visualizations

The following interactive visualizations allow you to explore the data in detail. Hover over points to see operator names and key metrics, and use the interactive features to filter and examine the data.

Opportunity vs Realized Reduction

(a)
(b)
(c)
Figure 2: Comparison of opportunity scores against realized emissions reductions (2022->2023). Operators above the zero line reduced emissions; those below increased emissions.

Regulatory Threshold Analysis

(a)
(b)
(c)
Figure 3: Operators positioned relative to regulatory thresholds (100 kt and 500 kt CO2e). Operators above thresholds face additional compliance requirements.

Risk-Return Scatter

(a)
(b)
(c)
Figure 4: Risk-return analysis showing NPV vs regulatory risk score. Bubble size represents total emissions. Operators in the upper left (high NPV, low risk) are prime targets.

Production vs Intensity

(a)
(b)
Figure 5: Production volume vs emissions intensity. Operators are colored by archetype to show different operational profiles.
Figure 6: Top 20 operators ranked by venting and flaring reduction opportunities. Color intensity represents total reduction potential.
Figure 7: Production vs emissions intensity efficiency frontier. Operators in the lower right (high production, low intensity) are most efficient.
Figure 8: Operators ranked by emissions intensity (lowest to highest). Lower intensity indicates better efficiency.
Figure 9: Stacked decomposition of emissions by source for top operators. Shows the relative contribution of venting, flaring, fuel, processing, and other sources.

From a development point of view, this phase demonstrates that the Hamilton + medallion pipeline, emissions model, decision metrics, and 2-year panel cohere and can be run end to end on real data. The natural next step is to attach additional years and jurisdictions and let the same machinery answer the original North American question at full scale. More years will also turn the current exploratory ML work into a robust forecasting layer that can sit on top of the physics-based core, rather than replace it.

Pipeline architecture at a glance

Note

This section shows the core DAG layers that drive the analysis pipeline: 1. Viz layer (top): Decision outputs including rankings, scenarios, and visualizations 2. Gold emissions (bottom left): Facility emissions → operator-level emissions 3. Gold rankings (bottom right): Baseline rankings, composite scoring, and comparison logic

Figure 10: Visualization layer showing decision outputs, rankings, scenarios, and interactive figures.
Figure 11: Gold emissions layer from facilities with emissions to operator-level emissions.
Figure 12: Gold rankings layer showing baseline rankings, composite rankings, and comparison.