infrastructure.parquet_writer

infrastructure.parquet_writer

Write DataFrames to partitioned Parquet files following Gold schema.

Classes

Name Description
ParquetWriter Writer for partitioned Parquet tables following Gold schema.

ParquetWriter

infrastructure.parquet_writer.ParquetWriter(base_dir: str | Path = 'data')

Writer for partitioned Parquet tables following Gold schema.

Methods

Name Description
read_table Read Parquet table back into DataFrame.
write_scd2_dimension Write SCD2 dimension table with historical tracking.
write_table Write DataFrame to partitioned Parquet table.
read_table
infrastructure.parquet_writer.ParquetWriter.read_table(
    table_name: str,
    filters: list | None = None,
    columns: list[str] | None = None,
)

Read Parquet table back into DataFrame.

write_scd2_dimension
infrastructure.parquet_writer.ParquetWriter.write_scd2_dimension(
    df: pd.DataFrame,
    table_name: str,
    key_columns: list[str],
    attribute_columns: list[str],
    effective_date: datetime | None = None,
)

Write SCD2 dimension table with historical tracking.

write_table
infrastructure.parquet_writer.ParquetWriter.write_table(
    df: pd.DataFrame,
    table_name: str,
    mode: str = 'overwrite',
    validate_schema: bool = True,
    metadata: dict | None = None,
)

Write DataFrame to partitioned Parquet table.

Args: df: DataFrame to write table_name: Table name (e.g., ‘silver.vol_production’) mode: Write mode (‘overwrite’ or ‘append’) validate_schema: Whether to validate schema metadata: Optional metadata dict

Returns: Path to output directory

Functions

Name Description
write_dim_facility Write silver.dim_facility SCD2 dimension.
write_emission_attribution Write gold.emission_attribution table.
write_facility_aggregated Write gold.facility_aggregated table.
write_facility_edges Write silver.facility_edges_monthly table.
write_ngl_production Write silver.ngl_production table.
write_production_monthly Write silver.production_monthly table.
write_vol_production Write silver.vol_production table.

write_dim_facility

infrastructure.parquet_writer.write_dim_facility(
    df: pd.DataFrame,
    base_dir: str = 'data',
    key_columns: list[str] | None = None,
    attribute_columns: list[str] | None = None,
)

Write silver.dim_facility SCD2 dimension.

write_emission_attribution

infrastructure.parquet_writer.write_emission_attribution(
    df: pd.DataFrame,
    base_dir: str = 'data',
    mode: str = 'overwrite',
)

Write gold.emission_attribution table.

write_facility_aggregated

infrastructure.parquet_writer.write_facility_aggregated(
    df: pd.DataFrame,
    base_dir: str = 'data',
    mode: str = 'overwrite',
)

Write gold.facility_aggregated table.

write_facility_edges

infrastructure.parquet_writer.write_facility_edges(
    df: pd.DataFrame,
    base_dir: str = 'data',
    mode: str = 'overwrite',
)

Write silver.facility_edges_monthly table.

write_ngl_production

infrastructure.parquet_writer.write_ngl_production(
    df: pd.DataFrame,
    base_dir: str = 'data',
    mode: str = 'overwrite',
)

Write silver.ngl_production table.

write_production_monthly

infrastructure.parquet_writer.write_production_monthly(
    df: pd.DataFrame,
    base_dir: str = 'data',
    mode: str = 'overwrite',
)

Write silver.production_monthly table.

write_vol_production

infrastructure.parquet_writer.write_vol_production(
    df: pd.DataFrame,
    base_dir: str = 'data',
    mode: str = 'overwrite',
)

Write silver.vol_production table.