infrastructure.parquet_writer
infrastructure.parquet_writer
Write DataFrames to partitioned Parquet files following Gold schema.
Classes
| Name | Description |
|---|---|
| ParquetWriter | Writer for partitioned Parquet tables following Gold schema. |
ParquetWriter
infrastructure.parquet_writer.ParquetWriter(base_dir: str | Path = 'data')Writer for partitioned Parquet tables following Gold schema.
Methods
| Name | Description |
|---|---|
| read_table | Read Parquet table back into DataFrame. |
| write_scd2_dimension | Write SCD2 dimension table with historical tracking. |
| write_table | Write DataFrame to partitioned Parquet table. |
read_table
infrastructure.parquet_writer.ParquetWriter.read_table(
table_name: str,
filters: list | None = None,
columns: list[str] | None = None,
)Read Parquet table back into DataFrame.
write_scd2_dimension
infrastructure.parquet_writer.ParquetWriter.write_scd2_dimension(
df: pd.DataFrame,
table_name: str,
key_columns: list[str],
attribute_columns: list[str],
effective_date: datetime | None = None,
)Write SCD2 dimension table with historical tracking.
write_table
infrastructure.parquet_writer.ParquetWriter.write_table(
df: pd.DataFrame,
table_name: str,
mode: str = 'overwrite',
validate_schema: bool = True,
metadata: dict | None = None,
)Write DataFrame to partitioned Parquet table.
Args: df: DataFrame to write table_name: Table name (e.g., ‘silver.vol_production’) mode: Write mode (‘overwrite’ or ‘append’) validate_schema: Whether to validate schema metadata: Optional metadata dict
Returns: Path to output directory
Functions
| Name | Description |
|---|---|
| write_dim_facility | Write silver.dim_facility SCD2 dimension. |
| write_emission_attribution | Write gold.emission_attribution table. |
| write_facility_aggregated | Write gold.facility_aggregated table. |
| write_facility_edges | Write silver.facility_edges_monthly table. |
| write_ngl_production | Write silver.ngl_production table. |
| write_production_monthly | Write silver.production_monthly table. |
| write_vol_production | Write silver.vol_production table. |
write_dim_facility
infrastructure.parquet_writer.write_dim_facility(
df: pd.DataFrame,
base_dir: str = 'data',
key_columns: list[str] | None = None,
attribute_columns: list[str] | None = None,
)Write silver.dim_facility SCD2 dimension.
write_emission_attribution
infrastructure.parquet_writer.write_emission_attribution(
df: pd.DataFrame,
base_dir: str = 'data',
mode: str = 'overwrite',
)Write gold.emission_attribution table.
write_facility_aggregated
infrastructure.parquet_writer.write_facility_aggregated(
df: pd.DataFrame,
base_dir: str = 'data',
mode: str = 'overwrite',
)Write gold.facility_aggregated table.
write_facility_edges
infrastructure.parquet_writer.write_facility_edges(
df: pd.DataFrame,
base_dir: str = 'data',
mode: str = 'overwrite',
)Write silver.facility_edges_monthly table.
write_ngl_production
infrastructure.parquet_writer.write_ngl_production(
df: pd.DataFrame,
base_dir: str = 'data',
mode: str = 'overwrite',
)Write silver.ngl_production table.
write_production_monthly
infrastructure.parquet_writer.write_production_monthly(
df: pd.DataFrame,
base_dir: str = 'data',
mode: str = 'overwrite',
)Write silver.production_monthly table.
write_vol_production
infrastructure.parquet_writer.write_vol_production(
df: pd.DataFrame,
base_dir: str = 'data',
mode: str = 'overwrite',
)Write silver.vol_production table.