schemas
schemas
Schema definitions for 7 Parquet tables in data warehouse.
Functions
| Name | Description |
|---|---|
| create_metadata_dict | Create standard metadata dict with dataset_id, source_url, and ingested_at. |
| get_partition_cols | Get partition columns for a table. |
| get_schema | Get PyArrow schema for a table. |
| get_table_info | Get comprehensive info about table schema, partitions, and SCD2 status. |
| validate_dataframe_schema | Validate DataFrame matches expected schema. |
create_metadata_dict
schemas.create_metadata_dict(dataset_id: str, source_url: str)Create standard metadata dict with dataset_id, source_url, and ingested_at.
get_partition_cols
schemas.get_partition_cols(table_name: str)Get partition columns for a table.
Raises: ValueError: If table name not found
get_schema
schemas.get_schema(table_name: str)Get PyArrow schema for a table.
Raises: ValueError: If table name not found
get_table_info
schemas.get_table_info(table_name: str)Get comprehensive info about table schema, partitions, and SCD2 status.
validate_dataframe_schema
schemas.validate_dataframe_schema(df, table_name: str, strict: bool = False)Validate DataFrame matches expected schema.
Args: df: pandas DataFrame or PyArrow Table table_name: Table name (e.g., ‘silver.vol_production’) strict: If True, require exact column match
Returns: True if valid
Raises: ValueError: If schema invalid