schemas

schemas

Schema definitions for 7 Parquet tables in data warehouse.

Functions

Name Description
create_metadata_dict Create standard metadata dict with dataset_id, source_url, and ingested_at.
get_partition_cols Get partition columns for a table.
get_schema Get PyArrow schema for a table.
get_table_info Get comprehensive info about table schema, partitions, and SCD2 status.
validate_dataframe_schema Validate DataFrame matches expected schema.

create_metadata_dict

schemas.create_metadata_dict(dataset_id: str, source_url: str)

Create standard metadata dict with dataset_id, source_url, and ingested_at.

get_partition_cols

schemas.get_partition_cols(table_name: str)

Get partition columns for a table.

Raises: ValueError: If table name not found

get_schema

schemas.get_schema(table_name: str)

Get PyArrow schema for a table.

Raises: ValueError: If table name not found

get_table_info

schemas.get_table_info(table_name: str)

Get comprehensive info about table schema, partitions, and SCD2 status.

validate_dataframe_schema

schemas.validate_dataframe_schema(df, table_name: str, strict: bool = False)

Validate DataFrame matches expected schema.

Args: df: pandas DataFrame or PyArrow Table table_name: Table name (e.g., ‘silver.vol_production’) strict: If True, require exact column match

Returns: True if valid

Raises: ValueError: If schema invalid