Example Notebooks

Ready-to-use Databricks notebooks demonstrating end-to-end TitanRDM SDK workflows. Import these directly into your Databricks workspace.

Available Notebooks

Notebook	Description	Download
SparkSync Example	Automated upload/download using `SparkSync`	databricks_spark_sync_example.py
Convention Sync Example	Manual convention-based sync with full control	databricks_sync_example.py
SDK System Tests	Comprehensive test of all SDK methods	databricks_system_tests.py

Importing Notebooks into Databricks

Download the .py file from the links above
In Databricks, navigate to Workspace
Click Import
Select the downloaded .py file
The file will be imported as a Databricks notebook automatically

These files use the Databricks notebook source format (# Databricks notebook source) and are recognised natively.

SparkSync Example

File: databricks_spark_sync_example.py

This notebook demonstrates the SparkSync class with four scenarios:

#	Direction	Scope
1	Upload	Entire domain (`Clinics`) — all deployed tables
2	Upload	Specific tables (`Sites`, `Delivery Centre`, `Org Unit`)
3	Download	Entire domain (`Clinics`) — all deployed tables
4	Download	Specific tables (`Sites`, `Delivery Centre`, `Org Unit`)

Key Concepts

Uses SparkSync for automatic catalog read/write
Configurable via Databricks widgets (branch_name, catalog, download_schema, upload_schema)
Credentials loaded from Databricks secret scope titan-rdm

Notebook Walkthrough

Setup:

from titan_rdm_sdk import TitanRDMClient
from titan_rdm_sdk.spark_sync import SparkSync

client = TitanRDMClient(
    url=dbutils.secrets.get(scope="titan-rdm", key="url"),
    client_id=dbutils.secrets.get(scope="titan-rdm", key="client_id"),
    client_secret=dbutils.secrets.get(scope="titan-rdm", key="client_secret"),
)

branch = client.get_branch_by_name("dev")
sync = SparkSync(client=client, spark=spark)

Upload entire domain:

upload_results = sync.upload_sync_by_convention(
    branch_id=branch.id,
    source_catalog="hive_metastore",
    source_schema="rdmout",
    target_domain_name="Clinics",
)

Download entire domain:

download_results = sync.download_sync_by_convention(
    branch_id=branch.id,
    target_catalog="hive_metastore",
    target_schema="rdmin",
    source_domain_name="Clinics",
)

Upload specific tables:

upload_results = sync.upload_sync_by_convention(
    branch_id=branch.id,
    source_catalog="hive_metastore",
    source_schema="rdmout",
    target_domain_name="Clinics",
    target_table_names=["Site", "Delivery Centre", "Org Unit"],
)

Convention Sync Example

File: databricks_sync_example.py

This notebook demonstrates the manual convention-based approach — giving you full control over the sync loop while still following the naming convention.

Key Concepts

Discovers all domains and deployed tables automatically
No hard-coded table lists — adding a table in TitanRDM includes it in the next sync
Manual control over the upload/download loop
Single import batch for all tables

Notebook Walkthrough

Discover metadata:

domains = client.get_domains()
sync_manifest = []

for domain in domains:
    tables = client.get_deployed_table_definitions(
        branch_id=branch.id,
        domain_id=domain.id,
    )
    for t in tables:
        sync_manifest.append((domain, t))

Upload all tables:

upload = client.get_upload(
    branch_id=branch.id,
    description="Convention sync upload",
    correlation_code="convention-upload",
)

for domain, table in sync_manifest:
    source_table = f"{CATALOG}.{UPLOAD_SCHEMA}.{domain.abbreviation}_{table.database_table_name}"
    source_df = spark.table(source_table).toPandas()

    import_mapping = client.get_default_import_mapping(table.id)
    table_upload = upload.get_table_upload(
        table_definition_key=table.key,
        import_mapping_key=import_mapping.key,
        pattern="full",
    )
    table_upload.send(source_df)

upload.complete(message="Convention sync upload completed")

Download all tables:

for domain, table in sync_manifest:
    target_table = f"{CATALOG}.{DOWNLOAD_SCHEMA}.{domain.abbreviation}_{table.database_table_name}"

    download = client.get_download(
        branch_id=branch.id,
        table_definition_key=table.key,
        pattern="full",
    )
    download.wait_until_ready(poll_interval=2.0, max_wait=300.0)
    df = download.receive()

    spark_df = spark.createDataFrame(df)
    spark_df.write.mode("overwrite").option("overwriteSchema", "true").saveAsTable(target_table)

SDK System Tests

File: databricks_system_tests.py

A comprehensive test notebook that exercises all SDK methods against a real TitanRDM instance. Use this to verify your environment is configured correctly.

Tests Included

Test	Description
1	Authentication
2	Full upload workflow (create batch → upload → complete)
3	Full download workflow (create export → wait → receive)
4	Incremental upload
5	Error handling (invalid IDs)
6	List branches
7	Get branch by name
8	Get branch by ID
9	List domains
10	Get domain by name and ID
11	List deployed table definitions
12	Get deployed table definition by key

Configuration Widgets

Widget	Default	Description
`branch_id`	`174`	Target branch for testing
`table_definition_key`	`100`	Table for upload/download tests
`import_mapping_key`	`10`	Mapping for upload tests
`test_csv_path`	`/dbfs/test_data/customers.csv`	Path to test CSV

Prerequisites (All Notebooks)

1. Secret Scope

databricks secrets create-scope --scope titan-rdm
databricks secrets put --scope titan-rdm --key url
databricks secrets put --scope titan-rdm --key client_id
databricks secrets put --scope titan-rdm --key client_secret

2. SDK Installation

%pip install titan-rdm-sdk

3. Schemas (for Sync Notebooks)

CREATE SCHEMA IF NOT EXISTS rdmin;
CREATE SCHEMA IF NOT EXISTS rdmout;

Next Steps

Spark Sync — Detailed SparkSync documentation
Convention Sync — Understanding the naming convention
SDK Client Methods — Full method reference
Getting Started — Installation and authentication