Example Notebooks

Ready-to-use Databricks notebooks demonstrating end-to-end TitanRDM SDK workflows. Import these directly into your Databricks workspace.


Available Notebooks

NotebookDescriptionDownload
SparkSync ExampleAutomated upload/download using SparkSyncdatabricks_spark_sync_example.py
Convention Sync ExampleManual convention-based sync with full controldatabricks_sync_example.py
SDK System TestsComprehensive test of all SDK methodsdatabricks_system_tests.py

Importing Notebooks into Databricks

  1. Download the .py file from the links above
  2. In Databricks, navigate to Workspace
  3. Click Import
  4. Select the downloaded .py file
  5. The file will be imported as a Databricks notebook automatically

These files use the Databricks notebook source format (# Databricks notebook source) and are recognised natively.


SparkSync Example

File: databricks_spark_sync_example.py

This notebook demonstrates the SparkSync class with four scenarios:

#DirectionScope
1UploadEntire domain (Clinics) — all deployed tables
2UploadSpecific tables (Sites, Delivery Centre, Org Unit)
3DownloadEntire domain (Clinics) — all deployed tables
4DownloadSpecific tables (Sites, Delivery Centre, Org Unit)

Key Concepts

  • Uses SparkSync for automatic catalog read/write
  • Configurable via Databricks widgets (branch_name, catalog, download_schema, upload_schema)
  • Credentials loaded from Databricks secret scope titan-rdm

Notebook Walkthrough

Setup:

from titan_rdm_sdk import TitanRDMClient
from titan_rdm_sdk.spark_sync import SparkSync

client = TitanRDMClient(
    url=dbutils.secrets.get(scope="titan-rdm", key="url"),
    client_id=dbutils.secrets.get(scope="titan-rdm", key="client_id"),
    client_secret=dbutils.secrets.get(scope="titan-rdm", key="client_secret"),
)

branch = client.get_branch_by_name("dev")
sync = SparkSync(client=client, spark=spark)

Upload entire domain:

upload_results = sync.upload_sync_by_convention(
    branch_id=branch.id,
    source_catalog="hive_metastore",
    source_schema="rdmout",
    target_domain_name="Clinics",
)

Download entire domain:

download_results = sync.download_sync_by_convention(
    branch_id=branch.id,
    target_catalog="hive_metastore",
    target_schema="rdmin",
    source_domain_name="Clinics",
)

Upload specific tables:

upload_results = sync.upload_sync_by_convention(
    branch_id=branch.id,
    source_catalog="hive_metastore",
    source_schema="rdmout",
    target_domain_name="Clinics",
    target_table_names=["Site", "Delivery Centre", "Org Unit"],
)

Convention Sync Example

File: databricks_sync_example.py

This notebook demonstrates the manual convention-based approach — giving you full control over the sync loop while still following the naming convention.

Key Concepts

  • Discovers all domains and deployed tables automatically
  • No hard-coded table lists — adding a table in TitanRDM includes it in the next sync
  • Manual control over the upload/download loop
  • Single import batch for all tables

Notebook Walkthrough

Discover metadata:

domains = client.get_domains()
sync_manifest = []

for domain in domains:
    tables = client.get_deployed_table_definitions(
        branch_id=branch.id,
        domain_id=domain.id,
    )
    for t in tables:
        sync_manifest.append((domain, t))

Upload all tables:

upload = client.get_upload(
    branch_id=branch.id,
    description="Convention sync upload",
    correlation_code="convention-upload",
)

for domain, table in sync_manifest:
    source_table = f"{CATALOG}.{UPLOAD_SCHEMA}.{domain.abbreviation}_{table.database_table_name}"
    source_df = spark.table(source_table).toPandas()

    import_mapping = client.get_default_import_mapping(table.id)
    table_upload = upload.get_table_upload(
        table_definition_key=table.key,
        import_mapping_key=import_mapping.key,
        pattern="full",
    )
    table_upload.send(source_df)

upload.complete(message="Convention sync upload completed")

Download all tables:

for domain, table in sync_manifest:
    target_table = f"{CATALOG}.{DOWNLOAD_SCHEMA}.{domain.abbreviation}_{table.database_table_name}"

    download = client.get_download(
        branch_id=branch.id,
        table_definition_key=table.key,
        pattern="full",
    )
    download.wait_until_ready(poll_interval=2.0, max_wait=300.0)
    df = download.receive()

    spark_df = spark.createDataFrame(df)
    spark_df.write.mode("overwrite").option("overwriteSchema", "true").saveAsTable(target_table)

SDK System Tests

File: databricks_system_tests.py

A comprehensive test notebook that exercises all SDK methods against a real TitanRDM instance. Use this to verify your environment is configured correctly.

Tests Included

TestDescription
1Authentication
2Full upload workflow (create batch → upload → complete)
3Full download workflow (create export → wait → receive)
4Incremental upload
5Error handling (invalid IDs)
6List branches
7Get branch by name
8Get branch by ID
9List domains
10Get domain by name and ID
11List deployed table definitions
12Get deployed table definition by key

Configuration Widgets

WidgetDefaultDescription
branch_id174Target branch for testing
table_definition_key100Table for upload/download tests
import_mapping_key10Mapping for upload tests
test_csv_path/dbfs/test_data/customers.csvPath to test CSV

Prerequisites (All Notebooks)

1. Secret Scope

databricks secrets create-scope --scope titan-rdm
databricks secrets put --scope titan-rdm --key url
databricks secrets put --scope titan-rdm --key client_id
databricks secrets put --scope titan-rdm --key client_secret

2. SDK Installation

%pip install titan-rdm-sdk

3. Schemas (for Sync Notebooks)

CREATE SCHEMA IF NOT EXISTS rdmin;
CREATE SCHEMA IF NOT EXISTS rdmout;

Next Steps