Skip to content
Tutorials>Azure Synapse Analytics>Part 5 — Upload & Develop Pattern in MKW

Part 5 — Upload & Develop Pattern in MKW

Now that you have created the support files in Synapse, it is time to upload this metadata to MetaKraftwerk. The support files can be uploaded via the menu by selecting Upload Pattern. You have to select your pattern in the sidebar to see the menu.

A Notebook file in MetaKraftwerk is not static; it can be dynamically defined by combining multiple metadata expressions and reusable expressions. These reusable expressions are modular code or logic blocks that can be flexibly assembled to create complex files tailored to your specific requirements.

The parameters or column information for each Pattern can be configured using the Functional Roles Filter and Instance Property settings. By applying these filters, you can precisely control which parameters or columns are included in your Notebook and how they are mapped.

When you create a new instance of a Pattern, MetaKraftwerk automatically generates a new Notebook for that instance, populated with its own specific values. This ensures that each instance operates with the correct configuration and data, supporting scalable and repeatable integration processes.

Summary of Key Steps:

  1. Upload support files via the menu Upload Pattern.
  2. Define Notebook files by combining multiple reusable expressions.
  3. Configure parameters/columns using Functional Roles Filter and Instance Property.
  4. Each Pattern instance will generate a new Notebook with its own values. This approach provides flexibility, reusability, and automation for managing complex data integration scenarios in MetaKraftwerk.

1. Upload Pipeline Support File

The first step will be to upload our exported support files to the pattern in MetaKraftwerk. Please click on the pattern in the sidebar. Then click on Upload Pattern .

These files represent a base pattern — meaning they are not yet parameterized. You will need to configure them step by step to turn this pattern into a dynamic one, capable of generating the appropriate notebooks for different instances.

Note:

You can also see a complete list of downloads in Prerequisites.

2. Define Names for Dataflow and Pipeline

For notebooks and pipelines to be generated, the name of the new objects must be defined. This can be done using the naming editor. To open it, you first have to select the folder containing the objects in the sidebar on the left—in this case, the SYNAPSE_TUTORIAL folder. Then you click on one of the objects to open the naming editor. In the editor, you can now define a naming rule consisting of a prefix, instance name, and suffix. The prefix and suffix can be changed by the developer. The name of the instance for which the object is being created is entered in INSTANCE_NAME during object creation.

Assign clear, consistent names to your Dataflow (Notebook) and Pipeline components. This helps maintain traceability. For example, name the Dataflow: NB_LDG_2_BRONZE_INSTANCE_NAME, Pipeline: PL_LDG_2_BRONZE_INSTANCE_NAME.

You can later reference the naming rule of the data flow and pipeline in files or reusable expressions. You will see it in the right side of the file editor under References -> Names.

3. Configure the Notebook File (Parameterization & Reusable Expressions)

Next, we need to configure the Notebook File so that it can dynamically generate different Notebooks based on the metadata of different instances.

  • Use reusable expressions to simplify logic and enable modular design.
  • Add blocks in your reusable expressions to parameterize certain attributes (e.g., table schema). Configure parameters and columns using Functional Roles Filter and Instance Property to ensure dynamic substitution during notebook generation.

So we can:

  1. Build reusable expressions (cells) that encapsulate all logic.
  2. Assemble the final notebook by referencing those reusable expressions.

More information about file templates:

  • For a practical walkthrough of the File Editor, watch the video in the File Editor. It explains how to navigate the editor, use dynamic metadata, and build custom file templates(SQL script and Notebook) efficiently.
  • For the core concepts of the File Editor and a detailed step-by-step walkthrough with screenshots demonstrating how to create a DDL Template for a database table, see the page Files Templates.

3.1 Build the Reusable Expressions

Before assembling the notebook, first create the folder that will hold your reusable expressions and then add the expressions themselves. If you already have an original Synapse notebook, you can copy relevant cells or code from that file directly into each reusable expression editor.

Steps:

  1. Click the Reusable Expressions tab.
  2. Create the folder CELLS:
  • Click on the new Folder icon
  • Enter CELLS in the input box to insert the folder name, then click on + Create new Folder.
  • The new Folder CELLS will then appear in the list. This keeps reusable expressions organized and mirrors the structure shown in the screenshots
  1. Create the Reusable Expressions:
  • In the Folder CELLS click on the New Item icon
  • Enter 01_Header in the input box to insert the reusable expression name, then click on + Create new Reusable Expression.
  • The new reusable expression will then appear in the list.
  1. To edit a reusable expression, click it in the sidebar to select it. An editor appears on the right where you can paste or type content (you may copy parts of the original notebook file here). The editor is the same one used for File Templates.

Create the following reusable expressions. Each reusable expression begins with a Cell Format Indicator and can contain one or more expression blocks and optional switch cases.

  • 01_Header%markdown

    • Purpose: Human-readable description with dynamic placeholders (BUILD_USER_NAME, BUILD_TIMESTAMP, PATTERN_NAME, PATTERN_VERSION, INSTANCE_NAME, RELEASE_NAME).
    • Result: A rendered header section in the final notebook.

    Note

    For more information on how to generate dynamic Markdown, see the supplementary section Part 7(optional): Add a dynamic markdown header.

  • 02_Imports%code

    • Purpose: Import PySpark/Delta libraries and define any small helpers needed across cells.
    • Typical content: imports for SparkSession, DataFrame, StructType, StructField, and Spark data types.
  • 03_Parameters%parameters

    • Purpose: Declare runtime configuration (storage account/container names, base paths, PATTERN_NAME, LAYER_NAME, DATA_SOURCE, INSTANCE_NAME, DELIVERY_MODE).
    • Rule: Only one %parameters cell per notebook. Keep it at the top so later cells can use these variables.
  • 04_Util_functions%code

    • Purpose: Utility functions used throughout the pipeline (e.g., generate_md5_hash, generate_hdiff).
  • 05_Load_csv_files%code

    • Purpose: Load raw files from the landing zone using the declared parameters.
  • 06_Process_landing_to_bronze%code

    • Purpose: Core processing cell including schema application, metadata enrichment, hash key & hash value generation, and Delta write.
    • Contains expression blocks and switch cases(see Subsection 3.3) to parameterize schema and hashing.
  • 07_Main%code

    • Purpose: Orchestrate the logical steps (call utils, load, process, write) and handle high-level flow.
  • 08_Test%code (optional)

    • Purpose: Quick validations or preview queries used during development.

3.2 Cell Format Indicators

Here you'll learn how %markdown, %code, and %parameters control rendering, execution, and runtime configuration across your expressions.

In Synapse notebooks, each cell can start with a format indicator that controls how the content is treated. The three indicators used in this pattern are shown in your screenshots and explained below.

IndicatorWhere used in screenshotsPurposeTypical content
%markdown01_HeaderRenders documentation and headingsOverview, BUILD_USER_NAME, BUILD_TIMESTAMP, PATTERN_NAME, INSTANCE_NAME, RELEASE_NAME, etc.
%code02_ImportsExecutable code cell (Python by default).Library imports, Standard code cell for Python code blocks.
%parameters03_ParametersDeclares runtime parameters available to later cells.STORAGE_ACCOUNT_NAME, LANDING_BASE_PATH, BRONZE_BASE_PATH, INSTANCE_NAME, etc.

Details matching the images:

  • 01_Header uses %markdown. This cell documents the notebook and can embed dynamic references like BUILD_USER_NAME, BUILD_TIMESTAMP, PATTERN_NAME, INSTANCE_NAME, and RELEASE_NAME. These values are resolved during instance generation so the rendered header always reflects the selected instance.

    Note

    For more information about %markdown header, see the page Part 7 (optional) - Add a dynamic markdown header.

  • 02_Imports uses %code. It contains import statements for PySpark, Delta, and types. Being a %code cell, it is executed and makes these symbols available to subsequent cells.

  • 03_Parameters uses %parameters. It defines configuration variables. These variables are injected by the pipeline/notebook and referenced later in %code cells.

    WARNING

    Only a single %parameters cell is allowed per notebook. Place it near the top so all later cells can reference these values.

3.3 Creating the loaded_schema Block

In this subsection, you'll implement a dynamic loaded_schema using expression blocks and switch cases so data types resolve from metadata at generation time.

Why This Matters: The loaded_schema block is the foundation of your data processing pipeline. Without a properly defined schema, your data transformations will fail, and you'll spend hours debugging type mismatches and field mapping errors. This block acts as a "contract" between your source data and your processing logic, ensuring data integrity throughout your pipeline.

The Challenge: Different data sources often have varying field types - some tables might have string fields, others decimal numbers, long, or timestamp values. Hard-coding these types would mean creating separate notebooks for each data source, defeating the purpose of a reusable pattern.

The Solution: Switch Cases allow you to create one intelligent schema definition that adapts based on your metadata. When MetaKraftwerk generates your notebook, it reads your metadata and automatically applies the correct data type for each field. This is where the magic happens - one pattern, infinite possibilities!

Step 1: Add the Expression Block

  • Open the reusable expression 06_Process_landing_to_bronze, and find loaded_schema in the function process_landing_to_bronze().

  • Position the cursor where you want to add the schema definition, click the Add Expression Block icon in the editor toolbar to insert a new expression block.

  • The new expression block will appear with a purple border, indicating it's active and ready for configuration.

  • In the block settings panel on the right, configure:
    Functional Roles Filter: Select the functional role BUSINESS_KEY and SRC_AND_TGT_FIELD. Concatenation: Use , (comma) as the separator.

  • Switch-Default: Inside the block, In the content field, enter:

    StructField("NAME", DATA_TYPEType(), True)

    When you reach the NAME part (inside the quotes), type NAME explicitly. As you type, an autocomplete suggestion labeled with a blue IP badge (indicating an instance property) will appear. Select this suggestion by pressing Enter. This action inserts a dynamic reference to the instance property NAME, which automatically pulls the business key and field name from your instance metadata. Your reusable expression should now look like this:

Step 2: Configure the Base Schema Structure

  • Outside the expression block, define the base schema structure (as shown in lines 19 and 22 in the screenshot):
python
loaded_schema = StructType([
    # Schema fields will be defined here using Switch Cases
])

Why StructType: Apache Spark uses StructType to define the structure of DataFrames. This is like creating a blueprint that tells Spark exactly what to expect in each column - the column name, data type, and whether it can contain null values. Without this definition, Spark would have to guess the data types, often leading to incorrect interpretations and processing errors.

Step 3: Create the First Switch Case (StringType Fields)

  • In addition to the two default cases( Block-Default and Switch-Default), developers can create additional cases. In the expression block, click the plus sign icon below the expression block (Create new Switch Case). It will only be visible, when you place the cursor inside the block, so that the block becomes active (purple border is visible). A dialog will appear to configure your switch:

    • New Switch Name: Enter StringType.

    • Switch Cases: Keep OR selected (you can choose OR or AND).

    • Add Instance Property: A condition must be defined that must apply to a metadata row so that the case is applied to that row. The condition consists of an instance property, a comparison operator (=, !=, <, >, <=, >=), and a value. Click on + Add Instance Property and then select DATA_TYPE from the dropdown, enter string in the input.

    • Click the + Create new Switch Case button. The case then appears in the editor.

    • Inside this case StringType, add:

      StructField("NAME", StringType(), True)

Step 4: Complete the Schema with Additional Switch Cases

The Power of Completeness: Once you've created cases for all your data types, you'll have a truly dynamic schema that can handle any data source. This is the difference between a "one-off" script and a professional, enterprise-grade data processing pattern.

After creating the first Switch Case, you can add additional cases for different data types. Here's a reference table for the remaining Switch Cases:

Switch Case NameConditionContent
DecimalType with ScaleDATA_TYPE = decimal AND SCALE != <empty>StructField("[NAME]", DecimalType([PRECISION],[SCALE]), True)
DecimalTypeDATA_TYPE = decimalStructField("[NAME]", DecimalType([PRECISION]), True)
TimestampTypeDATA_TYPE = timestampStructField("[NAME]", TimestampType(), True)
LongTypeDATA_TYPE = long OR DATA_TYPE = bigintStructField("[NAME]", LongType(), True)

3.4 Additional Expression Blocks

The Modular Approach: After completing the loaded_schema block, you'll start to see how MetaKraftwerk's modular approach scales. Each expression block handles one specific concern, making your patterns easier to maintain, test, and reuse across different projects.

After completing the loaded_schema block, you can add other expression blocks for different parts of your notebook:

  • Hash Key Generation: The hash key uniquely identifies a business entity.

    • Use an expression like:
    python
    df_with_metadata = (
        df_with_metadata.withColumn(
            "HASH_KEY",
            generate_md5_hash([
                # insert Expression Block here
                "NAME"  # instance property for the business key column name
            ])
        )
    )
    • In the expression block:
      • Set Functional Roles Filter to BUSINESS_KEY.
      • Set Concatenation to , (comma). If multiple key columns exist, they will be concatenated in a stable order before hashing.

    Explanation: the block iterates over rows matching the role BUSINESS_KEY and injects the selected instance property (e.g., NAME) into generate_md5_hash. When multiple business-key columns are present, they are concatenated with , and then hashed to produce a deterministic HASH_KEY.

  • Hash Value Generation: The hash value captures changes in the non-key attributes.

    • Use an expression like:
    python
    df_with_metadata = (
        df_with_metadata.withColumn(
            "HASH_VALUE",
            generate_hdiff([
                # insert Expression Block here
                "NAME"  # instance property; the block will expand to all SRC_AND_TGT_FIELD columns
            ])
        )
    )
    • In the expression block:
      • Set Functional Roles Filter to SRC_AND_TGT_FIELD.
      • Set Concatenation to , (comma) so all attributes are combined in a consistent order.

    Explanation: the block expands over all attributes marked with the functional role SRC_AND_TGT_FIELD. These values are concatenated using , and hashed by generate_hdiff. Any change in any tracked attribute will result in a different HASH_VALUE, enabling efficient change detection.

3.5 Assemble the Notebook from Reusable Expressions

In this subsection, you'll compose the final notebook by referencing the previously defined reusable expressions in sequence, producing an instance-ready notebook.

Once your reusable expressions are defined, you do not handwrite a long monolithic notebook. Instead, select the target notebook file in the Files editor and reference the reusable expressions in the desired order. The final notebook is produced by concatenating the referenced reusable expressions with their resolved metadata.

Note

Please note that when you click the blue icon landing_2_bronze in the pattern section of the sidebar, the file that opens by default under the "Files" option is the active file. You must always pay attention to which file is the "active" file mapped in the pattern, this is the file required for the build and where you must place the final code version.

Recommended order (as shown in the screenshot):

  1. 01_Header
  2. 02_Imports
  3. 03_Parameters
  4. 04_Util_functions
  5. 05_Load_csv_files
  6. 06_Process_landing_to_bronze
  7. 07_Main
  8. 08_Test (optional)

This approach keeps logic modular and makes it easy to reuse the same expressions across multiple patterns or instances.

4. Adjust File Settings

Now after we have implemented the notebook, we have to do the following:

  • Check that the Output checkbox is selected for our notebook file, otherwise the file will not be part of the build results
  • Define Notebook as the type of the file.
  • Define a dynamic naming rule for the notebook file (e.g., NB_LDG_2_BRONZE_INSTANCE_NAME.json). Click on the File Name button to open the naming editor of the file (see screenshot below). Make sure to activate the Dynamic Name checkbox.

5. Preview the Generated Notebook

The Preview feature allows you to generate and inspect the code of a selected instance before deploying it to a target platform. This helps validate the logic, review script details, and test the outcome in a controlled way.

To preview an instance:

  1. In the File Editor, choose the file you want to inspect.

  2. At the top-right of the interface, click the Preview Instance dropdown.

  3. From the list, select the Instance you want to preview.

  4. The system will generate the corresponding preview, which is displayed on the right-hand side of the editor.
    From here, you can see and copy the previewed code and use it on the target platform (e.g., for testing or validation purposes) without committing the pattern.