It's all about Data

Sunday, March 1, 2026

Vibe coding with Databricks: Step by step guide to setup your environment

Image is taken from: https://github.com/databricks-solutions/ai-dev-kit

In the era of Agentic AI, vibe coding is gaining popularity, and Databricks Field Engineering has developed a Toolkit for Coding Agents. This blog post provides step-by-step guidelines to help you prepare your environment for vibe coding.

There are few pre-requisites those needing to install before the Dev tool kit can be installed. Step 1 to 4 are for pre-requisites and step 5 is for the Dev Kit installation.

Prerequisites

Step 1: Installing the Python package manger astral

If you are using Windows please use the below code to install the python package astral

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

In case there is an error to install the file, you need to install chocolaty

Step 2: Install chocolaty

Please make sure to open the PowerShell in Administrator mode

Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString(‘https://community.chocolatey.org/install.ps1')) Verify if chocolaty is installed correctly.

Fig 1: validate is choco is installed correctly.

Step 3: Download and Install Databricks CLI

Use below CLI command: choco install databricks-cli

Fig 2: download the Databricks CLI

After downloading, install it:

Fig 3: Databricks CLI is installed

Validate if the CLI is installed properly: databricks -v

Fig 4: Validate the CLI installed properly

Step 4: Install git with choco

Next you need to install git, you can use following command: choco install git -y

Fig 5: Installing git

Step 5: Install the AI DEV Kit

Since all the pre-requisites are completed, now let’s install the dev kit by using the following command:

Step 5A: Choose the tools irm https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/install.ps1 | iex

Fig 6: Dev Kit installation

It will ask for your Databricks existing profile as shown in the below fig 7

Step 5B: select Databricks profile Press enter or click to view image in full size

Fig 7: Profile and MCP server location

Step 5C: Choose path for MCP and tools The chosen tool, Claude, cursor, copilot and MCP server will be installed.

Fig 8: MCP server configuration path

Next step is to authenticate: Press enter or click to view image in full size

Fig 9: Authentication

After this step it will take you to the browser to authenticate and after you complete the authentication tools and MCP server will be installed Press enter or click to view image in full size

Fig 10: install completed

As above fig 10 shown, the installation is completed, however, to further configure it Now you can use VS code to open the project and start the vibe coding as shown fig 11 😊 Press enter or click to view image in full size

Fig 11: start using Databricks vibe coding with VS code

You will find more details: https://github.com/databricks-solutions/ai-dev-kit

Sunday, January 11, 2026

Agentic AI: How Not to Become Part of the 80% Failure Side!

Many research reports cite that 80%–85% of GenAI pilots fail, and Gartner predicts that over 40% of Agentic AI projects will be canceled by the end of 2027 [1].

Organizations have started building AI solutions — especially AI agents — to serve specific purposes, and in many cases, multiple agents are being developed. However, these solutions are often built in silos. While they may work well in proof-of-concept (PoC) stages, they struggle when pushed into production.

The biggest question that emerges on a scale is trust.

Trust First: Security, Governance, and Cost Control

Before scaling AI, we must make it safe, compliant, and with predictable cost. It means:

PII and sensitive data never leaves our controlled boundary
Every query is auditable and attributable
Cost per interaction is enforced by design, not monitored after the bill arrives

Why this matters:
Without trust, AI adoption stops at the first incident. To gain trust we need to avoid hallucination, confabulation and we can make some action to monitor and govern. I was working with Databricks to monitor and govern the agent. With step by step process I will share how you can setup AI governance in Databricks.

AI Governance with Databricks
Once you deploy your model as a serving endpoint, you can configure the Databricks AI Gateway, which includes:
• Input and Output guardrails
• Usage monitoring
• Rate limiting
• Inference tables

This setup is illustrated in Figure 1.

AI Guardrails: Filter input & output to prevent unwanted data, such as personal data or unsafe content. Figure 2 demonstrates how guardrails protect both input and output.

Fig 2: AI guardrails to protect Input/Output

Inference tables: Inference tables track the data sent to and returned from models, providing transparency and auditability over model interactions. Refer to Figure 3 for inference table, where you can assign the inference table.

Rate Limiting: Enforce request rate limits to manage traffic for the endpoint. As shown in fig 4, you can limit number queries and number of token for the particular serving end point.

In above Fig 4, QPM stands for number of queries the end point can process per minute and TPM stands for number of tokens the endpoint can process per minute

Usage Tracking: Databricks provides built-in usage tracking so you can monitor:
• Who is calling the APIs or model endpoints
• How frequently they are used

Usage data is available in the System table: system.serving.endpoint_usage

In summary, Databricks offers the tools needed to build trusted and governed Agentic AI solutions, helping you land on the 20% success side of AI adoption.

MIT report: 95% of generative AI pilots at companies are failing | Fortune
https://www.gartner.com/en/newsroom/press-releases/2025-06-25-gartner-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027

Saturday, January 25, 2025

Why Data Strategy Is More Crucial Than Ever for Your Enterprise

From executives to frontline teams, everyone in the organization is talking about GenAI applications. Now, a new buzzword is gaining traction — “Autonomous Agents.” In the near future, organizations won’t just have a handful of these agents; they will likely manage hundreds, if not thousands. As their presence grows, so will the challenges of ensuring security, compliance, and governance. Effectively managing this new wave of AI-driven automation will be critical to maintaining trust, control, and operational efficiency.

The data products we build in any organization rely on data. And when we talk about data, it must be fresh, reliable, secure, and compliant at any given point in time. Your organization can invest heavily in building data products, but if the data itself is unreliable, insecure, or non-compliant, these products will fail to deliver value to the business.

A well-defined data strategy plays a critical role in ensuring that your organization delivers fresh, reliable, secure, and compliant data. It establishes governance frameworks, data quality measures, and security policies that safeguard the integrity of data assets. A strong data strategy ensures that AI agents and other data-driven applications operate with trustworthy information.

In my next blog post, I’ll be sharing the foundational building blocks of a successful Data Strategy — practical insights you can apply to your own organization. I’d also love to hear from my network: If you’ve already implemented a modern data strategy, what lessons have you learned? Any tips or best practices you’d like to share? Let’s learn from each other!

Saturday, March 23, 2024

Bridging Snowflake and Azure Data Stack: A Step-by-Step Guide

In today's era of hybrid data ecosystems, organizations often find themselves straddling multiple data platforms for optimal performance and functionality. If your organization utilizes both Snowflake and Microsoft data stack, you might encounter the need to seamlessly transfer data from Snowflake Data Warehouse to Azure Lakehouse.

Fear not! This blog post will walk you through the detailed step-by-step process of achieving this data integration seamlessly.

As an example, the below Fig:1 shows a Customer table in the Snowflake Data warehouse.

Fig 1: Customer data in the Snowflake data warehouse

To get this data from Snowflake to Azure Data Lakehouse we can use cloud ETL tool like Azure Data Factory (ADF), Azure Synapse Analytics or Microsoft Fabric. For this blog post, I have used Azure Synapse Analytics to extract the data from Snowflakes. There are two main activities involved from Azure Synapse Analytics:

A. Creating Linked Service

B. Creating a data pipeline with Copy activity

Activity A: Creating Linked service

In the Azure Synapse Analytics ETL tool you need to create a Linked Service (LS), this makes sure connectivity between Snowflake and Azure Synapse Analytics.

Please find the steps to create Linked Service:

Step 1) Azure Synapse got built in connector for Snowflake, Please click new Linked service and search the connector "Snowflake" and click next as shown in Fig 2 below

Fig 2: Built in connector Snowflake

Step 2) Make sure to fill all the necessary information

Fig 3: Linked service details

a) Linked service name: Please put the name of the Linked service

b) Linked service description: Provide the description of the Linked service.

c) Integration runtime: Integration runtime is required for Linked service, Integration Runtime (IR) is the compute infrastructure used by Azure Data Factory and Azure Synapse pipelines. You will find more information under Microsoft learn page.

d) Account name: This is the full name of your Snowflake account, to find this information in Snowflake you need to go to Admin->Accounts and find out the LOCATOR as shown in figure 4.

Fig 4: Snowflake account name

If you hover over the LOCATOR information, you will find the URL as shown in fig 5.

Fig 5: Snowflake account URL

Please don't use full URL for the Account name in Linked Service, keep until https://hj46643.canada-central.azure

e) Database: Please find the database name from Snowflake, go to Databases->{choose the right Database} as shown in Fig 6

Fig 6: Snowflake Database

Snowflake database is nothing but storage. In general ,MS SQL Database, ORACLE or Teradata have their compute and storage together and called Database. However, in Snowflake; Storage is called Database and Compute is their Virtual data warehouse.

f) Warehouse: In Snowflake, you have warehouse in addition to the database. Please go to Admin->Warehouses->{choose your warehouse} as shown in fig 7. We have used the warehouse: AZUREFABRICDEMO

Fig 7: Virtual Warehouse in Snowflake

g) User name: User name of your Snowflake account

h) Password: Password of your Snowflake account

i) Role: Default role is PUBLIC, if you don't use any other role it will pick PUBLIC. I did not put any specific role so kept this field empty.

j) Test connection: Now you can test the connection before you save it.

k) Apply: If the earlier step "Test connection" is successful, please save the Linked service by clicking apply button.

B. Creating a data pipeline with Copy activity

This activity includes connecting the source Snowflake and copying the data to the destination Azure data Lakehouse. The activity includes following steps:

1. Synapse Data pipeline

2. Source side of the Copy activity needs to connect with the Snowflake

3. Sink side of the Copy activity needs to connect with the Azure Data Lakehouse

1. Synapse Data pipeline

From the Azure Synapse Analytics, create a pipeline as shown Fig 8

Fig 8: Create a pipeline from Azure Synapse Analytics

And then drag and drop Copy activity from the canvas as shown in Fig 9, You will find Copy activity got source and sink side.

Fig 9: Copy Activity

2. Source side of the Copy activity needs to connect with the Snowflake

Source side of the Copy activity needs to connect with the Snowflake Linked service that we created under the Activity A: Creating Linked service. Please find how you connect Snowflake from Synapse pipeline, at first choose "Source" and then click "New" as shown in below fig 10

fig 10: Source dataset (step 1)

and next step is to choose Snowflake as shown in below fig 11

Fig 11: Source dataset with Snowflake

After choosing the above integration dataset, you will find another UI which you need to fill up as shown in fig 12

Fig 12: Source dataset details

a) Name: Provide a dataset name

b) Linked service: Please choose the Linked service which we already created under the Activity A

c) Connect via Integration runtime: Choose the very same Integration runtime you used at the Activity A.

d) Table name: Now you should able to find all the table from Snowflake, so choose the right table you want to get data from.

e) Click 'Ok' to complete the source dataset.

3. Sink side of the Copy activity needs to connect with the Azure Data Lakehouse

Now we need to connect the sink side of the copy activity, fig 13 shows how to start with sink dataset.

Fig 13: creating sink dataset

And then fill up the details to create Sink dataset as shown in the below fig 14

fig 14: Sink dataset properties

a) Name: Provide a dataset name

b) Linked service: Please choose the Linked service which we already created under the Activity A

c) Connect via Integration runtime: Choose the very same Integration runtime you used at the Activity A.

d) File Path: Now you need to choose the file path in Azure Data Storage account. I have already created a storage account, container and sub directory. The file path is: snowflake/testdata

e) Click 'Ok' to complete the source dataset.

The Synapse pipeline is completed. However, before executing the pipeline, need to check if there is any error in the code. To check it, please click 'validate'. When I did the validation found the below error as shown in fig 15

Fig 15: staging error

The error is self explanatory, since we are copying directly from Snowflake data warehouse, we must need to enable staging in the pipeline.

To enable staging, at first click on settings of the pipeline, then enable the staging and connect a Linked service that connect a storage as shown in the below fig 16.

Fig 16: Enable staging in the Copy pipeline

When you are connecting the blob storage for the staging please make sure it's not ADLS storage account and must need to choose Authentication type SAS URI.

After fixing the error when execute it again, the pipeline moved the data from Snowflake data warehouse to Azure data lake storge. You will find a .parquet file created as shown below fig 17

and you can view the data by using notebook as shown in fig 18.

fig 18: Data from Azure Data Lake

The blog post shared how you can copy data from Snowflake Data warehouse to Azure Data Lake by using Azure Synapse Analytics. The same can be achieved through Azure Data Factory (ADF) as well as Microsoft Fabric.

Wednesday, November 29, 2023

How to solve available workspace capacity exceeded error (Livy session) in Azure Synapse Analytics?

Livy session errors in Azure Synapse with Notebook

If you are working with Notebook in Azure Synapse Analytics and multiple people are using the same Spark cluster at the same time then chances are high that you have seen any of these below errors:

"Livy Session has failed. Session state: error code: AVAILBLE_WORKSPACE_CAPACITY_EXCEEDED.You job requested 24 vcores . However, the workspace only has 2 vcores availble out of quota of 50 vcores for node size family [MemoryOptmized]."

"Failed to create Livy session for executing notebook. Error: Your pool's capacity (3200 vcores) exceeds your workspace's total vcore quota (50 vcores). Try reducing the pool capacity or increasing your workspace's vcore quota. HTTP status code: 400."

"InvalidHttpRequestToLivy: Your Spark job requested 56 vcores. However, the workspace has a 50 core limit. Try reducing the numbers of vcores requested or increasing your vcore quota. Quota can be increased using Azure Support request"

The below figure:1 shows one of the error while running the Synapse notebook.

Fig 1: Error Available workspace exceed

What is Livy?

Your initial thought will likely be, "What exactly is this Livy session?"! Apache Livy is a service that enables easy interaction with a Spark cluster over a REST interface [1]. Whenever we execute a notebook from Azure Synapse Analytics Livy helps interact with Spark engine.

Will it fix if you increase vCores?

By looking at the error message you may attempt to increase the vCores; however, increasing the vCores will not solve your problem.

Let's look into how Spark pool works, by definition of Spark pool; when Spark pool instantiated it's create a Spark instance that process the data. Spark instances are created when you connect to a Spark pool, create a session and run the job. And multiple users may have access to a single Spark pool. When multiple users are running a job at the same time then it may happen that the first job already used most vCores so the another job executed by other user will find the Livy session error .

How to solve it?

The problem occurs when multiple people work on same Spark pool or same user running more than one Synapse notebook in parallel. When multiple Data Engineer works on same Spark pool in Synapse they can configure the session and save into their DevOps branch. Let's look into it step by step:

1. You will find configuration button (a gear icon) as below fig 1 shown:

Fig 2: configuration button

By clicking on the configuration button you will find details about Spark pool configuration details as shown in the below diagram:

Fig 3: Session details

And this is your session, as you find in the above Fig 3. And there is no active session.

2. Activate your session by attaching the Spark pool and assigning right resources. Please find the below fig 4 and details steps to avoid the Livy session error.

Fig 4: Fine-tune the Settings

a) Attach the Spark pool from available pools (if you have more than one). I got only one Spark pool.

b) Select session size, I have chosen small by clicking the 'Use' button

c) Enable Dynamically allocate executor, this will help Spark engine to allocate the executor dynamically.

d) You can change Executors, e.g. by default small session size have 47 executors however; I have chosen 3 to 18 executors to free up the rest for other users. Sometimes you may need to get down to 1 to 3 executors to avoid the errors.

e) And finally apply the changes to your session.

And you can commit this changes to your DevOps branch for the particular notebook you are working on so that you don't need to apply the same settings again.

In addition, since the maximum Synapse workspace's total vCores is 50, you can create request to Microsoft to increase the vCores for your organization.

In summary, the Livy session error is a common error when multiple data engineers working in the same Spark pool. So it's important to understand how you can have your session setup correctly so that more than one session can be executed at the same time.

Saturday, July 1, 2023

Provisioning Microsoft Fabric: A Step-by-Step Guide for Your Organization

Learn how to provision Microsoft Fabric with ease! Unveiled at Microsoft Build 2023, Microsoft Fabric is a cutting-edge Data Intelligence platform that has taken the industry by storm. This blog post will cover a simple step-by-step guide to provisioning Microsoft Fabric using the efficient Microsoft Fabric Capacity.

Before we look into the Fabric Capacity, let's get into different licensing models. There are three different types of licenses you can choose from to start working with Microsoft Fabric:
1. Microsoft Fabric trial
It's free for two months and you need to use your organization email. Personal email doesn't work. Please find how you can activate your free trial
2. Power BI Premium Per Capacity (P SKUs)
If you already have a Power BI premium license you can work with Microsoft Fabric.
3. Microsoft Fabric Capacity (F SKUs)
With your organizational Azure subscription, you can create Microsoft Fabric capacity. You will find more details about Microsoft Fabric licenses
We will go through the steps in detail on how Microsoft Fabric capacity can be provisioned from Azure Portal.

Step 1: Please login to the Azure Portal and find Microsoft Fabric from your organization's Azure Portal as shown below in Fig 1.

Fig 1: Finding Microsoft Fabric in Azure Portal

Step 2: To create the Fabric capacity, the very first step is to create a resource group as shown in Fig 2

Fig 2: Creating resource group

Step 3: To create the right capacity you need to choose the resource size that you require. e.g. I have chosen F8 as shown in the below figure 3

Fig 3: Choose the right resource size

You will find more capacity details in this Microsft blog post.

Step 4: As shown below in Fig 4, before creating the capacity please review all the information you have provided including resource group, region, capacity size, etc., and then hit the create button.

Fig 4: Review and create

When it's done you will able to see Microsoft Fabric Capacity is created (see below fig 5)

Fig 5: Microsoft Fabric capacity created

You can go to the admin portal to validate your recently created Fabric capacity too. Fig 6, shows the Fabric capacity under the admin panel.

Fig 6: Fabric capacity under admin portal

To explore the Microsoft Fabric please browse through the site and you will find the capacity you just created finally and the home page will look like below in Fig 7

Fig 7: Fabric Home page

In summary, you have a few ways to use Microsoft Fabric and learned provisioning Fabric by using Fabric Capacity. However, it's important to remember you must need to Enable Microsoft Fabric for your organization.

Pages