How to secure your data services in Azure with Azure Private Link
Authors: Marzieh Barghandan & Naomi Verdult
A. Introduction
Securing Azure data services can pose many unique challenges. After all security is job one in the cloud. Security in Azure covers a vast range of topics; Network security, identity management, audits and logging and more. Of course, all are important, but in this blog, we’ll discuss security on a network level with Azure Private Link for data platforms.
PaaS services in Azure commonly use public interface for connection. Leveraging these services often raise the concern of how to secure the architecture, there is always a consideration of restricting access to PaaS services as well as how data would transit between services. In most cases there are two scenarios when it comes to data movements. Either both resource and destination are publicly accessible over the internet or one of the ends is behind a firewall or inside a corporate network, private network or virtual network, and thus not publicly accessible.
In the architecture that we are covering within this article, we are looking into security of three Azure services (Storage Account, Azure Data Factory, Azure Synapse Analytics) which means they are publicly accessible over internet and we need to restrict access to each service as well as allowing communication between these services.
B. What is Azure Private Link exactly?
Azure Private Link makes it able to access Azure PaaS Services (e.g. Azure Storage Account) and Azure hosted partner services over a private endpoint in a virtual network (Vnet). By using this, the traffic between the Vnet and the Azure service flows over the Microsoft network (and not over public internet). Traffic can be split into outbound traffic and inbound traffic. Important to note is that Azure Private link only works for incoming traffic. If you want to check if which Azure services support private link, you can check that here.
So, what’s the difference between an Azure Private Endpoint and Azure Private Link service? Also, see the image below for a visual representation.
- Azure Private Endpoint is a network interface which connects you privately and securely to a service powered by Azure Private Link.
- Azure Private Link Service is a reference to your Azure PaaS Service. When private link is used, the service becomes part of a VNET and gets a private IP.
C. Different approaches
As mentioned in the introduction, this article covers three Azure PaaS services: Azure Storage Account, Azure Data Factory, and Azure Synapse Analytics. These services are PaaS Services which means they are publicly accessible over internet. To secure them from a network point of view, we want to restrict access to each service as well as allowing communication between these services. The overview below shows which connections are possible over private link.
- Manages Vnet option of Azure Synapse Analytics and Azure Data Factory (preview)
The Managed Virtual Network is associated with Azure Data Factory instance and managed by Azure Data Factory. Using Azure Data Factory Managed Vnet will allow us to create an Azure Integration runtime provisioned with the managed Virtual Network, using private endpoints to securely connect to Azure services. Azure Data Factory manages these private endpoints on your behalf.
Similarly, same feature is also available within Azure Synapse workspace, when we are creating a new instance, we can leverage this feature to associate our workspace with an Azure Virtual Network which is managed by Azure Synapse; so called Managed workspace Virtual Network.
Using Managed Virtual Networks with our ADF and Synapse workspace instance would give us the benefit of pushing the Vnet management workload to the Azure Synapse, we don’t need to be concern about setting up the right NSG rules.
2. Azure Data Factory Self Hosted Integration Runtime (SHIR) on a VM
Deploying Azure Data Factory (ADF) with a Managed Vnet is currently in preview. If you want to have the same functionalities, it is also possible to install a Self Hosted Integration Runtime (SHIR) on a virtual machine in Azure. It is important that the private endpoint of the storage account is deployed in the same Vnet as the VM. Otherwise, ADF doesn’t know what the private IP means as public IP’s are unique, but private IP’s are not.
D. Architecture & Deployment
For our solution, we decided to deploy solution 1. On this step we will walk through deploying architecture below in three steps (Figure 4):
- First step is to create an Azure IR associated with a managed Vnet and create a private end point to the storage account.
- Same steps for connecting Azure Synapse with Storage account.
- We put this step grayed out as at the moment of writing this article this is not yet possible, but this will be sorted out soon.
To create an IR with a managed Vnet we have two options, while creating an ADF instance you have an option under “Networking” tab to “Enable Managed Virtual Network on the default AutoResolveIntegrationRuntime” (Figure 5).
It’s also possible to set this up while creating an IR in ADF, on “Manage” blade > Integration runtimes > New > Azure, Self-Hosted >Azure. On the setup page “Enable” the Virtual network configuration (Figure 6).
Now that we have our Integration runtime deployed, we want to make the connection to our storage account. To achieve this, we need to make a managed private endpoint to the storage account. From Azure Data Factory navigate to “Manage” blade > Managed private endpoint, create a new instance, select Azure Blob Storage or Azure Data Lake Gen 2 and fill the form and create. You’ll see a new Managed private endpoint has been create with provisioning state on “Provisioning” after a while this status will change to “succeeded” now we need to approve this managed private endpoint.
For Approval navigate to Azure Portal > storage account > “Networking” blade > “Private endpoint connections” tab. Select the endpoint and approve. After couple of minutes, we can see the managed private endpoint approval state has been updated to “Approved”.
Now from Azure portal we will create Azure Synapse workspace, setting up all the configurations, then navigate to “Networking” tab and make sure you check the “Enable managed virtual network” and create your instance.
Launching Synapse Studio workspace and navigating to “Manage” blade > Managed private endpoints. From here on steps are same as Azure Data Factory to create managed private endpoints.
E. What about Azure Private Link Hubs?
Azure Synapse Analytics Private Link Hubs can be used to connect to your Synapse Studio in a secure way. It is a separate resource which needs to be deployed. After deploying it, you connect with your private endpoint (in a Vnet) to the private link hub. In this way, you can use private endpoints to securely connect to your Synapse Studio.
F. Conclusion
With Azure introducing the Managed Virtual Network on ADF and synapse it offered customers more secure and manageable data integration solution. At the end of the day in order to decide which approach to take on how to implement these services we need to look at business requirements and structure.