Azure Data Factory (ADF) has introduced a feature called Managed Virtual Network (MVN) to connect Azure PaaS resources securely and privately from on-premises without opening corporate boundary firewall to Azure PaaS public IPs.
To run an on-premises backend job using PaaS services such as Azure Storage, Azure SQL, a corporate boundary firewall may need to be opened to Azure PaaS service public IP addresses. If you receive pushback from InfoSec to punch a hole in the firewall, you are not alone. I understand “the why” part of that pushback. Luckily, no more convincing is needed!
Managed Virtual Network (MVN)
In the picture below, MVN contains an integration runtime. This integration runtime can have multiple ADF private managed endpoints. These ADF private endpoints establish private links with other Azure resources such as Azure Storage and Azure SQL database.
The advantages of using MVN are:
- MVN eliminates the need to have a deep understanding of Azure networking and upfront network planning
- MVN takes care of automatic DNS entry registration
- MVN is secure and private. Public internet access to azure PaaS services is disabled
- Corporate firewall rules will be much cleaner. No more clutter from punching holes in on-premises corporate boundary firewall to allow Azure PaaS Services
- Built-in data exfiltration protection
How to enable MVN, private link & private endpoints in ADF?
Let’s see how to enable MVN with an Azure PaaS implementation. Often, we come across a typical business scenario where we want to transform a broad set of flat, semi-structured CSV files into a schematized and structured format ready for further querying.
As shown above, Azure Data Factory is a PaaS orchestration engine to convert from flat file data into SQL data. The CSV data is stored in Azure Data Lake and is converted from the raw formats into binary formats and stored in Azure Synapse SQL Data Warehouse. Ingested CSV data is transformed and stored in a columnar format and will be more performant for querying.
We will create an Azure Data Lake Storage, Azure SQL Server, SQL Database, and ADF. These PaaS resources have been created in the same Azure region. For step-by-step instructions on creating these unique PaaS resources, refer to the Credits section.
- In ADF, go to Manage connections and select Integration runtimes. Create an Azure Integration Runtime (IR). Enable Virtual Network Configuration, as shown below. This step will ensure the data integration process is isolated and secure.
- After creating the IR, in the bottom left corner, select managed private endpoints. This helps to connect to Azure Data Lake from the ADF securely. It’s a two-step process; select “create a new managed private endpoint” and then selected Data Lake Storage. You will get a prompt to go to the Azure portal. In the portal, select Data Lake Storage, approve the newly created private endpoint as shown below:
- Similarly, as above, create a private endpoint in ADF for SQL database and approve in the portal.
- The last step is to create a linked service in ADF for Azure Data Lake Storage and Azure SQL Database.
- Finally, configure and run the data factory pipeline to transfer flat file data from Azure Data Lake to Azure SQL database.
In conclusion, Azure customers can avail the power and flexibility of a PaaS solution with no internet exposure. As demonstrated above, Azure Private Link provides all back-office jobs to run in PaaS privately and securely.