Recently, I was involved in a project where we had requirements. A report needed to be built based on data from several sources, including Azure Cosmos DB and Azure SQL. This report was required to be stored in Azure, and data was subject to a retention policy where it would be unable to be deleted. For this use case, we decided to use Azure Data Factory to read the data, transform the data, and load the data into a CSV file within a blob container in an Azure Storage Account. Then, to prevent the deletion of data, we would create a retention policy on the blob container to enable immutable storage, giving us write-once, read-many capabilities.

Azure Cosmos Data Flow

We found that we needed a Data Flow within Azure Data Factory to perform logic such as joining across our data sources. We created an output sink to a Delimited Text output within an Azure Blob Storage Linked Service within our data flow.

Immutable storage policies divided into retention policies and legal holds can be enabled on a storage account to enforce write-once, read-many policies, which allow new documents to be added to the storage account. But, it prevents the modification and deletion of existing documents. In the case of a retention policy, individual blobs cannot be modified or deleted until the retention period has passed for a given blob. In the case of legal holds, no blobs can be modified or deleted until the legal hold is manually removed from the blob container. Legal holds can be used to fulfill various legal requirements, including keeping data under active investigation from being modified. You can read more about immutable storage in the official Microsoft documentation for Immutable storage for blobs.

Our project had legal requirements for retaining these created documents, so we found immutable storage policies to be a good fit. When we enabled Immutable Storage, either through a retention policy or a legal hold on the Blob Container, we encountered one of two errors:

Job failed due to reason: Spark job failed in one of the cluster nodes while writing data in one partition to sink, with following error message: This operation is not permitted as the blob is immutable due to a policy.

OR

Job failed due to reason: Spark job failed in one of the cluster nodes while writing data in one partition to sink, with the following error message: This operation is not permitted as the blob is immutable due to one or more legal holds.

Here, we will examine why this error occurs, and how we can resolve these errors.

Setup

To simplify our example, I will create a simple Azure Data Factory pipeline that uses a data flow to read from an Azure SQL server, performs a Select operation on that data, and then output a CSV file into an Azure Storage Blob Container.

Data Flow pipeline

I created an Azure SQL server to hold our source data. This server has a database named testdb, and a table named MOCK_DATA with a few columns of varying data types. I then inserted some mock data into the table.

SERVICE OFFERING: REHOST ON AZURE
Initiate your cloud journey on Microsoft technologies to accelerate your cloud transformation. Learn more about the benefits of rehosting.

I then created an Azure Storage Account to hold our final data, with a blob container named “output.” This is where I will store the final CSV files output by the Azure Data Factory.

Azure Storage Account

To make these blobs immutable, I added a legal hold to the output container:

Make blobs immutable

I then created a new Azure Data Factory:

Create and Open Azure Data Factory

Opening Azure Data Factory Studio, I created the following. A new Linked Service, associated with my Azure SQL Server:

Create New Linked Service associated with Azure SQL Server

A second Linked Service, associated with Azure Blob Storage:

Associate Linked Service with Azure Blob Storage

An Azure SQL Database Dataset, with the linked service associated to my SQL Database and the table source associated with my mock data table:

Azure SQL Dataset

An Azure Blob Storage Delimited Text Dataset, associated with my Blob Storage Linked Service, and with two parameters, Container and Blob, that are associated with the File path:

Azure Blob Storage Delimited Text Dataset

A Dataflow, which has the following configuration:

  • A parameter named filename
  • A Source associated with the mock data table dataset, with all default options.
  • A Select activity to simulate processing the data to select specific columns
  • A Sink activity with the following settings:
    • Dataset: The delimited text dataset
    • Settings tab:
      • File name option: Output to a single file
      • File name: $filename
      • Optimize: Single partition

Data Flow Activity and Configuration

A Pipeline with a Data Flow activity, associated with the Data Flow that was just built, with the following parameters:

  • Sink parameters:
    • Container: output
    • Blob: A dynamic content value with an empty string, ”
  • Other parameters:
    • filename: Data flow expression, “file1.csv”

Main Data Flow Parameters

When I run this pipeline, I get the following error:

Job failed due to reason: Spark job failed in one of the cluster nodes while writing data in one partition to sink, with the following error message: This operation is not permitted as the blob is immutable due to one or more legal holds.

Solution

Why do we get an error when running this pipeline? When the Spark data cluster associated with the Data Flow activity runs, it creates several temporary files within the Blob Container. It then attempts to append to these files and delete them after processing.

Error when running pipeline

This violates the write-once, read-many restrictions on the storage blob container so that the Data Factory job will fail.

To resolve this issue, I will need to create a temporary location for these files to be built and then copy that data from one storage account to another. First, I will create a new blob storage container with no retention restrictions, which I will call “temp”:

Temporary Blob Storage Container

Then, on the storage account, to clean up the temporary data, I will enable lifecycle management. I created a new lifecycle management rule on the storage account to delete old files by providing the following settings:

  • Rule scope: Limit blobs with filters
  • Blob type: Block blobs
  • Blob subtype: Base blobs
  • Base blobs rules:
    • Add a rule: If base blobs haven’t been modified in 1 day, then delete the blobs
  • Filter set:
    • Blob prefix: temp/*.csv

Base Blobs Subtype

Create the Filter Set

Next, back in Azure Data Factory, in my pipeline, I changed the dataflow sink parameters to use the temp blob container rather than the output blob container:

Change parameters in dataflow pipeline

The Copy Data activity does not work the same way as a Data Flow sink and does not create temporary files in its process. I created a new Copy Data activity in the pipeline, and attached the data flow’s anchor to the copy data activity:

Create New Data Activity in Pipeline

For the Source of the Copy Data activity, I selected the Delimited Text dataset and provided the following settings:

  • Container: temp
  • Blob: file1.csv

For the Sink of the Copy Data activity, I selected the same Delimited Text dataset and provided the following settings:

  • Container: output
  • Blob: file1.csv

Rerun the pipeline. The pipeline should run successfully, and your output file will exist in both the temp and output blob containers. After our lifecycle management policies find that the temporary blob is over a day old, the temporary blob will be deleted, while the output blob will remain. The retention policy will prevent the output blob from being deleted.

Run Pipeline with output and temp blob containers

Summary

The way the Data Flow activity in Azure Data Factory uses temporary files to build delimited text outputs, using a delimited text output as a Data Sink is not possible on an Azure Storage Blob Container with an immutable storage policy. A temporary storage output destination must exist as the output for the data flow, which is then copied over to the final, firm storage destination with a Copy Data activity to work around this restriction. The Copy Data activity is not processed the same way as the Data Flow sink. It does not create temporary files, so it is suitable for copying data from the temporary container to the output container. The temporary container is then cleaned regularly by creating a lifecycle management policy on the storage account.

The data team of the internal AIS Microsoft Power Platform Hackathon used 3 different data movement techniques to solve the hackathon use case: Dataflows, Power Query, and Power Automate. Read on to learn how we did it. There are several ways to import and export data through Microsoft Dataverse, the data backbone of Microsoft Power Platform.

What is Dataverse?

Dataverse is designed to work with any data and incorporates all the significant data technologies that any organization needs – relational, non-relational, file, image, search, and data lake. Dataverse helps to store and manage data securely. Dataverse includes a set of visual designers to create, edit, and interact with data. In Dataverse, tables are used to model and manage business data. To increase productivity, Dataverse includes a set of tables known as standard tables.

We used Dataverse as a place to store all our data related to catalogs and imported and exported data as per the scenarios.

Our Approach

Our hackathon team was tasked with migrating the data of a legacy application into Dataverse. The legacy application data was created from the eShopOnWeb project and was hosted in Azure SQL. Our approach was to break this use case down into two problems:

  1. Migrate the tables (schema)
  2. Migrate the data

The Power BI Model View of our data structure in the screenshot below shows the entities and their relationships we needed to migrate. The schema did not need to change during the migration, representing both the source and destination data model.

Power BI Model View

LEGACY DATA CONVERSION
Accelerate your data migration project with our free white paper download.

Migrate the Tables

We evaluated 3 techniques for migrating the tables into Dataverse. 

  1. Create the tables automatically during the data migration when using Dataflows 
  2. Write an application that creates the columns through the Dataverse Web API 
  3. Manually create them in the portal 

Option 1: Dataflows
Tables can be created in the process of moving data into Dataverse using Dataflows. This is only an option if you do not need to modify the schema and migrate the data.

Option 2: Dataverse Web API
The Dataverse web API provides a RESTful web service to interact with data in Microsoft Dataverse using a wide variety of platforms & programming languages, such as C#. This is an excellent option to consider if you’d like to programmatically migrate the schema but need to change the data model in the process.

Option 3: Manual
Manually creating tables in the portal is the simplest of the three options but could be time-consuming and error-prone if you make many tables or migrate the same schema into multiple environments. However, given the time constraint of our hackathon and the simple use case, we chose to use this option.

Migrate the Data

Once the tables are in place, Dataverse is ready to receive data from the source system. We evaluated four options to migrate data from the legacy application to Dataverse.

  1. Import from a CSV file
  2. Use Power Query
  3. Use Dataflows

Option 1: Import from CSV
You can load data in Dataverse by importing data from different files like Excel or CSV. This is an excellent option if you need to do a 1-time import of data that does not need to be transformed and does not include any unsupported data types, such as timezones, images, or multi-select choices. We connected to the legacy SQL database using SQL Server Management Studio and exported the data to a CSV during the hackathon. We then completed the import process using the portal to upload the CSV and map the source and destination columns.

Map source to destination columns

Option 2: Power Query
Users can filter, transform, and combine data before loading it into a new or existing Dataverse table. The target data source can be online and on-premises sources, including SQL Server, Salesforce, IBM DB2Access, Excel, or a Web API. Use this option if you are moving a large volume of data or if the data being moved needs to be reshaped during the migration.

Option 3: Dataflows
Dataflows are built upon Power Query, so they have all the same benefits but bring the added advantage of letting users trigger the migration on-demand or automatically on a schedule.

We will be diving deeper into each team, so stay tuned for more blog posts around our AIS Internal Hackathon!

Authored by Jagrati Modi (team lead)
Thank you to the Data team for sharing their experience:

  • Jagrati Modi (team lead)
  • Souradeep Banerjee
  • Nikhil Grover
Do your users want reports in SharePoint? Yes! They crave reports and charts. Regardless of which version of SharePoint they are using, getting started now to build their data visualizations in Power BI will position the reports for seamless migration to future SharePoint versions. These are the necessary steps to take to add a simple report in a SharePoint modern page.

Important note: To embed Power BI reports in SharePoint Online, a Power BI Pro license is required.

The Process Flow

This is the flow:

The Process Flow Figure 1

Figure 1: The Process Flow

  1. Create your data in SharePoint; say a list or library.
  2. Start Power BI Desktop to connect to SharePoint to read the data and transform it and create visualizations.
  3. Publish to Power BI Online, where a link becomes available to paste into the Power BI webpart available in SharePoint Modern pages.

We’ve gone full circle! Let’s look at the steps.

SharePoint Online Data

For this example, we will read data from a simple custom list. I added the list to my SPO Dev Tenant site named Vacation Planner. Since all our “vacay” are now “staycay,” I decided to make a board game list. Along with the default Title column that you get with any new list, I added three more. Each is a number column. Then I added games to the list; I listed every game I could think of. For each one, I entered somewhat random numbers for Difficulty and Minimum age and Popularity, although I am pretty sure Candy Land is for 4-year-olds.

SharePoint Online Data Figure 2

Figure 2: Board games list

To create the list, I was logged into SPO as a fictitious test user I named Gwen Lim.

Build the Report

Install the Power BI Desktop application to build the report. It’s free: download it here.

On first use, you will be prompted to sign in. If the login type option appears, choose “Organizational” and log in with a Windows account. I logged in with fictional Gwen Lim’s account. The app, either from the startup splash screen or the menu, chooses “Get Data.”

Select Data Source Figure 4

Figure 3: Select a data source

From the Common data sources dropdown, select “More…” at the bottom. Then click the “Online Services” option, and you should see “SharePoint Online List” on the right. Select that and then click “Connect” at the bottom.

Choose SharePoint online Figure 5

Figure 4: We will choose SharePoint online list

In the SharePoint Online Lists dialog, paste the address URL of the SharePoint site that contains your list. You can check the 2.0 (Beta) radio button (see figure 6) to enable the app to open the default view of your list or leave it 1.0 if you prefer.

SharePoint Site URL Figure 6

Figure 5: Enter the SharePoint site URL

A Navigator window appears with all of the lists available in the SharePoint site in the left columns with checkboxes. Then, check BoardGames to see a view of the data in the right side of the pane. Click the Load button.

Select the List Figure 7

Figure 6: Select the List

You can start building the report. The fields of the data display on the right side. Having chosen a specific, limited column view as default for the list, along with selecting the 2.0 radio button, you will see only a few fields (aka columns) on the right, which is easy to work with.

BoardGames List App

Figure 7: The BoardGames list fields appear

Ignore the fields for a moment while you chose a Visualization. Select the doughnut. Now, it’s time to apply fields to the doughnut. Drag Title into the Legend box under Visualizations. A legend appears beside the doughnut chart. Drag Popularity into the Values box, and your doughnut comes to life with color.

Pick a visualization Figure 9

Figure 8: Pick a visualization chart and add fields

When you hover the chart, tooltips appear with data for each game. Age level, Difficulty, and Popularity values have been imported as decimal values, which would be more readable as whole numbers. To alter this, and to edit column heading text, click on the ribbon’s Transform Data button.

Modify the Data Figure 10

Figure 9: Modify the data

To change the column value from a decimal to a whole number, click the column title to select it and then click on the ribbon’s Data Type button. Select Whole Number as in figure. Double click the column heading to rename the column.

Changing field titles and data types

Figure 10: Changing field titles and data types

Click the Close & Apply button on the left in the ribbon to cause the visualization to appear with the changes applied. Now when you hover your cursor over a section, Minimum Age will appear with a space and both values as whole numbers.

Ready to Publish Figure 11

Figure 11: Improved tooltips

Display in SharePoint

To display the report in SharePoint, click the Publish button in the ribbon on the right side. You will be prompted to save your report in .pbix format.

Ready to publish report figure

Figure 12: Ready to publish!

Save anywhere you want to keep it, and then the Publish to Power BI dialog appears. Varied workspaces can be configured, but initially, you only have “My Workspace” as an option, which is fine. Select it and then click “Select.”

Publishing to Power BI

Figure 13: Successful publish to Power BI Online

When you see the Success! dialog, click on the link to open the .pbix in Power BI online to view your report. In the Share menu above the report, drop down the menu options, and hover over Embed report. Here you want to see an option for SharePoint online.

Link to use in SharePoint page Figure 14

Figure 14: Get a link to use in a SharePoint page

This will be missing until you upgrade to a Power BI Pro license. This is not free, but the trial is for 60 days. Once you have that option in the menu and select it, you are rewarded with the Embed link to use in SharePoint.

Embed link for SharePoint

Figure 15: Click to highlight and then copy

Click that link to highlight it and copy. Now head over to your SharePoint site and create a page.

Locate built-in Power BI

Figure 16: Locate the built-in Power BI webpart

Click the webpart plus sign, and in the search box, type “power” to filter the results. The Power BI webpart will appear. Click on it, and the webpart will be inserted into your page. You will see a green button for Add report; click it to open the properties panel on the right. Paste in the embed link you got from Power BI online.

Apply the embed link

Figure 17: Apply the embed link

Click away from that textbox and your report will appear on the left.

Report Displayed Correctly

Figure 18: Report successfully displayed in SharePoint Online

Conclusion

This is a no-code solution and a simple demo. However, the depth of tooling provided by Power BI to enable developers and business data experts to transform and visualize organizational data is immense. The speed and ease with which we can integrate data reporting into SharePoint modern pages will be welcome to customers as they migrate to current SharePoint versions.

Links

Embed a report web part in SharePoint Online – Power BI | Microsoft Docs

Introduction

Configuration Data for DSC is somewhat analogous to configuration files in traditional applications; different environments can have their own Configuration Data file. The 2 main topics I will go over in this blog post is the use of Nodes in Configuration Data, and using multiple Configuration Data files.

Nodes in Configuration Data

A typical Configuration Data file looks something like this:

@{
    AllNodes =
    @(
        @{
            NodeName = "VM-1"
            Role     = "WebServer"
        },
        @{
            NodeName = "VM-2"
            Role     = "AppServer"
        },
        @{
            NodeName = "VM-3"
            Role     = "SQLServer"
        }
    );

    NonNodeData = @{
        LogFolder = "C:\Logs"
        InputFolder = "C:\Input"
        OutputFolder = "C:\Output"
        ApplicationUrl = "https://myapp.com"
        ServiceUrl = "https://myservice.com"
    }
}

AllNodes is the only required property; it is an array of Node objects. Each Node object must have a NodeName property. We can add as many additional properties as we would like to Node objects as well as the top top-level Configuration Data object itself. NonNodeData is generally used as a container to put all properties that do not apply to individual Nodes; however, we could rename it or split it into multiple objects. Taking the previous sample above, we can rearrange things to make it a bit easier to read:

@{
    AllNodes =
    @(
        @{
            NodeName = "VM-1"
            Role     = "WebServer"
        },
        @{
            NodeName = "VM-2"
            Role     = "AppServer"
        },
        @{
            NodeName = "VM-3"
            Role     = "SQLServer"
        }
    )
    Folders = @{
        LogFolder = "C:\Logs"
        InputFolder = "C:\Input"
        OutputFolder = "C:\Output"
    }
    Urls = @{
        ApplicationUrl = "https://myapp.com"
        ServiceUrl = "https://myservice.com"
    }
}

Regarding the NodeName property – you will often see this as the same as the VM Name. However, this isn’t required – NodeName can be anything. In fact, there is a good reason to not treat it as a VM Name, and instead use NodeName more like the Role of the VM. In the original example, what happens if we want to add a second WebServer? Since Configuration Data is set at compile time, we would need to recompile our DSC. Rewriting that Configuration Data file, we can eliminate the need to recompile if we want to add a new server:

@{
    AllNodes =
    @(
        @{
            NodeName = "WebServer"
        },
        @{
            NodeName = "AppServer"
        },
        @{
            NodeName = "SQLServer"
        }
    )
    Folders = @{
        LogFolder = "C:\Logs"
        InputFolder = "C:\Input"
        OutputFolder = "C:\Output"
    }
    Urls = @{
        ApplicationUrl = "https://myapp.com"
        ServiceUrl = "https://myservice.com"
    }
}

There is unfortunately one use case where the above does not work – if you have a DSC resource that needs to reference the actual VM name (such as the ComputerManagementDsc’s Computer resource which is used to join a computer to a domain). You would either have to revert back to specifying each VM name in your Configuration Data file, or rewrite the DSC resource to not need the actual VM name.

Dealing with multiple Configuration Data files

Let’s say we’ve properly split out our Configuration Data files so that we have one for each environment:

MySampleConfiguration.AzureCloud.psd1:

@{
    AllNodes =
    @(
        @{
            NodeName = "WebServer"
        },
        @{
            NodeName = "AppServer"
        },
        @{
            NodeName = "SQLServer"
        }
    )
    Environment = 
    @{
        Name = "AzureCloud"
    }
}
MySampleConfiguration.AzureUSGovernment.psd1:

@{
    AllNodes =
    @(
        @{
            NodeName = "WebServer"
        },
        @{
            NodeName = "AppServer"
        },
        @{
            NodeName = "SQLServer"
        }
    )
    Environment = 
    @{
        Name = "AzureUSGovernment"
    }
}

There is a good bit of duplicate content in both files. Since Configuration Data is just a hashtable object, and its value is set at compile time, we can split out our files, then merge them with powershell

MySampleConfiguration.shared.psd1:

@{
    AllNodes =
    @(
        @{
            NodeName = "WebServer"
        },
        @{
            NodeName = "AppServer"
        },
        @{
            NodeName = "SQLServer"
        }
    )
}
MySampleConfiguration.AzureCloud.psd1:

@{
    Environment = 
    @{
        Name = "AzureCloud"
    }
}
MySampleConfiguration.AzureUSGovernment.psd1:

@{
    Environment = 
    @{
        Name = "AzureUSGovernment"
    }
}

We need to write a function to merge the hashtable objects. In this case, any new properties are copied from the second object into a clone of the first, and any properties in the first that also exist in the second are overwritten by the second:

function Merge-Hashtables
{
    Param
    (
        [Parameter(Mandatory=$true)]
        [Hashtable]
        $First,

        [Parameter(Mandatory=$true)]
        [Hashtable]
        $Second
    )

    function ProcessKeys($first, $second)
    {
        foreach ($key in $second.Keys) {
            $firstValue = $first[$key]
            $secondValue = $second[$key]

            if ($firstValue -is [hashtable] -and $secondValue -is [hashtable])
            {
                ProcessKey($firstValue, $secondValue)
            }
            else
            {
                $first[$key] = $secondValue
            }
        }
    }

    $firstClone = $First.Clone()
    $secondClone = $Second.Clone()

    ProcessKeys -first $firstClone -second $secondClone

    return $firstClone
}

Then we load the shared and appropriate environment-specific Configuration Data file, merge them, and pass it in at compile time:

$sharedConfigData = Import-PowerShellDataFile MySampleConfiguration.shared.psd1
$azureCloudConfigData = Import-PowerShellDataFile MySampleConfiguration.AzureCloud.psd1
$mergedConfigurationData = Merge-Hashtables -First $sharedConfigData -Second $azureConfigData

$automationAccountName = "Your Azure Automation Account Name"
$resourceGroupName = "The resource group containing your automation account"
$configurationName = "Name of your configuration"
$pathToConfiguration = "Path to your DSC file"

Import-AzAutomationDscConfiguration -AutomationAccountName $automationAccountName -ResourceGroupName $resourceGroupName -SourcePath $pathToConfiguration -Force -Published

Start-AzAutomationDscCompilationJob -AutomationAccountName $automationAccountName -ResourceGroupName $resourceGroupName -ConfigurationName $configurationName -ConfigurationData $mergedConfigurationData
WindowsAzureAs more and more businesses move their applications to the cloud, it’s clear that operation and log data analysis is a major component to the migrating process. That data is crucial to understanding the health and reliability of your cloud services with respect to scalability, resilience, uptime, and your ability to troubleshoot issues.

But how do you deal with all that operational data? How do you identify specific application issues such as exceptions raised from bugs in the code, troubling increases in processor or memory consumption, or slow response times?

It turns out that migrating your applications to the cloud is just the first step: Having a well-thought-out operational data and monitoring user story is just as important. Read More…

Microsoft’s Cortana Intelligence Suite provides a seamless transition from raw data to intelligence: Real, meaningful data for real, meaningful business decisions. 

With the rise of the Internet of Things (IoT), the need for real-time processing and data analytics has become paramount. As a part of the Cortana Intelligence Suite, Microsoft offers Azure Stream Analytics (ASA) as a fully-managed cloud service for analyzing complex event and data streams at near real time. Read More…