Web development is arguably the most popular area of software development right now. Software developers can make snappy, eye-catching websites, and build robust APIs. I’ve recently developed a specific interest in a less discussed facet of web development: web scraping.

Web scraping is the process of programmatically analyzing a website’s Document Object Model (DOM) to extract specific data of interest. Web scraping is a powerful tool for automating certain features such as filling out a form, submitting data, etc. Some of these abilities will depend if the site allows web scraping or not. One thing to keep in mind, if you want to web scrape, is that some websites will be using cookies/session state. So some automation tasks might need to abide by the use of the site’s cookies/session state. It should go without saying, but please be a good Samaritan when web scraping since it can negatively impact site performance.

Getting Started

Let’s get started with building a web scraper in an Azure Function! For this example, I am using an HTTP Trigger Azure Function written in C#. However, you can have your Azure Function utilize a completely different trigger type, and your web scraper can be written in other languages if preferred.

Here is a list of Azure resources that were created for this demo:

Azure Resources

Before we start writing code, we need to take care of a few more things first.

Let’s first select a website to scrape data from. I feel that the CDC’s COVID-19 site is an excellent option for this demo. Next, we need to pick out what data to fetch from the website. I plan to fetch the total number of USA cases, new USA cases, and the date that the data was last updated.

Now that we have that out of the way, we need to bring in the dependencies for this solution. Luckily, there is only one dependency we need to install. The NuGet package is called HtmlAgilityPack. Once that package has been installed into our solution, we can then start coding.

Coding the Web Scraper

Since the web scraper component will be pulling in multiple sets of data, it is good to capture them inside a custom resource model. Here is a snapshot of the resource model that will be used for the web scraper.

Resource Model

Now it’s time to start coding the web scraper class. This class will utilize a few components from the HtmlAgilityPack package that was brought into the project earlier.

Web Scraper Class

The web scraper class has a couple of class-level fields, one public method, and a few private methods. The method “GetCovidStats” performs a few simple tasks to get our data from the website. The first step is setting up an HTML document object that will be used to load HTML and parse the actual HTML document we get back from the site. Then, there is an HTTP call out to the website we want to hit.

Right after that, we ensure the call out to the website results in a success status code. If not, an exception is thrown with a few details of the failing network call.

Ensure the Call Out

We then load the HTML that we received back from the network call into our HTML document object. There are several calls to a method that will perform the extraction of the data we are looking for. Now you might be wondering what those long strings are in the method call. Those are the full xpaths for each targeted HTML element. You can obtain them by opening the dev tools in your browser, selecting the HTML element, and right-clicking it in the dev tools. From there, select “copy full xpath”.

Load the HTML

Next, we need to set up the endpoint class for our Azure Function. Luckily for us, the out of the box template sets up a few things automatically. In the endpoint class, we are merely calling our web scraper class and returning its results to the client calling the Azure Function.

Endpoint Class for Azure Function

Now comes time to test out the Azure Function! I used Postman for this, and these are the results.

Test out the Azure Function

Closing Thoughts

Overall, web scraping can be a powerful tool at your disposal if sites do not offer APIs for you to consume. It allows you to swiftly grab essential data off of a site and even automate specific tasks in the browser. With great power comes great responsibility, so please use these tools with care!

Leveraging the Power Platform and Microsoft Azure to connect housing agencies and intermediaries with the Department of Housing and Urban Development (HUD)

Intro

The US Department of Housing and Urban Development runs a program known as the Housing Counseling Program that aids housing agencies around the nation through things like grants and training. These agencies provide tremendous benefit to Americans in many forms including credit counseling, foreclosure prevention, predatory lending awareness, homelessness, and more. One of the requirements for approval into the program is that the agency uses a software system that is also approved by HUD and interfaces with HUD’s Agency Reporting Module SOAP web service.

Original System

The original system is built on Dynamics 365 Online using custom entities, web resources, and plug-ins. Complex batching functionality was implemented to deal with the two-minute timeout period that plug-ins have, web resource files (JavaScript, HTML, and CSS) were stuck in a solution with no real structure, and application lifecycle management (ALM) was extremely painful. The original system design also doesn’t account for the intermediary-to-agency relationship, specifically the roll-up reporting and auditing that intermediaries desperately need for providing their compiled agency data to HUD, as well as their own internal reporting. Finally, because the solution was built using Dynamics 365 Online Customer Engagement, licensing costs were more than double what we knew they could be with Microsoft’s new Power Apps licenses and Azure pay-as-you-go subscriptions.

The Original Diagram

Figure 1 – current CMS system build on Dynamics 365 Online

Definition

Intermediaries – organizations that operate a network of housing counseling agencies, providing critical support services to said agencies including training, pass-through funding, and technical assistance.

Modern System

Whereas the data schema for the solution remains mostly unchanged, the architecture of the system has changed profoundly. Let’s check out some of the bigger changes.

Power Apps Component Framework (PCF)

The Power Apps Component Framework enables developers to create code components that can run on model-driven and canvas apps and can be used on forms, dashboards, and views. Unlike traditional web resources, PCF controls are rendered in the same context and at the same time as any other component on a form. A major draw for developers is that PCF controls are created using TypeScript and tools like Visual Studio, and Visual Studio Code. In the modern solution, web resources are replaced with PCF controls that make calls directly to the Power Apps Web API.

Azure Resources (API Management, Function Apps, Logic Apps, and a Custom Connector)

A Custom Connector and Azure API Management are used to manage and secure various APIs that are used in the system. The connector can be used in any Azure Logic App to make connecting to an API much easier than having to configure HTTP actions. All of the business logic that once lived in plug-ins has been replaced with a combination of Azure Logic Apps and Azure Function Apps. This has provided incredible gains where development and testing are concerned, but it also provides a more secure and scalable model to support these agencies as they grow. Lastly, it removes the burden experienced with the two-minute time-out limitation that plug-ins have.

ALM with Azure DevOps Services and Power Apps Build Tools

Azure DevOps Services, coupled with the Power Apps Build Tools, are being used to ease the pain we experienced with ALM prior to these tools being available. Now we can easily move our Power App solutions across environments (e.g. dev, test, production) and ensure our latest solution code and configurations are always captured in source control.

ALM with Azure

Figure 2 – modern solution using Power Apps and Azure resources for extreme scalability and maintainability

Next Steps

I hope by demonstrating this use case you’re inspired to contact your Microsoft partner, or better yet contact us and let us work with you to identify your organization’s workloads and technology modernization opportunities.

Ready to dig a little deeper into the technologies used in this blog post? Below we have some hands-on labs for you. Stay tuned for more updates!

  1. Power Apps Component Framework Controls
    As a developer, PCF Controls has been one of the most exciting things to grace the Power Platform landscape. In this video, I’ll show you how to go from zero to a simple PCF control that calls the Power Platform Web API. I also provide a Pro Tip or two on how I organize my PCF controls in my development environment.

  2. Using an Azure Logic App to Query Data in an On-Premises SQL Server Database
    Companies are embracing modernization efforts across the organization, quickly replacing legacy apps with Power Apps and Azure resources. This can’t all happen at once though, and often times companies either can’t or won’t move certain data to the cloud. In this hands-on lab, I’ll walk you through using an Azure Logic App and an On-Premises Data Gateway to connect to a local SQL Server 2017 database to query customer data using an HTTP request and response.
  3. Azure API Management and Custom Connectors
    API’s give you the ability to better secure and manage your application interfaces. Couple an API with a Custom Connector and you not only better secure and manage, but also make it super easy for other app developers across the organization to connect to your back-end resources through the API, without having to worry about all of the API configurations.
  4. Power Apps DevOps Using Best Practices
    There’s more to DevOps than just backing up your Power App solutions in source control and automating your build and release pipeline. Good solution management and best practices around publishers, entity metadata, components, and subcomponents are also key to manageability and scalability with your ALM.
Screen Shot 2015-12-09 at 1.21.26 PMWith the abundance of JavaScript libraries and frameworks available today, it’s hard to decide what is going to work best for a certain requirement. Add in the fact that there are many server-side tools that can also accomplish the task and you could spend hours just narrowing down options to test before deciding on the path you’ll take in the end. This was a recent conundrum for me when approached to incorporate child data management in the parent forms on a SharePoint 2010 project. My experience with JavaScript has been limited over my career because I’ve been so focused on the backend of SharePoint during the majority of that time. My current client has need for a better user experience, so I’ve been trying to fill that hole in my skills.  This project offered an opportunity to do just that.

While it’s possible to put an ASP GridView control in an Update Panel, a client-side approach seemed cleaner and a way to expand my JavaScript skills. I looked at many options like JQuery Datatables, koGrid, and a few others, but they didn’t give me the look, price (free), and/or TypeScript definitions for me to easily take off with implementing it.

I decided to build my own solution since it would be a relatively simple design and it would let me dig into KnockoutJS. In addition, it would be easier to use TypeScript to build an easier-to-maintain solution since it incorporates many of the ECMAScript 6 features like classes and modules, among others. Read More…

I recently encountered a requirement to programmatically zip multiple files into an archive in a Windows 8 JavaScript/HTML app. The intent was to shrink the several large files as much as possible for eventual submission to a central server. This operation needed to occur without the user’s direct involvement as part of a larger data transmission process.

While the Windows.Storage.Compression namespace does provide an interface for compressing individual files, there is no native support for creating a multi-file archive. In order to implement this feature I chose to use the third-party JSZip library, which is a light wrapper around the zLib library. Read More…