AIS is working with a large organization that wants to discover relationships between data and the business by iteratively integrating data from many sources into Azure Data Lake. The data will be analyzed by different groups within the organization to discover new factors that might affect the business. Data will then be published to the appropriate consumers using PowerBI.

In the initial phase, data lake ingests data from some of the Operational Systems. Eventually, data will be captured not only from all the organization’s systems but also from streaming data from IoT devices.  

Azure Data Lake 

Azure Data Lake allows us to store a vast amount of data of various types and structures. Data can be analyzed and transformed by Data Scientists and Data Engineers. 

The challenge with any data lake system is preventing it from becoming a data swamp. To establish an inventory of what is in a data lake, we capture the metadata such as origin, size, and content type during ingestion. We also have the Interface Control Document (ICD) from the Operational Systems that describe the data definition of the source data. 

Logical Zones

The data in the data lake is segregated into logical zones to allow logical and physical separation to keep the environment secure, organized and agile. As the data progress through the zones various transformation is performed. 

  • Landing Zone is a place where the original data files are stored untouched. No data is deleted from this zone, and access to this zone is limited.  
  • Raw Zone is a place where data quality validation is applied based on the rules defined in source ICDAny data filed validation moves to Error Zone. 
  • Curated Zone is a place where we store the cleansed and transformed data and ready for consumption. The transformation is done for different audiences, and within the Zone, folders will be created for each specialized change.  
  • Error Zone is a place where we store data that filed validation. A notification is sent to the registered data curators upon arriving new data.  
  • Metadata Zone is a place where we keep track of metadata of the source and the transformed data.Metadata Zone Organization

The source systems have security requirements that prevent access to sensitive data. When the folders are created, permissions are given to security groups in Azure Active Directory. The same security rules are applied to the subsequent folders.

Now that the data is in the data lake, we allow each consuming group to create their own transformation rules. The transformed data is then moved to the curated zone ready to be loaded to the Azure Data Warehouse.

Azure Data Factory

Azure Data Factory orchestrated the movement and transformation of data, as shown in the diagram below. When a file is dropped in the Landing Zone, the Azure Data Factory pipeline that consists of activities to Unzip, Validate, Transform, and Load the data into Data Warehouse.

The unzipping is performed by a custom code Azure Function activity rather than the copy activity’s decompress functionality. The out of box functionality of Azure Data Factory can be used to uncompressed only GZip, Deflate and BZip2 files but not Tar, Rar, 7Zip, Lzip.

The basic validation rules, such as data range, valid values, and reference data, are described in the ICD. A custom Azure Function activity was created to validate the incoming data.

Data is transformed using Spark activity in Azure Data Factory for each consuming user. Each consumer has a folder under the Curated Zone.

Data Processing Example

Tables in the Azure Data Warehouse were created based on the Curated zone by executing the Generate Azure Function activity to create data definition language (DDL). The script modifies the destination table if there is a new field added.

Finally, the data is copied to the destination tables to be used by end-users and warehouse designers.

In each step, we captured business, operational, and technical metadata to help us descript the data in the lake. The metadata information can be uploaded to a metadata management system in the future.

As organizations increase their footprint the cloud, there’s increased scrutiny on mounting cloud consumption costs, reigniting a discussion about longer-term costs.

This is not an entirely unexpected development. Here’s why:

  1. Cost savings were not meant to be the primary motivation for moving to the cloud – At least not in the manner most organizations are moving to the cloud – which is to move their existing applications with little to no changes to the cloud. For most organizations, the primary motivation is the “speed to value,” aka the ability to offer business value at greater speeds by becoming more efficient in provisioning, automation, monitoring, the resilience of IT assets, etc.
  2. Often the cost comparisons between cloud and on-premises are not a true apples-to-apples comparison – For example, were all on-premises support staff salaries, depreciation, data center cost per square foot, rack space, power and networking costs considered? What about troubleshooting and cost of securing these assets?
  3. As these organizations achieve higher cloud operations maturity, they can realize increased cloud cost efficiency – For instance, by implementing effective auto-scaling, optimizing execution contexts by moving to dynamic consumption plans like serverless, take advantage of discounts through longer-term contracts, etc.

Claim Your Free Whitepaper

In this whitepaper, we talk about the aforementioned considerations, as well as cost optimization techniques (including resource-based, usage-based and pricing-based cost optimization).

FREE WHITEPAPER ON AZURE COST MANAGEMENT: BACKGROUND, TOOLS, AND APPROACHES

PaaS & Cloud-Native Technologies

If you have worked with Azure for a while, you’re aware of the benefits of PaaS, such as the ability to have the cloud provider manage the underlying storage and compute infrastructure so you don’t have to worry about things like patching, hardware failures, and capacity management. Another important benefit of PaaS is the rich ecosystem of value-add services like database, identity, and monitoring as a service that can help reduce time to market.

So if PaaS is so cool, why are cloud-native technologies like Kubernetes and Prometheus all the rage these days? In fact, not just Kubernetes and Prometheus, there is a groundswell of related cloud-native projects. Just visit the cloud-native landscape to see for yourself.

Key Benefits of Cloud-Native Architecture

Here are ten reasons why cloud-native architecture is getting so much attention:

  1. Application as a first-class construct — Rather than speak in terms of VMs, storage, firewall rules, etc. cloud-native is about application-specific constructs. Whether it is a Helm chart that defines the blueprint of your application or a service mesh configuration that defines the network in application-specific terms.
  1. Portability — Applications can run on any CNCF certified clouds and on-premises and edge devices. The API surface is exactly the same.
  1. Cost efficiency — By densely packing the application components (or containers) on the underlying cluster, the cost of running an application is significantly more efficient.
  1. Extensibility model — Standards-based extensibility model allows you to tap into innovations offered by the cloud provider of your choice. For instance, using the service catalog and open service broker for Azure, you can package a Kubernetes application with a service like Cosmos DB.
  1. Language agnostic — Cloud-native architecture can support a wide variety of languages and frameworks including .NET, Java, Node etc.
  1. Scale your ops teams — Because the underlying infrastructure is decoupled from the applications, there is greater consistency for lower levels of your infrastructure. This allows your ops team to scale much more efficiently.
  1. Consistent and “decoupled” — In addition to greater consistency at the lower levels of infrastructure, applications developers are exposed to a consistent set of constructs for deploying their applications. For example, Pod, Service Deployment and Job. These constructs remain the same across cloud, on-premises and edge environments. Furthermore, these constructs also help decouple the developers from the underlying layers (Cluster, Kernel and Hardware layers ) shown in the diagram below.decoupling
  1. Declarative Model – Kubernetes, Istio, and other projects are based on a declarative, configuration-based model that support self-healing. This means that any deviation from the “desired state” is automatically “healed” by the underlying system. Declarative models reduce the need for imperative automation scripts that can be expensive to develop and maintain.
  1. Community momentum – As stated earlier, the community momentum behind CNCF is unprecedented. Kubernetes is #1 open source project in terms of contributions. In addition to Kubernetes and Prometheus, there are close to 500 projects that have collectively attracted over $5 B of venture funding! In the latest survey, (August 2018), the use of cloud-native technologies in production has gone up by 200% since Dec 2017.
  1. Ticket to DevOps 2.0 – Cloud-native combines the well-recognized benefits of what is being termed as “DevOps 2.0” that combines hermetically sealed and immutable container images, microservices and continuous deployment. Please refer to the excellent book by Victor Farcic.

Now that we understand the key benefits of cloud-native technologies, let us compare it to a traditional PaaS offering:

Attribute Tradition PaaS Cloud-Native as a Service
Portability Limited Advanced
Application as a first-class construct Limited (application construct limited to the specific PaaS service) Advanced construct including Helm, network and security policies
Managed offering Mature (fully managed) Maturing (some aspects of the cluster management currently require attention)
Stateful applications Advanced capabilities offered by the database as service offerings Some cloud-native support for stateful applications (However, cloud-native applications can be integrated with PaaS database offerings through the service catalog)
Extensibility Limited Advanced (extensibility includes Container Network Interface, Container Runtime Interface)

Azure & CNCF

Fortunately, Microsoft has been a strong supporter of CNCF, as they joined CNCF back in 2017 as a platinum member. Since then, they have made significant investments in a CNCF-compliant offering in the form of Azure Kubernetes Service (AKS). AKS combines the aforementioned benefits of a cloud-native computing with a fully managed offering – think of AKS as a PaaS solution that is also CNCF compliant.

Additionally, AKS addresses enterprise requirements such as compliance standards, integration with capabilities like Azure AD, Key Vault, Azure Files etc. Finally, offerings like Azure Dev Spaces and Azure DevOps greatly enhance the CI/ CD experience in working with cloud-native applications. I will be remiss not to talk about VS Code extension for Kubernetes that also brings a useful tooling to the mix.

Cloud-Native Use Cases

Here are few key use cases for cloud-native applications. Microservices are something you would expect, of course.  Customers are also being used to run Apache Spark on AKS.  There is also thinking around managing IoT Edge deployments right from within the Kubernetes environment. Finally, “Lift and shift to containers” – this use case is getting a lot of attention from customers as the preferred route for moving on-premises applications to the cloud. Please refer to our recent blog post on this very topic “A “Modernize-by-Shifting” App Modernization Approach” for more details!

Cloud-Native Scenarios

FREE HALF DAY SESSION: APP MODERNIZATION APPROACHES & BEST PRACTICES
Transform your business into a modern enterprise that engages customers, supports innovation, and has a competitive advantage, all while cutting costs with cloud-based app modernization.

Companies are adopting Docker containers at a remarkable pace and for a good reason – Docker containers are turning out to be key enablers for a micro-services based architecture.

As a quick recap, Docker containers are:

  • Encapsulated, deployable components that can run as isolated instances
  • Small in size with a fast boot-up time
  • Include tools that enable containerized application images to be easily moved across the public cloud and on-premises
  • Capable of applying limits on physical resources consumed by any given application

Given the popularity of Docker containers, it should come as no surprise that the Azure platform already provides first-class support for a container hosting solution, in the form of Azure Container Service (ACS). ACS makes it simple to create a cluster of Virtual Machines that can run containerized applications. ACS relies on popular open-source tools – with Docker as the container format, and a choice of Marathon, DC/OS, Docker Swarm and Kubernetes for orchestration and scheduling, etc. All this makes it possible to easily run containerized workloads on Azure in a portable manner.

But the Docker containerization story on Azure does not stop here.

It is also being weaved more and more into existing PaaS offerings, including Azure Batch, Azure App Service and Azure Service Fabric. Let’s briefly review the latest developments to see how Docker integrates with Azure PaaS: Read More…

22106868_sYou’re an enterprise. You’ve done your research. You’ve read the whitepapers. You’ve heard all the success stories (along with a few cautionary tales). Perhaps you’ve already taken your first steps into the cloud, but want to embark on a larger-scale public cloud adoption strategy.

But what does that look like for your enterprise? The journey is different for you – for everyone, really. And you certainly don’t want to make it up as you go along.

Here are five important things you need to map out before you start your public cloud journey. We’re confident in this roadmap because we’ve been along for the ride before. We’ve helped many large enterprises and agencies successfully adopt and implement their own unique cloud strategies. Read More…

27525399 - open window on white wall and the cloudy skyModern cloud computing offers enterprises unprecedented opportunities to manage their IT infrastructure and applications with agility, resiliency, and security, while at the same time realizing significant cost savings. The ability to rapidly scale up and down in the cloud opens countless doors of possibility to use compute and storage resources in innovative ways that were not previously feasible.

But getting to the cloud and managing both cloud and on-premises resources can be a daunting challenge. As a recent Gartner article explains, a Cloud Strategy is a must for organizations. That’s where we at AIS can help – we have years of experience and successes working with enterprises to develop a cloud strategy. We have the resources and expertise to then plan and execute, leveraging the latest technologies and best practices.

Read More…

WindowsAzureAs more and more businesses move their applications to the cloud, it’s clear that operation and log data analysis is a major component to the migrating process. That data is crucial to understanding the health and reliability of your cloud services with respect to scalability, resilience, uptime, and your ability to troubleshoot issues.

But how do you deal with all that operational data? How do you identify specific application issues such as exceptions raised from bugs in the code, troubling increases in processor or memory consumption, or slow response times?

It turns out that migrating your applications to the cloud is just the first step: Having a well-thought-out operational data and monitoring user story is just as important. Read More…

Cloud ComputingA few weeks ago, AIS’ Solutions Architect, Jason McNutt and Managing Director, Larry
Katzman spoke with Federal Tech Talk’s John Gilroy on Federal News Radio for a discussion around federal agencies moving to the cloud and to answer the question of “what happens once you get there?”

Listen to the interview.

John Gilroy states that “This is a critical question ask in the brave new world of the cloud. No human can conceivably be able to understand all the dependencies and updates that are needed for a complex cloud migration. This ability to manage the system is just as important once it is live. Jason McNutt talks about the capability of automation to be able to manage today’s complex systems.”

Federal Tech Talk looks at the world of high technology in the federal government. Host John Gilroy of The Oakmont Group speaks the language of federal CISOs, CIOs and CTOs, and gets into the specifics for government IT systems integrators. John covers the latest government initiatives and technology news for the federal IT manager and government contractor. Follow John on Twitter @raygilrar and hear more from Federal Talk Talk on federalnewsradio.com

Listen to the interview.

Whnetrocks1at’s your DevOps plan? Carl Franklin and Richard Campbell from .NET Rocks! talk to Vishwas Lele about taking a comprehensive, model-driven approach to DevOps. What does it mean to be model-driven? Working with a strategic approach that is agnostic to any given technology or platform – but in the end, the tools do matter!

Vishwas talks about common elements like a single repository for all assets, repeatable deployment processes, instrumentation and feedback mechanisms that enable the entire team to see how the software is being used and improved. He also talks about the Azure templates for getting infrastructure up and running quickly – and the on-going evolution to let this model work anywhere, not just in the cloud!

.NET Rocks! is a weekly talk show for anyone interested in programming on the Microsoft .NET platform. The shows range from introductory information to hardcore geekiness.

Click here to listen!