About the Podcast

I had the pleasure to once again be a guest on the .NET Rocks! podcast last month. This year marked my 11th time on the show!

Carl, Richard, and I talked about how the cloud has changed data analytics. We discussed the latest data work at AIS and bringing a developer’s view to the approach. The cloud has changed bringing disparate data sources together for analytics. With the cloud’s compute-on-demand, you don’t need to do many transformations of data as it’s loaded, but you can test it! This conversation dives into how CI/CD techniques can be applied to data to make for accurate data analytics and intelligent ingestion pipelines. Check out the full episode to learn more.

Listen to the full podcast here

Related Content

Accelerate your journey to become a data-driven organization. Many companies struggle to extract value from their data at the pace that today’s world moves and data grows. Our Data Platform Modernization Proof of Value engagement provides you a path to enable safe and secure access to data sets, empowering business users to unlock the next big opportunities for the company. Learn more and reach out to AIS today.

The first AWS service (S3) was launched in September 2006. My humble journey in the cloud started in 2007, leading up to my first cloud talk in New York in 2008. In this blog post, I want to talk about what I, and we collectively, have learned from the last decade in the cloud and how to use this knowledge to drive cloud value. Let us get started.

By now, the transformational benefits of the cloud are well established. Look no further than spending data on the top three cloud providers, which was nearly 150 billion in 2021. However, current cloud spending is only a fraction of the global IT spend at 2-3 trillion dollars. This suggests that the main chapters of the cloud story are yet to be written. As is the case with the progression of any successful technology, the cloud is expected to become so pervasive that it will become invisible – just as traditional IaaS is becoming largely invisible with the advent of Kubernetes and serverless models.

Future discussions about the cloud will be less about the cloud capabilities, and more about the outcomes empower by cloud services. At some point, we can expect to stop talking about enterprise landing zones, VNETs/ VPCs, PaaS v-next, i.e., containers and serverless. I also hope that we will stop measuring our progress in the cloud using “proxy” metrics like the number of VMs migrated, number of cloud deployments, number of workloads moved to the cloud, Cloud policy violations, and more. Later in this blog, we will talk about cloud transformation metrics that matter most at the end of the day.

But first, let us get back to the theme of this blog post – driving cloud value. This depends on a few things:

  1. Mature cloud foundation
  2. Cloud value stream

Cloud Value Diagram Cloud Foundation and Cloud Value Stream Ven Diagram

Mature Cloud Foundation

With cloud providers releasing close to three thousand features each year, cloud entropy is absolute. Cloud consumers, including big enterprises, don’t have the resources to take advantage of all the new features announced daily. As a result, cloud consumers need to focus on improving their foundational capabilities. Let us look at a few (non-exhaustive) foundational capabilities.

  1. Consistency – Pick a few (3-5) architectural blueprints that will support 80% of the application types. Ensure that users can provision these architectural blueprints consistently, quickly, and in a self-service manner using a service catalog, like Zeuscale. The service catalog, in turn, is powered by a collection of robust automation scripts that address cross-cutting objectives. Here are a few attributes to consider:
    • Resilience – spreading the subsystems across availability zones
    • Observability – automatically enabling monitoring machinery
    • Security – a zero-trust approach that assumes breach
    • It is crucial that you treat the above-mentioned automation assets as first-class code artifacts and invest in their continual improvement.

  2. Cost Optimization – Too many organizations are witnessing unplanned increases in their cloud bills. Worst, it has been suggested that 30% of all cloud consumption can be attributed to waste. Cloud cost maturity comes with cloud maturity. Ensure that you have enough governance in place, including resource tags, provisioning policies, reservations for long-term resource use, monitoring & alerts, and detailed usage reporting at a level of granularity that matches your organizational hierarchy. Doing so will allow you to catch unplanned cloud expenditure and improve your ability to predict costs over time.

  3. Security – Resources provisioned in the cloud are dynamic, and so is security monitoring. Security monitoring does *not* end with provisioning a compliant architectural blueprint. In fact, it starts with the provisioning step. You will need automation jobs that continuously monitor your applications’ security configurations and security policies to prevent “drift” from a compliant state.

  4. Data Gravity – In 2013, Dave McCrory proposed the software concept of “data gravity,” where applications, services, and business logic in software applications will move physically closer to where data is stored. As you can imagine, data gravity is a critical consideration for companies with a multi-cloud** setup i.e it is hard to spread applications across clouds because of data gravity. One way to dent the data gravity challenge is to have a data-sharing strategy in place. Data sharing can be based on copy / in-place access of datasets and can span across single or multiple clouds.** Almost 80% of Fortune 500 companies find themselves in a multi-cloud setup. These companies are provisioning similar technology stacks across more than one cloud provider.

  5. Center of Excellence – We talked about settling on a small set of architecture blueprints. You will need to invest in a forward-looking CoE group that continues to track the advances in the cloud and ensures that your organization is not caught flat-footed in the face of a disruptive new cloud capability or an architecturally significant improvement to an existing service. Without a CoE team focused on tracking and evaluating new capabilities in the cloud, you are likely to accrue cloud debt rapidly.

  6. Inclusiveness – Cloud is not just for professional developers and infrastructure engineers. An inclusive cloud strategy needs to support the needs of a growing community of citizen developers as well. Constructs like self-service provisioning and architectural blueprints that we discussed earlier need to be accessible to citizen developers. For example, it should be seamless for citizen developers to mix and match low / no-code constructs with advanced cloud platform construct.

  7. Data Analytics – As you plan to migrate/reimagine your applications in the cloud, recognize the immense potential of the data being collected. By planning for data ingestion, data transformation upfront, you can help bridge the divide between operations and analytics data. Architectures like the data mesh that think of data (operations and analytics) as the product are headed in this direction.

  8. Cloud Operating Model – Your traditional infrastructure, networking, and security teams must embrace a cloud operating model. They must rely on modern development practices of iterative development, DevSecOps, maintaining infrastructure/network/security as code. You cannot succeed in the cloud with a traditional IT operating mindset.

  9. Continuous Learning – Your organization may have become fluent in the cloud basics, but you will need continuous learning and upskilling programs to reach the next level. Only an organization that embeds a culture of learning can truly achieve its cloud transformation goals.

  10. Sandbox – Along with upskilling programs, cloud teams need the freedom to experiment and fail. This is where a cloud sandbox unencumbered by enterprise cloud security policies is essential for innovation. It should be possible for teams to experiment with any new, fast arriving preview capabilities within hours (not weeks or months).

Focus on Cloud Value Stream

Working on the cloud foundation alone will not be enough to leverage all the benefits cloud has to offer. You will need to consider the entire cloud value stream – a set of actions from start to finish that bring the value of the cloud to an organization. Cloud value streams allow businesses to specify the value proposition that they want to realize from the cloud.

Align cloud strategy with business objectives

The key idea is to start with a business strategy that can help realize the value proposition, then map that strategy into a list of cloud services. The list of cloud services, in turn, determines your cloud adoption plan. Let us break this down.

One of my favorite tools to develop a business strategy is a Wardley map. Wikipedia describes a Wardley map as a set of components positioned within a value chain (vertical axis) and anchored by the user need, with movement defined by an evolution axis (horizontal axis). Don’t worry if you are feeling a bit lost with the definition. A simple example can help. Let us assume that business leaders of fictitious financial services companies want to set up an insurance exchange in the future.

Starting from the perspective of the user of the insurance exchange, you can create a Wardley map, as shown in the diagram below.

Cloud Transformation Wardley Map

Mapped along the value chain (vertical axis) are the value-line capabilities of the insurance exchange. These capabilities are pegged on an evolution axis (horizontal axis) that represents the evolution of the components from genesis (high value) to utility (commodity).

A map like this allows you to organize your cloud investments. For example, the Matching Algorithm that pairs incoming purchase requests with the insurance providers may need to be a custom-built capability. A custom-built capability requires additional investment, but it also offers a differentiator and potentially higher profit. In the future, the previously mentioned matching capability may become available as a pre-built ML product or rental capability through evolution. So, there is indeed a risk of commoditization. But the question is – how soon can that happen? Wardley maps excel in engendering discussion across various teams into a single map.

The End-to-End Flow of Business Value

Earlier in this post, we talked about “proxy” metrics such as the number of VMs or workloads migrated to the cloud. While these metrics are helpful as IT and agile metrics, they fail to communicate the overall progress of a cloud transformation effort from the perspective of business outcomes. This is where Flow Framework®, introduced by Dr. Mik Kersten, comes in.

The core premise of the Flow Framework is the need to measure the end-to-end flow of business value and what results it produces. Flow Metrics measures the flow of business value through all the activities involved in producing business value through a software value stream. For example, consider the following chart depicting the Flow Efficiency metric. Flow Efficiency is the ratio of active time out of the total Flow Time.

A few things to note in the diagram below:

  • We are measuring end-to-end for the migration time of the application.
  • Additionally, we are considering the entire capability area and not an individual app.
  • The process of containerizing the app seems to be quick, but we are spending a significant time on security scanning and Authority to Operate (ATO) certification – not surprising for a highly regulated environment with very stringent security expectations.

Perhaps we need to make an upfront down payment on “technical debt” associated with security scanning and certification. Doing so would improve the flow efficiency of cloud migration.

Flow Chart for Driving Cloud Value Streams

In summary, to drive cloud value, you need a robust cloud foundation, as well as a keen eye towards the overall cloud value stream. Focusing on a few well-defined architectural blueprints will accord you the opportunity to mature in the areas of cloud costs, automation, and readiness. Focusing on the overall cloud value stream will ensure that your cloud investments are aligned with your business strategic goals.

Contact AIS to talk cloud strategy and business objectives.

What is Cognitive Surplus?

In his famous TED talk about Cognitive Surplus, Clay Shirky shares an incident in Kenya from December of 2007. There was a disputed presidential election that resulted in an outbreak of ethnic violence. A lawyer in Nairobi, Ory Okolloh, began blogging about it on her site, Kenyan Pundit. Shortly after the election and the outbreak of violence, the government suddenly imposed a significant media blackout. So, Okolloh solicited more information about what was going on from her commenters. The comments began pouring in, and Okolloh would collate them, but there was more information than any one person could manage. So, she asked for help to automate this task.

Two programmers who read her blog held their hands up and said, “We could do that,” and in 72 hours, they launched Ushahidi. Ushahidi (which means “witness” or “testimony” in Swahili) is a straightforward way of taking reports from the field, whether it’s from the web or, critically, via mobile phones and SMS, aggregating it and putting it on a map.

Enough people looked at it and found a value that the programmers who created Ushahidi decided they would make it open source and turn it into a platform. It’s since been deployed in Mexico to track electoral fraud and Washington D.C. to track snow cleanup.

Shirky credits the idea of Cognitive Surplus behind the creation of Ushahidi. According to Shirky, Cognitive Surplus has two parts:

  1. Time: The world’s population’s ability to volunteer and contribute and collaborate on large, sometimes global, projects. According to Shirky, the world has over a trillion hours a year to commit to shared projects.
  2. Tools: Primarily for collaboration. Internet, mobile phones, social media, and more.

Shirky acknowledges that not all cognitive surplus products are as helpful as Ushahidi. We also get LOL Cats (cute pictures of cats made more adorable by adding cute captions). While LOL Cats may not seem beneficial (unless you are looking for cute cat pictures), it is essential to note that it is still a creative and communal act. Freedom to contribute means freedom to contribute anything. The important takeaway is that the contributor of a LOL Cat picture has crossed a vital participation threshold.

In ur folder   Failed Password

Cognitive Surplus in an Enterprise

Can Shirky’s notion of Cognitive Surplus be applied to an enterprise? A typical enterprise has employees in between projects, not being utilized to the fullest on their current assignments for whatever reason, or motivated by creating value not just for themselves (e.g., self-learning) but for everyone in the enterprise. Cognitive Surplus encourages this resource of precious employee time for building something of value to the employees and the enterprise.

Beyond time, enterprises also need a framework that engenders participation. Such a framework would include:

  1. A starting point for a framework is tools for collaboration. Examples include collaboration tools like Github, Teams, and Slack.
  2. A collection of tasks that employees can contribute towards. These tasks don’t have to be well-defined like a product backlog. But an absence of defined tasks to choose from can significantly hamper employees from getting started. Ideas for tasks can come from ongoing projects (a nagging problem or an optimization that the product team has no time to spend on or research on upcoming features and features in an area of interest.
  3. Each task needs to be broken up into “byte-sized” chunks. A rule of thumb is 40 hours. This is typically an employee’s time between projects before they are pulled into their next assignment. It is also important to encapsulate the task to hide or decouple it from underlying dependencies. Doing so allows employees to contribute without spending hours or days setting up the environment before they start contributing.
  4. Ability to get feedback early and often is crucial to making the employees’ contribution productive. Therefore, it is essential to scale the feedback loop. The way to scale the feedback loop is to crowdsource it, i.e., get a representative from the team that suggested the task or a member of COI to pair with the employee working on the task. Even a 15-minute daily sync-up can go a long way.
  5. Celebrate the successful completion of a task. Recognize people’s contributions via a blog, internal communication, social media shares, and more. No matter how small the contribution, it should be recognized.

Cognitive Surplus at AIS

At AIS, we have worked to put the Cognitive Surplus to good use. Here are a few recent examples.

    1. Cameo Contributors for Value Creation projects. New hire and not yet on a project? In between projects? Our team is working to support more internal efforts by matchmaking individuals who have availability with the needs of our internal Cloud Acceleration Hub​ (HUB) team. The HUB is a dedicated team of AIS consultants organized to help AIS project teams deliver successful cloud services modernization projects.​ The HUB team consolidates knowledge and experience to provide rapid research and guidance for cloud migration and modernization projects. Cameo contributions can range from project-specific problem solving or solution reviews to contributing to IP development or internal skilling efforts. The HUB team has manageable chunks of work that they engage individuals for and mature this capability with ramp-up guides, task partners, and more.
    2. Creation of open-source tool ddl2dbt tool. The team wanted to automate the creation of DBT YML files based on ErWin models but had no cycles to build this tool. Cognitive Surplus made it possible to develop this tool – AppliedIS/ddl2dbt: CLI to generate DBT and dbtvault models from DDL and Source To Target Mappings (github.com).

How Do You Exercise Cognitive Surplus? Tangible and Intangible Benefits

Dean Kamen, the inventor, and entrepreneur, said, “Free cultures get what they celebrate.” In a similar vein, “Enterprises get what they celebrate.” Enterprises need to create a culture that celebrates participation, no matter how small the contribution is or how directly impactful the contribution is. The value created for the enterprise is not just a by-product of the participation; it is what they collectively make of participation.

Enterprises have hundreds of hours of participatory value up for grabs, year-in and year-out. Enterprises designed around a culture of participation and a framework for common value creation can achieve incredible results.

We have been able to integrate more individuals across the organization while providing value to the broader company. Those who have participated in these exercises have provided excellent feedback on the experience and how it gave them a positive experience with AIS, allowing them to contribute value to the company between delivery and billability. There are intangible benefits to this, including valuable impacts on culture, employee passion, and motivation. How is your organization using cognitive surplus?

First-time Feedback from Peers and Mentors

An AIS employee was invited to share his experience when participating in project work.


“I had the opportunity to work with an internal AIS HUB project earlier this year and came away with a new perspective when it came to my critical thinking. I was asked to write specific articles dealing with PowerShell, Pester, and PSScriptAnalyzer as well as a working example code to compliment the article. This was the first time in my career I had a group of Engineers and Developers providing feedback and guidance as I was producing a WiKi and code. The guidance and feedback were outstanding!

By the end of my time with the HUB team not only was my WiKi writing substantially better, but the feedback I received from the HUB team made my thought process much clearer and more refined. As a DevOps Engineer, being able to work with a client in a clear and concise manner is critical to successfully providing implementation guidelines and also results. The HUB Team took me under their wing and taught me how to be a better DevOps Engineer. My current project requires a lot of critical thinking, WiKi’s, and code blocks. I work with developers who need example code and instructions on how to get started. If I had not had the time I had with the HUB team, I would not be able to provide better documentation and code for their WiKi’s.”

– David Dracoules, AIS Cloud Infrastructure Consultant


JOIN OUR GROWING TEAM
AIS provides employees with opportunities to learn and grow in their careers. Won't you join us?

As we are still adjusting to remote work, it has allowed me to uncover some extra time to record a Power Platform Administration Foundation Pluralsight course to complement the “dev side” course released in 2018, Low-Code Development with Power Apps.

This course is designed for developers–both citizen and professional–interested in a low-code approach for building mobile applications. In this course, I used aspects of Azure PowerShell/CLI in demonstrating Power Platform management as I see a commonality between PowerApps and Azure Administration.

You can find the Pluralsight course here.

My daughter drew the following sketch to captures the principal motivation for why I created this course, which was to help the Power Platform admins attain equilibrium. Today, the balance seems to be tilting towards the citizen developers, making IT departments uncomfortable with moving forward with the Microsoft Power Platform adoption.

Illustration of Power Platform Course

The  Power Platform Adoption Framework created by Andrew Welch and the team speaks to the “adoption at scale” theme and an approach to enterprise management and governance that resonates with customers and puts IT and security departments at ease.

This course is about teaching admins the basic skills needed to effectively administer the Power Platform. My hope is that making the Power Platform admin tools somewhat match that of Azure, acceptance of the platform may be a bit easier.

The current approach of mass quarantining is to “flatten the curve.” However, learning about how the virus has spread and how it can return with the eventual restoration of our economy is something that is still blurry. Recent work by Abbott Labs (among others) shows that shortening testing times and mass-producing testing kits at affordable prices look promising.

However, despite the advancements by Abbott Labs, it is unattainable to test everyone in America. As of today, April 5th, we have tested, on average, one in every two hundred Americans. This can be compared to a test rate like South Korea. The ramp-up in testing has not allowed moving closer to reopening our economy.

Some have proposed the idea of checking for antibodies. This test would suggest immunity to the virus because of a prior infection. The logic behind this is that people that have these antibodies can safely return to work and take on the critical tasks needed to restart the economy. Nonetheless, recent news from across Asia warns us that patients previously diagnosed with COVID-19 are being readmitted to hospitals after testing positive for the virus again.

So as it stands, our current approach to mass quarantining from what the media outlets have predicted to be up to twelve-months is not only slow but is also pushing us down a path of economic damage. If that continues, this may be difficult to recover from. Scaling up and developing new methods of testing that check for antibodies, while advantageous, will not be by itself enough to reopen our economy.

An Aggressive Data-Vision Approach is Needed

An aggressive data-driven approach to understand how COVID-19 is spreading should be suggested. Based on these insights, we can demarcate safe zones where seminal economic activity can be reinstituted with minimal risk. There are two aspects of this approach:

  1. We can develop more in-depth insights into how the virus is spreading. We must acknowledge that mass quarantining alone is not the best approach.
  2. Based on the insights we develop, we may be able to open parts of our country once again, with a measure of confidence based on the data.

Ultimately, the solution boils down to data and computation problems. Imagine if we took the phone numbers of everyone infected with COVID-19 (of course, using an anonymized identifier rather than the actual phone numbers for protecting the privacy of folks involved). Then, using cell tower data to gather movement of those individuals based on their smart-phones, we will perform location and time-based searches. This would determine who might have come in contact with infected persons in the last forty-five days. Then, we will algorithmically place the search results dataset into bands based on the degree of overlap (time and location). This algorithm will be able to eliminate instances of location proximity where there is minimal chance of spread for example, at a traffic intersection. Conversely, this algorithm will accord a higher weight to location proximity based on where there is a bigger chance of the virus spreading. For example, at a coffee shop or workplace. All these factors will lead to the calculation of a risk factor. Individuals that meet the high-risk criteria will be notified. Any individual who receives the notification will be instructed to self-quarantine immediately. We can go further and penalize them if they don’t follow the suggestion, using the cell phone data. These individuals should be sent a self-test kit on a priority basis.

If these individuals test positive, their immediate family would then receive instant notification to also self-quarantine. The location in which this individual came into contact with the COVID-19 infected patient that initiated this search will be notified as well. If they test negative, we will still learn a vital data point is how the virus is spreading. These learnings, including the test results, will be fed into a continuously retraining machine learning algorithm. This algorithm will keep track of the trajectory of an infected person and common intersection locations. Additionally, this algorithm will also be able to account for an infected person being quarantined, thus neutralizing a virus carrier from the mix. In summary, this algorithm is akin to performing deep automated contact tracing at a level that cannot be matched by armies of volunteers.

Another important byproduct of the trained algorithm is the automatic extraction of “features”. In machine learning, a feature is an individual measurable property or characteristic of a phenomenon being observed [1]. For example, the algorithm will observe that many people are becoming infected, without coming in direct contact with an already infected person. Based on observing millions of such data points, it can, on its own, identify discriminating features such as an infected mailman route and common meeting areas that include certain surfaces like metals where coronavirus can remain active for days.

Using a continuously retraining algorithm, we can start to open parts of the country where the threat of spread is low. Any discovery of a COVID-19 case in a low-risk area will trigger the actions mentioned above and will flow back as input to training. It should be evident that the dataset and algorithm described above is computationally challenging. We are talking about recursive data searches through a dataset comprised of millions of citizens and a continuously learning algorithm with potentially billions of hyperparameters.

But Hasn’t This Approach Already Been Used in Other Countries like Taiwan and Singapore?

There is no question that location tracking capabilities have been highly effective in controlling the spread of coronavirus. In Taiwan and Singapore, location tracking technologies were used very early in the outbreak and mainly used for surveillance. In Korea, officials routinely send text messages to people’s phones alerting them on newly confirmed infections in their neighborhood — in some cases, alongside details of where the unnamed person had traveled before entering quarantine. Based on my research, these countries did not rely on big data and deep learning techniques to derive insights from the data. In the case of Taiwan and Singapore, the dataset of infected persons is not large enough for such an analysis.

Summary

The U.S. Government has broad authority to request personal data in the case of a national emergency like the Coronavirus. In the United States, phone companies such as AT&T and Verizon have extensive records on their customer’s movements. However, it does not appear that we are leveraging the large body of people’s movement data to combat coronavirus. According to a recent Washington Post story, “AT&T said it has not had talks with any government agencies about sharing this data for purposes of combating coronavirus. Verizon did not respond to requests for comment.”

The goal of this post is to engender a collaborative discussion with experts in big data, ML and medicine. Hopefully, there are efforts already underway based on a similar or better idea. Please send your comments via twitter @vlele.

Note: This blog post is *not* about Kubernetes infrastructure API (an API to provision a Kubernetes cluster). Instead, this post focuses on the idea of Kubernetes as a common infrastructure layer across private and public clouds.

Kubernetes is, of course, well known as the leading open-source system for automating deployment and management of containerized applications. However, its uniform availability, is for the first time, giving customers a “common” infrastructure API across public and private cloud providers. Customers can take their containerized applications, Kubernetes configuration files, and for most parts, move to another cloud platform. All of this without sacrificing the use of cloud provider-specific capabilities, such as storage and networking, that are different across each cloud platform.

At this time, you are probably thinking about tools like Terraform and Pulumi that have focused on abstracting underlying cloud APIs. These tools have indeed enabled a provisioning language that spans across cloud providers. However, as we will see below, Kubernetes “common” construct goes a step further – rather than be limited to a statically defined set of APIs, Kubernetes extensibility allows or extends the API dynamically through the use plugins, described below.

Kubernetes Extensibility via Plugins

Kubernetes plugins are software components that extend and deeply integrate Kubernetes with new kinds of infrastructure resources. Plugins realize interfaces like CSI (Container Storage Interface). CSI defines an interface along with the minimum operational and packaging recommendations for a storage provider (SP) to implement a compatible plugin.

Another example of interfaces includes:

  • Container Network Interface (CNI) – Specifications and libraries for writing plug-ins to configure network connectivity for containers.
  • Container Runtime Interface (CRI) – Specifications and libraries for container runtimes to integrate with kubelet, an agent that runs on each Kubernetes node and is responsible for spawning the containers and maintaining their health.

Interfaces and compliant plug-ins have opened the flood gates to third-party plugins for Kubernetes, giving customers a whole range of options. Let us review a few examples of “common” infrastructure constructs.

Here is a high-level view of how a plugin works in the context of Kubernetes. Instead of modifying the Kubernetes code for each type of hardware or a cloud provider offered service, it’s left to the plugins to encapsulate the knowhow to interact with underlying hardware resources. A plugin can be deployed to a Kubernetes node as shown in the diagram below. It is the kubelet’s responsibility to advertise the capability offered by the plugin(s) to the Kubernetes API service.

Kubernetes Control Panel

“Common” Networking Construct

Consider a networking resource of type load balancer. As you would expect, provisioning a load balancer in Azure versus AWS is different.

Here is a CLI for provisioning ILB in Azure:

CLI for provisioning ILB in Azure

Likewise, here is a CLI for provisioning ILB in AWS:

CLI for provisioning ILB in AWS

Kubernetes, based on the network plugin model, gives us a “common” construct for provisioning the ILB that is independent of the cloud provider syntax.

apiVersion: V1

“Common” Storage Construct

Now let us consider a storage resource type. As you would expect, provisioning a storage volume in Azure versus Google is different.

Here is a CLI for provisioning a disk in Azure:

CLI for provisioning a disk in Azure

Here is a CLI for provisioning a persistent disk in Google:

CLI for provisioning a persistent disk in Google

Once again, under the plugin (device) model, Kubernetes gives us a “common” construct for provisioning storage that is independent of the cloud provider syntax.

In the example below, of “common” storage construct across cloud providers. In this example, a claim for a persistent volume of size 1Gi and access mode “ReadWriteOnce” is being made. Additionally, storage class “cloud-storage” is associated with the request. As we will see next, the persistent volume claims decouple us from the underlying storage mechanism.

cloud-storage-claim

The StorageClass determines which storage plugin gets invoked to support the persistent volume claim. In the first example below, StorageClass represents the Azure Disk plugin. In the second example below, StorageClass represents the Google Compute Engine (GCE) Persistent Disk.

StorageClass

StorageClass Diagram 2

“Common” Compute Construct

Finally, let us consider a compute resource type. As you would expect, provisioning a compute resource in Azure versus GCE is different.

Here is a CLI for provisioning a GPU VM in Azure:

CLI for provisioning a GPU VM in Azure

Here is a CLI for provisioning a GPU in Google Cloud:

CLI for provisioning a GPU in Google Cloud:

Once again, under the plugin (device) model, Kubernetes, gives us a “common” compute construct across cloud providers. In this example below, we are requesting a compute resource of type GPU. An underlying plugin (Nvidia) installed on the Kubernetes node is responsible for provisioning the requisite compute resource.

requesting a compute resource of type GPU

Source: https://docs.microsoft.com/en-us/azure/aks/gpu-cluster

Summary

As you can see from the examples discussed in this post, Kubernetes is becoming a “common” infrastructure API across private, public, hybrid, and multi-cloud setups. Even traditional “infrastructure as code” tools like Terraform are building on top of Kubernetes.

Azure Arc is one of the significant announcements coming out of #msignite this week. As depicted in the picture below, Azure Arc is a single control plane across multiple clouds, premises, and the edge.

Azure Arc

Source: https://azure.microsoft.com/en-us/services/azure-arc/

But we’ve seen single control planes before, no?

That is correct. The following snapshot (from 2013) shows App Controller securely connected to both on-premise and Microsoft Azure resources.

Azure App Controller in 2013

Source: https://blogs.technet.microsoft.com/yungchou/2013/02/18/system-center-2012-r2-explained-app-controller-as-a-single-pane-of-glass-for-cloud-management-a-primer/

So, what is different with Azure Arc?

Azure Arc is not just a “single-pane” of control for cloud and on-premises. Azure Arc takes Azure’s all-important control plane – namely, the Azure Resource Manager (ARM) – and extends it *outside* of Azure. In order to understand the implication of the last statement, it will help to go over a few ARM terms.

Let us start with the diagram below. ARM (shown in green) is the service used to provision resources in Azure (via the portal, Azure CLI, Terraform, etc.). A resource can be anything you provision inside an Azure subscription. For example, SQL Database, Web App, Storage Account, Redis Cache, and Virtual Machine. Resources always belong to a Resource Group. Each type of resource (VM, Web App) is provisioned and managed by a Resource Provider (RP). There are close to two hundred RPs within the Azure platform today (and growing with the release of each new service).

ARM

Source: http://rickrainey.com/2016/01/19/an-introduction-to-the-azure-resource-manager-arm/

Now that we understand the key terms associated with ARM, let us return to Azure Arc. Azure Arc takes the notion of the RP and extends it to resources *outside* of Azure. Azure Arc introduces a new RP called “Hybrid Compute”. See the details for the RP HybridCompute in the screenshot below. As you can imagine, the HybridCompute RP is responsible for managing the resources *outside* of Azure. HybridCompute RP manages the external resources by connecting to the Azure Arc agent, deployed to the external VM. The current preview is limited to Windows or Linux VM. In the future, the Azure Arc team plans to support containers as well.

RP Hybrid Compute Screenshot

Note: You will need to first to register the provider using the command az register -n Microsoft.HybridCompute

Once we deploy the Azure Arc agent [1] to a VM running in Google Cloud, it shows inside Azure Portal within the resource group “az_arc_rg” (see screenshot below). Azure Arc agent requires connectivity to Azure Arc service endpoints for this setup to work. All connections are outbound from the agent to Azure and are secured with SSL. All traffic can be routed via an HTTPS proxy.

deploy the Azure Arc agent [1] to a VM running in Google cloud

Since the Google Cloud hosted VM (gcp-vm-001) is an ARM resource, it is an object inside Azure AD. Furthermore, there can be a managed identity associated with Google VM.

Benefits of Extending ARM to Resources Outside Azure:

  • Ability to manage external VMs as ARM resources via using Azure Portal / CLI, as well as, the ability to add tags, as shown below.Ability to manage external VMs as ARM resources using Azure Portal
  • Ability to centrally manage access and security policies for external resources with Role-Based Access Control.manage access and security policies for external resources with Role-Based Access Control
    Microsoft Hybrid Compute Permissions
  • Ability to enforce compliance and simplify audit reporting.Ability to enforce compliance and simplify audit reporting

[1] Azure Arc Agent is installed by running the following script on the remote VM. This script is generated from the Azure portal:

# Download the package:

Invoke-WebRequest -Uri https://aka.ms/AzureConnectedMachineAgent -OutFile AzureConnectedMachineAgent.msi

# Install the package:

msiexec /i AzureConnectedMachineAgent.msi /l*v installationlog.txt /qn | Out-String

# Run connect command:

"$env:ProgramFiles\AzureConnectedMachineAgent\azcmagent.exe" connect --resource-group "az_arc_rg" --tenant-id "" --location "westus2" --subscription-id ""
Late last Friday, the news of the Joint Enterprise Defense Infrastructure (JEDI) contract award to Microsoft Azure sent seismic waves through the software industry, government, and commercial IT circles alike.

Even as the dust settles on this contract award, including the inevitable requests for reconsideration and protest, DoD’s objectives from the solicitation are apparent.

DOD’s JEDI Objectives

Public Cloud is the Future DoD IT Backbone

A quick look at the JEDI statement of objectives illustrates the government’s comprehensive enterprise expectations with this procurement:

  • Fix fragmented, largely on-premises computing and storage solutions – This fragmentation is making it impossible to make data-driven decisions at “mission-speed”, negatively impacting outcomes. Not to mention that the rise in the level of cyber-attacks requires a comprehensive, repeatable, verifiable, and measurable security posture.
  • Commercial parity with cloud offerings for all classification levels – A cordoned off dedicated government cloud that lags in features is no longer acceptable. Furthermore, it is acceptable for the unclassified data center locations to not be dedicated to a cloud exclusive to the government.
  • Globally accessible and highly available, resilient infrastructure – The need for infrastructure that is reliable, durable, and can continue to operate despite catastrophic failure of pieces of infrastructure is crucial. The infrastructure must be capable of supporting geographically dispersed users at all classification levels, including in closed-loop networks.
  • Centralized management and distributed control – Apply security policies; monitor security compliance and service usage across the network; and accredit standardized service configurations.
  • Fortified Security that enables enhanced cyber defenses from the root level – These cyber defenses are enabled through the application layer and down to the data layer with improved capabilities including continuous monitoring, auditing, and automated threat identification.
  • Edge computing and storage capabilities – These capabilities must be able to function totally disconnected, including provisioning IaaS and PaaS services and running containerized applications, data analytics, and processing data locally. These capabilities must also provide for automated bidirectional synchronization of data storage with the cloud environment when a connection is re-established.
  • Advanced data analytics – An environment that securely enables timely, data-driven decision making and supports advanced data analytics capabilities such as machine learning and artificial intelligence.

Key Considerations: Agility and Faster Time to Market

From its inception, with the Sep 2017 memo announcing the formation of Cloud Executive Steering Group that culminated with the release of RFP in July 2018, DoD has been clear – They wanted a single cloud contract. They deemed a multi-cloud approach to be too slow and costly. Pentagon’s Chief Management officer defended a single cloud approach by suggesting that multi-cloud contract “could prevent DoD from rapidly delivering new capabilities and improved effectiveness to the warfighter that enterprise-level cloud computing can enable”, resulting in “additional costs and technical complexity on the Department in adopting enterprise-scale cloud technologies under a multiple-award contract. Requiring multiple vendors to provide cloud capabilities to the global tactical edge would require investment from each vendor to scale up their capabilities, adding expense without commensurate increase in capabilities”

A Single, Unified Cloud Platform Was Required

The JEDI solicitation expected a unified cloud platform that supports a broad set of workloads, with detailed requirements for scale and long-term price projections.

  1. Unclassified webserver with a peak load of 400,000 requests per minute
  2. High volume ERP system – ~30,000 active users
  3. IoT + Tactical Edge – A set of sensors that captures 12 GB of High Definition Audio and Video data per hour
  4. Large data set analysis – 200 GB of storage per day, 4.5 TB of online result data, 4.5 TB of nearline result data, and 72 TB of offline result data
  5. Small form-factor data center – 100 PB of storage with 2000 cores that is deliverable within 30 days of request and be able to fit inside a U.S. military cargo aircraft

Massive Validation for the Azure Platform

The fact that the Azure platform is the “last cloud standing” at the end of the long and arduous selection process is massive validation from our perspective.

As other bidders have discovered, much to their chagrin, the capabilities described above are not developed overnight. It’s a testament to Microsoft’s sustained commitment to meeting the wide-ranging requirements of the JEDI solicitation.

Lately, almost every major cloud provider has invested in bringing the latest innovations in compute (GPUs, FPGAs, ASICs), storage (very high IOPS, HPC), and network (VMs with 25 Gbps bandwidth) to their respective platforms. In the end, what I believe differentiates Azure is a long-standing focus on understanding and investing in enterprise IT needs. Here are a few examples:

  • Investments in Azure Stack started 2010 with the announcement of Azure Appliance. It took over seven years of learnings to finally run Azure completely in an isolated mode. Since then, the investments in Data Box Edge, Azure Sphere and commitment to hybrid solutions have been a key differentiator for Azure.
  • With 54 Azure regions worldwide that ( available in 140 countries) including dedicated Azure government regions – US DoD Central, US DoD East, US Gov Arizona, US Gov Iowa, US Gov Texas, US Gov Virginia, US Sec East, US Sec West – Azure team has accorded the highest priority on establishing a global footprint. Additionally, having a common team that builds, manages, and secures Azure’s cloud infrastructure has meant that even the public Azure services have DoD CC SRG IL 2, FedRAMP moderate and high designations.
  • Whether it is embracing Linux or Docker, providing the highest number of contributions to GitHub projects, or open-sourcing the majority of  Azure SDKs and services, Microsoft has demonstrated a leading commitment to open source solutions.
  • Decades of investment in Microsoft Research, including the core Microsoft Research Labs and Microsoft Research AI, has meant that they have the most well-rounded story for advanced data analytics and AI.
  • Documentation and ease of use have been accorded the highest engineering priorities. Case in point, rebuilding Azure docs entirely on Github. This has allowed an open feedback mechanism powered by Github issues.
This case study was featured on Microsoft. Click here to view the case study.

When GEICO sought to migrate its sales mainframe application to the cloud, they had two choices. The first was to “rehost” their applications, which involved recompiling the code to run in a mainframe emulator hosted in a cloud instance. The second choice was to “rebuild” their infrastructure and replace the existing mainframe functionality with equivalent features build using cloud-native capabilities.

The rehost approach is arguably the easier of the two alternatives. Still, it comes with a downside – the inability to leverage native cloud capabilities and benefits like continuous integration and deployment that flow from it.

Even though a rebuild offers the benefits of native cloud capabilities, it is riskier as it adds cost and complexity. GEICO was clear that it not only wanted to move away from the mainframe codebase but also significantly increase the agility through frequent releases. At the same time, risks of the “rebuild” approach involving a million lines of COBOL code and 16 subsystems were staring them in their faces.

This was when GEICO turned to AIS. GEICO hired AIS for a fixed price and time engagement to “rebuild” the existing mainframe application. AIS extracted the business logic from the current system and reimplemented it from ground-up using modern cloud architecture and at the same time, baking in the principles of DevOps and CI/CD from the inception.

Together GEICO and AIS teams achieved the best of both worlds – a risk mitigated “rebuild” approach to mainframe modernization. Check out the full success story on Microsoft, GEICO finds that the cloud is the best policy after seamless modernization and migration.

DISCOVER THE RIGHT APPROACH FOR MODERNIZING YOUR APPLICATIONS
Download our free whitepaper to explore the various approaches to app modernization, where to start, how it's done, pros and cons, challenges, and key takeaways

Last week, AIS was invited to speak to at DIR one day conference for public sector IT leaders with the theme Journey to the Cloud: How to Get There.

AIS’ session was titled Improve Software Velocity and Portability with Cloud Native Development. As we know, with a significant increase in the adoption of containers, cloud-native development is emerging as an important trend with customers who are looking to build new applications or lift and reshape existing applications in the cloud.

In this session, we discussed the key attributes of the cloud-native approach and how it improves modularity, elasticity, robustness, and – at the same time – helps avoid cloud vendor lock-in.

Cloud-Native Attributes

  • Declarative formats to setup automation
  • Clean contract with the underlying operating system, offering portability
  • Minimize divergence between development and production, enabling continuous deployment
  • Scale-up without significant changes to tooling, architecture, or development practices
  • Containers, service meshes, and microservices
  • Public, private, and hybrid cloud

Multi-Cloud Deployment Demo

To demonstrate cross-cloud portability, we conducted a live demonstration of an application being deployed to the Google Cloud Platform and Microsoft Azure during the session.

Note that the exact same application (binaries and Kubernetes deployment manifests) were used across the two clouds. The only difference between the storage provisioner as shown in the diagram below.

deployment on multi cloud

We thank Texas DIR forum organizers and attendees for a great conference!

Related Resources

For additional information on cloud-native development refer to links below: