Bridging the gap between serverless OLTP and Analytics

Cosmos DB is unabashedly a document-based Online Transactional Processing (OLTP) system. It was engineered at its core to provide low latency, high throughput transactions with service level agreements (SLAs), consistency models, and guarantees to back it. It does this exceptionally well, but there are trade-offs for these architectural decisions. Other databases can do like complex queries with joins and aggregates or set-based operations. Cosmos DB, by design, doesn’t have a native solution or is too resource prohibitive. Capabilities in Cosmos DB continue to evolve, but at a certain point requires another solution that is better suited for the challenges. One of these gaps has now been bridged with Azure Synapse Link for Cosmos DB.

Azure Synapse is Microsoft’s consolidated data analytics platform that brings together data ingestion, transformation, machine learning (ML) training, testing, management, security, monitoring, and visualization all in one place. With Synapse Link, Cosmos DB data can participate in this eco-system.

Under the hood, Cosmos DB data is then replicated from its row-based index store to a column-based index store that sits on top of Azure Data Lake. These files are stored in a read optimized format while the process is fully managed and is enabled by a checkbox and a few options. Because the data is replicated, there is no impact on transactional workloads against Cosmos DB, but there is a delay. There is currently an up to 5-minute replication period, but this time is much lower in practice.

CosmosDb OLTP

The analytical storage is decoupled from the analytical compute systems, so as other compute options become available, the data doesn’t need to be replicated. This also allows for multi-use scenarios like Apache Spark structured event for streaming or traditional data warehousing. Azure Synapse also provides a Serverless SQL compute engine that can read the replicated data.

Previously, these features were possible but required the use of Cosmos DB change feed or direct queries to move the data to another store. With Azure Synapse Link for Azure Cosmos DB, analytics has gone serverless and cloud-native!

I just returned from Microsoft BUILD 2019 where I presented a session on Azure Kubernetes Services (AKS) and Cosmos. Thanks to everyone who attended. We had excellent attendance – the room was full! I like to think that the audience was there for the speaker 😊 but I’m sure the audience interest is a clear reflection of how popular AKS and Cosmos DB are becoming.

For those looking for a 2-minute overview, here it is:

In a nutshell, the focus was to discuss the combining Cloud-Native Service (like AKS) and a Managed Database

Microsoft Build Session Architecting Cloud-Native Apps with AKS and Cosmos DB Slide Deck

We started with a discussion of Cloud-Native Apps, along with a quick introduction to AKS and Cosmos. We quickly transitioned into stateful app considerations and talked about new stateful capabilities in Kubernetes including PV, PVC, Stateful Sets, CSI, and Operators. While these capabilities represent significant progress, they don’t match up with external services like Cosmos DB.

Microsoft Build Session Architecting Cloud-Native Apps with AKS and Cosmos DB Slide Deck

Microsoft Build Session Architecting Cloud-Native Apps with AKS and Cosmos DB Slide Deck Cloud Native Tooling

One option is to use Open Service Broker – It allows Kubernetes hosted services to talk to external services using cloud-native tooling like svcat (Service Catalog).

Microsoft Build Session Architecting Cloud-Native Apps with AKS and Cosmos DB Slide Deck svcat

Microsoft Build Session Architecting Cloud-Native Apps with AKS and Cosmos DB Slide Deck SRE

External services like Cosmos DB can go beyond cluster SRE and offer “turn-key” SRE in essence – Specifically, geo-replication, API-based scaling, and even multi-master writes (eliminating the need to failover).

Microsoft Build Session Architecting Cloud-Native Apps with AKS and Cosmos DB Slide Deck Mutli Master Support

Microsoft Build Session Architecting Cloud-Native Apps with AKS and Cosmos DB Slide Deck Configure Regions

Microsoft Build Session Architecting Cloud-Native Apps with AKS and Cosmos DB Slide Deck Portability

Since the Open Service Broker is an open specification, your app remains mostly portable even when you move to one cloud provider to another. OpenService Broker does not deal with syntactic differences, say connection string prefix difference between cloud providers.  One way to handle these differences is to use Helm.

Learn more about my BUILD session:

Here you can find the complete recording of the session and slide deck: https://mybuild.techcommunity.microsoft.com/sessions/77138?source=sessions#top-anchor

Additionally, you can find the code for the sample I used here: https://github.com/vlele/build2019 

WORK WITH THE BRIGHTEST LEADERS IN SOFTWARE DEVELOPMENT

Recently I had an opportunity to sit down with Steve Michelotti, Program Manager on the AzureGov team and talk about a Machine Learning (ML) application we built for a federal agency. This application is a great example of how AIS leverages the latest innovations on the AzureGov platform to build applications that align with agencies’ missions – and go beyond IT support to directly assist in meeting the mission objectives.

Specifically, this application was designed to help analysts get personalized recommendations (based on their own preference settings, ratings provided by their co-workers) for stories they need to analyze as part of their daily work.

Brent Wodicka from AIS described this application in an earlier blog post. Read More…

We took Rimma Nehme’s excellent demo from BUILD 2017 and recreated it for AzureGov.

In a nutshell, we took the Marvel Universe Social Database and loaded it in Azure Cosmos DB as a graph database. Then we built a simple web page that invoked Gremlin queries against Cosmos DB.

The key theme of this demo is the ease with which you can create a globally distributed database that can support low latency queries against the Marvel Universe graph database. In the context of AzureGov (as shown below), we can seamlessly replicate the data across the three AzureGov regions by clicking on these regions within the Azure portal.

Here’s a quick look at the demo: