Bridging the gap between serverless OLTP and Analytics

Cosmos DB is unabashedly a document-based Online Transactional Processing (OLTP) system. It was engineered at its core to provide low latency, high throughput transactions with service level agreements (SLAs), consistency models, and guarantees to back it. It does this exceptionally well, but there are trade-offs for these architectural decisions. Other databases can do like complex queries with joins and aggregates or set-based operations. Cosmos DB, by design, doesn’t have a native solution or is too resource prohibitive. Capabilities in Cosmos DB continue to evolve, but at a certain point requires another solution that is better suited for the challenges. One of these gaps has now been bridged with Azure Synapse Link for Cosmos DB.

Azure Synapse is Microsoft’s consolidated data analytics platform that brings together data ingestion, transformation, machine learning (ML) training, testing, management, security, monitoring, and visualization all in one place. With Synapse Link, Cosmos DB data can participate in this eco-system.

Under the hood, Cosmos DB data is then replicated from its row-based index store to a column-based index store that sits on top of Azure Data Lake. These files are stored in a read optimized format while the process is fully managed and is enabled by a checkbox and a few options. Because the data is replicated, there is no impact on transactional workloads against Cosmos DB, but there is a delay. There is currently an up to 5-minute replication period, but this time is much lower in practice.

CosmosDb OLTP

The analytical storage is decoupled from the analytical compute systems, so as other compute options become available, the data doesn’t need to be replicated. This also allows for multi-use scenarios like Apache Spark structured event for streaming or traditional data warehousing. Azure Synapse also provides a Serverless SQL compute engine that can read the replicated data.

Previously, these features were possible but required the use of Cosmos DB change feed or direct queries to move the data to another store. With Azure Synapse Link for Azure Cosmos DB, analytics has gone serverless and cloud-native!