7 Key Features of Azure Cosmos DB

Today’s digital world demands data at the click of a button–or a touch–which requires applications to be highly responsive and always online. Along with low latency and high availability, applications need to respond in real-time, store large volumes of data, and make it available to users in milliseconds.

At AArete, we are constantly on the lookout for identifying better ways to tame data and make valuable insights available to users. In recent months, we have had interesting opportunities to work on Azure Cosmos DB during various internal projects. We have discovered that Azure Cosmos DB is of great help for developers because of its features. In this blog, we’re going to talk about 7 such key features of Azure Cosmos DB.  

Before we jump to the list, what is Azure Cosmos DB?

It’s a NoSQL JSON database offering from Microsoft Azure Cloud Platform. Azure Cosmos DB is a globally distributed, massively scalable, multi-model database service with guaranteed single-digit-millisecond data access. Azure Cosmos DB also supports large-scale operational and transactional applications.

Why should you use Azure Cosmos DB?

Azure Cosmos DB can resolve the problems of projects which demand:

  • Minimal query processing
  • High throughput & availability
  • Low latency
  • Unlimited Storage

Now that we have the basics in place, here are the 7 key features of Azure Cosmos DB.

1. Quick & Easy Database Representation

Creating an Azure Cosmos DB database is a simple process. The below diagram represents the hierarchy of the data representation where “:Account” represents the Azure subscription under which all databases are created with sub-entities.

Database Representation

Here’s an example to help you understand the database representation in Azure Cosmos DB –

Consider that in the relational structure shown in the image below, the “Employee table” is related to “Address” and “Contact” via “key EmpID.” The image beside it is the corresponding JSON in Azure Cosmos DB.

Azure Cosmos DB Tables

(Note: EmpID becomes the partition key in CosmosDB to uniquely identify the document)

2. Multiple Data Model Support
Multiple Data Model Support

Azure Cosmos DB supports a wide range of data models with formats like document, graph, key-value, and table and column-family data for flexibility in storage.

3. Globally Distributed Database

Another interesting feature of Azure Cosmos DB is that it automatically replicates all your data that is distributed globally. Here are three things to know about globally distributed database feature of Azure Cosmos DB:

  1. Available in all Azure regions
  2. Availability of manual and automatic failover feature
  3. Automatic & synchronous multi-region replication

Take a look at the image below.

To add multiple regions, you can select the “Replicate data globally” option under “settings” and then select the region you want to replicate. So, as you can see in the image below, in scenarios when the ?write region” which is “Australia Southeast” goes down, the selected “secondary node,” which is “Canada Central,” becomes the primary write node.

4. Five Well-Defined Consistency Models

Working on a globally distributed database requires methods to maintain the consistency of the data replicated among the write/read nodes which are key for data availability in all regions. Azure Cosmos DB offers 5 consistency models as below:

ConsistencyWorking
StrongStrong consistency: reads only committed data with high availability
Bounded-StalenessBounded-staleness consistency: read will lag behind writes (Eg: If bounded staleness time is set to 1hr, all regions will read data lagged by 1hr.)
SessionA session is the default consistency level since it not only guarantees consistency but also has better throughput.
Consistent PrefixConsistent prefix reads the data in the order which it is written/updated. Global order of write/update is preserved.
EventualEventual consistency guarantees for reads of all replicated data eventually. It offers low latency because it does not wait on any document commits.

The image below shows the default consistency i.e. Session in Azure Cosmos DB.

5. Change Feed (Change Data Capture)

Change feed support in Azure Cosmos DB works by referring to an Azure Cosmos DB container for any changes. Its output is the sorted list of documents that were changed in the order in which they were modified.

Here’s the architecture diagram of Change Feed –

The change feed includes inserts and updates operations made to items within the container. You can capture deletes by setting a “soft-delete” flag within your items (for example, documents) in place of deletes.

You can work with change feed using the following options:

  • Using Azure Functions (Recommended)
  • Using Change Feed processor library
  • Using the Azure Cosmos DB SQL API SDK

Among all available methods, Azure Functions is the simplest way to connect to the Change Feed of Azure Cosmos DB. Azure Cosmos DB Trigger gets automatically triggered on each new event in your Azure Cosmos container’s change feed.

To implement a serverless event-based flow, you need to create two containers in Azure Cosmos DB:

  • The monitored container: The monitored container is the Azure Cosmos container being monitored for any insert/updates. It stores the live data from which the change feed is generated.
  • The lease container: The lease container maintains state across multiple and dynamic serverless Azure Function instances and enables dynamic scaling. This lease container is a common container for all Azure Functions for internal session management and can be manually or automatically created by the Azure Cosmos DB Trigger.

Any action can be performed on the captured change feed of the Azure Functions to implement business logic.

Use Case of Change Data Capture

One of our clients wanted to push a pre-approved instant credit offer to active users of their digital wallet application. For this, their exponentially increasing customer base needed a highly scalable, modern data warehouse solution as their existing SQL Database was unable to serve this requirement in real-time due to its query processing technology limitation. To enable the client to push the credit offer in real-time to the active users of their digital wallet application based on credit eligibility, AArete designed a micro-service architecture using Azure Stream Analytics along with Azure Cosmos DB.

As shown in the above diagram, when a customer logs in into the application, the login information is passed to Azure Cosmos DB via Azure Stream Analytics when the document of Azure CosmosDB is updated. As soon as the document is updated, the Change Data Capture triggers Azure Cosmos DB Trigger which pushes the offer specific to that customer by searching in the database in real time.  

6. Latency or Provision Throughput on Containers and Databases

With Azure Cosmos DB, very low latency is practically guaranteed at less than 10 milliseconds when reading data and less than 15 milliseconds when writing data. The cost of all database operations is normalized by Azure Cosmos DB and is expressed by Request Units (or RUs, for short).  

Microsoft provides some metrics about the costs of basic operations as:

  • For a 1 KB document: a read costs 1 RU, write costs 5 RU
  • For a 100 KB document: a read costs 10 RU, write costs 50 RU

Using these metrics the RU’s can be calculated for any project. All database operation costs like write, read, query execution, etc. are measured in RUs. A minimum of 400 RUs is reserved to keep container or database operational. According to the data volume, RUs can be calculated between 400-1,00,000 RU/second.

With Azure Cosmos DB, you can provision throughput at two granularities:

  • Containers Level: When throughput is provisioned on an Azure Cosmos container, it is exclusively reserved for that container and also uniformly distributed across all the logical partitions of the container. This provision is used when you want to dedicate throughput on the individual container.
  • Database Level: When you provision throughput on an Azure Cosmos database, the throughput is shared across all the containers in the database. It is recommended that you configure throughput on a database when you want to share the throughput across multiple containers, but don’t want to dedicate the throughput to any particular container.
7. Partitioning in Azure Cosmos DB

Azure Cosmos DB allows for an automatic partitioning capability using containers for storing data collections. It determines the number of partitions across servers based upon the size of storage and throughput commissioned for the container. Each document is assigned a partition key and a row key for unique identification and prevents Azure Cosmos DB from distribution across various physical partitions. Below is a flow diagram that shows partitioning in Azure Cosmos DB.       

The two types of possible partitions are:

  • Physical Partition: These partitions provide fixed SSD-Backed storage with variable compute resources namely CPU and Memory. Azure automatically scales the number of physical partitions based on workload and hence the identification of the right partition key is important.
  • Logical Partitions: Logical partition is a partition within a physical partition, which are based on the value of a partition key that is associated with each item in a container. As a result, multiple logical partitions can be stored in the same physical partition.

Here’s how you can select the “Right” level of granularity for your partitions.

Consider an example of physical partitioning for a Honda by year and model.

Partitions should be based on your most often occurring query and transactional needs. The goal is to maximize granularity and minimize cross-partition requests.

Questions?

AArete is a Microsoft Gold Partner in analytics. If you need help with Azure Cosmos DB or other related Microsoft Azure products, get in touch with us!