The first time I was introduced to Azure Cognitive Search was from the Microsoft AI Dev Intersection conference in 2019. I thought to write a quick blog post on it to help others understand its features and benefits. This blog is not only for developers, so if you are a Business Analyst, SharePoint Analyst, Project Manager, or Ops Engineer,
you will still find the information useful from this blog
Azure Cognitive Search (ACS) is a technique for using artificial intelligence (AI) to extract additional metadata from images, blobs, and other unstructured data. It works well for both structured and unstructured data. In the past, we needed to set up a separate search farm to fulfill the search requirements for a web application. Since ACS is a Microsoft Cloud service, we do not need to set up any servers or be a search expert. You can prove these concepts in front of your customer in minutes.
When can we use it?
Most of the businesses have many handwritten documents, forms, emails, PowerPoints, Word documents, of unstructured data. For handwritten documents, even if you scan and digitize it, how can we make content searchable? If you have images, drawings, and picture data, how do we extract text contents out of it and make it searchable? If you have many handwritten documents, you can scan it, upload it to Azure Blob Storage containers in an organized fashion and Azure Cognitive search can import the documents from Blob Containers and create the search indexes. The below diagram shows the paper document flow.
Paper Documents Flow:
Below are a few cases where ACS can really come handy:
- If the local-file share has many documents and running out of space. Example: If your organization is storing documents in File Server, you can index those documents using ACS and can provide a good search experience so users do not have to use Windows, search explorer to search. You can design nice web application UI which can search using ACS indexes.
- The customer already has data in the cloud. Like data stored in Azure Blob Storage, Azure SQL Database, or Azure Cosmos Db. ACS can easily connect and create indexes on Azure Blob Storage, Azure SQL Db, and Azure Cosmos DB.
- International business companies have documents in many languages. Out of the box, ACS search indexes translated results in many different languages. You can show your search result in a different language as well.
- The client needs to apply AI to business documents.
- Documents are lacking the Metadata. Example: Documents that are having Title only as metadata so all you can search by is Title! But ACS can extract many key phrases from documents, and we can search on key phrases as well.
We will next learn how to quickly prove this concept.
Creating Service and Indexes from Azure Portal
The below diagram shows the simple flow from the Azure portal. You can prove the ACS concepts in front of clients in minutes.
Log in to the Azure portal and create the Azure cognitive search service. You can find steps on how to create ACS here.
Once your service has been created, follow the below steps to quickly prove the concept.
- Step 1: Start with documents (unstructured text) such as PDF, HTML, DOCX, Emails, and PPTX in Azure Blob storage. Upload your contents in Azure blob Storage and in ACS. Import your data from Azure Blob Storage.
- Step 2: Select this option if you would like to apply cognitive skills (see the next section for understanding the cognitive skills)
- Step 3: Define an index (structure) to store the output (raw content, Step 2-generated name-value pairs).
- Step 4: Create an indexer, Indexer fills the data into your index fields.
(See the next section for understanding the Index and Indexer)
- Step 5: You can quickly search on indexes by using Azure Search Explorer.
Understanding Index and Indexer
The search index is like you are creating an empty table and fields. If you want to search on your data, first we need to figure out which fields we want to make it searchable. Once we decide the fields, how can we populate data into it? The search indexer pulls the data from your source and fills your search indexes with data so you can search on search indexes. It is very quick to define your search indexes and create an indexer from Azure Portal in ACS. In ACS search index is just Json objects.
Understanding Text Cognitive Skills and Image Skills
Out of the box Text Cognitive skills in ACS can extract the people’s names, organization names, location names, and key phrases from your data or documents. Text Cognitive skills can also translate the result in different languages and can also detect the language.
See below an example of results translated into the Hindi language.
Image skills can generate tags and captions from images and can also identify celebrities.
See below JSON search index as an example of Image cognitive skill.
Since Azure Cognitive Search is cloud service, it is very quick to use it if you already have data in cloud or on-premises. If you have data in your own data center, you can push the data into Azure cognitive search indexes. Below two are my favorite demo sites, they used ACS to extract the content out of paper documents and images.