Part 4 - Backend Brilliance: Integrating Langchain and Cognitive Search for AI-Powered Chats

Intro

This post dives into the application realm, showcasing how vector storage transforms the backend of your chat and AI applications. Unveil the synergy between Langchain, a potent language processing tool, and Cognitive Search. Immerse yourself in the world of vector-based language understanding, and witness how it propels your application’s backend, enabling nuanced and contextually aware conversations.

Understanding the Scenario

In today’s data-driven world, efficient document indexing and retrieval are crucial for businesses to extract valuable insights from their data. Azure Cognitive Search offers a robust platform for creating and managing search indexes, allowing organizations to build intuitive and effective user search experiences. However, the process of indexing large volumes of data and retrieving relevant information can be complex and resource-intensive.
To address these challenges, we’ll explore a scenario where Azure Functions play a pivotal role in automating and enhancing the document indexing and retrieval process. By integrating Azure Cognitive Search with custom language models and external APIs, we can create a streamlined workflow that optimizes the search experience and improves data accessibility.

Setting Up the Environment

Before diving into the code, let’s ensure we have the necessary tools and services to replicate this solution. In this scenario, we’ll need:

  • An Azure subscription with access to Azure Cognitive Search.
  • Azure Functions runtime environment.
  • OpenAI API credentials for language modelling.
  • A storage account for storing documents and metadata.
  • Python development environment with required libraries and packages.

Key Components Explained

To implement our solution, we’ll leverage several custom modules and libraries:

  • chunkindexmanager: A module responsible for creating chunk-based search indexes in Azure Cognitive Search. It uses specialized algorithms for efficient indexing and retrieval of large documents. Explained in the previous part.
  • documentindexmanager: This module handles creating and managing document-based search indexes. It also interacts with storage accounts to retrieve document data. Explained in the previous part.
  • langchain: A library that integrates custom language models for intelligent document retrieval. It combines the power of language understanding with retrieval techniques.
  • AzureChatOpenAI: A class from the langchain library that interacts with OpenAI’s language models to generate human-like responses to user queries.
  • AzureCognitiveSearchRetriever: A retriever class from the langchain library that interacts with Azure Cognitive Search to retrieve relevant documents based on user queries.

Client from the langsmith library: A utility for managing language models and interactions with the Langsmith platform.

Azure Functions: Indexing Documents

The 'IndexDocuments' code on GitHub.

Our first Azure Function, named IndexDocuments, is triggered by an HTTP request to create search indexes in Azure Cognitive Search. This function encapsulates the entire indexing process, from extracting configuration information to creating and managing search indexes.


The create_indexes function orchestrates the creation of both chunk-based and document-based search indexes using the ChunkIndexManager and DocumentIndexManager. It efficiently manages the index creation process, ensuring that the system is set up for optimal search performance.
If you want to test in Postman you can import the following curl command:

# Defining Azure Function 'IndexDocuments'
@app.function_name("IndexDocuments")
@app.route(route="IndexDocuments")
def IndexDocuments(req: func.HttpRequest) -> func.HttpResponse:
    """
    Azure Function triggered by HTTP request to create search indexes in Azure Search.
    """
    try:
        # Extracting configuration information from request body
        req_body = req.get_json()
        config = {
            'AZURE_SEARCH_ADMIN_KEY': req_body.get('AZURE_SEARCH_ADMIN_KEY'),
            'AZURE_SEARCH_SERVICE_ENDPOINT': req_body.get('AZURE_SEARCH_SERVICE_ENDPOINT'),
            'AZURE_SEARCH_INDEX_NAME': req_body.get('AZURE_SEARCH_INDEX_NAME'),
            'AZURE_OPENAI_EMBEDDING_DEPLOYED_MODEL_NAME': req_body.get('AZURE_OPENAI_EMBEDDING_DEPLOYED_MODEL_NAME'),
            'AZURE_OPENAI_EMBEDDING_DEPLOYED_MODEL': req_body.get('AZURE_OPENAI_EMBEDDING_DEPLOYED_MODEL'),
            'AZURE_OPENAI_API_VERSION': req_body.get('AZURE_OPENAI_API_VERSION'),
            'AZURE_OPENAI_ENDPOINT': req_body.get('AZURE_OPENAI_ENDPOINT'),
            'AZURE_OPENAI_API_KEY': req_body.get('AZURE_OPENAI_API_KEY'),
            'BLOB_CONNECTION_STRING': req_body.get('BLOB_CONNECTION_STRING'),
            'BLOB_CONTAINER_NAME': req_body.get('BLOB_CONTAINER_NAME'),
            'AZURE_SEARCH_EMBEDDING_SKILL_ENDPOINT': req_body.get('AZURE_SEARCH_EMBEDDING_SKILL_ENDPOINT'),
            'AZURE_SEARCH_KNOWLEDGE_STORE_CONNECTION_STRING': req_body.get('AZURE_SEARCH_KNOWLEDGE_STORE_CONNECTION_STRING'),
            'AI_SERVICES_RESOURCE_NAME': req_body.get('AI_SERVICES_RESOURCE_NAME'),
            'AZURE_SEARCH_COGNITIVE_SERVICES_KEY': req_body.get('AZURE_SEARCH_COGNITIVE_SERVICES_KEY'),
        }

        # Creating search indexes in Azure Search
        tenant = 'liantis'
        prefix = f"{tenant}-{config['BLOB_CONTAINER_NAME']}"
        index_resources = create_indexes(prefix, config['BLOB_CONNECTION_STRING'], config['BLOB_CONTAINER_NAME'], config)

        # Returning success message if indexes are created successfully
        return func.HttpResponse(f"Indexes Created {index_resources}", status_code=200)
    except Exception as e:
        # Logging error and returning error message if an error occurs
        logger.error(f"Could not create  Index: {str(e)}")
        return func.HttpResponse(f"Could not create  Index. {str(e)}", status_code=500)

The delete_indexes function ensures that the search indexes are gracefully removed from the system, minimizing any potential disruptions.

If you want to test in postman you can import the following curl command:

curl --location --request GET 'http://localhost:7071/api/DeleteIndexes' \
--header 'Content-Type: application/json' \
--data '{
    "AZURE_SEARCH_ADMIN_KEY": "<yourconfig>",
    "AZURE_SEARCH_SERVICE_ENDPOINT":"https://<yourconfig>.search.windows.net",
    "AZURE_SEARCH_INDEX_NAME":"azureblob-index",
    "AZURE_OPENAI_EMBEDDING_DEPLOYED_MODEL_NAME": "text-embedding-ada-002",
    "AZURE_OPENAI_EMBEDDING_DEPLOYED_MODEL": "text-embedding-ada-002",
    "AZURE_OPENAI_API_VERSION": "2023-03-15-preview",
    "AZURE_OPENAI_ENDPOINT": "https://<yourconfig>.openai.azure.com/",
    "AZURE_OPENAI_API_KEY": "<yourconfig>",
    "BLOB_CONNECTION_STRING":"DefaultEndpointsProtocol=https;AccountName=<yourconfig>;AccountKey=<yourconfig>;EndpointSuffix=core.windows.net",
    "BLOB_CONTAINER_NAME":"<yourconfig>",
    "AZURE_SEARCH_EMBEDDING_SKILL_ENDPOINT": "https://<yourconfig>.azurewebsites.net/api/chunk-embed?code=<yourconfig>"
}'

Azure Functions: Answering User Queries

The 'AskYourDocuments' code on GitHub.

In today’s data-driven world, users expect quick and accurate responses to their queries. Our third Azure Function, AskYourDocuments, demonstrates how integrating custom language models and retrieval techniques can provide intelligent responses to user queries.

This function showcases the power of custom language models in generating relevant and context-aware responses. It utilizes the AzureChatOpenAI class to interact with OpenAI’s models and the AzureCognitiveSearchRetriever class to retrieve relevant documents from the search indexes.

Creating and Managing Indexes

A critical part of our solution involves efficiently creating and managing search indexes. The create_indexes function takes care of this process by leveraging the capabilities of the ChunkIndexManager and DocumentIndexManager which were explained in the previous part.
The ChunkIndexManager is responsible for creating chunk-based search indexes. These indexes use specialized algorithms for efficient indexing and retrieval of large documents. On the other hand, the DocumentIndexManager handles document-based indexing, interacting with storage accounts to retrieve document data.

Conclusion

In this part, we’ve explored a real-world scenario where Azure Functions, Azure Cognitive Search, custom language models, and external APIs come together to create an optimized document indexing and retrieval system. By combining the power of serverless computing, intelligent language understanding, and efficient search capabilities, organizations can enhance their data accessibility and provide users with meaningful insights.

The code example provided showcases how Azure Functions can be leveraged to automate and orchestrate complex processes, improving overall efficiency and user experience. By following the explanations and insights provided in this part, you’ll be well-equipped to implement similar solutions tailored to your organization’s unique requirements.

As you continue to explore the capabilities of Azure Functions and cognitive technologies, you’ll be better prepared to harness the full potential of serverless computing and intelligent data retrieval.

Want to know more?

This insight is part of a series where we go through the necessary steps to create and optimize Chat & AI Applications.

Below, you can find the full overview and the links to the different parts of the series:

If you want to get started with creating your own AI-powered chatbot, contact us