Hybrid Search: Vector + Keyword Techniques for better RAG retrieval

Hybrid Search-RAG combines vector embeddings for semantic understanding with traditional keyword search for exact matches, giving you the best of both worlds when retrieving relevant documents. Instead of relying on just one search method, hybrid search ranks results from multiple approaches and picks the most relevant ones.

Written by Gaurav | 22 min read

Hybrid Search-RAG combines vector embeddings for semantic understanding with traditional keyword search for exact matches, giving you the best of both worlds when retrieving relevant documents. Instead of relying on just one search method, hybrid search ranks results from multiple approaches and picks the most relevant ones.

Have you ever asked a RAG system about “machine learning algorithms” and it completely missed documents that mention “ML algorithms” or “artificial intelligence methods”? That’s because pure vector search sometimes misses obvious keyword connections.

On the flip side, searching for “How does AI impact society?” with pure keyword search might miss documents that discuss the concept without using those exact words.

Hybrid search solves both problems.

Let me walk you through building a hybrid search system that will make your RAG applications significantly more reliable.

1. Understanding the Problem We’re Solving

Before we jump into code, let me explain why hybrid search matters with a simple example.

Imagine you have a document collection about machine learning, and someone asks: “What are the benefits of neural networks?”

A pure Vector Search converts your question into numbers (embeddings) and finds documents with similar “meaning.” Think of it like this: it understands that “benefits” and “advantages” mean similar things, or that “neural networks” relates to “deep learning.”

A pure Keyword Search looks for exact word matches. It’s like using Ctrl+F on steroids – it excels at finding documents that contain your exact terms but struggles with synonyms and context. It’s fast and precise but literal-minded.

The problem is that Vector search might miss a document titled “Advantages of Deep Learning Models” if the semantic similarity isn’t strong enough. Keyword search might miss it because it doesn’t contain the exact words “neural networks” or “benefits.”

Hybrid search runs both searches approaches simultaneously, then intelligently combines the results. It’s like having both a smart (who understands meaning) AND a precise filing assistant (who finds exact matches) working together.

2. Setting Up Your Environment

Let’s get our tools ready. I’ll assume you have Python and VS Code set up.

bash

conda create -n hybrid-rag python==3.12
conda activate hybrid-rag
pip install ipykernel
pip install langchain openai faiss-cpu python-dotenv tiktoken pypdf rank-bm25 sentence-transformers

Now let’s import everything we need:

python

import os
from dotenv import load_dotenv

# LangChain components for RAG
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import FAISS
from langchain.schema import Document

# For keyword search
from rank_bm25 import BM25Okapi

# Utility libraries
import numpy as np  # For numerical operations
from sentence_transformers import SentenceTransformer  # Alternative embeddings
import textwrap  # For nice text formatting
from typing import List, Tuple  # Type hints for better code
import re  # For text preprocessing

# Load environment variables (your API keys)
load_dotenv()
os.environ["OPENAI_API_KEY"] = os.getenv('OPENAI_API_KEY')

3. Loading and Preparing Our Documents

For this tutorial, we will work with a PDF document. You can use a research paper, manual, or any document you want to make searchable.

You can download the pdf here.

python

# Load your PDF document
document_path = "Robotics.pdf"  # Replace with your PDF path

# PyPDFLoader reads PDF files and converts each page into a document object
pdf_loader = PyPDFLoader(document_path)
raw_documents = pdf_loader.load()

print(f"Loaded {len(raw_documents)} pages from the PDF")
print(f"Sample content: {raw_documents[0].page_content[:200]}...")

python

Loaded 57 pages from the PDF
Sample content: Comprehensive Guide to Robotics
Table of Contents
1. Introduction to Robotics
2. Historical Development of Robotics
3. Fundamental Concepts and Definitions
4. Types and Classifications of Robots
5. Ro...

The PyPDFLoader takes your PDF and converts each page into a “Document” object.

Now we need to split our documents into smaller, searchable chunks. This is crucial for both vector and keyword search effectiveness.

python

# Configure text splitting for optimal search performance
chunk_size = 600  # Character limit per chunk - larger chunks work better for hybrid search
chunk_overlap = 150  # How many characters overlap between chunks to preserve context

# RecursiveCharacterTextSplitter is smart about where it splits text
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=chunk_size,
    chunk_overlap=chunk_overlap,
    # It tries these separators in order - paragraphs first, then sentences, then words
    separators=["\n\n", "\n", ". ", " ", ""]
)

# Split documents into chunks
document_chunks = text_splitter.split_documents(raw_documents)

print(f"Split {len(raw_documents)} pages into {len(document_chunks)} chunks")
print(f"Average chunk length: {np.mean([len(chunk.page_content) for chunk in document_chunks]):.0f} characters")

python

Split 57 pages into 346 chunks
Average chunk length: 526 characters

Imagine trying to answer “What is machine learning?” by giving someone an entire 300-page textbook. They’d get overwhelmed! Instead, we want to find just the relevant pages or sections.

4. Building the Vector Search Component

Vector search converts text into numerical representations that capture semantic meaning. Think of it like translating words into a mathematical language that computers can understand and compare.

python

# Initialize embeddings model - this converts text into vectors (arrays of numbers)
embeddings_model = OpenAIEmbeddings()

# Create vector store - this is like building a searchable database of vectors
print("Creating vector embeddings... (this may take a moment)")
vector_store = FAISS.from_documents(
    documents=document_chunks,  # Our text chunks
    embedding=embeddings_model  # The model that converts text to vectors
)

print(f"Created vector store with {len(document_chunks)} document embeddings")

python

Creating vector embeddings... (this may take a moment)
Created vector store with 346 document embeddings

What are embeddings? Imagine each piece of text as a point in a multi-dimensional space. Related concepts end up close to each other in this space. For example, “dog” and “puppy” would be close together, “car” and “automobile” would be close together.

FAISS (Facebook AI Similarity Search) is like a super-fast librarian. Instead of checking each document one by one, it uses mathematical tricks to quickly find the most similar vectors.

Let’s test our vector search to see how it works:

python

# Test vector search with a sample query
test_query = "AI in robotics"

# similarity_search finds the most similar document chunks
vector_results = vector_store.similarity_search(test_query, k=3)  # k=3 means "give me top 3 results"

print("Vector Search Results:")
print("=" * 50)
for i, doc in enumerate(vector_results, 1):
    print(f"Result {i}:")
    # Show first 200 characters of each result
    print(textwrap.fill(doc.page_content[:200], width=80) + "...")
    print("-" * 40)

python

Vector Search Results:
==================================================
Result 1:
The Role of AI in Robotics Artificial intelligence enhances robotics by
providing capabilities that go beyond traditional programmed responses. AI
enables robots to perceive and understand complex env...
----------------------------------------
Result 2:
Artificial Intelligence Integration The integration of artificial intelligence
with robotics represents one of the most significant current trends,
transforming robots from programmed machines into in...
----------------------------------------
Result 3:
domains. While the timeline for AGI remains uncertain, its eventual development
could lead to robots capable of scientific research, creative problem-solving,
and autonomous learning that could accele...
----------------------------------------

Notice how vector search finds semantically related content, even if it doesn’t contain your exact words. This is its key strength!

5. Building the Keyword Search Component

Now let’s implement BM25, the gold standard for keyword search. BM25 (Best Matching 25) is an algorithm that ranks documents based on how well they match your search terms.

python

# First, we need to prepare our text for BM25 search
def preprocess_text_for_bm25(text):
    """
    Clean and tokenize text for BM25 search
    This function prepares text the way BM25 expects it
    """
    # Convert to lowercase (so "Machine" and "machine" are treated the same)
    # Remove special characters (keep only letters, numbers, and spaces)
    text = re.sub(r'[^a-zA-Z0-9\s]', ' ', text.lower())

    # Split into individual words (tokens) and remove empty strings
    tokens = [token for token in text.split() if token.strip()]
    return tokens

# Let's see what this preprocessing does
sample_text = "Machine Learning algorithms, including Neural Networks!"
processed = preprocess_text_for_bm25(sample_text)
print(f"Original: {sample_text}")
print(f"Processed: {processed}")

python

Original: Machine Learning algorithms, including Neural Networks!
Processed: ['machine', 'learning', 'algorithms', 'including', 'neural', 'networks']

Why do we preprocess text? BM25 works with individual words (tokens). By standardizing the text (lowercase, removing punctuation), we ensure that “Machine,” “machine,” and “machine.” are all treated as the same word.

Now let’s create our BM25 search index

python

# Create BM25 corpus (collection of documents for searching)
document_texts = [chunk.page_content for chunk in document_chunks]  # Extract just the text
tokenized_corpus = [preprocess_text_for_bm25(text) for text in document_texts]  # Process each document

# Initialize BM25 with our processed documents
bm25_searcher = BM25Okapi(tokenized_corpus)

print(f"BM25 index created with {len(tokenized_corpus)} documents")

# Let's peek at what a tokenized document looks like
print(f"Sample tokenized document: {tokenized_corpus[0][:10]}...")  # First 10 tokens

python

BM25 index created with 346 documents
Sample tokenized document: ['comprehensive', 'guide', 'to', 'robotics', 'table', 'of', 'contents', '1', 'introduction', 'to']...

What is BM25 doing? It’s analyzing each document to understand:
– Term Frequency (TF): How often does each word appear in this document?
– Document Frequency (DF): How many documents contain this word?
– Document Length: Longer documents get slightly penalized

The algorithm gives higher scores to documents that contain your search terms frequently, contain rare words from your query (common words like “the” matter less) and those that
aren’t too long (focused documents rank higher)

Let’s test keyword search:

python

# Test BM25 search
test_query = "AI in Robotics"
query_tokens = preprocess_text_for_bm25(test_query)  # Process query same way as documents
print(f"Query tokens: {query_tokens}")

# Get BM25 scores for all documents
bm25_scores = bm25_searcher.get_scores(query_tokens)

# Get indices of top 3 scoring documents
top_indices = np.argsort(bm25_scores)[::-1][:3]  # [::-1] reverses to get highest scores first

print("Keyword Search Results:")
print("=" * 50)
for i, idx in enumerate(top_indices, 1):
    print(f"Result {i} (BM25 Score: {bm25_scores[idx]:.2f}):")
    print(textwrap.fill(document_texts[idx][:200], width=80) + "...")
    print("-" * 40)

python

Query tokens: ['ai', 'in', 'robotics']
Keyword Search Results:
==================================================
Result 1 (BM25 Score: 8.15):
time through experience. The symbiotic relationship between AI and robotics is
particularly important because robotics provides AI with a physical embodiment
that enables interaction with the real wor...
----------------------------------------
Result 2 (BM25 Score: 7.87):
The Role of AI in Robotics Artificial intelligence enhances robotics by
providing capabilities that go beyond traditional programmed responses. AI
enables robots to perceive and understand complex env...
----------------------------------------
Result 3 (BM25 Score: 7.33):
Generalization and robustness challenges arise from the need for AI systems to
perform well in situations that differ from their training data. This is
particularly challenging in robotics where the d...
----------------------------------------

Understanding BM25 scores:
– Higher scores = better matches
– Score of 0 = no matching terms found
– Typical scores range from 0 to 20+ (no fixed maximum)
– Documents with exact phrase matches often score highest

You’ll notice that keyword search excels at finding documents with your exact terms, but might miss semantically related content that uses different words.

6. Implementing Hybrid Search

We’re now entering the highlight of this system – combining both search methods.

There are several ways to do this, but I’ll show you the most effective approach.

python

def perform_hybrid_search(query: str, vector_store, bm25_searcher, document_chunks, 
                         k: int = 5, alpha: float = 0.7):
    """
    Perform hybrid search combining vector and keyword search

    Args:
        query: What the user is searching for
        vector_store: Our FAISS vector database
        bm25_searcher: Our BM25 keyword searcher
        document_chunks: Original document pieces
        k: How many results to return
        alpha: Weight for vector search (0.0 = only BM25, 1.0 = only vector)
    """

    # STEP 1: Get results from vector search
    # We ask for k*2 results to have more options for combination
    vector_results = vector_store.similarity_search_with_score(query, k=k*2)

    # STEP 2: Get results from keyword search
    query_tokens = preprocess_text_for_bm25(query)
    bm25_scores = bm25_searcher.get_scores(query_tokens)

    # STEP 3: Normalize scores so we can combine them fairly
    # Vector and BM25 scores are on different scales, so we need to normalize them

    vector_scores = {}
    for doc, score in vector_results:
        # FAISS returns distance (lower = more similar), so we convert to similarity
        # The formula 1/(1+distance) converts distance to similarity (0 to 1 range)
        normalized_score = 1 / (1 + score)

        # Find which document chunk this corresponds to
        doc_index = next(i for i, chunk in enumerate(document_chunks) 
                        if chunk.page_content == doc.page_content)
        vector_scores[doc_index] = normalized_score

    # Normalize BM25 scores to 0-1 range
    if max(bm25_scores) > 0:
        # Divide all scores by the maximum score
        bm25_normalized = bm25_scores / max(bm25_scores)
    else:
        # If no matches found, all scores remain 0
        bm25_normalized = bm25_scores

    # STEP 4: Combine the scores using weighted average
    hybrid_scores = {}

    # Get all document indices that appeared in either search
    all_indices = set(vector_scores.keys()) | set(range(len(bm25_scores)))

    for idx in all_indices:
        # Get vector score for this document (0 if not found by vector search)
        vector_score = vector_scores.get(idx, 0)

        # Get BM25 score for this document (0 if index out of range)
        bm25_score = bm25_normalized[idx] if idx < len(bm25_normalized) else 0

        # Weighted combination: alpha controls the balance
        # alpha=0.7 means 70% vector search, 30% keyword search
        hybrid_scores[idx] = alpha * vector_score + (1 - alpha) * bm25_score

    # STEP 5: Get top k results
    top_indices = sorted(hybrid_scores.keys(), key=lambda x: hybrid_scores[x], reverse=True)[:k]

    # STEP 6: Package results with all the score information
    results = []
    for idx in top_indices:
        results.append({
            'document': document_chunks[idx],
            'hybrid_score': hybrid_scores[idx],
            'vector_score': vector_scores.get(idx, 0),
            'bm25_score': bm25_normalized[idx] if idx < len(bm25_normalized) else 0
        })

    return results

Breaking down the hybrid search logic:

Why k*2 for vector results? We get extra vector results so we have more candidates to work with when combining scores.
Score normalization: Vector search returns distances (lower is better), while BM25 returns scores (higher is better). We normalize both to 0-1 range so they’re comparable.
Alpha parameter: This is your control knob:

– alpha = 1.0: Pure vector search (semantic only)
– alpha = 0.0: Pure keyword search (BM25 only)
– alpha = 0.7: Balanced toward semantic (70% vector, 30% keyword)

Weighted combination: The formula alpha * vector + (1-alpha) * keyword lets you tune the balance.

This function combines both search methods using a weighted average. The alpha parameter controls the balance – higher values favor semantic search, lower values favor keyword matching.

Let’s test our hybrid search

python

# Test hybrid search with different types of queries
test_query = "What are the trade-offs between hydraulic, pneumatic, and electric actuators?"
hybrid_results = perform_hybrid_search(
    query=test_query,
    vector_store=vector_store,
    bm25_searcher=bm25_searcher,
    document_chunks=document_chunks,
    k=5,
    alpha=0.7  # 70% vector search, 30% keyword search
)

print("Hybrid Search Results:")
print("=" * 60)
for i, result in enumerate(hybrid_results, 1):
    print(f"Result {i}:")
    print(f"  Hybrid Score: {result['hybrid_score']:.3f}")
    print(f"  Vector Score: {result['vector_score']:.3f}")
    print(f"  BM25 Score:   {result['bm25_score']:.3f}")
    print(f"  Content: {textwrap.fill(result['document'].page_content[:200], width=75)}...")
    print("-" * 50)

python

Hybrid Search Results:
============================================================
Result 1:
  Hybrid Score: 0.849
  Vector Score: 0.784
  BM25 Score:   1.000
  Content: Actuators can be classified by their energy source (electric, hydraulic,
pneumatic, or alternative technologies), their motion type (rotary or
linear), their control method (position, velocity, or for...
--------------------------------------------------
Result 2:
  Hybrid Score: 0.802
  Vector Score: 0.787
  BM25 Score:   0.837
  Content: Pneumatic actuators use compressed air to generate motion and are
characterized by fast response times and clean operation. They are commonly
used in pick-and-place applications and environments where...
--------------------------------------------------
Result 3:
  Hybrid Score: 0.796
  Vector Score: 0.771
  BM25 Score:   0.855
  Content: motion and rotary motion through rack-and-pinion mechanisms. These
cylinders are available in various configurations including single-acting,
double-acting, and specialized designs for specific applic...
--------------------------------------------------
Result 4:
  Hybrid Score: 0.785
  Vector Score: 0.808
  BM25 Score:   0.732
  Content: achieve excellent controllability through servo valves and proportional
valves that provide precise flow and pressure control. The primary
advantages of hydraulic actuators include very high power-to-...
--------------------------------------------------
Result 5:
  Hybrid Score: 0.776
  Vector Score: 0.799
  BM25 Score:   0.722
  Content: DC motors provide high efficiency and long life, making them suitable for
continuous operation applications. Hydraulic actuators use pressurized
fluid to generate motion and can produce very high forc...
--------------------------------------------------

Look for patterns:
– Documents with high vector scores understand your query’s meaning
– Documents with high BM25 scores contain your exact words
– The best hybrid results often have decent scores in both categories

7. Comparing All Three Approaches

Let’s create a side-by-side comparison to see the differences. This will help you understand when each method shines:

python

def compare_search_methods(query: str):
    """Compare vector, keyword, and hybrid search results"""

    print(f" SEARCH QUERY: '{query}'")
    print("=" * 80)

    # VECTOR SEARCH: Focuses on meaning and concepts
    print("\n VECTOR SEARCH (Semantic Similarity):")
    print("-" * 50)
    vector_results = vector_store.similarity_search(query, k=3)
    for i, doc in enumerate(vector_results, 1):
        print(f"{i}. {textwrap.fill(doc.page_content[:150], width=70)}...")
        print()

    # KEYWORD SEARCH: Focuses on exact word matches
    print(" KEYWORD SEARCH (BM25):")
    print("-" * 50)
    query_tokens = preprocess_text_for_bm25(query)
    bm25_scores = bm25_searcher.get_scores(query_tokens)
    top_bm25_indices = np.argsort(bm25_scores)[::-1][:3]

    for i, idx in enumerate(top_bm25_indices, 1):
        print(f"{i}. {textwrap.fill(document_texts[idx][:150], width=70)}...")
        print(f"   BM25 Score: {bm25_scores[idx]:.2f}")
        print()

    # HYBRID SEARCH: Combines both approaches
    print(" HYBRID SEARCH (Combined Approach):")
    print("-" * 50)
    hybrid_results = perform_hybrid_search(query, vector_store, bm25_searcher, 
                                         document_chunks, k=3)

    for i, result in enumerate(hybrid_results, 1):
        print(f"{i}. {textwrap.fill(result['document'].page_content[:150], width=70)}...")
        print(f"   Hybrid: {result['hybrid_score']:.3f} | Vector: {result['vector_score']:.3f} | BM25: {result['bm25_score']:.3f}")
        print()

# Test with different types of queries to see how each method performs
test_queries = [
    "LIDAR sensor resolution accuracy",  # Technical terms - keyword search should excel
    "robots understanding human emotions",  # Conceptual question - vector search should excel
    "industrial robot safety standards compliance",  # Mixed - both should contribute
    "DOF kinematics inverse transformation matrix"  # Specific technical terms
]

for query in test_queries:
    compare_search_methods(query)
    print("\n" + "="*80 + "\n")

python

 SEARCH QUERY: 'LIDAR sensor resolution accuracy'
================================================================================

 VECTOR SEARCH (Semantic Similarity):
--------------------------------------------------
1. Laser rangefinders use laser light to measure distances with high
accuracy and precision. Single- point laser rangefinders can measure
distances to sp...

2. expensive than conventional cameras, these systems enable applications
such as agricultural monitoring, mineral identification, and quality
inspection...

3. accuracy, range, and environmental requirements. Ultrasonic sensors
use sound waves to measure distance by calculating the time required
for sound to...

 KEYWORD SEARCH (BM25):
--------------------------------------------------
1. returning to the same position) but not accurate (that position might
not be where it was commanded to go), or vice versa. These
characteristics are c...
   BM25 Score: 9.41

2. Laser rangefinders use laser light to measure distances with high
accuracy and precision. Single- point laser rangefinders can measure
distances to sp...
   BM25 Score: 7.95

3. and object properties. Some systems can even detect chemical
properties through artificial smell and taste capabilities.
Proprioceptive sensing that p...
   BM25 Score: 6.53

 HYBRID SEARCH (Combined Approach):
--------------------------------------------------
1. returning to the same position) but not accurate (that position might
not be where it was commanded to go), or vice versa. These
characteristics are c...
   Hybrid: 0.791 | Vector: 0.701 | BM25: 1.000

2. Laser rangefinders use laser light to measure distances with high
accuracy and precision. Single- point laser rangefinders can measure
distances to sp...
   Hybrid: 0.767 | Vector: 0.733 | BM25: 0.845

3. sensor data. Multi-sensor architectures require careful design to
handle data synchronization, communication bandwidth, and
computational requirements...
   Hybrid: 0.692 | Vector: 0.696 | BM25: 0.682


================================================================================

 SEARCH QUERY: 'robots understanding human emotions'
================================================================================

 VECTOR SEARCH (Semantic Similarity):
--------------------------------------------------
1. dynamics. Emotional intelligence in robots involves the ability to
recognize, understand, and respond appropriately to human emotions.
This capability...

2. require careful coordination and communication. These operations must
be safe, efficient, and intuitive for human users while accommodating
variations...

3. on context, user preferences, and environmental conditions. Affective
computing enables robots to understand and respond to human emotions
more effect...

 KEYWORD SEARCH (BM25):
--------------------------------------------------
1. on context, user preferences, and environmental conditions. Affective
computing enables robots to understand and respond to human emotions
more effect...
   BM25 Score: 7.23

2. dynamics. Emotional intelligence in robots involves the ability to
recognize, understand, and respond appropriately to human emotions.
This capability...
   BM25 Score: 7.05

3. Natural interaction paradigms seek to make human-robot interaction as
natural and intuitive as human-human interaction. This includes
developing robot...
   BM25 Score: 6.94

 HYBRID SEARCH (Combined Approach):
--------------------------------------------------
1. dynamics. Emotional intelligence in robots involves the ability to
recognize, understand, and respond appropriately to human emotions.
This capability...
   Hybrid: 0.863 | Vector: 0.815 | BM25: 0.976

2. on context, user preferences, and environmental conditions. Affective
computing enables robots to understand and respond to human emotions
more effect...
   Hybrid: 0.857 | Vector: 0.796 | BM25: 1.000

3. require careful coordination and communication. These operations must
be safe, efficient, and intuitive for human users while accommodating
variations...
   Hybrid: 0.739 | Vector: 0.799 | BM25: 0.600


================================================================================

 SEARCH QUERY: 'industrial robot safety standards compliance'
================================================================================

 VECTOR SEARCH (Semantic Similarity):
--------------------------------------------------
1. validation procedures to ensure safety systems function correctly
under all conditions. Standards and regulations provide frameworks for
ensuring robo...

2. addressing legitimate concerns about robot deployment. Regulatory
approaches include development of adaptive regulatory frameworks,
international coor...

3. may include mechanical hazards from moving parts, electrical hazards
from power systems, thermal hazards from heating elements, and
behavioral hazards...

 KEYWORD SEARCH (BM25):
--------------------------------------------------
1. validation procedures to ensure safety systems function correctly
under all conditions. Standards and regulations provide frameworks for
ensuring robo...
   BM25 Score: 14.24

2. The regulatory environment for robotics is still evolving, creating
uncertainty and potential barriers to innovation and deployment.
Regulatory fragme...
   BM25 Score: 12.42

3. deployment and create uncertainty about safety and performance
requirements. Certification and testing procedures for complex robotic
systems can be e...
   BM25 Score: 10.72

 HYBRID SEARCH (Combined Approach):
--------------------------------------------------
1. validation procedures to ensure safety systems function correctly
under all conditions. Standards and regulations provide frameworks for
ensuring robo...
   Hybrid: 0.867 | Vector: 0.811 | BM25: 1.000

2. The regulatory environment for robotics is still evolving, creating
uncertainty and potential barriers to innovation and deployment.
Regulatory fragme...
   Hybrid: 0.798 | Vector: 0.766 | BM25: 0.872

3. addressing legitimate concerns about robot deployment. Regulatory
approaches include development of adaptive regulatory frameworks,
international coor...
   Hybrid: 0.690 | Vector: 0.772 | BM25: 0.498


================================================================================

 SEARCH QUERY: 'DOF kinematics inverse transformation matrix'
================================================================================

 VECTOR SEARCH (Semantic Similarity):
--------------------------------------------------
1. Kinematics and Dynamics Kinematics deals with the motion of robots
without considering the forces that cause the motion. It involves
understanding the...

2. Degrees of Freedom Degrees of freedom (DOF) represent the number of
independent ways a robot can move. Each joint in a robot typically
provides one de...

3. within its workspace and orient it in any desired direction. Human
arms, for comparison, have seven degrees of freedom, providing
redundancy that allo...

 KEYWORD SEARCH (BM25):
--------------------------------------------------
1. Kinematics and Dynamics Kinematics deals with the motion of robots
without considering the forces that cause the motion. It involves
understanding the...
   BM25 Score: 14.32

2. within its workspace and orient it in any desired direction. Human
arms, for comparison, have seven degrees of freedom, providing
redundancy that allo...
   BM25 Score: 6.05

3. degrees of freedom and the complexity of the robot's structure.
Advanced mathematical tools, including matrix algebra and differential
equations, are...
   BM25 Score: 5.08

 HYBRID SEARCH (Combined Approach):
--------------------------------------------------
1. Kinematics and Dynamics Kinematics deals with the motion of robots
without considering the forces that cause the motion. It involves
understanding the...
   Hybrid: 0.790 | Vector: 0.700 | BM25: 1.000

2. within its workspace and orient it in any desired direction. Human
arms, for comparison, have seven degrees of freedom, providing
redundancy that allo...
   Hybrid: 0.599 | Vector: 0.675 | BM25: 0.423

3. Degrees of Freedom Degrees of freedom (DOF) represent the number of
independent ways a robot can move. Each joint in a robot typically
provides one de...
   Hybrid: 0.583 | Vector: 0.697 | BM25: 0.317


================================================================================

Performance patterns you’ll typically see:

Vector search: Excels with conceptual queries, struggles with exact technical terms
Keyword search: Dominates with specific technical terms, misses conceptual relationships
Hybrid search: Provides the most balanced and comprehensive results across all query types

9. Building a Complete RAG System with Hybrid Search

Now let’s put it all together into a complete question-answering system. This is where your hybrid search becomes a smart assistant

python

def create_rag_with_hybrid_search(question, alpha = 0.7, k = 3):
    """
    Complete RAG system using hybrid search
    This function takes a question and returns a comprehensive answer
    """

    print(f" Question: {question}")
    print("\n Searching with hybrid approach...")

    # STEP 1: Get relevant documents using hybrid search
    search_results = perform_hybrid_search(
        query=question,
        vector_store=vector_store,
        bm25_searcher=bm25_searcher,
        document_chunks=document_chunks,
        k=k,  # Number of document chunks to retrieve
        alpha=alpha  # Balance between vector and keyword search
    )

    # STEP 2: Combine retrieved content into context
    context_pieces = []
    for result in search_results:
        context_pieces.append(result['document'].page_content)

    # Join all pieces with double newlines for readability
    combined_context = "\n\n".join(context_pieces)

    # STEP 3: Generate answer using retrieved context
    llm = ChatOpenAI(temperature=0, model_name="gpt-4o-mini")  # Low temperature for consistent answers

    # Create a detailed prompt for the language model
    prompt = f"""Based on the following context, answer the question comprehensively and accurately.
Use only the information provided in the context. If the context doesn't contain enough information 
to fully answer the question, acknowledge this limitation.

Context:
{combined_context}

Question: {question}

Answer:"""

    # Generate the response
    response = llm.invoke(prompt)

    # STEP 4: Display results
    print("\n Answer:")
    print(textwrap.fill(response.content, width=80))

    print(f"\n Sources (Found using hybrid search with α={alpha}):")
    print("-" * 50)
    for i, result in enumerate(search_results, 1):
        print(f"Source {i} (Hybrid Score: {result['hybrid_score']:.3f}):")
        print(f"  Vector Score: {result['vector_score']:.3f}")
        print(f"  BM25 Score: {result['bm25_score']:.3f}")
        print(f"  Content: {textwrap.fill(result['document'].page_content[:200], width=75)}...")
        print()

    return response.content, search_results

# Test the complete system with different types of questions
test_questions = [
    "How will quantum computing impact robotic control systems and optimization?",
    "How do AI and deep learning help robots adapt to unstructured environments?",
    "What makes swarm robotics more capable than individual robot systems?"
]

for question in test_questions:
    print("\n" + "=" * 85)
    answer, sources = create_rag_with_hybrid_search(question, alpha=0.6)

python

=====================================================================================
 Question: How will quantum computing impact robotic control systems and optimization?

 Searching with hybrid approach...

 Answer:
Quantum computing has the potential to significantly impact robotic control
systems and optimization by enabling the solution of complex optimization
problems that are currently intractable with classical computing methods. This
capability could lead to more efficient and effective control algorithms for
robots, allowing them to make better decisions and perform tasks more optimally.
Additionally, quantum sensors could enhance robot perception systems by
providing unprecedented sensitivity and precision, further improving the robots'
ability to interact with their environments and execute complex instructions.
Overall, the integration of quantum computing into robotic systems could lead to
advancements in their intelligence and operational capabilities.

Sources (Found using hybrid search with α=0.6):
--------------------------------------------------
Source 1 (Hybrid Score: 0.875):
  Vector Score: 0.792
  BM25 Score: 1.000
  Content: communication and could lead to robots that can understand and respond to
complex instructions in natural language. Quantum computing, while still in
its early stages, holds the potential to revolutio...

Source 2 (Hybrid Score: 0.778):
  Vector Score: 0.751
  BM25 Score: 0.820
  Content: Shared autonomy systems allow humans and robots to share control and
decision-making responsibilities dynamically. These systems can adapt the
level of robot autonomy based on the situation and human...

Source 3 (Hybrid Score: 0.700):
  Vector Score: 0.768
  BM25 Score: 0.598
  Content: approach can enable capabilities that would not be possible with onboard
computing alone but requires careful consideration of communication
requirements and security issues. The continued advancement...


=====================================================================================
Question: How do AI and deep learning help robots adapt to unstructured environments?

Searching with hybrid approach...

 Answer:
AI and deep learning help robots adapt to unstructured environments by providing
them with the ability to perceive and understand complex surroundings, make
decisions under uncertainty, and learn from experience. Specifically, deep
learning techniques enable robots to process high-dimensional sensor data,
allowing them to recognize patterns and make sense of their environment. This
capability is crucial for navigating and operating in unpredictable settings
where traditional programmed responses may fall short.  Additionally, AI
facilitates learning and adaptation through methods such as online learning,
which allows robots to update their knowledge and skills in real-time as they
encounter new situations. Reinforcement learning enables robots to discover
optimal behaviors by engaging in trial and error, while imitation learning
allows them to acquire skills by observing and mimicking human actions. These
learning approaches empower robots to improve their performance over time and
effectively respond to novel tasks and changing conditions in unstructured
environments.   Furthermore, the integration of edge AI and distributed
intelligence enhances real-time decision-making capabilities, enabling robots to
operate independently without relying on cloud connectivity. This is
particularly important in dynamic environments where immediate responses are
necessary. Overall, the combination of AI and deep learning equips robots with
the flexibility and adaptability needed to thrive in complex, unstructured
settings.

 Sources (Found using hybrid search with α=0.6):
--------------------------------------------------
Source 1 (Hybrid Score: 0.871):
  Vector Score: 0.785
  BM25 Score: 1.000
  Content: The Role of AI in Robotics Artificial intelligence enhances robotics by
providing capabilities that go beyond traditional programmed responses. AI
enables robots to perceive and understand complex env...

Source 2 (Hybrid Score: 0.842):
  Vector Score: 0.790
  BM25 Score: 0.920
  Content: High-level planners generate overall strategies and goals, while low-level
planners handle detailed execution. This approach can make complex planning
problems more tractable while providing flexibili...

Source 3 (Hybrid Score: 0.782):
  Vector Score: 0.794
  BM25 Score: 0.765
  Content: robots to learn complex behaviors from high-dimensional sensor data.
Reinforcement learning enables robots to learn optimal behaviors through
trial and error, while imitation learning allows them to a...


=====================================================================================
 Question: What makes swarm robotics more capable than individual robot systems?

 Searching with hybrid approach...

 Answer:
Swarm robotics is more capable than individual robot systems due to its reliance
on multiple simple robots working together to achieve collective goals. This
approach is inspired by social insects, where complex behaviors can emerge from
the interactions of simple individuals. The coordination among numerous robots
allows for enhanced problem-solving capabilities, as they can share information,
distribute tasks, and adapt to changing environments more effectively than a
single robot could. Additionally, swarm robotics can leverage redundancy and
parallel processing, enabling the system to accomplish large-scale tasks such as
environmental restoration, disaster response, or space exploration more
efficiently.

 Sources (Found using hybrid search with α=0.6):
--------------------------------------------------
Source 1 (Hybrid Score: 0.750):
  Vector Score: 0.768
  BM25 Score: 0.724
  Content: technical challenges. Humanoid robots are designed to resemble and mimic
human behavior and appearance. While often seen as the ultimate goal of
robotics, humanoid robots face significant technical ch...

Source 2 (Hybrid Score: 0.734):
  Vector Score: 0.770
  BM25 Score: 0.681
  Content: by social insects, swarm robotics explores how complex behaviors can emerge
from simple individual behaviors and local interactions. Soft robots use
flexible materials and structures, allowing them to...

Source 3 (Hybrid Score: 0.704):
  Vector Score: 0.755
  BM25 Score: 0.629
  Content: domains. While the timeline for AGI remains uncertain, its eventual
development could lead to robots capable of scientific research, creative
problem-solving, and autonomous learning that could accele...

Pro tip: Start with alpha=0.6 for most applications, then adjust based on your results. If you’re getting too many irrelevant semantic matches, lower the alpha. If you’re missing obvious related content, raise the alpha.

Free Course

Master Core Python — Your First Step into AI/ML

Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.

Start Free Course →

Trusted by 50,000+ learners

Written by

Gaurav →

Related Course

Master Gen AI — Hands-On

Join 5,000+ students at edu.machinelearningplus.com

Explore Course

Hybrid Search: Vector + Keyword Techniques for better RAG retrieval

1. Understanding the Problem We’re Solving

2. Setting Up Your Environment

3. Loading and Preparing Our Documents

4. Building the Vector Search Component

5. Building the Keyword Search Component

6. Implementing Hybrid Search

Breaking down the hybrid search logic:

7. Comparing All Three Approaches

9. Building a Complete RAG System with Hybrid Search

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

1. Understanding the Problem We’re Solving

2. Setting Up Your Environment

3. Loading and Preparing Our Documents

4. Building the Vector Search Component

5. Building the Keyword Search Component

6. Implementing Hybrid Search

Breaking down the hybrid search logic:

7. Comparing All Three Approaches

9. Building a Complete RAG System with Hybrid Search

Related Articles

Build a Python AI Chatbot with Memory Using LangChain

OpenAI API Python Tutorial – A Complete Crash Course

Zero-Shot vs Few-Shot Prompting: Complete Guide

Python.SQL. NumPy. All free.

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Python.
SQL. NumPy.
All free.