LanceDB provides two powerful tools for query analysis and optimization:Documentation Index
Fetch the complete documentation index at: https://lancedb-bcbb4faf-mintlify-f5da8d82.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
explain_plan and analyze_plan. Let’s take a better look at how they work:
| Method | Purpose | Description |
|---|---|---|
explain_plan | Query Analysis | Print the resolved query plan to understand how the query will be executed. Helpful for identifying slow queries or unexpected query results. |
analyze_plan | Performance Tuning | Execute the query and return a physical execution plan annotated with runtime metrics including execution time, number of rows processed, and I/O stats. Essential for performance tuning and debugging. |
Query Analysis Tools
explain_plan
Reveals the logical query plan before execution, helping you identify potential issues with query structure and index usage. This tool is useful for:- Verifying query optimization strategies
- Validating index selection
- Understanding query execution order
- Detecting missing indices
analyze_plan
Executes the query and provides detailed runtime metrics, including:- Operation duration (
_elapsed_compute_) - Data processing statistics (
_output_rows_,_bytes_read_) - Index effectiveness (
_index_comparisons_,_indices_loaded_) - Resource utilization (
_iops_,_requests_)
explain_plan to verify your query structure and analyze_plan to measure and optimize actual performance.
Metadata filters are prefiltered by default, which usually shows the filter pushed into the
LanceScan or index scan. If you set prefilter=False, expect a separate FilterExec after
search instead; that can be useful for some expensive filters, but it changes both latency and
the number of rows available after filtering.
Reading the Execution Plan
To demonstrate query performance analysis, we’ll use a table containing 1.2M rows sampled from the Wikipedia dataset. Initially, the table has no indices, allowing us to observe the impact of optimization. Let’s examine a vector search query that:- Filters rows where
identifieris between 0 and 1,000,000 - Returns the top 100 matches
- Projects specific columns:
chunk_index,title, andidentifier
Execution Plan Components
The execution plan reveals the sequence of operations performed to execute your query. Let’s examine each component:1. Base Layer (LanceScan)
- Initial data scan loading only specified columns to minimize I/O
- Unordered scan enabling parallel processing
2. First Filter
- Apply requested filter on
identifiercolumn - Reduces the number of vectors that need KNN computation
3. Vector Search
- Computes L2 (Euclidean) distances between query vector and all vectors that passed the filter
4. Results Processing
- Filters out null distance results
- Sorts by distance and takes top 100 results
- Processes in batches of 1024 for optimal memory usage
5. Data Retrieval
RemoteTakeis a key component of Lance’s I/O cache- Handles efficient data retrieval from remote storage locations
- Fetches specific rows and columns needed for the final output
- Optimizes network bandwidth by only retrieving required data
6. Final Output
- Returns only requested columns and maintains column ordering
Performance Analysis
Let’s useanalyze_plan to run the query and analyze the query performance, which will help us identify potential bottlenecks:
Performance Metrics Analysis
1. Data Loading (LanceScan)
- Scanned 1,200,000 rows from the LanceDB table
- Read 1.86GB of data in 78 I/O operations
- Only loaded necessary columns (
vectorandidentifier) - Unordered scan for parallel processing
2. Filtering & Search
- Applied prefilter condition (
identifier > 0 AND identifier < 1000000) - Reduced dataset from 1.2M to 1,099,508 rows
- KNN search used L2 (Euclidean) distance metric
- Vector comparisons processed in 1076 batches
3. Results Processing
- KNN results sorted by distance (TopK with fetch=100)
- Null distances filtered out
- Batches coalesced to target size of 1024 rows
- Additional columns fetched for final results
- Remote take operation for 100 results
- Final projection of required columns
Key Observations
- Vector search is the primary bottleneck (1,099,508 vector comparisons)
- Significant I/O overhead (1.86GB data read)
- Full table scan due to lack of indices
- Substantial optimization potential through proper index implementation
Optimized Query Execution
After creating vector and scalar indices, the execution plan shows:Optimized Plan Analysis
1. Scalar Index Query
- Range filter using scalar index
- Only 2 index files and 562 scalar index parts loaded
- 2.3M index comparisons for matches
2. Vector Search
- IVF index with 20 probes
- Only 20 index parts loaded
- 25,893 vector comparisons
- 2,000 matching vectors
3. Results Processing
- Sorts by distance
- Limits to top 100 results
- Batches into groups of 1024
4. Data Fetching
- Single output batch
- One remote take per row
5. Final Projection
- Returns specified columns: chunk_index, title, identifier, and distance
Performance Improvements
1. Initial Data Access
- Before: Full table scan of 1.2M rows, 1.86GB data
- After: Only 2 indices and 562 scalar index parts loaded
- Benefit: Eliminated table scans for prefilter
2. Vector Search Efficiency
- Before: L2 calculations on 1,099,508 vectors
- After:
- 99.8% reduction in vector comparisons
- Decreased output batches from 1,076 to 20
3. Data Retrieval Optimization
- RemoteTake operation remains consistent
Performance Optimization Guide
1. Index Implementation
When to Create Indices
- Columns used in WHERE clauses
- Vector columns for similarity searches
- Join columns used in
merge_insert
Index Type Selection
| Data Type | Recommended Index | Use Case |
|---|---|---|
| Vector | IVF_PQ/IVF_HNSW_SQ/IVF_HNSW_FLAT | Approximate nearest neighbor search |
| Scalar | B-Tree | Range queries and sorting |
| Categorical | Bitmap | Multi-value filters and set operations |
List | Label_list | Multi-label classification and filtering |
Index Coverage Monitoring
Use
table.index_stats() to monitor index coverage.
A well-optimized table should have num_unindexed_rows ~ 0.2. Query Plan Optimization
Common Patterns and Fixes
| Plan Pattern | Optimization |
|---|---|
| LanceScan with high bytes_read or iops | Add missing index |
Use select() to limit returned columns | |
| Check whether the dataset has been compacted | |
| Multiple sequential filters | Reorder filter conditions |
analyze_plan output provides detailed metrics to guide optimization efforts.
3. Getting Started with Optimization
For vector search performance:- Create ANN index on your vector column(s) as described in the index guide
- If you often filter by metadata, create scalar indices on those columns
Analyzing non-vector queries
explain_plan and analyze_plan aren’t vector-specific — they’re available on every query builder, including FTS and hybrid. The most common reason to look at the plan for a non-vector query is to confirm whether your where clause pushed into the scan (good) or ran as a separate FilterExec step on top of the search results (often slower, and a hint that the filter column needs a scalar index).
FTS queries
MatchQuery (or other FTS execution node) reading from the inverted index, with the metadata filter pushed down. If the plan shows a LanceScan followed by FilterExec over the entire text column, the FTS index either isn’t covering the column or the filter isn’t using a scalar index — both worth investigating.
Hybrid queries
For hybrid queries,explain_plan returns the reranker label followed by the vector and FTS sub-plans, indented for readability:
analyze_plan does the same, but executes both sub-queries and labels them as Vector Search Plan: and FTS Search Plan: in the output. This is the easiest way to see whether the filter pushed into both halves uniformly, and which half is dominating latency.