Index Classes¶
NgramIndex¶
N-gram based index for fast fuzzy search.
Constructor¶
NgramIndex(
ngram_size: int = 3,
min_similarity: float = 0.0,
min_ngram_ratio: float = 0.0,
normalize: bool = False
)
Parameters:
ngram_size: Size of n-grams (1-32)min_similarity: Minimum similarity for resultsmin_ngram_ratio: Minimum n-gram overlap ratio for candidatesnormalize: Lowercase text for case-insensitive matching
Methods¶
add¶
Add a string to the index. Returns the assigned ID.
add_with_data¶
Add a string with optional associated data.
add_all¶
Add multiple strings.
search¶
search(
query: str,
algorithm: str = "jaro_winkler",
min_similarity: float = 0.0,
limit: int | None = None
) -> list[SearchResult]
Search for similar strings.
Returns: List of SearchResult(id, text, score, distance, data)
batch_search¶
batch_search(
queries: list[str],
algorithm: str = "jaro_winkler",
min_similarity: float = 0.0,
limit: int | None = None
) -> list[list[SearchResult]]
Search for multiple queries in parallel.
contains¶
Check if exact match exists in index.
compress / decompress¶
Compress/decompress posting lists for memory efficiency.
is_compressed¶
Check if index is compressed.
save / load¶
Persist index to disk.
BkTree¶
BK-tree index for edit distance queries.
Constructor¶
Parameters:
algorithm: Distance algorithm ("levenshtein" or "damerau_levenshtein")
Methods¶
add / add_all¶
Add strings to the tree.
search¶
Find strings within edit distance threshold.
Note
Consider using search_similarity() instead for consistency with NgramIndex and HybridIndex APIs.
search_similarity¶
search_similarity(
query: str,
min_similarity: float,
limit: int | None = None
) -> list[SearchResult]
Find strings above similarity threshold. Recommended over search() for API consistency.
The similarity is computed as: 1 - (distance / max(len(query), len(match)))
save / load¶
HybridIndex¶
Combined N-gram and similarity index.
Constructor¶
Methods¶
Same as NgramIndex: add, add_all, search, batch_search, contains.
SchemaBuilder¶
Build multi-field matching schemas.
Methods¶
add_field¶
Add a field to the schema.
build¶
Build the schema.
SchemaIndex¶
Index for multi-field record matching.
Constructor¶
Methods¶
search¶
search(
query: dict,
limit: int | None = None,
min_similarity: float = 0.0
) -> list[SchemaSearchResult]
Search for matching records.
Returns: List of SchemaSearchResult(id, score, record, field_scores)
Result Types¶
SearchResult¶
@dataclass
class SearchResult:
id: int # Index ID
text: str # Matched text
score: float # Similarity score
distance: int | None # Edit distance (if applicable)
data: int | None # Associated data