FuzzyRust¶

High-performance string similarity library for Python, written in Rust.

Features¶

Fast: Rust-powered algorithms with parallel processing via Rayon
Comprehensive: Levenshtein, Jaro-Winkler, N-gram, Cosine, Phonetic, and more
Scalable: Index structures (BK-tree, N-gram index) for efficient large-scale matching
Polars Integration: Native DataFrame operations for fuzzy joins and deduplication
Multi-field Matching: Schema-based matching with weighted field scoring

Quick Example¶

import fuzzyrust as fr
from fuzzyrust import batch, polars as frp

# Simple similarity
fr.jaro_winkler_similarity("hello", "hallo")  # 0.88

# Find best matches from a list
batch.best_matches(["apply", "maple", "orange"], "apple", limit=2)

# Fuzzy join with Polars
import polars as pl

df1 = pl.DataFrame({"name": ["John Smith", "Jane Doe"]})
df2 = pl.DataFrame({"customer": ["Jon Smith", "Janet Doe"]})
result = frp.df_join(df1, df2, left_on="name", right_on="customer", min_similarity=0.8)

Installation¶

pip install fuzzyrust

Performance¶

FuzzyRust is designed for performance:

Operation	vs RapidFuzz
Single pair	Competitive (~1x)
Batch (10K)	5-10x faster
Index search	100-2000x faster

Next Steps¶

Installation - Detailed installation instructions
Quickstart - Get started in 5 minutes
API Reference - Complete function documentation