Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.bondata.ai/llms.txt

Use this file to discover all available pages before exploring further.

Detects duplicate records using fuzzy matching and configurable thresholds. Outputs two paths: All Records and Deduplicated - so you can handle duplicates and clean records differently.

Configuration tab

SettingDescription
Preset ConfigurationChoose a starting point - Conservative (95%), Balanced (92%), Aggressive (88%), or Custom
Auto-merge thresholdRecords scoring above this are merged automatically
Review thresholdRecords between review and auto-merge thresholds need manual review
Fields to CompareSelect fields and set comparison type (Fuzzy or Exact) and weight

Rules & Performance tab

SettingDescription
Blocking KeysOnly compare records that share a blocking key value - dramatically improves performance on large datasets
Must-Match RulesRecords must match on these fields to be considered duplicates
Must-Not-Match RulesRecords matching on these fields are never considered duplicates

Advanced tab

Normalization Options - applied before comparison:
  • Trim & lowercase
  • Remove punctuation
  • Unicode normalize
  • Ignore company suffixes (e.g., “Inc”, “LLC”)
  • Phone normalize (E.164)
  • Address normalize
Safety Rails - prevent unintended mass merges:
SettingDescription
Dry runSimulate without actually merging
Max cluster sizePrevents one bad key from merging hundreds of records
Max total mergesStops after N merges for gradual rollout

Output

Two output paths:
  • All Records - every record with duplicate scores attached
  • Deduplicated - clean dataset with duplicates removed

Best Practices

  • Start with the Balanced preset and adjust thresholds based on results
  • Use Blocking Keys for large datasets - comparing every record pair is expensive
  • Enable Dry run first to preview results before committing merges
  • Set Max total merges for gradual rollout on critical data
  • Bond Node - matches records across different entities, not within the same dataset
  • Data Normalization - clean field values before duplicate detection for better accuracy