Regex Pattern - BonData Documentation

Applies regex patterns to field values for classification, extraction, detection, or multi-labeling. Use it to categorize records, pull out structured data from text, or flag records matching specific patterns.

Configuration

Setting	Description
Mode	Operation mode: Classify, Extract, Detect, or Multi-Label
Input Field	The field to apply regex patterns to
Case Insensitive	Ignore case when matching (default: enabled)
Virtual Object Name	Namespace prefix for output fields (default: `regex_pattern`)

Mode-specific settings

Classify - first matching pattern assigns a label:

Setting	Description
Rules	Ordered list of `{label, pattern, exclude_pattern}`
Output Field Name	Name for the label column
Default Value	Label when no pattern matches

Extract - capture groups pull out structured data:

Setting	Description
Extract Pattern	Regex with capture groups
Extract Groups	Map each group to `{group_index, output_field, cast_type}`

Detect - single pattern produces a boolean:

Setting	Description
Rules	Single pattern rule
Output Field Name	Name for the boolean column

Multi-Label - each pattern produces an independent boolean:

Setting	Description
Rules	List of `{label, pattern}` - each creates a separate boolean column

How It Works

Choose a mode

Select the operation that fits your use case - classification, extraction, detection, or multi-labeling.

Select the input field

Choose which field to apply patterns to.

Define patterns

Write regex patterns. For Classify and Multi-Label, add multiple rules with labels.

Output

Depends on the mode:

Classify: a single label column with the first matching category
Extract: one column per capture group, optionally cast to specific types
Detect: a single boolean column
Multi-Label: one boolean column per rule

Examples

Classify products by name

Mode: Classify
Input: Product Name
Rules:
- Label “Electronics” → pattern phone|laptop|tablet
- Label “Clothing” → pattern shirt|pants|jacket
Default: “Other”

Extract price and currency

Mode: Extract
Input: Price Text (e.g., “USD 149.99”)
Pattern: ([A-Z]{3})\s+(\d+\.\d+)
Groups: group 1 → currency (str), group 2 → amount (float)

Detect email addresses

Mode: Detect
Input: Notes field
Pattern: [\w.-]+@[\w.-]+\.\w+
Output: has_email (boolean)

Best Practices

Use Classify for first-match-wins categorization (order rules from most specific to most general)
Use Extract when you need to pull structured data out of text
Use Detect for simple yes/no pattern presence checks
Use Multi-Label when a record can belong to multiple categories simultaneously
Test patterns on sample data before running on the full dataset

Transform - rule-based field computation without regex
Data Normalization - LLM-powered text cleaning when regex is too rigid
AI Enrichment - LLM-based classification when pattern matching isn’t sufficient

​Configuration

​Mode-specific settings

​How It Works

​Output

​Examples

​Classify products by name

​Extract price and currency

​Detect email addresses

​Best Practices

​Related Nodes