Skip to main content
Applies regex patterns to field values for classification, extraction, detection, or multi-labeling. Use it to categorize records, pull out structured data from text, or flag records matching specific patterns.

Configuration

SettingDescription
ModeOperation mode: Classify, Extract, Detect, or Multi-Label
Input FieldThe field to apply regex patterns to
Case InsensitiveIgnore case when matching (default: enabled)
Virtual Object NameNamespace prefix for output fields (default: regex_pattern)

Mode-specific settings

Classify — first matching pattern assigns a label:
SettingDescription
RulesOrdered list of {label, pattern, exclude_pattern}
Output Field NameName for the label column
Default ValueLabel when no pattern matches
Extract — capture groups pull out structured data:
SettingDescription
Extract PatternRegex with capture groups
Extract GroupsMap each group to {group_index, output_field, cast_type}
Detect — single pattern produces a boolean:
SettingDescription
RulesSingle pattern rule
Output Field NameName for the boolean column
Multi-Label — each pattern produces an independent boolean:
SettingDescription
RulesList of {label, pattern} — each creates a separate boolean column

How It Works

1

Choose a mode

Select the operation that fits your use case — classification, extraction, detection, or multi-labeling.
2

Select the input field

Choose which field to apply patterns to.
3

Define patterns

Write regex patterns. For Classify and Multi-Label, add multiple rules with labels.

Output

Depends on the mode:
  • Classify: a single label column with the first matching category
  • Extract: one column per capture group, optionally cast to specific types
  • Detect: a single boolean column
  • Multi-Label: one boolean column per rule

Examples

Classify products by name

  • Mode: Classify
  • Input: Product Name
  • Rules:
    • Label “Electronics” → pattern phone|laptop|tablet
    • Label “Clothing” → pattern shirt|pants|jacket
  • Default: “Other”

Extract price and currency

  • Mode: Extract
  • Input: Price Text (e.g., “USD 149.99”)
  • Pattern: ([A-Z]{3})\s+(\d+\.\d+)
  • Groups: group 1 → currency (str), group 2 → amount (float)

Detect email addresses

  • Mode: Detect
  • Input: Notes field
  • Pattern: [\w.-]+@[\w.-]+\.\w+
  • Output: has_email (boolean)

Best Practices

  • Use Classify for first-match-wins categorization (order rules from most specific to most general)
  • Use Extract when you need to pull structured data out of text
  • Use Detect for simple yes/no pattern presence checks
  • Use Multi-Label when a record can belong to multiple categories simultaneously
  • Test patterns on sample data before running on the full dataset
  • Transform — rule-based field computation without regex
  • Data Normalization — LLM-powered text cleaning when regex is too rigid
  • AI Enrichment — LLM-based classification when pattern matching isn’t sufficient