Code Execution - BonData Documentation

Runs custom Python code in a secure, isolated sandbox to transform your workflow data. The code executes in an E2B sandboxed environment. Use when you need custom Python logic that cannot be expressed with Transform formulas or SQL expressions. Common use cases: complex string parsing with regex, conditional logic across multiple fields, statistical calculations (percentiles, z-scores), custom classification with multiple rules, pivot/unpivot operations, or row-level deduplication with custom logic.

Prefer the Transform node for simple operations like upper(), concat(), or arithmetic - it’s faster and doesn’t require sandbox execution.

Configuration

Setting	Description
Code	Python code to execute. Must call `load_data()` to get a pandas DataFrame and `save_data(df)` to write the result
Description	Human-readable description of what the code does. Important for non-technical users who view the workflow
Result Fields	New columns the code creates. Each has a name and type (`str`, `int`, `float`, `bool`, `date`, `datetime`). Must match the columns your code adds to the DataFrame
Timeout	Max execution time in seconds (default: 60, range: 5–300). Increase for large datasets
Row Limit	Max rows to process (default: 100,000, range: 1–250,000). Table is sliced if larger
Virtual Object Name	Namespace prefix for output fields (default: `code_execution`). Output columns are named as `{virtual_object_name}.{field_name}`

How It Works

Your code uses two helper functions:

load_data() - returns a pandas DataFrame with all columns using human-readable names (e.g., email, revenue, company_name)
save_data(df) - saves the modified DataFrame back. Must be called or the node fails

New columns added to the DataFrame automatically become virtual fields available to downstream nodes. You must declare these in Result Fields so downstream nodes can reference them.

Write your Python code

Call load_data() to get a pandas DataFrame. Transform it, add new columns, then call save_data(df).

Define result fields

Declare every new column your code creates, along with its data type. These must match the columns you actually add to the DataFrame.

Set limits

Configure timeout and row limit to prevent runaway execution. Start small while testing.

Output

The transformed DataFrame, with any new columns available as Mentions in downstream nodes.

Pre-installed Packages

pandas, numpy, scipy, scikit-learn, pyarrow, rapidfuzz, plus Python stdlib: re, json, datetime, math, statistics, collections, itertools

No pip install. No file system access outside /tmp. No network access.

Examples

Extract email domains

Pull the domain from an email field:

df = load_data()
df['domain'] = df['email'].str.split('@').str[1]
save_data(df)

Result fields: domain (str)

Revenue-per-employee scoring

Calculate a score and classify companies into tiers:

df = load_data()
df['score'] = df['revenue'] / df['employees']
df['tier'] = df['score'].apply(lambda x: 'enterprise' if x > 100 else 'smb')
save_data(df)

Result fields: score (float), tier (str)

Percentile-based health score

Rank accounts across multiple dimensions:

df = load_data()

df['health_score'] = (
    df['revenue'].rank(pct=True) * 0.4 +
    df['contact_count'].rank(pct=True) * 0.3 +
    df['activity_score'].rank(pct=True) * 0.3
)

save_data(df)

Result fields: health_score (float)

Conditional logic across multiple fields

Classify leads based on combined criteria:

import re

df = load_data()

def classify_lead(row):
    if row['revenue'] > 1000000 and row['employees'] > 500:
        return 'enterprise'
    elif row['revenue'] > 100000:
        return 'mid-market'
    elif re.search(r'\.edu$', str(row['email'])):
        return 'education'
    else:
        return 'smb'

df['segment'] = df.apply(classify_lead, axis=1)
save_data(df)

Result fields: segment (str)

Deduplication with custom logic

Keep the most complete record per email:

df = load_data()

# Score completeness (count of non-null fields per row)
df['completeness'] = df.notna().sum(axis=1)

# Keep the most complete record per email
df = df.sort_values('completeness', ascending=False).drop_duplicates(subset=['email'], keep='first')
df = df.drop(columns=['completeness'])

save_data(df)

Result fields: none (same columns, fewer rows)

Best Practices

Always call both load_data() and save_data(df) - the node fails without them
Column names in the DataFrame are human-readable (e.g., email, revenue, not internal IDs)
Add new columns by assigning to the DataFrame (e.g., df['new_col'] = ...)
Declare every new column in Result Fields so downstream nodes can use them
Start with a small Row Limit while testing, then increase for production runs
Use the Description field to explain what the code does for non-technical team members

Transform - simpler formula/SQL transforms without writing Python
Query - run SQL directly against the data warehouse
Regex Pattern - pattern matching without custom code

​Configuration

​How It Works

​Output

​Pre-installed Packages

​Examples

​Extract email domains

​Revenue-per-employee scoring

​Percentile-based health score

​Conditional logic across multiple fields

​Deduplication with custom logic

​Best Practices

​Related Nodes

Configuration

How It Works

Output

Pre-installed Packages

Examples

Extract email domains

Revenue-per-employee scoring

Percentile-based health score

Conditional logic across multiple fields

Deduplication with custom logic

Best Practices

Related Nodes