Skip to main content
Runs custom Python code in a secure, isolated sandbox to transform your workflow data. The code executes in an E2B sandboxed environment. Use when you need custom Python logic that cannot be expressed with Transform formulas or SQL expressions. Common use cases: complex string parsing with regex, conditional logic across multiple fields, statistical calculations (percentiles, z-scores), custom classification with multiple rules, pivot/unpivot operations, or row-level deduplication with custom logic.
Prefer the Transform node for simple operations like upper(), concat(), or arithmetic — it’s faster and doesn’t require sandbox execution.

Configuration

SettingDescription
CodePython code to execute. Must call load_data() to get a pandas DataFrame and save_data(df) to write the result
DescriptionHuman-readable description of what the code does. Important for non-technical users who view the workflow
Result FieldsNew columns the code creates. Each has a name and type (str, int, float, bool, date, datetime). Must match the columns your code adds to the DataFrame
TimeoutMax execution time in seconds (default: 60, range: 5–300). Increase for large datasets
Row LimitMax rows to process (default: 100,000, range: 1–250,000). Table is sliced if larger
Virtual Object NameNamespace prefix for output fields (default: code_execution). Output columns are named as {virtual_object_name}.{field_name}

How It Works

Your code uses two helper functions:
  • load_data() — returns a pandas DataFrame with all columns using human-readable names (e.g., email, revenue, company_name)
  • save_data(df) — saves the modified DataFrame back. Must be called or the node fails
New columns added to the DataFrame automatically become virtual fields available to downstream nodes. You must declare these in Result Fields so downstream nodes can reference them.
1

Write your Python code

Call load_data() to get a pandas DataFrame. Transform it, add new columns, then call save_data(df).
2

Define result fields

Declare every new column your code creates, along with its data type. These must match the columns you actually add to the DataFrame.
3

Set limits

Configure timeout and row limit to prevent runaway execution. Start small while testing.

Output

The transformed DataFrame, with any new columns available as Mentions in downstream nodes.

Pre-installed Packages

pandas, numpy, scipy, scikit-learn, pyarrow, rapidfuzz, plus Python stdlib: re, json, datetime, math, statistics, collections, itertools
No pip install. No file system access outside /tmp. No network access.

Examples

Extract email domains

Pull the domain from an email field:
df = load_data()
df['domain'] = df['email'].str.split('@').str[1]
save_data(df)
Result fields: domain (str)

Revenue-per-employee scoring

Calculate a score and classify companies into tiers:
df = load_data()
df['score'] = df['revenue'] / df['employees']
df['tier'] = df['score'].apply(lambda x: 'enterprise' if x > 100 else 'smb')
save_data(df)
Result fields: score (float), tier (str)

Percentile-based health score

Rank accounts across multiple dimensions:
df = load_data()

df['health_score'] = (
    df['revenue'].rank(pct=True) * 0.4 +
    df['contact_count'].rank(pct=True) * 0.3 +
    df['activity_score'].rank(pct=True) * 0.3
)

save_data(df)
Result fields: health_score (float)

Conditional logic across multiple fields

Classify leads based on combined criteria:
import re

df = load_data()

def classify_lead(row):
    if row['revenue'] > 1000000 and row['employees'] > 500:
        return 'enterprise'
    elif row['revenue'] > 100000:
        return 'mid-market'
    elif re.search(r'\.edu$', str(row['email'])):
        return 'education'
    else:
        return 'smb'

df['segment'] = df.apply(classify_lead, axis=1)
save_data(df)
Result fields: segment (str)

Deduplication with custom logic

Keep the most complete record per email:
df = load_data()

# Score completeness (count of non-null fields per row)
df['completeness'] = df.notna().sum(axis=1)

# Keep the most complete record per email
df = df.sort_values('completeness', ascending=False).drop_duplicates(subset=['email'], keep='first')
df = df.drop(columns=['completeness'])

save_data(df)
Result fields: none (same columns, fewer rows)

Best Practices

  • Always call both load_data() and save_data(df) — the node fails without them
  • Column names in the DataFrame are human-readable (e.g., email, revenue, not internal IDs)
  • Add new columns by assigning to the DataFrame (e.g., df['new_col'] = ...)
  • Declare every new column in Result Fields so downstream nodes can use them
  • Start with a small Row Limit while testing, then increase for production runs
  • Use the Description field to explain what the code does for non-technical team members
  • Transform — simpler formula/SQL transforms without writing Python
  • Query — run SQL directly against the data warehouse
  • Regex Pattern — pattern matching without custom code