Configuration
| Setting | Description |
|---|---|
| Code | Python code to execute. Must call load_data() to get a pandas DataFrame and save_data(df) to write the result |
| Description | Human-readable description of what the code does. Important for non-technical users who view the workflow |
| Result Fields | New columns the code creates. Each has a name and type (str, int, float, bool, date, datetime). Must match the columns your code adds to the DataFrame |
| Timeout | Max execution time in seconds (default: 60, range: 5–300). Increase for large datasets |
| Row Limit | Max rows to process (default: 100,000, range: 1–250,000). Table is sliced if larger |
| Virtual Object Name | Namespace prefix for output fields (default: code_execution). Output columns are named as {virtual_object_name}.{field_name} |
How It Works
Your code uses two helper functions:load_data()— returns a pandas DataFrame with all columns using human-readable names (e.g.,email,revenue,company_name)save_data(df)— saves the modified DataFrame back. Must be called or the node fails
Write your Python code
Call
load_data() to get a pandas DataFrame. Transform it, add new columns, then call save_data(df).Define result fields
Declare every new column your code creates, along with its data type. These must match the columns you actually add to the DataFrame.
Output
The transformed DataFrame, with any new columns available as Mentions in downstream nodes.Pre-installed Packages
pandas, numpy, scipy, scikit-learn, pyarrow, rapidfuzz, plus Python stdlib: re, json, datetime, math, statistics, collections, itertools
Examples
Extract email domains
Pull the domain from an email field:domain (str)
Revenue-per-employee scoring
Calculate a score and classify companies into tiers:score (float), tier (str)
Percentile-based health score
Rank accounts across multiple dimensions:health_score (float)
Conditional logic across multiple fields
Classify leads based on combined criteria:segment (str)
Deduplication with custom logic
Keep the most complete record per email:Best Practices
- Always call both
load_data()andsave_data(df)— the node fails without them - Column names in the DataFrame are human-readable (e.g.,
email,revenue, not internal IDs) - Add new columns by assigning to the DataFrame (e.g.,
df['new_col'] = ...) - Declare every new column in Result Fields so downstream nodes can use them
- Start with a small Row Limit while testing, then increase for production runs
- Use the Description field to explain what the code does for non-technical team members
Related Nodes
- Transform — simpler formula/SQL transforms without writing Python
- Query — run SQL directly against the data warehouse
- Regex Pattern — pattern matching without custom code