Skip to main content
The Insert Rows node inserts data from your workflows into Google BigQuery tables with advanced mapping capabilities and write mode options. This is an AI-powered node that can understand natural language instructions for data transformation and mapping.

When to Use It

  • Store workflow data in BigQuery for analytics and reporting
  • Build comprehensive data warehouses from multiple sources
  • Archive performance data for long-term trend analysis
  • Integrate marketing data with business intelligence tools
  • Create unified datasets from fragmented data sources
  • Populate tables with processed and cleaned data

Inputs

FieldTypeRequiredDescription
ProjectSelectYesSelect the Google BigQuery project containing the target table
DatasetSelectYesSelect the dataset containing the target table
TableSelectYesSelect the target table to insert data into
DataDataYesData source from previous workflow steps to insert
Write TypeSelectYesHow to handle the data insertion (Append/Replace/Create)
Column MappingMapperNoMap input data fields to specific table columns
Skip Invalid RowsToggleNoSkip rows that fail validation instead of failing entire operation

Write Type Options

Write TypeDescriptionWhen to Use
AppendAdd new rows to existing table dataIncremental data loading, daily updates
ReplaceDelete all existing data and insert new dataFull data refreshes, snapshot updates
CreateCreate new table if it doesn’t exist, then appendDynamic table creation workflows

Column Mapping

Automatic Mapping (No Column Mapping):
  • Maps input fields to table columns by exact name matching
  • Best when input data structure matches table schema
  • Fastest setup for standard data workflows
Custom Mapping (With Column Mapping):
  • Manually map specific input fields to table columns
  • Transform field names and select specific data
  • Handle schema mismatches and data transformations
Example Column Mapping:
Input Field: "campaign_name" → Table Column: "campaign"
Input Field: "total_cost" → Table Column: "spend_usd"
Input Field: "click_count" → Table Column: "clicks"

Output

Returns insertion operation results and statistics:
{
  "rows_inserted": 1250,
  "rows_failed": 5,
  "table_id": "campaign_performance",
  "dataset_id": "marketing_data",
  "project_id": "my-project-123",
  "write_type": "Append",
  "schema_validated": true,
  "invalid_rows_skipped": 5,
  "insertion_time": "2024-10-17T10:30:00Z"
}

Output Fields:

FieldDescription
rows_insertedNumber of rows successfully inserted
rows_failedNumber of rows that failed validation
table_idTarget table name
dataset_idTarget dataset name
project_idBigQuery project
write_typeWrite operation mode used
schema_validatedWhether schema validation passed
invalid_rows_skippedRows skipped due to validation errors
insertion_timeWhen the operation completed

Credit Cost

  • Cost per run: 1 credit (regardless of number of rows)

FAQs

Append Mode: How it works:
  • Adds new rows to existing table data
  • Preserves all existing data in the table
  • Most common mode for data pipelines
Best for:
  • Daily/hourly data updates
  • Incremental data loading
  • Event logging and tracking
  • Growing datasets over time
Example scenario:
Day 1: Table has 1000 rows → Append 50 rows → Table has 1050 rows
Day 2: Table has 1050 rows → Append 75 rows → Table has 1125 rows
Replace Mode: How it works:
  • Deletes ALL existing data from table
  • Inserts new data as the complete dataset
  • Equivalent to TRUNCATE + INSERT
Best for:
  • Full data refreshes
  • Snapshot datasets
  • Dimension tables that change completely
  • When you want to replace stale data
Example scenario:
Before: Table has 1000 rows of old data
After Replace: Table has 500 rows of fresh data only
Create Mode: How it works:
  • Creates table if it doesn’t exist
  • If table exists, appends data (like Append mode)
  • Automatically infers schema from input data
Best for:
  • Dynamic workflows that create tables on-demand
  • Prototype development
  • One-time data exports
  • When table existence is uncertain
Choosing the Right Mode:
  • Daily updates: Append
  • Monthly refreshes: Replace
  • Dynamic workflows: Create
  • Real-time data: Append
  • Reporting snapshots: Replace
Automatic Mapping (No Column Mapping Specified): How it works:
  • Matches input data fields to table columns by exact name
  • Case-sensitive matching
  • Ignores input fields that don’t have matching table columns
  • Fails if required table columns are missing from input
Example:
Input Data: {"campaign_name": "Summer Sale", "clicks": 150}
Table Columns: campaign_name, clicks, impressions
Result: Maps campaign_name and clicks, ignores missing impressions
Custom Column Mapping: How it works:
  • Explicitly maps each input field to specific table column
  • Allows field renaming and transformation
  • Can map multiple input fields to single table column
  • Can skip input fields or leave table columns empty
Common Mapping Scenarios:Field Renaming:
Input: "campaign" → Table: "campaign_name"
Input: "cost" → Table: "spend_usd"
Input: "date" → Table: "report_date"
Data Transformation:
Input: "cost_cents" → Table: "cost_dollars" (with division by 100)
Input: "click_rate" → Table: "ctr_percentage" (with percentage conversion)
Selective Mapping:
Input Fields: campaign, clicks, impressions, cost, conversion_rate
Mapped Fields: campaign → name, clicks → total_clicks
Ignored Fields: impressions, cost, conversion_rate
When to Use Custom Mapping:
  • Schema mismatch: Input structure differs from table
  • Field renaming: Need to transform field names
  • Selective data: Only want specific fields from input
  • Data transformation: Need to modify values during insertion
  • Legacy systems: Working with existing table schemas
Row Validation Process:Schema Validation:
  • Data type checking: STRING data going to INTEGER field
  • Required field validation: NULL values in NOT NULL columns
  • Format validation: Invalid date/timestamp formats
  • Length validation: STRING data exceeding column limits
Common Validation Failures:Type Mismatches:
Input: {"user_id": "abc123"}  // STRING
Table Column: user_id INTEGER
Result: Validation fails, row rejected
Missing Required Fields:
Input: {"name": "John"}
Table: name STRING, user_id INTEGER NOT NULL
Result: Validation fails, user_id is required
Invalid Formats:
Input: {"created_date": "2024-13-45"}  // Invalid date
Table Column: created_date DATE
Result: Validation fails, invalid date format
Skip Invalid Rows Options:Skip Invalid Rows = True (Recommended):
  • Invalid rows are skipped and logged
  • Valid rows are still inserted
  • Operation continues and completes
  • Returns count of failed rows in output
Skip Invalid Rows = False:
  • Any invalid row causes entire operation to fail
  • No data is inserted if any row fails
  • Operation stops at first validation error
  • Use for strict data quality requirements
Error Handling Strategies:Data Quality Pipeline:
[Raw Data] → [Data Validation Node] → [Clean Invalid Data]
→ [Insert Rows] → [Log Failed Rows]
Graceful Degradation:
[Insert Rows with Skip = True] → [Check Failed Count]
→ [If >10% failed, Alert Team] → [Continue Workflow]
Monitoring and Alerting:
  • Monitor rows_failed count in output
  • Set up alerts for high failure rates
  • Log failed rows for investigation
  • Implement data quality metrics
Data Volume Considerations:Small Datasets (< 10k rows):
  • Single insertion operation works well
  • No special optimization needed
  • Focus on data quality and schema alignment
Medium Datasets (10k - 1M rows):
  • Consider batching data into chunks
  • Use partitioned tables for time-series data
  • Monitor insertion performance
  • Optimize column mapping complexity
Large Datasets (> 1M rows):
  • Break into multiple smaller operations
  • Use BigQuery Load Jobs for very large datasets
  • Consider streaming inserts for real-time data
  • Implement parallel processing workflows
Performance Optimization Strategies:Batch Processing:
[Large Dataset] → [Split into Batches] → [Parallel Insert Operations]
→ [Combine Results] → [Validate Complete]
Table Design for Performance:
  • Partitioned tables: Use for time-series data
  • Clustered tables: Optimize for query patterns
  • Appropriate data types: Use specific types, not generic STRING
  • Schema optimization: Remove unnecessary columns
Workflow Design Patterns:Parallel Processing:
[Data Source] → [Split by Date Range]
├── [Insert Jan Data] → [Table Partition]
├── [Insert Feb Data] → [Table Partition]  
└── [Insert Mar Data] → [Table Partition]
Incremental Loading:
[Check Last Update] → [Get New Data Since] → [Insert Rows]
→ [Update Watermark] → [Schedule Next Run]
Error Recovery:
[Insert Rows] → [Check Success Rate] → [If Failed > Threshold]
→ [Retry with Smaller Batches] → [Alert on Persistent Failures]
Monitoring Performance:
  • Track insertion_time in outputs
  • Monitor BigQuery slot usage
  • Watch for quota limitations
  • Measure rows per second throughput
  • Set up performance alerts
Schema Design Principles:Choose Specific Data Types:
// Good - Specific types
{
  "user_id": "INTEGER",
  "email": "STRING",
  "signup_date": "DATE", 
  "last_login": "TIMESTAMP",
  "is_premium": "BOOLEAN",
  "account_balance": "FLOAT",
  "metadata": "JSON"
}

// Avoid - Everything as STRING
{
  "user_id": "STRING",      // Should be INTEGER
  "signup_date": "STRING",  // Should be DATE
  "is_premium": "STRING"    // Should be BOOLEAN
}
Data Type Selection Guide:Numeric Data:
  • INTEGER: Whole numbers, IDs, counts, ages
  • FLOAT: Decimal numbers, percentages, currency
  • NUMERIC: High-precision decimals, financial data
Text Data:
  • STRING: Names, descriptions, categories
  • JSON: Complex nested objects, flexible schemas
Date/Time Data:
  • DATE: Dates only (2024-10-17)
  • TIME: Time only (14:30:00)
  • TIMESTAMP: Full date/time with timezone
  • DATETIME: Date/time without timezone
Boolean Data:
  • BOOLEAN: True/false flags, status indicators
Common Data Type Mappings:Marketing Data:
{
  "campaign_id": "INTEGER",
  "campaign_name": "STRING",
  "start_date": "DATE",
  "created_at": "TIMESTAMP",
  "budget": "FLOAT",
  "is_active": "BOOLEAN",
  "targeting": "JSON"
}
E-commerce Data:
{
  "order_id": "INTEGER",
  "customer_email": "STRING", 
  "order_date": "DATE",
  "order_total": "FLOAT",
  "is_paid": "BOOLEAN",
  "items": "JSON"
}
Schema Evolution Best Practices:Start Flexible:
  • Use nullable fields initially
  • Plan for additional data over time
  • Consider JSON fields for evolving structures
  • Document schema changes
Handle Missing Data:
-- Use COALESCE for default values
SELECT 
  campaign_name,
  COALESCE(budget, 0) as budget_with_default
FROM campaigns

-- Use conditional logic
SELECT
  CASE 
    WHEN budget IS NOT NULL THEN budget
    ELSE 'Budget not set'
  END as budget_status
FROM campaigns
Validation and Quality:
  • Validate data types before insertion
  • Use consistent naming conventions
  • Document field meanings and formats
  • Implement data quality checks
  • Monitor schema compliance
Common Integration Patterns:Data Pipeline Workflows:
[API Data Source] → [Transform Data] → [Insert Rows] 
→ [Validate Results] → [Trigger Analytics]
Multi-Source Data Warehouse:
[Google Ads] → [Transform] → [Insert to ads_table]
[Facebook Ads] → [Transform] → [Insert to ads_table]  
[LinkedIn Ads] → [Transform] → [Insert to ads_table]
→ [Union All Sources] → [Generate Report]
Real-Time Data Processing:
[Webhook Trigger] → [Parse Data] → [Insert Rows]
→ [Check for Anomalies] → [Alert if Needed]
Scheduled Data Sync:
[Schedule: Daily 6 AM] → [Get Yesterday's Data]
→ [Clean and Validate] → [Insert Rows] → [Email Summary]
Data Validation Pipeline:
[Raw Data] → [Data Quality Checks] → [Insert Valid Rows]
→ [Log Invalid Data] → [Alert on Quality Issues]
Integration with BI Tools:Automated Reporting:
[Insert Daily Data] → [Trigger BigQuery View Refresh]
→ [Update Dashboard] → [Email Stakeholders]
ML Feature Pipeline:
[Extract Features] → [Transform for ML] → [Insert to Feature Store]
→ [Trigger Model Training] → [Deploy Updated Model]
Data Export Workflows:Cross-Platform Sync:
[BigQuery Data] → [Transform Format] → [Insert to Sheets]
→ [Update Slides] → [Share with Team]
Backup and Archive:
[Production Data] → [Insert to Archive Table]
→ [Compress Old Data] → [Update Retention Policies]
Error Handling Across Workflows:Robust Pipeline:
[Data Source] → [Try Insert Rows]
→ [If Failed: Log Error] → [Try Alternative Source]
→ [If All Failed: Alert Team] → [Continue with Cached Data]
Monitoring Integration:
[Insert Rows] → [Check rows_inserted Count]
→ [If Below Threshold: Investigation Alert]
→ [Update Monitoring Dashboard]
Best Practices for Integration:
  • Use consistent error handling across all nodes
  • Monitor data flow metrics at each step
  • Implement circuit breakers for failing data sources
  • Log all data transformations for debugging
  • Test integration points with sample data
  • Document data lineage and dependencies