When to Use It
- Store workflow data in BigQuery for analytics and reporting
- Build comprehensive data warehouses from multiple sources
- Archive performance data for long-term trend analysis
- Integrate marketing data with business intelligence tools
- Create unified datasets from fragmented data sources
- Populate tables with processed and cleaned data
Inputs
| Field | Type | Required | Description |
|---|---|---|---|
| Project | Select | Yes | Select the Google BigQuery project containing the target table |
| Dataset | Select | Yes | Select the dataset containing the target table |
| Table | Select | Yes | Select the target table to insert data into |
| Data | Data | Yes | Data source from previous workflow steps to insert |
| Write Type | Select | Yes | How to handle the data insertion (Append/Replace/Create) |
| Column Mapping | Mapper | No | Map input data fields to specific table columns |
| Skip Invalid Rows | Toggle | No | Skip rows that fail validation instead of failing entire operation |
Write Type Options
| Write Type | Description | When to Use |
|---|---|---|
| Append | Add new rows to existing table data | Incremental data loading, daily updates |
| Replace | Delete all existing data and insert new data | Full data refreshes, snapshot updates |
| Create | Create new table if it doesn’t exist, then append | Dynamic table creation workflows |
Column Mapping
Automatic Mapping (No Column Mapping):- Maps input fields to table columns by exact name matching
- Best when input data structure matches table schema
- Fastest setup for standard data workflows
- Manually map specific input fields to table columns
- Transform field names and select specific data
- Handle schema mismatches and data transformations
Output
Returns insertion operation results and statistics:Output Fields:
| Field | Description |
|---|---|
| rows_inserted | Number of rows successfully inserted |
| rows_failed | Number of rows that failed validation |
| table_id | Target table name |
| dataset_id | Target dataset name |
| project_id | BigQuery project |
| write_type | Write operation mode used |
| schema_validated | Whether schema validation passed |
| invalid_rows_skipped | Rows skipped due to validation errors |
| insertion_time | When the operation completed |
Credit Cost
- Cost per run: 1 credit (regardless of number of rows)
FAQs
What's the difference between Append, Replace, and Create write types?
What's the difference between Append, Replace, and Create write types?
Append Mode:
How it works:Replace Mode:
How it works:Create Mode:
How it works:
- Adds new rows to existing table data
- Preserves all existing data in the table
- Most common mode for data pipelines
- Daily/hourly data updates
- Incremental data loading
- Event logging and tracking
- Growing datasets over time
- Deletes ALL existing data from table
- Inserts new data as the complete dataset
- Equivalent to TRUNCATE + INSERT
- Full data refreshes
- Snapshot datasets
- Dimension tables that change completely
- When you want to replace stale data
- Creates table if it doesn’t exist
- If table exists, appends data (like Append mode)
- Automatically infers schema from input data
- Dynamic workflows that create tables on-demand
- Prototype development
- One-time data exports
- When table existence is uncertain
- Daily updates: Append
- Monthly refreshes: Replace
- Dynamic workflows: Create
- Real-time data: Append
- Reporting snapshots: Replace
How does column mapping work and when should I use it?
How does column mapping work and when should I use it?
Automatic Mapping (No Column Mapping Specified):
How it works:Custom Column Mapping:
How it works:Data Transformation:Selective Mapping:When to Use Custom Mapping:
- Matches input data fields to table columns by exact name
- Case-sensitive matching
- Ignores input fields that don’t have matching table columns
- Fails if required table columns are missing from input
- Explicitly maps each input field to specific table column
- Allows field renaming and transformation
- Can map multiple input fields to single table column
- Can skip input fields or leave table columns empty
- Schema mismatch: Input structure differs from table
- Field renaming: Need to transform field names
- Selective data: Only want specific fields from input
- Data transformation: Need to modify values during insertion
- Legacy systems: Working with existing table schemas
What happens when rows fail validation and how can I handle it?
What happens when rows fail validation and how can I handle it?
Row Validation Process:Schema Validation:Missing Required Fields:Invalid Formats:Skip Invalid Rows Options:Skip Invalid Rows = True (Recommended):Graceful Degradation:Monitoring and Alerting:
- Data type checking: STRING data going to INTEGER field
- Required field validation: NULL values in NOT NULL columns
- Format validation: Invalid date/timestamp formats
- Length validation: STRING data exceeding column limits
- Invalid rows are skipped and logged
- Valid rows are still inserted
- Operation continues and completes
- Returns count of failed rows in output
- Any invalid row causes entire operation to fail
- No data is inserted if any row fails
- Operation stops at first validation error
- Use for strict data quality requirements
- Monitor
rows_failedcount in output - Set up alerts for high failure rates
- Log failed rows for investigation
- Implement data quality metrics
How do I optimize performance for large data insertions?
How do I optimize performance for large data insertions?
Data Volume Considerations:Small Datasets (< 10k rows):Table Design for Performance:Incremental Loading:Error Recovery:Monitoring Performance:
- Single insertion operation works well
- No special optimization needed
- Focus on data quality and schema alignment
- Consider batching data into chunks
- Use partitioned tables for time-series data
- Monitor insertion performance
- Optimize column mapping complexity
- Break into multiple smaller operations
- Use BigQuery Load Jobs for very large datasets
- Consider streaming inserts for real-time data
- Implement parallel processing workflows
- Partitioned tables: Use for time-series data
- Clustered tables: Optimize for query patterns
- Appropriate data types: Use specific types, not generic STRING
- Schema optimization: Remove unnecessary columns
- Track
insertion_timein outputs - Monitor BigQuery slot usage
- Watch for quota limitations
- Measure rows per second throughput
- Set up performance alerts
What are the best practices for schema design and data types?
What are the best practices for schema design and data types?
Schema Design Principles:Choose Specific Data Types:Data Type Selection Guide:Numeric Data:E-commerce Data:Schema Evolution Best Practices:Start Flexible:Validation and Quality:
- INTEGER: Whole numbers, IDs, counts, ages
- FLOAT: Decimal numbers, percentages, currency
- NUMERIC: High-precision decimals, financial data
- STRING: Names, descriptions, categories
- JSON: Complex nested objects, flexible schemas
- DATE: Dates only (2024-10-17)
- TIME: Time only (14:30:00)
- TIMESTAMP: Full date/time with timezone
- DATETIME: Date/time without timezone
- BOOLEAN: True/false flags, status indicators
- Use nullable fields initially
- Plan for additional data over time
- Consider JSON fields for evolving structures
- Document schema changes
- Validate data types before insertion
- Use consistent naming conventions
- Document field meanings and formats
- Implement data quality checks
- Monitor schema compliance
How do I integrate BigQuery Insert Rows with other workflow nodes?
How do I integrate BigQuery Insert Rows with other workflow nodes?
Common Integration Patterns:Data Pipeline Workflows:Multi-Source Data Warehouse:Real-Time Data Processing:Scheduled Data Sync:Data Validation Pipeline:Integration with BI Tools:Automated Reporting:ML Feature Pipeline:Data Export Workflows:Cross-Platform Sync:Backup and Archive:Error Handling Across Workflows:Robust Pipeline:Monitoring Integration:Best Practices for Integration:
- Use consistent error handling across all nodes
- Monitor data flow metrics at each step
- Implement circuit breakers for failing data sources
- Log all data transformations for debugging
- Test integration points with sample data
- Document data lineage and dependencies

