Skip to main content
The Create Dataset node creates a new dataset in your Google BigQuery project. Datasets are containers that organize tables and control access to your data. This is an AI-powered node that can understand natural language instructions.

When to Use It

  • Set up new data warehousing projects in BigQuery
  • Organize tables by business unit, data source, or project
  • Create isolated environments for development, testing, and production
  • Establish data governance boundaries with different access controls
  • Build automated data pipeline setup workflows
  • Initialize BigQuery infrastructure as part of larger workflows

Inputs

FieldTypeRequiredDescription
ProjectSelectYesSelect the Google BigQuery project to create the dataset in
Dataset IDTextYesUnique identifier for the dataset (alphanumeric and underscores only)
LocationTextNoGeographic location for the dataset (e.g., US, EU, asia-southeast1)
DescriptionTextNoOptional description to document the dataset’s purpose
Skip Error If Already ThereToggleNoIf enabled, won’t fail if dataset already exists (default: false)

Dataset ID Requirements

  • Characters: Letters, numbers, and underscores only
  • Length: Up to 1024 characters
  • Case sensitive: MyDataset and mydataset are different
  • Uniqueness: Must be unique within the project
  • No spaces: Use underscores instead of spaces
Good examples: marketing_data, sales_2024, user_analytics Bad examples: marketing data, sales-2024, user@analytics

Location Options

LocationDescriptionUse Case
USMulti-region in United StatesDefault, best for US-based operations
EUMulti-region in European UnionGDPR compliance, EU operations
asia-southeast1SingaporeAsia-Pacific operations
us-central1Iowa, USASpecific US region
europe-west1BelgiumSpecific EU region
Important: Location cannot be changed after dataset creation. Choose based on:
  • Data residency requirements
  • Performance (closer to users/applications)
  • Compliance regulations (GDPR, etc.)

Output

Returns dataset creation confirmation and details:
{
  "dataset_id": "marketing_data",
  "project_id": "my-project-123",
  "location": "US",
  "creation_time": "2024-10-17T10:30:00Z",
  "description": "Marketing analytics data warehouse",
  "exists_ok_used": false
}

Output Fields:

FieldDescription
dataset_idThe created dataset identifier
project_idBigQuery project containing the dataset
locationGeographic location of the dataset
creation_timeWhen the dataset was created
descriptionDataset description (if provided)
exists_ok_usedWhether the dataset already existed

Credit Cost

  • Cost per run: 1 credit

FAQs

Default Behavior (Skip Error If Already There = false):
  • The operation will fail with an error
  • Workflow will stop execution
  • Useful for ensuring new dataset creation
With Skip Error If Already There = true:
  • Operation succeeds even if dataset exists
  • No changes made to existing dataset
  • exists_ok_used: true in output
  • Workflow continues normally
Best Practice: Enable “Skip Error If Already There” for idempotent workflows that should run multiple times safely.
Consider These Factors:Data Residency:
  • GDPR compliance: Use EU locations for European user data
  • Local regulations: Some countries require data to stay within borders
  • Company policies: Internal data governance requirements
Performance:
  • User proximity: Choose location closest to end users
  • Application location: Co-locate with your applications
  • Data sources: Near where your data originates
Cost Optimization:
  • Multi-region: Higher availability, slightly higher cost
  • Single region: Lower cost, regional availability
  • Egress charges: Consider data export costs
Common Patterns:
  • Global business: US (multi-region) for flexibility
  • EU operations: EU (multi-region) for compliance
  • Asian markets: asia-southeast1 or other Asian regions
  • Cost-sensitive: Specific single regions
Dataset Level (Container):
  • Purpose: High-level organization and access control
  • Contains: Multiple related tables
  • Access control: IAM permissions at dataset level
  • Location: Fixed geographic location
  • Billing: Costs roll up to dataset level
Table Level (Data Storage):
  • Purpose: Actual data storage and schema definition
  • Contains: Rows and columns of data
  • Access control: Inherits from dataset (can be restricted further)
  • Location: Same as parent dataset
  • Billing: Storage and query costs
Organization Strategies:By Business Unit:
marketing_data → campaigns, leads, attribution
sales_data → opportunities, customers, revenue
finance_data → transactions, budgets, forecasts
By Data Source:
google_ads → campaigns, keywords, ads
facebook_ads → campaigns, adsets, creatives
crm_data → contacts, deals, activities
By Environment:
production_data → live operational data
staging_data → testing and development
analytics_data → processed analytical datasets
Modifiable After Creation:
  • Description: Can be updated anytime
  • Access controls: IAM permissions can be changed
  • Labels: Can add/modify/remove labels
  • Default table expiration: Can be set or changed
Cannot Be Modified:
  • Dataset ID: Cannot be renamed (must recreate)
  • Location: Cannot be changed (must recreate)
  • Project: Cannot move between projects
Best Practices:
  • Plan dataset ID carefully: Include version numbers if needed
  • Choose location wisely: Cannot be changed later
  • Use descriptive names: Make purpose clear from the name
  • Document thoroughly: Use descriptions and labels
BigQuery IAM Roles for Datasets:Read Access:
  • BigQuery Data Viewer: Read tables and run queries
  • BigQuery User: Read + create temporary tables
Write Access:
  • BigQuery Data Editor: Read + write + delete data
  • BigQuery Admin: Full control including schema changes
Management Access:
  • BigQuery Admin: Full dataset management
  • BigQuery Resource Admin: Manage datasets and jobs
Access Control Strategies:By Business Function:
Marketing Team → BigQuery Data Viewer on marketing_data
Data Scientists → BigQuery Data Editor on analytics_data
ETL Service Account → BigQuery Admin on staging_data
By Environment:
Production → Strict controls, minimal write access
Staging → Broader access for testing
Development → Full access for iteration
Security Best Practices:
  • Principle of least privilege: Grant minimum necessary access
  • Use service accounts: For automated workflows
  • Regular audits: Review and update permissions
  • Monitor usage: Track who accesses what data
Recommended Naming Patterns:Descriptive Structure:
{business_unit}_{data_type}_{environment}
marketing_analytics_prod
sales_crm_staging
finance_reports_dev
Data Source Based:
{source_system}_{data_type}
google_ads_raw
salesforce_cleaned
website_analytics
Temporal Organization:
{purpose}_{time_period}
marketing_2024
historical_archive
current_quarter
Best Practices:
  • Use underscores: Not dashes or spaces
  • Be consistent: Follow same pattern across organization
  • Include context: Make purpose clear
  • Plan for growth: Consider future datasets
  • Avoid abbreviations: Use clear, full words
  • Include environment: Distinguish prod/staging/dev
Examples by Use Case:
  • Agency: client_name_data_type (acme_google_ads)
  • Enterprise: dept_function_env (marketing_analytics_prod)
  • Startup: data_source_purpose (ads_performance, user_behavior)
Common Automation Patterns:Client Onboarding:
[Trigger: New Client] → [Create Dataset: {client_name}_data]
→ [Create Tables] → [Set Permissions] → [Notify Team]
Environment Setup:
[Trigger: New Project] → [Create Dataset: {project}_prod]
→ [Create Dataset: {project}_staging] → [Setup IAM]
Data Pipeline Initialization:
[Schedule: Monthly] → [Create Dataset: archive_{year}_{month}]
→ [Move Old Data] → [Update References]
Dynamic Dataset Creation:
[AI Agent] → [Determine Dataset Name] → [Create Dataset]
→ [Create Tables] → [Load Initial Data]
Error Handling Strategies:
  • Always enable “Skip Error If Already There” for recurring workflows
  • Validate names before creation to avoid failures
  • Plan rollback procedures for failed setups
  • Monitor creation success and alert on failures
Integration with Other Nodes:
  • Create DatasetCreate TableInsert Rows
  • List DatasetsConditional LogicCreate Dataset
  • Create DatasetSet IAM PermissionsNotify Stakeholders
I