IO Contracts System
GeneForgeLang v1.0.0 introduces IO Contracts, a powerful system for ensuring data integrity between workflow blocks. IO Contracts provide static compatibility checking and type safety for genomic data flows.
Overview
IO Contracts define the data requirements and guarantees for each workflow block, ensuring that outputs from one block are compatible with inputs to another. This system prevents runtime errors caused by incompatible data types and provides better tooling support.
Contract Structure
IO Contracts are defined using the contract key in experiment and analyze blocks:
experiment:
tool: RNAseq
type: sequencing
contract:
outputs:
sequences:
type: FASTQ
attributes:
quality_score: true
paired_end: true
params:
# ... experiment parameters
Contract Components
- Type: Specifies the data type (e.g., FASTQ, BAM, CSV, JSON)
- Attributes: Optional metadata about the data (e.g., quality scores, paired-end status)
Available Data Types
GeneForgeLang supports several built-in data types:
- Sequence Data: FASTA, FASTQ, BAM, SAM, VCF
- General Data: CSV, JSON, TEXT, BINARY
- Custom Types: User-defined types via schema registry
Example Usage
Experiment Block with Output Contract
experiment:
tool: CRISPR_cas9
type: gene_editing
contract:
outputs:
edited_sequences:
type: FASTQ
attributes:
quality_score: true
read_length: 150
params:
target_gene: BRCA1
guide_rna: GCGTACGTTCAAGCGATCCG
Analysis Block with Input Contract
analyze:
strategy: differential
contract:
inputs:
sequences:
type: FASTQ
attributes:
quality_score: true
data: edited_sequences
Static Compatibility Checking
GeneForgeLang's validator automatically checks compatibility between block contracts:
- Type Matching: Ensures output and input types are compatible
- Attribute Validation: Verifies required attributes are present
- Custom Schema Support: Works with user-defined types from schema registry
Error Handling
When contracts are incompatible, GeneForgeLang provides detailed error messages:
Contract type mismatch: experiment output 'sequences' (type: FASTQ) is incompatible with analyze input 'sequences' (type: BAM)
Best Practices
- Define Contracts Early: Add contracts during workflow design to catch issues early
- Use Attributes: Specify important data characteristics to ensure compatibility
- Leverage Custom Types: Define domain-specific types in schema files for better validation
- Validate Before Execution: Always validate contracts before running workflows
Next Steps
- Schema Registry Documentation - Learn how to define custom types
- Design Block Documentation - See how contracts work with AI workflows
- Error Handling Guide - Understand contract validation errors