Symbolic Reasoning in GeneForgeLang¶
GeneForgeLang now supports advanced symbolic reasoning capabilities that enable users to express biological knowledge, hypotheses, and temporal relationships in a structured way. This feature enhances GFL's expressiveness and allows for more sophisticated workflow designs.
Overview¶
Symbolic reasoning in GFL includes:
- Rules and Hypotheses - Express biological knowledge and scientific hypotheses
- Timeline Blocks - Define temporal orchestration of experiments
- Biological Entities - Define pathways and protein complexes
- Cross-Reference Validation - Ensure consistency between different parts of a workflow
Rules and Hypotheses¶
Rules Block¶
Rules allow you to express conditional relationships in biological systems:
rules:
- id: rule1
if:
gene: TP53
mutation: R175H
then:
effect: increased_risk
cancer_type: breast
- id: rule2
if:
gene: BRCA1
expression: low
then:
effect: increased_risk
cancer_type: ovarian
Each rule must contain:
- id
: A unique identifier for the rule
- if
: A dictionary describing the conditions
- then
: A dictionary describing the consequences
Hypothesis Block¶
Hypotheses allow you to formally express scientific hypotheses that your workflow aims to test:
hypothesis:
id: hypothesis1
description: TP53 mutations increase cancer risk
if:
- gene: TP53
mutation: R175H
- expression: low
then:
- effect: increased_risk
cancer_type: breast
- biomarker: Ki67
expression: high
A hypothesis must contain:
- id
: A unique identifier for the hypothesis
- description
: A human-readable description
- if
: A list of conditions
- then
: A list of expected outcomes
Timeline Blocks¶
Timeline blocks allow you to define the temporal orchestration of your experiments:
timeline:
events:
- at: "2024-01-01"
actions:
- type: sequencing
sample: patient1
expectations:
- outcome: high_quality_data
- at: "2024-01-15"
actions:
- type: analysis
data: sequencing_results
Each timeline must contain:
- events
: A list of temporal events
- Each event must have:
- at
: A time specification (string)
- actions
: A list of actions to perform
- expectations
: (Optional) A list of expected outcomes
Biological Entities¶
Pathways¶
Define metabolic and signaling pathways:
pathways:
UreaCycle:
description: Urea cycle metabolic pathway
genes: [ASS1, ASL, ARG1]
enzymes: [ASS, ASL, ARG]
reactions:
- substrate: ornithine
product: citrulline
Glycolysis:
description: Glycolysis pathway
genes: [HK1, PGK1, PKM]
enzymes: [Hexokinase, Phosphoglycerate kinase, Pyruvate kinase]
Complexes¶
Define protein complexes:
complexes:
RNA_POLYMERASE_II:
description: RNA polymerase II complex
subunits: [POLR2A, POLR2B, POLR2C, POLR2D, POLR2E]
function: transcription
RIBOSOME_80S:
description: 80S ribosome complex
subunits: [RPSA, RPSB, RPLA, RPLB]
function: translation
Entity References¶
You can reference defined entities in experiment parameters using the syntax entity_type(entity_name)
:
experiment:
tool: CRISPR_cas9
type: gene_editing
params:
target_pathway: pathway(UreaCycle)
target_complex: complex(RNA_POLYMERASE_II)
concentration: 50.0
Supported entity types:
- pathway
: Reference a defined pathway
- complex
: Reference a defined protein complex
Hypothesis References¶
Both experiment and analysis blocks can reference hypotheses to indicate which hypothesis they are designed to test:
# Reference a hypothesis in an experiment block
experiment:
tool: CRISPR_cas9
type: gene_editing
params:
target_gene: TP53
validates_hypothesis: hypothesis1
# Reference a hypothesis in an analysis block
analyze:
strategy: variant
data: sequencing_results.csv
validates_hypothesis: hypothesis1
Validation¶
GeneForgeLang's semantic validator ensures:
- Rules validation: Each rule has required fields (id, if, then)
- Hypothesis validation: Each hypothesis has required fields (id, description, if, then)
- Timeline validation: Each timeline event has required fields (at, actions)
- Entity reference validation: Referenced entities must be defined
- Hypothesis reference validation: Referenced hypotheses must be defined
Error Handling¶
When validation fails, GFL provides specific error codes:
SEMANTIC_UNDEFINED_HYPOTHESIS
(SEMANTIC009): Referenced hypothesis is not definedSEMANTIC_UNDEFINED_ENTITY_REFERENCE
(SEMANTIC010): Referenced entity is not defined
Example Workflow¶
Here's a complete example that uses all symbolic reasoning features:
# Define biological entities
pathways:
UreaCycle:
description: Urea cycle metabolic pathway
genes: [ASS1, ASL, ARG1]
complexes:
RNA_POLYMERASE_II:
description: RNA polymerase II complex
subunits: [POLR2A, POLR2B]
# Define a scientific hypothesis
hypothesis:
id: urea_cycle_hypothesis
description: Disruption of urea cycle genes affects RNA polymerase II expression
if:
- gene: ASS1
mutation: null
then:
- complex: RNA_POLYMERASE_II
expression: decreased
# Define rules about the biological system
rules:
- id: urea_cycle_rule
if:
pathway: UreaCycle
disrupted: true
then:
disease: hyperammonemia
# Timeline for experimental orchestration
timeline:
events:
- at: "2024-01-01"
actions:
- type: crispr_editing
target_gene: ASS1
expectations:
- outcome: successful_knockout
- at: "2024-01-15"
actions:
- type: expression_analysis
target_complex: complex(RNA_POLYMERASE_II)
expectations:
- outcome: decreased_expression
# Experiment that tests the hypothesis
experiment:
tool: CRISPR_cas9
type: gene_editing
params:
target_gene: ASS1
target_pathway: pathway(UreaCycle)
validates_hypothesis: urea_cycle_hypothesis
# Analysis that tests the hypothesis
analyze:
strategy: expression
data: expression_results.csv
validates_hypothesis: urea_cycle_hypothesis
This symbolic reasoning framework enables more expressive and scientifically rigorous workflow definitions in GeneForgeLang.