Our Mission
The Sanayana Computational Genomics Group is committed to advancing the frontiers of genomic medicine. We harness the power of artificial intelligence and machine learning to decipher the complexities of the human genome, with a primary focus on improving diagnostic rates and therapeutic insights for rare diseases. Our interdisciplinary team integrates expertise in clinical genetics, computational biology, and AI to translate research into tangible clinical impact.
The Diagnostic Odyssey in Rare Diseases
Rare diseases collectively affect millions worldwide, yet the diagnostic journey is often protracted and inconclusive. A significant proportion of these cases, estimated at over 40%, remain unresolved due to the challenges in interpreting non-coding genomic variants. Global Rare Diseases Initiative Report, 2023 Current analytical pipelines struggle with the synthesis of heterogeneous data types, including unstructured clinical narratives and complex genomic signals.
Current Diagnostic Landscape for Suspected Genetic Rare Diseases
Diagnosed (~60%)
Undiagnosed (~40%)
Diagnosed
Undiagnosed
Project: CLINVAR-ASP
To address this critical gap, we are developing CLINVAR-ASP (Clinical Variant Reasoning via AI Semantic Processing), an innovative framework designed to enhance the interpretation of non-coding variants.
Methodological Framework
CLINVAR-ASP High-Level Workflow
Data Ingestion
(Clinical Notes, Genomics)
➔
Claude AI Semantic Processing & Evidence Synthesis
➔
Multi-Modal Integration & Conservation Analysis
➔
Variant Prioritization & Clinical Report
Key components of CLINVAR-ASP include:
- Advanced AI-Powered Evidence Synthesis: Leveraging Anthropic's Claude for deep semantic analysis of over 500,000 clinical records and research publications to identify nuanced phenotype-variant associations.
- Phylogenetic Conservation Modeling: Employing graph neural networks to integrate evolutionary conservation data from over 100 vertebrate genomes.
- Multi-Modal Data Fusion: Utilizing attention mechanisms to synergize AI-derived textual insights with quantitative functional genomics predictors (e.g., DeepSEA, Enformer).
- Extensive Clinical Validation: Prospective validation of the CLINVAR-ASP framework on a cohort of 2,000 undiagnosed rare disease patients, in collaboration with leading NHS Genomic Medicine Centres.
The conceptual basis for CLINVAR-ASP builds upon our preliminary work in AI-driven variant prioritization. Mehta A, Kapoor R, et al. (2023). Enhancing Non-Coding Variant Interpretation with Large Language Models. *bioRxiv* doi:10.1101/2024.05.30.596789
The Intellectual Role of Claude AI
Anthropic's Claude is not merely a tool but an intellectual partner in CLINVAR-ASP, tasked with complex cognitive functions:
- Hypothesis Generation: Proposing novel pathogenic mechanisms by synthesizing disparate and sometimes contradictory evidence from extensive literature corpora.
- Causal Inference Support: Assisting in the construction of Bayesian networks to model probabilistic relationships between phenotypes, genotypes, and environmental factors.
- Dynamic Guideline Refinement: Participating in the iterative refinement of ACMG/AMP variant classification criteria through adversarial testing and evidence-based suggestions.
Comparative Analysis Time per Case (Pilot Study)
Manual Curation
Claude-Assisted
Our pilot studies indicate that Claude integration can reduce literature curation and initial hypothesis generation time per case from approximately 8 hours to under 30 minutes, while improving the accuracy of phenotype-variant mapping by an estimated 32-41%. Sharma P, Mehta A. (2024). Pilot Validation of an AI-Augmented Workflow for Rare Disease Diagnostics. *Proc. AI Med. Conf.*, 112-115.
Our Team
The Sanayana Group comprises a dedicated team of researchers and clinicians:
Dr. Arjun Mehta: Principal Investigator. Expertise: Computational Biology, Genomic AI.
Dr. Priya Sharma: Co-Investigator. Expertise: Clinical Genetics, Rare Disease Diagnostics.
Rahul Kapoor, MS: Lead Bioinformatician. Expertise: Machine Learning, NLP Pipelines.
Neha Patel: PhD Candidate. Expertise: Biomedical NLP, Data Curation.
Supported by a dynamic group of postgraduate researchers and clinical scientists.
Anticipated Impact & Outcomes
Successful completion of the CLINVAR-ASP project is anticipated to yield:
- A 15-20% increase in diagnostic yield within our validation cohort, potentially translating to ~300,000 resolved cases globally if widely adopted. Orphanet Report on Rare Disease Prevalence, 2022.
- An open-source, validated variant prioritization framework (CLINVAR-ASP), disseminated via GitHub.
- A deployable EHR-integrated clinical decision support module.
- Potential to significantly inform and refine international ACMG/AMP variant interpretation guidelines.
Beyond diagnostics, the methodologies developed hold promise for applications in agricultural genomics (e.g., crop improvement with CGIAR) and accelerating target identification in drug discovery programs.
Project Timeline
The CLINVAR-ASP project is structured over a 12-month execution plan:
M1-3: Data Curation & Claude Pipeline Dev.
M4-6: Model Training & Initial Validation
M7-9: Clinical Pilot & Iterative Refinement
M10-12: Deployment, Dissemination & Reporting
Selected Publications & Preprints
- Mehta A, Kapoor R, Sharma P, Patel N. (2024). CLINVAR-ASP: A Framework for AI-Enhanced Non-Coding Variant Interpretation in Rare Diseases.
- Sharma P, Mehta A. (2023). Challenges and Opportunities in Computational Diagnosis of Rare Genetic Disorders. Journal of Clinical Genomics, 7(2), 45-58.
- Kapoor R, Patel N, Mehta A. (2022). Semantic Phenotyping from Clinical Notes using Transformer Architectures. Proceedings of the Conference on AI in Medicine (AIMLHD), 205-212.
Collaborations & Support
This research is strengthened by collaborations with NHS Genomic Medicine Centres (London, Manchester, Birmingham) and benefits from institutional support. We are actively seeking API credit support from Anthropic's AI for Science Program to maximize the computational scope and impact of this project. All research involving patient data is conducted under strict IRB approval (IRB-2024-7890) and HIPAA compliance.