Bioinformatics AI Agent

Analyze sequencing, transcriptomics, and other omics data using an AI research assistant.

Pipette.bio platform showing RNA-seq analysis chat, DESeq2 report, and output figures

Tool availability is continuously expanding. Request a tool if you don't see what you need.

Quality Control & Preprocessing

Tools

fastqc • multiqc • fastp • cutadapt • trimmomatic • seqtk • sickle-trim • bbmap • sra-tools • entrez-direct • gffread • blast • meme • qualimap • rseqc • quast • pilon

Read Alignment & Mapping

Tools

star • hisat2 • bwa • bwa-mem2 • bowtie2 • minimap2 • bedtools • samtools • bcftools • subread • sambamba • picard • htslib

Quantification & Assembly

Tools

salmon • kallisto • stringtie • rmats • rsem • spades • megahit • flye • velvet • metabat2 • prodigal • prokka • busco

RNA-seq Differential Expression

Tools

deseq2 • edger • limma • tximport • apeglm • variancepartition • dexseq • drimseq

Single-Cell Analysis

Tools

scanpy • anndata • scvelo • scvi-tools • celltypist • squidpy • seurat • monocle3 • soupx • signac • archr • sctransform • harmony • batchelor • mofa2 • scran • scater • scuttle • scdblfinder • dropletutils • slingshot • mast • singler • celldex

Survival Analysis

Tools

survival • survminer

Proteomics Analysis

Tools

msstats • dep • limma

Spatial Omics

Tools

seurat • semla • giotto • spacexr • squidpy • spatialdata • stlearn • tangram-sc • cell2location • stereoscope • archr • signac • chromvar

Functional Enrichment & Pathways

Tools

clusterprofiler • fgsea • enrichplot • biomart • msigdbr • wgcna • mixomics

Variant Calling & Annotation

Tools

gatk4 • freebayes • varscan • strelka2 • bcftools • vcftools • snpeff • ensembl-vep

Cancer Genomics

Tools

mutect2 • cnvkit • arriba • star-fusion

GWAS & Population Genomics

Tools

plink • plink2 • admixture • qqman

Metagenomics & Taxonomic Profiling

Tools

kraken2 • metaphlan • humann • diamond • kaiju

Virome & Phage Analysis

Tools

virsorter • checkv • pharokka

Phylogenetics & Evolution

Tools

mafft • fasttree • iqtree • raxml-ng • orthofinder • trimal • ete3 • dendropy

ChIP-seq / ATAC-seq

Tools

macs2 • diffbind • chipseeker • deeptools

Visualization & Reporting

Tools

ggplot2 • pheatmap • complexheatmap • igraph • enrichplot • cowplot • ggrepel • patchwork

Drug Design & Cheminformatics

Tools

autodock-vina • smina • meeko • rdkit • openbabel • gypsum_dl • admet-ai • fpocket • pymol • plip • prolif • gromacs • openmm • mdanalysis

Integrated Knowledge Sources

Pipette seamlessly connects to leading biological databases, enabling you to enrich your analyses with literature, annotations, variants, structures, and clinical data - all through natural language queries.

Literature & Publications

PubMed

Search biomedical literature, research papers, and scientific publications with advanced query syntax.

Example: "Find recent papers about CRISPR gene editing"

Clinical Data & Trials

ClinicalTrials.gov

Access information about clinical trials, including recruitment status, phases, conditions, and interventions.

Example: "Find recruiting cancer trials in phase 3"

OpenFDA

Query FDA data on drug adverse events, recalls, and safety information from regulatory databases.

Example: "Find adverse events for Lipitor"

Genomic Variants & Mutations

ClinVar

Clinical interpretation of germline variants with pathogenicity classifications and disease associations.

Example: "Find pathogenic BRCA1 variants"

dbSNP

Comprehensive database of single nucleotide polymorphisms and short genetic variations.

Example: "Find common SNPs in TP53"

gnomAD

Population-scale genomic data with allele frequencies across diverse populations.

Example: "Get allele frequencies for BRCA2 variants"

GWAS Catalog

Curated collection of genome-wide association studies linking variants to traits and diseases.

Example: "Find GWAS studies for Type 2 diabetes"

Gene & Protein Information

UniProt

Comprehensive protein sequence and functional information database.

Example: "Get information about human insulin protein"

Ensembl

Genome annotations, gene coordinates, variants, and comparative genomics data.

Example: "Get information about the BRCA2 gene"

UCSC Genome Browser

Access genomic sequences, coordinates, and annotations from the UCSC database.

Example: "Get DNA sequence of chr17:43000000-43100000"

Protein Structures

AlphaFold

AI-predicted protein structures with confidence scores and downloadable models.

Example: "Get AlphaFold structure for human p53"

Gene Expression & Multi-Omics

GEO (Gene Expression Omnibus)

Repository of high-throughput gene expression and genomics datasets (GSE/GSM accessions).

Example: "Find RNA-seq datasets for breast cancer"

cBioPortal

Cancer genomics data from TCGA and other studies, including mutations and expression profiles.

Example: "Find TP53 mutations in lung cancer"

Drug Discovery & Therapeutics

Open Targets

Evidence-based drug target identification and disease associations for therapeutic development.

Example: "Find drug targets for Alzheimer's disease"

How it works: Simply ask Pipette in natural language, and it automatically determines which database to query, formulates the optimal search, and integrates the results into your analysis.

No API keys, no complex syntax - just ask and receive.

From Results to Discovery

Pipette doesn't stop at delivering results — it helps you decide what to investigate next. After each analysis, the agent reviews your outputs, connects key findings to published literature, and generates testable hypotheses grounded in real science.

Each hypothesis comes with concrete follow-up computational experiments you can run immediately, turning a single analysis into a springboard for discovery. This closes the loop between data and insight, making Pipette a true research partner rather than just a pipeline runner.

Hypothesis Generation

example
Analysis RNA-seq differential expression — treated vs control
Key Finding IL6 and TNF significantly upregulated (log2FC > 2, padj < 0.01)
Hypothesis NF-κB pathway activation may be driving the observed inflammatory response, given coordinated upregulation of its downstream targets
Follow-up Run pathway enrichment (KEGG/Reactome) on upregulated genes; query ClinVar for IL6 variants associated with inflammatory phenotypes

Just ask

"Run DESeq2 on my data file deseq_output.csv and highlight top upregulated genes".
"Summarize quality metrics for all FASTQ files in my workspace."
"Perform single-cell clustering on filtered_feature_bc_matrix.h5."
"Identify differentially accessible regions from ATAC-seq data."
"Annotate assembled contigs using Prokka."
"Generate PCA and volcano plots from my RNA-seq analysis."
"Run GO enrichment for genes with adjusted p-value ≤ 0.01."
"Find SNPs from aligned BAM files using GATK."
"Merge count files in files.zip into a count matrix."
"Based on my DE results, what hypotheses can you generate from the literature?"
"Suggest follow-up experiments for the top differentially expressed genes."

Transparent, Reproducible, Scalable

Every analysis on Pipette.bio automatically generates a provenance record - capturing versions, parameters, and outputs in a structured format. You can always trace, reproduce, or share your results with full confidence.

Provenance Snapshot

completed
Session rna-seq_2025_10_12_1830
Analysis Differential Expression (DESeq2)
Inputs 2 FASTQ files • GRCh38 reference
Tools STAR • Salmon • DESeq2
Parameters α = 0.05 • BH correction
Environment R 4.2.3 • Python 3.10.14 • Ubuntu 22.04
Outputs volcano_plot.png • deseq2_results.csv

Example of a provenance record automatically generated after each run - concise, human-readable, and machine-traceable.