The Platform

We have designed a low-friction, high-impact path to get started with bioinformatics and computational biology. Pipette unifies the molecular data science landscape into one intelligent platform. Researchers can now run complete workflows through a single conversational interface.

Ask questions in plain language:

"Find differentially expressed genes between my treated and control samples"

- Pipette handles the rest: preprocessing, alignment, quantification, statistical testing, and even downstream visualization.

Explore the major analysis areas below to see how Pipette brings the power of AI-driven data science to every researcher, instantly.

Note: These tools have NOT been fully tested on the platform yet.

Quality Control & Preprocessing

Tools

fastqc • multiqc • fastp • cutadapt • trimmomatic • sickle-trim • seqtk • bbmap • sra-tools • entrez-direct • gffread • qualimap • rseqc

Read Alignment & Mapping

Tools

star • hisat2 • bwa • bwa-mem2 • bowtie2 • minimap2 • bedtools • bedops • samtools • subread

Quantification & Assembly

Tools

salmon • kallisto • stringtie • rmats • spades • megahit • flye • Velvet • metabat2 • prodigal • prokka • busco

RNA-seq Differential Expression

Tools

deseq2 • edger • limma • apeglm • clusterprofiler

Single-Cell Analysis

Tools

anndata • scanpy • louvain • numba • scvelo • seurat • monocle3 • soupx • signac • archr • azimuth • scdblfinder • dropletutils • scater • scuttle • slingshot • mast • scran • singler

ChIP-seq / ATAC-seq

Tools

macs2 • diffbind • chipseeker • meme • archr

Metagenomics & Taxonomic Profiling

Tools

kraken2 • metaphlan • humann • diamond • mafft • clustalw • fasttree • iqtree • raxml

Variant Calling & Annotation

Tools

gatk4 • freebayes • varscan • bcftools • vcftools • snpeff • picard • plink • gcta • finemap • htslib

GWAS & Population Genomics

Tools

plink • gcta • finemap • rmvp • variancepartition

Functional Enrichment & Pathways

Tools

clusterprofiler • enrichr • gprofiler2 • msigdbr • enrichplot

Network & Systems Biology

Tools

wgcna • igraph • complexheatmap

Visualization & Reporting

Tools

ggplot2 • pheatmap • complexheatmap • deeptools

Integrated Knowledge Sources

Pipette seamlessly connects to leading biological databases, enabling you to enrich your analyses with literature, annotations, variants, structures, and clinical data - all through natural language queries.

Literature & Publications

PubMed

Search biomedical literature, research papers, and scientific publications with advanced query syntax.

Example: "Find recent papers about CRISPR gene editing"

Clinical Data & Trials

ClinicalTrials.gov

Access information about clinical trials, including recruitment status, phases, conditions, and interventions.

Example: "Find recruiting cancer trials in phase 3"

OpenFDA

Query FDA data on drug adverse events, recalls, and safety information from regulatory databases.

Example: "Find adverse events for Lipitor"

Genomic Variants & Mutations

ClinVar

Clinical interpretation of germline variants with pathogenicity classifications and disease associations.

Example: "Find pathogenic BRCA1 variants"

dbSNP

Comprehensive database of single nucleotide polymorphisms and short genetic variations.

Example: "Find common SNPs in TP53"

gnomAD

Population-scale genomic data with allele frequencies across diverse populations.

Example: "Get allele frequencies for BRCA2 variants"

GWAS Catalog

Curated collection of genome-wide association studies linking variants to traits and diseases.

Example: "Find GWAS studies for Type 2 diabetes"

Gene & Protein Information

UniProt

Comprehensive protein sequence and functional information database.

Example: "Get information about human insulin protein"

Ensembl

Genome annotations, gene coordinates, variants, and comparative genomics data.

Example: "Get information about the BRCA2 gene"

UCSC Genome Browser

Access genomic sequences, coordinates, and annotations from the UCSC database.

Example: "Get DNA sequence of chr17:43000000-43100000"

Protein Structures

AlphaFold

AI-predicted protein structures with confidence scores and downloadable models.

Example: "Get AlphaFold structure for human p53"

Gene Expression & Multi-Omics

GEO (Gene Expression Omnibus)

Repository of high-throughput gene expression and genomics datasets (GSE/GSM accessions).

Example: "Find RNA-seq datasets for breast cancer"

cBioPortal

Cancer genomics data from TCGA and other studies, including mutations and expression profiles.

Example: "Find TP53 mutations in lung cancer"

Drug Discovery & Therapeutics

Open Targets

Evidence-based drug target identification and disease associations for therapeutic development.

Example: "Find drug targets for Alzheimer's disease"

How it works: Simply ask Pipette in natural language, and it automatically determines which database to query, formulates the optimal search, and integrates the results into your analysis.

No API keys, no complex syntax - just ask and receive.

Just ask

"Run DESeq2 on my data file deseq_output.csv and highlight top upregulated genes".
"Summarize quality metrics for all FASTQ files in my workspace."
"Perform single-cell clustering on filtered_feature_bc_matrix.h5."
"Identify differentially accessible regions from ATAC-seq data."
"Annotate assembled contigs using Prokka."
"Generate PCA and volcano plots from my RNA-seq analysis."
"Run GO enrichment for genes with adjusted p-value ≤ 0.01."
"Find SNPs from aligned BAM files using GATK."
"Merge count files in files.zip into a count matrix."

Transparent, Reproducible, Scalable

Every analysis on Pipette.bio automatically generates a provenance record - capturing versions, parameters, and outputs in a structured format. You can always trace, reproduce, or share your results with full confidence.

Provenance Snapshot

completed
Session rna-seq_2025_10_12_1830
Analysis Differential Expression (DESeq2)
Inputs 2 FASTQ files • GRCh38 reference
Tools STAR • Salmon • DESeq2
Parameters α = 0.05 • BH correction
Environment R 4.2.3 • Python 3.10.14 • Ubuntu 22.04
Outputs volcano_plot.png • deseq2_results.csv

Example of a provenance record automatically generated after each run - concise, human-readable, and machine-traceable.