The Skill Graph: a map of bioinformatics that we built from the literature, and the reason it makes Pipette actually work

Variome Analytics

May 5, 2026

If you want all the technical details and the benchmarks, please read the full preprint, this blog is just the short version of the same idea.

So what is the Skill Graph, really?

The Skill Graph is a directed and edge-weighted knowledge graph of how bioinformatics analysis is done in practice, and the way to think about it is very simple, every node in the graph is one analytical skill that a bioinformatician would recognise, for example read alignment, differential expression, pathway enrichment, variant annotation, molecular docking, and many more like that, and every edge in the graph is a valid sequential transition from one skill to another skill, where the weight on the edge tells you how often that exact transition appears in the published literature, and the existence of the edge in the first place is gated by whether the data types are compatible between the two skills.

In the version that we have published right now the graph contains 78 bioinformatics skills across 9 domains, and these domains are genomics and alignment, variants, RNA-seq, single cell, epigenomics, drug design, metagenomics, databases, and the general utilities, and these skills are connected together by 483 directed edges in total. Around half of these edges are coming directly from what is written in the literature, and the other half are inferred from data-type compatibility, meaning these are pairs of skills that could connect to each other because the outputs of one match with the inputs of the other, even though no single paper that we scanned happens to chain them together explicitly.

If you want a one-line picture in your head, the Skill Graph is basically a topological map of "what usually comes after what" inside a real bioinformatics workflow, but the important part is that it is written in a shape that an LLM agent can actually use for planning, instead of being a static figure inside a review paper.

How did we build it

The Skill Graph is constructed by a dual-track extraction pipeline that we apply to around 20,000 full-text open-access papers from PubMed Central together with 800 method notes that we curated by hand, and the corpus spans 18 bioinformatics domains in total.

Track 1, the ML-based tool-to-skill mapping. In this track we first used GPT-4o to generate joint Named Entity Recognition (NER) and Relation Extraction (RE) annotations on a seed corpus of 800 documents, and after that we fine-tuned PubMedBERT on those annotations, and the resulting NER model is able to identify tools, operations and data types at F1 = 0.91, while the RE model recovers the tool-to-operation "USES" relationships at F1 = 0.76, and once these models were trained we ran inference across the entire 20,000-paper corpus, which is how we mapped the real-world bioinformatics tools onto our predefined skills.

Track 2, the document-level pipeline structure. This track is where most of the methodological novelty actually sits, because sentence-level relation extraction, which is the dominant paradigm in biomedical NLP today, only recovers about 6.2% of the pipeline transitions that the experts curated, and the reason for this poor recovery is that the analytical steps in a typical methods section often span across multiple paragraphs and not just one sentence. So instead of going sentence by sentence, we use a dictionary of 215 regex patterns (which are linked to the EDAM ontology and to the bio.tools API) to scan the whole methods section at once, and we record the positional ordering of every tool mention in the section, and then we aggregate the consecutive pairs into candidate transitions which are weighted by their paper co-occurrence. Using this document-level positional ordering we are able to recover 60.5% of the ground-truth transitions, which is a tenfold improvement over the sentence-level approach.

Type validation. After we have all the candidate edges, we filter every one of them against the curated input and output type maps that exist for each skill, and the rule is simple, an edge from skill A to skill B is kept only if the output types of A have a non-empty intersection with the input types of B. This filter alone is enough to lift the pipeline-extraction precision from 14.5% up to 37.2%, and on top of that it also enables the inference of new edges, which means biologically valid transitions between skills that were never co-observed in the literature explicitly, but should still be possible because the data types line up.

The two tracks come together at the end, because the ML track is what populates the nodes with their curated toolsets, and the document-extraction track, enriched with the type inference, is what establishes the connective topology between the nodes.

How does the Skill Graph help Pipette?

Pipette is a multi-agent AI framework that executes end-to-end bioinformatics workflows starting from a natural-language input from the user, and the planning agent inside Pipette is not allowed to guess what should come next, instead it has to query the Skill Graph and only choose from the transitions that exist in the graph.

The reason this matters is that the main failure mode of an unconstrained LLM in bioinformatics is not actually bad code, the code is usually fine, the real failure is incoherent multi-step plans, where the model runs one tool perfectly correctly and then chains it to a downstream step that is either biologically nonsensical, or simply not compatible at the level of data types. By restricting the planning space of the agent only to those transitions that are (a) supported by what the community actually publishes and (b) validated by data-type compatibility, the Skill Graph removes this failure mode almost completely.

In our ablation study, where we ran the exact same benchmark prompts against Pipette with the Skill Graph turned on and Pipette with the Skill Graph turned off, using Claude Opus 4.5 and GPT 5.4 as the fallback models, the Skill-Graph-guided system was either matching or exceeding both of the unguided baselines on every single quantitative metric we checked. On the rice bulk RNA-seq benchmark Pipette reached Pearson correlations of r = 0.976 to 0.991 against the published log fold-changes, while unguided Claude only reached r = 0.929 to 0.980 and unguided GPT only reached r = 0.914 to 0.976 on the same task. On the PBMC 68K single-cell benchmark, the Skill-Graph-guided selection of CellTypist together with the Immune_All_Low model produced 86% monocyte marker recall, while the same task without the graph gave only 29% for Claude and 43% for GPT, so the difference is really very large. As a side effect the Skill Graph also reduces the token usage, because all the exploratory planning is collapsed into a single graph lookup.

To say it in one sentence, the Skill Graph is the layer that turns a capable but generic language model into a coherent bioinformatics analyst.

How does it help bioinformaticians directly (the open MCP server and the graph explorer)

The published Skill Graph is fully open, and there are two different ways you can use it in your own work, and both of them are free.

1. The interactive graph explorer. The website skillgraph.pipette.bio lets you navigate the full graph directly in your browser, and you can click on any skill to see the tools that are associated with it, the inputs, the outputs, the upstream and downstream connections, and also the literature evidence behind every transition. We have found this useful when we are orienting ourselves in a subfield that we are not very familiar with, when we are designing a workflow from scratch, or when we want to cross-check that our planned pipeline actually matches the community consensus.

2. The hosted MCP server. We expose the Skill Graph as a Model Context Protocol server at https://skillgraph.pipette.bio/mcp, and you only have to add four lines to your Claude Code settings (or any other MCP client) and after that the graph becomes queryable from inside your conversation, with no installation step and no local data:

{
  "mcpServers": {
    "skillgraph": {
      "type": "http",
      "url": "https://skillgraph.pipette.bio/mcp"
    }
  }
}

Once it is connected, you can ask your agent things like "find the shortest path from wgs-alignment to pathway-enrichment", or "what tools does the scanpy skill use", or "what typically comes after variant-calling", and the server will answer using the graph. The server exposes six tools in total, which are get_skill, list_skills, search_skills, get_transitions, find_path, and get_graph_stats.

If you are building your own agentic system on top of the bioinformatics literature, the MCP server is the fastest path to give your agent a grounded planning capability, without you having to rebuild the entire extraction pipeline by yourself.

A snapshot of a living graph

The version of the Skill Graph that we describe in the preprint, that you can see at skillgraph.pipette.bio, and that is served by the MCP endpoint, is a snapshot, meaning it is frozen at the time when the extraction was done over the 20,000 papers and the 800 curated notes. This is what you would expect from a published artifact, it should be reproducible and citable, so that other researchers can audit it, benchmark against it, and build on top of it.

The version of the graph that powers Pipette inside the production system is something different, it is a continuously updated and behaviorally-shaped graph, and it is becoming stronger every single time someone runs a workflow on the platform.

There are three feedback loops driving this growth:

New literature, ingested continuously. The extraction pipeline is fully automated, so as new methods sections become available in PubMed Central, they go through the same NER, then ordering, then type-validation pipeline, and they refresh both the nodes and the edge weights of the graph. New tools start to appear, deprecated tools start to fade, and the recency-weighted edge weights keep the graph aligned with the moving frontier of the practice.
Production usage telemetry. Every workflow that Pipette executes is traversing edges of the Skill Graph, and the successful traversals reinforce the corresponding edge weights, while the transitions that the Reviewer Agent flags as methodologically not optimal are down-weighted. This means the graph is learning not only from what researchers publish in their papers, but also from what actually works on real data inside the platform. During a recent two-month deployment, Pipette executed 48 active skills and traversed canonical chains like cBioPortal → survival-analysis, RNA-seq alignment → differential expression, and even cross-domain paths like PubChem → ADMET → AlphaFold, and all of these traversals are becoming evidence for the next planning decision that the agent has to make.
Independently updatable skill specs. Every skill in the graph is defined by an editable SKILL.md document, which encodes the tool preferences, the contraindications (for example "do not use BWA for nanopore alignment"), and the default parameters. The curators are able to update a skill at any time without retraining the underlying NER and RE models, so whenever a tool is deprecated, or a new best-practice consensus is emerging, the change propagates immediately into the planning behaviour.

The practical consequence of all this is very direct, the more Pipette is used, the better it becomes at choosing the right tool, the right parameters, and the right next step, automatically, for every user, without any configuration on their side. The published graph is the foundation, and the production graph is what compounds on top of it over time.

Read the full preprint

The complete architecture, the extraction methodology, the ablation study, and the four-domain benchmark (single-cell PBMC and human pancreas, bulk RNA-seq under environmental stress in rice, ABL1 molecular docking, de novo cyclic peptide design against MDM2, and ACMG/AMP-compliant clinical variant classification on GIAB HG002) are all described in detail in:

Pipette: Encoding scientific literature into an executable Skill Graph for multi-agent bioinformatics. Chirag Gupta and Ananya Sharma. Variome Analytics, 2026.

The code, the benchmark data, and the provenance records are all available here: github.com/variomeanalytics/pipette_benchmark.

You can try Pipette at pipette.bio, and you can explore the graph at skillgraph.pipette.bio.