Reproducing a Rice Salt-Stress RNA-Seq Study in 90 Minutes
Team Pipette
Feb. 9, 2026
First in a series where we take published datasets and reproduce the analysis using Pipette.bio.
In 2023, Guo et al. published a study in Frontiers in Plant Science examining how rice seedlings respond to salt-alkali stress at the transcriptional level (PMC9840837). The original analysis involved a multi-step RNA-seq pipeline (trimming, alignment, quantification, differential expression, and visualization), the kind of work that typically takes a bioinformatician several days to set up and run.
We gave the same raw data to Pipette.bio and asked: can an AI agent reproduce the key findings from a single prompt?
The Experiment
The dataset (PRJNA895747) consists of 12 paired-end RNA-seq samples from rice (Oryza sativa japonica) seedlings treated with Na2CO3 for 1 and 5 days, with untreated controls: 4 groups, 3 biological replicates each.
We uploaded the FASTQ files and a sample metadata CSV, then gave the agent a single prompt describing the full analysis: trim with fastp, align with HISAT2 to the IRGSP-1.0 reference, quantify with featureCounts, run DESeq2 separately for each timepoint, use the same thresholds as the paper (fold change ≥ 1.5, FDR ≤ 0.01), and generate the standard visualizations.
The agent loaded the rnaseq-alignment skill, downloaded the pre-built HISAT2 rice index from S3, and executed the entire pipeline autonomously. 20 steps, zero human intervention.
Total wall time: ~90 minutes. That includes downloading indexes, trimming 12 samples, aligning, counting, running DESeq2 twice, and generating all plots.
The Results
Here's how the agent's output compared to the published findings:
| Metric | Guo et al. (2023) | Pipette.bio Agent |
|---|---|---|
| Alignment rate | 95.2–95.9% | 96.1–96.8% |
| Day 1 DEGs | 1,780 (753 up / 1,027 down) | 1,451 (649 up / 802 down) |
| Day 5 DEGs | 2,315 (982 up / 1,333 down) | 2,005 (869 up / 1,136 down) |
| Shared across timepoints | 405 | 331 |
| Downregulated dominant? | Yes (~57%) | Yes (~56%) |
| Day 5 response stronger? | Yes (1.3x Day 1) | Yes (1.38x Day 1) |
The key biological conclusions held up perfectly:
- Salt-alkali stress triggers a time-dependent transcriptional response. More genes are differentially expressed at Day 5 than Day 1.
- Downregulation dominates. At both timepoints, roughly 55–57% of the response is repression rather than activation.
- Hundreds of genes respond at both timepoints, a core stress response program, while the majority of DEGs are timepoint-specific.
- The top responding genes show dramatic fold changes, up to 64-fold (6 log2FC) upregulation at Day 1 and 128-fold (7 log2FC) downregulation at Day 5.
Why the Numbers Aren't Identical
The agent found ~15–18% fewer DEGs than the original paper. This is expected and well within normal variation for RNA-seq reanalysis. The main reasons:
- Newer software versions. The paper used DESeq2 v1.6.3 (circa 2014). The agent ran v1.38.0, which has substantially improved shrinkage estimators and more conservative dispersion modeling. Newer DESeq2 is generally more stringent, with fewer false positives, slightly fewer total calls.
- Different low-expression filtering. The agent applied a minimum of 10 total counts across samples before testing. The paper's exact filtering threshold isn't specified and may have been more permissive.
- Reference genome nuances. The paper used cultivar "Liaoxing NO.1" while aligning to the same IRGSP-1.0 reference. Minor cultivar-specific polymorphisms can affect mapping rates and counts at the margins.
None of these differences change the biological story. The trends, ratios, and top genes are consistent.
What the Agent Produced
From a single prompt, the agent delivered:
- 19 result files: count matrices, DE results for both timepoints, gene lists (up/down/all), overlap analysis, GSEA-ready ranked lists
- 13 figures: PCA plots, volcano plots (per-timepoint and combined), heatmaps of top DEGs, MA plots, QC summary panels, comprehensive DE comparison charts
- A full methods section with software versions, parameters, and statistical thresholds, ready to drop into a manuscript
All of this was generated without writing a single line of code. The agent selected tools, downloaded reference indexes from S3, handled a metadata formatting issue mid-run (the first DESeq2 attempt hit a contrast error, which it diagnosed, fixed, and re-ran automatically), and produced a structured PDF report.
The Bigger Picture
Reproducibility is one of the persistent challenges in bioinformatics. Pipelines rot. Dependencies break, reference builds change, parameter choices get lost in lab notebooks. When we reanalyze old data with current tools, we're not just checking a box. We're asking: does the biology hold up?
In this case, it does. The rice salt-alkali stress response described by Guo et al. is robust to software version changes, filtering differences, and a fully automated analysis pipeline. That's a good sign for the original work, and a good sign for AI-driven bioinformatics.
Download the full session report (PDF)
The agent used HISAT2 for alignment, featureCounts for quantification, and DESeq2 for differential expression, all selected and configured autonomously based on the user's prompt and the rnaseq-alignment skill.
Have a dataset you'd like us to reproduce? Drop us a note. We're building a library of "Old Wine, New Bottle" case studies.