Study Prioritization

Published

Jun 2026

Why Study Prioritization Matters

After candidate studies have been screened, the next step is study prioritization.

Screening determines whether a study is eligible.

Prioritization determines which eligible studies should be used first.

These are related but different decisions.

Dataset Screening
        ↓
Eligible Studies
        ↓
Study Prioritization
        ↓
Priority Study List
        ↓
CDI-DAS Input Package

A study may meet the eligibility criteria but still not be the best starting point for data acquisition, reference dataset assembly, or case-study demonstration.

Prioritization helps rank eligible studies based on usefulness, readiness, metadata quality, technical suitability, and alignment with the project objective.

Screening Versus Prioritization

Screening is a yes, no, or review decision.

Prioritization is a ranking decision.

Step Main Question Output
Screening Does the study meet the eligibility criteria? Included, excluded, or review studies
Prioritization Which eligible studies should be used first? Ranked study list

For example, two studies may both be eligible human gut microbiome datasets. However, one may have clearer metadata, better file accessibility, paired-end sequencing, and a manageable test subset. That study would be prioritized first.

Inputs to Study Prioritization

The prioritization step uses outputs from dataset screening.

Expected inputs include:

outputs/screened-studies.tsv
outputs/included-studies.tsv
outputs/review-studies.tsv

The included studies table contains records that passed eligibility screening.

The review studies table contains records that may be useful but need additional checking.

The screened studies table preserves all screening decisions and reasons.

Prioritization Criteria

Prioritization should be based on practical and scientific considerations.

For CDI Systematic Dataset Discovery, useful prioritization criteria include:

  • relevance to the research question
  • metadata completeness
  • accession clarity
  • public file availability
  • run-level metadata availability
  • compatibility with CDI-DAS
  • sequencing layout
  • platform suitability
  • manageable test subset
  • usefulness as a reference dataset
  • usefulness as a teaching or demonstration case

The goal is not to rank studies only by size.

A smaller study with clear metadata and accessible files may be more useful than a larger study with confusing records or incomplete metadata.

Core Prioritization Domains

Study prioritization can be organized into five domains:

Research Alignment
        ↓
Metadata Quality
        ↓
Technical Suitability
        ↓
Repository Readiness
        ↓
CDI-DAS Handoff Value

Each domain contributes to the final priority decision.

Domain Main Question
Research alignment Does the study strongly match the discovery objective?
Metadata quality Are the sample and technical metadata usable?
Technical suitability Does the sequencing design fit the workflow?
Repository readiness Are public files and accessions clear?
CDI-DAS handoff value Can the study move smoothly into acquisition?

Priority Labels

A simple priority label system can be used.

Priority Label Meaning
primary Main study for the current workflow
secondary Useful comparison or backup study
review Potentially useful but needs more checking
deferred Relevant but not needed now

These labels help separate the main case-study accession from supporting or future-use records.

Prioritization Scoring

A simple scoring system can make prioritization more transparent.

Each study can be scored across several criteria.

Example:

Criterion Score Meaning
0 Does not meet the criterion
1 Partially meets the criterion
2 Strongly meets the criterion

A study with higher total score may be prioritized first.

Example scoring criteria:

Criterion Description
Research alignment Matches the discovery question
Metadata completeness Provides usable biological and technical metadata
Accession clarity Provides clear BioProject or run-level accessions
File availability Public sequence files are available
CDI-DAS readiness Can be handed off to the acquisition system
Test subset suitability Has a practical subset for validation
Technical suitability Sequencing layout and platform are useful for the workflow

Creating a Prioritization Table

The prioritization table can be created with a script.

bash scripts/bash/07a-build-prioritization-table.sh

The expected output is:

outputs/prioritization-table.tsv

This table records how screened studies are ranked and why a study is prioritized for downstream acquisition.

Expected Prioritization Table Structure

A useful prioritization table may include:

  • candidate ID
  • accession
  • source
  • screening decision
  • priority label
  • research alignment score
  • metadata score
  • accession clarity score
  • file availability score
  • CDI-DAS readiness score
  • test subset score
  • total score
  • prioritization reason
  • notes

This table makes the ranking process more transparent.

Applying Prioritization to the Case Study

For this guide, the primary BioProject is:

PRJNA802976

The primary test subset is:

SRR17868090
SRR17868091
SRR17868092

This BioProject is prioritized because it provides a practical, accession-linked human gut microbiome case study that can be handed off to CDI-DAS.

A secondary comparison BioProject is:

PRJNA322554

This record may remain useful for technical contrast, but it is not the primary acquisition example.

Priority Decision for PRJNA802976

The priority decision for PRJNA802976 can be summarized as:

Field Value
BioProject PRJNA802976
Priority primary
Screening decision include
Study role Main healthy human gut microbiome case-study accession
Test subset SRR17868090, SRR17868091, SRR17868092
Downstream use CDI-DAS acquisition and validation
Prioritization reason Strong fit for the discovery objective and practical CDI-DAS handoff

This makes PRJNA802976 the main accession carried forward into included study assembly.

Secondary Comparison Record

PRJNA322554 can be retained as a secondary comparison record.

PRJNA322554

Its role is not to replace the primary case study. Instead, it helps demonstrate that eligible or relevant studies may still differ in technical characteristics, metadata structure, sequencing layout, or acquisition readiness.

This supports the teaching point that prioritization is more than inclusion.

Eligible
        ↓
Useful
        ↓
Prioritized
        ↓
Ready for handoff

Why Not Prioritize Everything Equally?

Not all eligible studies should be treated equally.

Some studies may be eligible but difficult to use immediately.

Reasons include:

  • metadata require manual cleaning
  • sample groups are difficult to interpret
  • run-level mapping is unclear
  • file access needs additional validation
  • study contains mixed sample types
  • data are useful but not central to the current objective

Prioritization helps the workflow focus first on the studies that best support the current objective.

Prioritization and CDI-DAS Handoff

The purpose of prioritization is to prepare a clear handoff into CDI-DAS.

CDI-DAS needs accession inputs that can be used for metadata retrieval, download manifest generation, file acquisition, and validation.

Prioritized studies should therefore be organized into an input-ready structure.

Prioritized Study
        ↓
BioProject Accession
        ↓
Run Accessions
        ↓
Test Subset
        ↓
CDI-DAS Input Package

For this guide, that handoff begins with:

BioProject: PRJNA802976

Test subset:
SRR17868090
SRR17868091
SRR17868092

Common Prioritization Problems

Prioritization can become unclear when ranking decisions are not documented.

Common problems include:

  • choosing a study only because it was found first
  • prioritizing the largest dataset without checking metadata
  • ignoring file availability
  • mixing primary and secondary examples
  • failing to explain why one eligible study was selected over another
  • changing the priority study without updating documentation

A prioritization table helps prevent these issues.

Summary

Study prioritization ranks screened studies based on their usefulness, readiness, and alignment with the discovery objective.

Screening determines whether a study is eligible. Prioritization determines which eligible study should be used first.

For this guide, PRJNA802976 is prioritized as the primary healthy human gut microbiome BioProject, with SRR17868090, SRR17868091, and SRR17868092 used as the primary test subset.

PRJNA322554 remains available as a secondary comparison record.

Looking Ahead

In the next chapter, we assemble the included and prioritized studies into a structured output package that can be handed off to the CDI Data Acquisition System.