Skip to content

Update list of sequencers for poly-g trimming#508

Merged
sfchen merged 3 commits intoOpenGene:masterfrom
semenko:expand-polyg-trimming
Jan 20, 2026
Merged

Update list of sequencers for poly-g trimming#508
sfchen merged 3 commits intoOpenGene:masterfrom
semenko:expand-polyg-trimming

Conversation

@semenko
Copy link
Contributor

@semenko semenko commented Jul 14, 2023

This PR updates and expands the list of serial numbers of Illumina 2-color SBS sequencers for poly-g trimming.

These values come from: https://knowledge.illumina.com/instrumentation/general/instrumentation-general-reference_material-list/000003880

This expands the previous 2-color list by adding:

  • NovaSeq 1000/2000 (@VL @VH)
  • NovaSeq X Plus (@LH)

This also broadens the NovaSeq 6000 serial from (@A0 --> @A) per Illumina's doc.

(I do not see @NDX documented by illumina, but this might be their NextSeq 550Dx FDA-regulated sequencer.)

Expanded parsing of Illumina 2-color SBS definitions for poly-g trimming.

These values are via: https://knowledge.illumina.com/instrumentation/general/instrumentation-general-reference_material-list/000003880

This expands the previous 2-color list by adding:
Novaseq 1000/2000 (@vl @vh)
Novaseq X Plus (@lh)

This changes the Novaseq 6000 header from (@a0 to @A) per Illumina's doc.

(I do not see @ndx documented by illumina, but this might be their NextSeq 550Dx FDA-regulated sequencer.)
semenko added a commit to semenko/liquid-cell-atlas that referenced this pull request Jul 14, 2023
OK for our data so far, and I submitted a PR to fastp:
OpenGene/fastp#508
@dlaehnemann
Copy link

I would like to rely on the automatic setting of the --trim_poly_g, and it would be great to see this pull request here, as well as #598 merged. Is anything holding this back?

Also, the above Illumina page is gone, which they do regularly on their docs. 🤦
But here's a page that at least mentions which models should be 2-channel (and also the 1-channel iSeq):
https://web.archive.org/web/20250701072529/https://www.illumina.com/science/technology/next-generation-sequencing/sequencing-technology/2-channel-sbs.html

And it seems like 10X put in quite some effort to collect all of Illumina's machine codes right here, although this seems to be 8 years old, so there will also be stuff missing:
https://github.com/10XGenomics/supernova/blame/b82c3d8efa68bda2d95f30621cd6d91308ce11a2/tenkit/lib/python/tenkit/illumina_instrument.py#L12-L45

So maybe these pull requests could be merged into one and amended if any of the other machine codes are missing.

@dlaehnemann
Copy link

Ah, and I finally also found this, where the author actually got info from Illumina support (this seems to be the only way of getting useful and somewhat structured info from them):
https://github.com/nickp60/fcid/blob/04bd2e6aab1979a6902a4470cc82c574991242a0/fcid/run.py#L5-L123

@dlaehnemann
Copy link

OK, I couldn't help myself and went to figure out which Illumina machine models will have the polyG issue. The list is (TL;DR):

  • iSeq 100
  • MiniSeq
  • MiSeq i100 Series
  • NextSeq 500/550
  • NextSeq 1000/2000
  • NovaSeq 6000
  • NovaSeq X

And here is the full story, with receipts:

Illumina Instrument Imaging Channel Systems

Instrument Model Imaging Technology Channel System Details Documentation Link Checked manually
iSeq 100 One-channel one dye, compact system Chemistry and Imaging on iSeq 100 Done.
MiniSeq Two-channel red and green Chemistry and Imaging on MiniSeq Done.
NextSeq 500/550 Two-channel red and green Chemistry and imaging on NextSeq 500/550 Done.
NextSeq 1000/2000 Two-channel blue and green, standard or XLEAP (A blue) Chemistry and imaging on the NextSeq 1000/2000 Done.
MiSeq Four-channel oldest SBS (Sequencing by Synthesis) chemistry Chemistry and imaging on MiSeq Done.
MiSeq i100 Series Two-channel blue and green, XLEAP (C blue) Two Channel Chemistry and Imaging on the MiSeq i100 Series Done.
HiSeq 1000/2500 Four-channel oldest SBS (Sequencing by Synthesis) chemistry Chemistry and imaging on MiSeq (also mentions HiSeq series) Done.
HiSeq X Four-channel oldest SBS (Sequencing by Synthesis) chemistry HiSeq X System Guide (15050091 v07) (mentions "the four color channels") Done.
NovaSeq 6000 Two-channel red and green Chemistry and Imaging on NovaSeq 6000 Done.
NovaSeq X Series Two-channel blue and green, XLEAP (A blue) Chemistry and Imaging on the NovaSeq X Series Instruments Done.

Channel System Summary

one-channel

Good overview: https://web.archive.org/web/20250701121553/https://knowledge.illumina.com/instrumentation/iseq-100/instrumentation-iseq-100-reference_material-list/000008434

Machine series:

  • iSeq 100

General setup:

  • one dye
  • each sequencing cycle has two rounds of chemistry + imaging

Color scheme:

  • adenine: first image only
  • cytosine: second image only
  • thymine: both images
  • guanine: permanently dark

two-channel

Good overview: https://web.archive.org/save/https://knowledge.illumina.com/instrumentation/novaseq-x-x-plus/instrumentation-novaseq-x-x-plus-reference_material-list/000007970

General setup:

  • two dyes (different colors, different base associations)
  • each sequencing cycle has two rounds of chemistry + imaging

two-channel: red and green

Machine series:

  • MiniSeq
  • NextSeq 500/550
  • NovaSeq 6000

Color scheme:

  • thymine: green
  • cytosine: red
  • adenine: both
  • guanine: dark

two-channel: blue and green

Standard reagents

Machine series:

  • NextSeq 1000/2000

Color scheme:

  • thymine: green
  • cytosine: blue
  • adenine: both
  • guanine: dark

XLEAP Reagents (A blue)

Machine series:

  • NextSeq 1000/2000
  • NovaSeq X

Color scheme:

  • thymine: green
  • adenine: blue
  • cytosine: both
  • guanine: dark

XLEAP Reagents (C blue)

Good overview: https://web.archive.org/web/20250701122436/https://knowledge.illumina.com/instrumentation/miseq-i100-series/instrumentation-miseq-i100-series-reference_material-list/000009348

Machine series:

  • MiSeq i100 Series

Color scheme:

  • thymine: green
  • cytosine: blue
  • adenine: both
  • guanine: dark

four-channel

Good overview: https://web.archive.org/web/20250701122141/https://knowledge.illumina.com/instrumentation/miseq/instrumentation-miseq-reference_material-list/000003757

Machine series:

  • MiSeq (except the i100 series)
  • HiSeq Series (docs mentioning HiSeq 1000/2500 and HiSeq X, but not other HiSeqs)

General setup:

  • four dyes
  • each sequencing cycle has four rounds of chemistry + imaging

Color scheme:

  • thymine: green
  • cytosine: yellow
  • adenine: red
  • guanine: blue

Data compiled from Illumina Knowledge Base documentation as of July 1st, 2025. The initial table was created by asking Claude Sonnet 4, to aggregate the relevant info scattered across Illumina Knowledge Base pages. But all entries and linkouts were checked manually, especially those for the HiSeq series were adjusted to point somewhere with a useful citation, an all pages were archived on the Wayback Machine (as Illumina often changes their links). Finally, I made the table much more concise by giving more detailed channel system descriptions below, which I compiled during cross-checking.

@dlaehnemann
Copy link

dlaehnemann commented Jul 4, 2025

I tried to compile a list of identifiers for Illumina machine models with two- or one-channel imaging (the ones with the polyG tail issue).
Theoretically, one should be able to identify them from the Illumina Serial Number (ISN) in the fastq headers, and most of those seem to be known from Illumina or other sources (I didn't find anything for the recent MiSeq i100 Series machines).
But it seems like this instrument name that contains the ISN can be changed in the machine setup.
So this is probably actually not the most reliable way to determine the machine model.

Instead, it probably makes more sense to use the flowcell ID, which also contains codes for the machine models.
This supposedly gets generated automatically by the machines and cannot be altered.
And people have gotten to the Flowcell ID patterns by emailing Illumina support and have a documented this for most models.
But again, information on the MiSeq i100 Series is missing.
I guess, one would have to email again for a current list, and maybe suggest they put a comprehensive list somewhere, and keep it up to date...

Here's what I could find.
An x would be represented by the regex [A-Za-z0-9] for pattern matching.

Illumina machine model Instrument ID Sources Flowcell ID pattern alt codes Sources
iSeq 100 @FS ISN, iSeq PR BRBxxxxx-xxxx BPC, BPG, BPA, BPL, BNT, BTR fcid
MiniSeq @MN ISN, 10X, fcid 000Hxxxxx fcid
MiSeq i100 Series @SH SCxxxxxxx bwlang
NextSeq 500/550 @NS, @NB ISN, PR #508, 10X, fcid xxxxxAFxx BG, AG fcid
NextSeq 1000/2000 @VH, @VL ISN, PR #508 xxxxxxxM5 HV fcid, 10X
NovaSeq 6000 @A, @NA ISN, PR #508, fcid xxxxxDRxx DM, DS fcid, 10X, ICTN62, ICTN63
NovaSeq X @LH ISN, PR #508 xxxxxxLTx fcid

@serge2016
Copy link

serge2016 commented Oct 6, 2025

Flowcell ID pattern for NovaSeq 6000 may differ: I saw these variants:

  • DS
  • DR
  • DM

Sorry, now I understand, that these codes present in alt codes column

@dlaehnemann
Copy link

No worries, any extra pair of eyes and any extra input is welcome.

@bwlang
Copy link
Contributor

bwlang commented Jan 18, 2026

I think for simplicity (and to get some forward progress) we should merge based on the machine name.

We just need to add @FS for iSeq (from #598) and @SH for MiSeq i100 (which I have observed) I think to reflect current instruments. @semenko , can you update ?

@bwlang
Copy link
Contributor

bwlang commented Jan 18, 2026

@dlaehnemann : in case you want to update your table... some of our i100 runs look like this:

20251208_SH00350_0012_ASC2149232-SC3 
20251209_SH00350_0013_ASC2149083-SC3
20251218_SH00350_0017_ASC2150551-SC3
20251218_SH00350_0018_ASC2150786-SC3
20251219_SH00350_0019_ASC2154379-SC3
20251223_SH00350_0020_ASC2154126-SC3

so @SH and ASCxxxxxxx i guess?

@dlaehnemann
Copy link

Just double-checking: do you see those tags (@SH and ASCxxxxxxx) in the read names as well, in the fastq-files? So do those lines look something like this?

@SH00350_0012_ASC2149232-SC3[...]

@bwlang
Copy link
Contributor

bwlang commented Jan 19, 2026

from a recent ubam:
SH00350:SH00350:SC2146634-SC3:1:1108:4725:2635

so for reads, maybe the ASC is not good - but just SC (A is like a side designation for NS6000).

@dlaehnemann
Copy link

I added it to the table accordingly. Thanks for reporting this. Let's see if the author comes back to this.

@bwlang
Copy link
Contributor

bwlang commented Jan 19, 2026

@semenko
if you are able to update soon, this could go into the next release - if not I'll make the update.

@semenko
Copy link
Contributor Author

semenko commented Jan 19, 2026

Sure @bwlang - will update PR in a bit.

Copilot AI review requested due to automatic review settings January 19, 2026 17:34
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the list of Illumina 2-color sequencer instrument ID prefixes used for automatic poly-G trimming detection. The changes expand support for newer sequencing platforms based on Illumina's official documentation.

Changes:

  • Added instrument ID prefixes for NovaSeq 1000/2000 (@VL, @VH) and NovaSeq X Plus (@LH)
  • Broadened NovaSeq 6000 prefix from @A0 to @A to match all variants
  • Added documentation comments with Illumina reference URL and instrument mappings

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Simplify the poly-g trimming test and add modern sequencers that use
two-channel chemistry and would benefit from polyG tail trimming.

Signed-off-by: Nick Semenkovich <semenko@alum.mit.edu>
Copy link
Contributor

@bwlang bwlang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good thanks

@sfchen sfchen merged commit 76957f8 into OpenGene:master Jan 20, 2026
@semenko semenko deleted the expand-polyg-trimming branch January 20, 2026 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants