Update list of sequencers for poly-g trimming#508
Conversation
Expanded parsing of Illumina 2-color SBS definitions for poly-g trimming. These values are via: https://knowledge.illumina.com/instrumentation/general/instrumentation-general-reference_material-list/000003880 This expands the previous 2-color list by adding: Novaseq 1000/2000 (@vl @vh) Novaseq X Plus (@lh) This changes the Novaseq 6000 header from (@a0 to @A) per Illumina's doc. (I do not see @ndx documented by illumina, but this might be their NextSeq 550Dx FDA-regulated sequencer.)
OK for our data so far, and I submitted a PR to fastp: OpenGene/fastp#508
|
I would like to rely on the automatic setting of the Also, the above Illumina page is gone, which they do regularly on their docs. 🤦 And it seems like 10X put in quite some effort to collect all of Illumina's machine codes right here, although this seems to be 8 years old, so there will also be stuff missing: So maybe these pull requests could be merged into one and amended if any of the other machine codes are missing. |
|
Ah, and I finally also found this, where the author actually got info from Illumina support (this seems to be the only way of getting useful and somewhat structured info from them): |
|
OK, I couldn't help myself and went to figure out which Illumina machine models will have the polyG issue. The list is (TL;DR):
And here is the full story, with receipts: Illumina Instrument Imaging Channel Systems
Channel System Summaryone-channelMachine series:
General setup:
Color scheme:
two-channelGeneral setup:
two-channel: red and greenMachine series:
Color scheme:
two-channel: blue and greenStandard reagentsMachine series:
Color scheme:
XLEAP Reagents (A blue)Machine series:
Color scheme:
XLEAP Reagents (C blue)Machine series:
Color scheme:
four-channelMachine series:
General setup:
Color scheme:
Data compiled from Illumina Knowledge Base documentation as of July 1st, 2025. The initial table was created by asking Claude Sonnet 4, to aggregate the relevant info scattered across Illumina Knowledge Base pages. But all entries and linkouts were checked manually, especially those for the HiSeq series were adjusted to point somewhere with a useful citation, an all pages were archived on the Wayback Machine (as Illumina often changes their links). Finally, I made the table much more concise by giving more detailed channel system descriptions below, which I compiled during cross-checking. |
|
I tried to compile a list of identifiers for Illumina machine models with two- or one-channel imaging (the ones with the polyG tail issue). Instead, it probably makes more sense to use the flowcell ID, which also contains codes for the machine models. Here's what I could find.
|
|
Sorry, now I understand, that these codes present in |
|
No worries, any extra pair of eyes and any extra input is welcome. |
|
@dlaehnemann : in case you want to update your table... some of our i100 runs look like this: so |
|
Just double-checking: do you see those tags ( |
|
from a recent ubam: so for reads, maybe the ASC is not good - but just SC (A is like a side designation for NS6000). |
|
I added it to the table accordingly. Thanks for reporting this. Let's see if the author comes back to this. |
|
@semenko |
|
Sure @bwlang - will update PR in a bit. |
There was a problem hiding this comment.
Pull request overview
This PR updates the list of Illumina 2-color sequencer instrument ID prefixes used for automatic poly-G trimming detection. The changes expand support for newer sequencing platforms based on Illumina's official documentation.
Changes:
- Added instrument ID prefixes for NovaSeq 1000/2000 (
@VL,@VH) and NovaSeq X Plus (@LH) - Broadened NovaSeq 6000 prefix from
@A0to@Ato match all variants - Added documentation comments with Illumina reference URL and instrument mappings
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Simplify the poly-g trimming test and add modern sequencers that use two-channel chemistry and would benefit from polyG tail trimming. Signed-off-by: Nick Semenkovich <semenko@alum.mit.edu>
This PR updates and expands the list of serial numbers of Illumina 2-color SBS sequencers for poly-g trimming.
These values come from: https://knowledge.illumina.com/instrumentation/general/instrumentation-general-reference_material-list/000003880
This expands the previous 2-color list by adding:
@VL @VH)@LH)This also broadens the NovaSeq 6000 serial from (
@A0 --> @A) per Illumina's doc.(I do not see
@NDXdocumented by illumina, but this might be their NextSeq 550Dx FDA-regulated sequencer.)