Skip to content

HLA divergence calculation might not be correct #5

@steletvinicius

Description

@steletvinicius

Hi again.

Could you please clarify how the HLA divergence score is calculated?
What is the math behind it?

I have got this comment from the hla_divergence.R file: https://github.com/slowkow/hlabud/blob/main/R/hla_divergence.R

The divergence is the sum of the distances between each pair of amino acids at each position, divided by the total sequence length.

Take into consideration the position argument You added in response to my request about calculating HLA divergence for HLA allele protein domains (substring of the alignment / subset of columns from the hla_alignment matrix).

Taking as an example the allele pair B44:02+B44:03, I have run some tests and It seems to me that something might be not working like expected.

To summarize:

  1. HLA protein sequence length is overestimated as it is considering the number of columns from the hla_alignments object which do not correspond to the correct number of amino acids forming the full HLA protein
    For HLA B, the matrix has 380 columns; however, the HLA-B protein has 362 residues

  2. When applying the protein segmentation strategy, a new feature implemented as discussed on the issue HLA divergence restricted to the peptide binding groove #4 ,
    the HLA divergence calculation is not assuming the new potein segment length informed on the positions argument.
    Given that, in the end, we will not control/normalize the sequence divergence accordingly as we expect to have a higher divergence index (normalized by area) on the peptide binding groove (wher the amount of differences is more concentrated) when compared to what is found for complete HLA protein.

Please, check my tests on the gist mentioned below.
You will also find there an IMGT protein alignment example.

https://gist.github.com/steletvinicius/7789538387ac7d94c417b747db2be655

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions