Skip to content

[Question] No similarity beetween WSI Reports Embeddings and WSI Slide Image #167

@vicky-gupta

Description

@vicky-gupta

What's your question?

Edit: I’m updating my question to make it more detailed for better understanding.


Hey Trident Team,

I’ve been using the TITAN model within the Trident framework to encode my WSI slides and reports.
Here’s the command I used to encode the WSI slides:

python TRIDENT/run_batch_of_slides.py \
  --task all \
  --wsi_dir "./Train_700" \
  --job_dir "./TRIDENT/new_tri_pro_train_part1" \
  --slide_encoder titan \
  --patch_size 512 \
  --mag 20 \
  --custom_list_of_wsis train_mpp_part1.csv

For the text reports, I used TITAN’s text encoder as follows:

# Encode the report text using TITAN (following the correct method)
with torch.no_grad():
    texts = [report_text]
    tokenized_text = model.text_encoder.tokenizer(texts).to(device)
    text_embedding = model.encode_text(tokenized_text, normalize=True)

    # Convert to NumPy
    if isinstance(text_embedding, torch.Tensor):
        text_embedding = text_embedding.cpu().numpy()

    # Remove batch dimension if present
    if len(text_embedding.shape) > 1 and text_embedding.shape[0] == 1:
        text_embedding = text_embedding.squeeze(0)

I’ve carefully reviewed the TITAN paper and tried to replicate the pretraining setup as closely as possible.


Task

Goal: Find the best-matching report for each WSI image.

Embeddings generation pipeline:

  • For WSI slides:

    WSI Image → Vision Encoder → Embeddings (not L2-normalized)
    → Apply L2 normalization along dim=-1 → Embeddings (L2-normalized)
    
  • For WSI reports:

    Report Text → TITAN Text Encoder → Embeddings (L2-normalized)
    

Then, computed cosine similarity between the WSI and report embeddings.


Problem

I’m getting almost zero similarity scores — suggesting the embeddings might not be aligned in the same shared space.

I suspect I might be missing a step or detail from the paper. Could you please advise which section or part of the TITAN paper I should revisit to understand how to properly align or normalize these embeddings?

Also, any feedback on my pipeline or assumptions would be greatly appreciated.


Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions