Skip to content

Bugs in dataset download script #4

@xiaowu0162

Description

@xiaowu0162

Thank you for the great work! I met two issues when running scripts/download_compactds.sh:

  • The output_dir variable is not used
  • The shard combination pattern also included the file index_IVFPQ.100000000.768.65536.64.faiss.meta, resulting in an error when the index is used later.

Here is the working script after some minor changes:

#!/bin/bash

# Check if an argument is passed
if [ $# -eq 0 ]; then
  echo "Usage: $0 <argument>"
  exit 1
fi

output_dir=$1

# Download the sharded index files
python scripts/download_index.py --output_path $output_dir

# Combine the shards
cat $output_dir/embeddings/index_IVFPQ/index_IVFPQ.100000000.768.65536.64.faiss_* > index_IVFPQ.100000000.768.65536.64.faiss

# Remove shard files
rm $output_dir/embeddings/index_IVFPQ/index_IVFPQ.100000000.768.65536.64.faiss_*

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions