Skip to content

Unable to run query on input file #11

@CSree

Description

@CSree

Hi Dr. Bo,
This is Chai, firstly thanks for the GIANA tool , its actually a fascinating idea to use the body’s immune response as a diagnostic tool.

I am writing to request help in querying my set of sequences against a reference . I have attached a section of the input file, this was successfully clustered by the clustering command. Next, I tried to query against the reference provided with the tool, as below. Before this, I clustered hc10s10.txt, and put that rotation file in the same dir, as mentioned on the github page.

python GIANA4.py -q input_giana.tsv -r hc10s10.txt -S 3.3 -o tmp/

Here is the error I got:

Processing tmp_query.txt
Total time elapsed: 0.290075
Maximum memory usage: 0.196432 MB
Build query clustering file. Elapsed 18.401398
Now mering with reference cluster
Traceback (most recent call last):
File "GIANA4.py", line 1207, in
main()
File "GIANA4.py", line 1151, in main
MergeExist(refClusterFile, OutDir+'/'+outFile)
File "/gpfs/scratch/cs5359/Projects/Weberlab_GIANA/GIANA/query.py", line 173, in MergeExist
queryT=pd.read_table(queryClusterFile, skiprows=2, delimiter='\t', header=None)
File "/gpfs/home/cs5359/.local/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1242, in read_table
return _read(filepath_or_buffer, kwds)
File "/gpfs/home/cs5359/.local/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 583, in _read
return parser.read(nrows)
File "/gpfs/home/cs5359/.local/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1704, in read
) = self._engine.read( # type: ignore[attr-defined]
File "/gpfs/home/cs5359/.local/lib/python3.8/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 234, in read
chunks = self._reader.read_low_memory(nrows)
File "pandas/_libs/parsers.pyx", line 814, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas/_libs/parsers.pyx", line 875, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 850, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 861, in pandas._libs.parsers.TextReader._check_tokenize_status
File "pandas/_libs/parsers.pyx", line 2029, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 4 fields in line 4879, saw 7

I checked both the files, there is nothing different on line 4879. I noticed that the input file input_giana.xlsx on github: TestReal-ADIRP0000023_TCRB.tsv, has 3 additional cols along with the cdr3 and gene info. These 3 cols are frequencyCount, RANK, and info. Are these mandoatory and how do I create these cols for my data?

Thanks in advance
Chai Sree

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions