Unable to run query on input file

Hi Dr. Bo,
This is Chai, firstly thanks for the GIANA tool , its actually a fascinating idea to use the body’s immune response as a diagnostic tool.
 
I am writing to request help in querying my set of sequences against a reference . I have attached a section of the input file, this was successfully clustered by the clustering command. Next, I tried to query against the reference provided with the tool, as below. Before this, I clustered hc10s10.txt, and put that rotation file in the same dir, as mentioned on the github page. 
 
python GIANA4.py -q input_giana.tsv -r hc10s10.txt -S 3.3 -o tmp/ 
 
Here is the error I got:
 
 Processing tmp_query.txt
Total time elapsed: 0.290075
Maximum memory usage: 0.196432 MB
     Build query clustering file. Elapsed 18.401398
Now mering with reference cluster
Traceback (most recent call last):
  File "GIANA4.py", line 1207, in <module>
    main()
  File "GIANA4.py", line 1151, in main
    MergeExist(refClusterFile, OutDir+'/'+outFile)
  File "/gpfs/scratch/cs5359/Projects/Weberlab_GIANA/GIANA/query.py", line 173, in MergeExist
    queryT=pd.read_table(queryClusterFile, skiprows=2, delimiter='\t', header=None)
  File "/gpfs/home/cs5359/.local/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1242, in read_table
    return _read(filepath_or_buffer, kwds)
  File "/gpfs/home/cs5359/.local/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 583, in _read
    return parser.read(nrows)
  File "/gpfs/home/cs5359/.local/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1704, in read
    ) = self._engine.read(  # type: ignore[attr-defined]
  File "/gpfs/home/cs5359/.local/lib/python3.8/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 234, in read
    chunks = self._reader.read_low_memory(nrows)
  File "pandas/_libs/parsers.pyx", line 814, in pandas._libs.parsers.TextReader.read_low_memory
  File "pandas/_libs/parsers.pyx", line 875, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 850, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 861, in pandas._libs.parsers.TextReader._check_tokenize_status
  File "pandas/_libs/parsers.pyx", line 2029, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 4 fields in line 4879, saw 7
 
 
 
I checked both the files, there is nothing different on line 4879. I noticed that the input file [input_giana.xlsx](https://github.com/s175573/GIANA/files/14842966/input_giana.xlsx) on github: TestReal-ADIRP0000023_TCRB.tsv, has 3 additional cols along with the cdr3 and gene info. These 3 cols are frequencyCount, RANK, and info. Are these mandoatory and how do I create these cols for my data?
 
Thanks in advance
Chai Sree

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to run query on input file #11

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Unable to run query on input file #11

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions