Create UniProt Mapping file. #691
Conversation
…nd presentor, scheduled the task to run monthly, added functions to get uniprot names and created the uniprotmapping tsv.
acoffman
left a comment
There was a problem hiding this comment.
Made some comments based on what we talked about on the call - let me know if you have any questions!
| tmp_file = tmp_file(e.file_name) | ||
| tmp_file.puts(e.headers.join("\t")) | ||
|
|
||
| e.objects.find_each do |object| |
There was a problem hiding this comment.
You probably don't need the indirection here of calling e.objects and can delete def self.objects in the presenter file. This can probably just be Gene.find_each
| e.objects.find_each do |object| | ||
|
|
||
| row = e.row_from_object(object) | ||
| if row[1].is_a?(Array) |
There was a problem hiding this comment.
I might change this to be e.rows_from_object() and assume you always get back an Array of rows and push the logic of handling multiple (or no) uniprot ids down a level. (See other comment)
| ] | ||
| end | ||
|
|
||
| def self.row_from_object(gene) |
There was a problem hiding this comment.
I'd rename to rows_from_object() and do something along these lines (haven't tested it, just off the top of my head):
swissprot_names = Array(Scrapers::MyGeneInfo.get_swissprot_name(gene))
formatted_overview = formatted_overview_col(gene)
swisprot_names.map do |swissprot_name|
if name == 'N/A'
nil
else
[gene.name, swissprot_name, formatted_overview]
end
end.compact
That way you have a list of rows for your TSV, compact will remove the nils and the code that actually writes the TSV can just be a simple iteration over genes, calling this, and then writing a row for each item this returns.
| "UniprotMapping.tsv" | ||
| end | ||
|
|
||
| def self.formatted_overview_col(gene) |
There was a problem hiding this comment.
I'd get the counts in the following ways:
eid_count = EvidenceItem.joins(variant: [:gene]).where("evidence_items.status != 'rejected'").where(variant: {gene: gene}).distinct.count
variant_count = gene.variants.joins(:evidence_items).where("evidence_items.status != 'rejected'").distinct.count
assertion_count = gene.assertions.where("status != 'rejected'").distinct.count
You could also invert the logic and do something like this:
Assertion.joins(:gene).where("status != 'rejected'").where(gene: g).distinct.count
depending on what's more clear to you.
Corrected spelling error in generate_tsvs.rb, created tsv generator and presenter, scheduled the task to run monthly, added functions to get uniprot names and created the uniprotmapping tsv. Need help getting the correct counts for variants and evidence items that are non-rejected.