Skip to content

Create UniProt Mapping file. #691

Open
nairod2000 wants to merge 4 commits intogriffithlab:stagingfrom
nairod2000:uniprotMapping
Open

Create UniProt Mapping file. #691
nairod2000 wants to merge 4 commits intogriffithlab:stagingfrom
nairod2000:uniprotMapping

Conversation

@nairod2000
Copy link
Copy Markdown

Corrected spelling error in generate_tsvs.rb, created tsv generator and presenter, scheduled the task to run monthly, added functions to get uniprot names and created the uniprotmapping tsv. Need help getting the correct counts for variants and evidence items that are non-rejected.

…nd presentor, scheduled the task to run monthly, added functions to get uniprot names and created the uniprotmapping tsv.
Copy link
Copy Markdown
Member

@acoffman acoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some comments based on what we talked about on the call - let me know if you have any questions!

tmp_file = tmp_file(e.file_name)
tmp_file.puts(e.headers.join("\t"))

e.objects.find_each do |object|
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably don't need the indirection here of calling e.objects and can delete def self.objects in the presenter file. This can probably just be Gene.find_each

e.objects.find_each do |object|

row = e.row_from_object(object)
if row[1].is_a?(Array)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might change this to be e.rows_from_object() and assume you always get back an Array of rows and push the logic of handling multiple (or no) uniprot ids down a level. (See other comment)

]
end

def self.row_from_object(gene)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rename to rows_from_object() and do something along these lines (haven't tested it, just off the top of my head):

swissprot_names = Array(Scrapers::MyGeneInfo.get_swissprot_name(gene))
formatted_overview = formatted_overview_col(gene)
swisprot_names.map do |swissprot_name|
  if name == 'N/A'
    nil
  else 
    [gene.name, swissprot_name, formatted_overview]
  end
end.compact

That way you have a list of rows for your TSV, compact will remove the nils and the code that actually writes the TSV can just be a simple iteration over genes, calling this, and then writing a row for each item this returns.

"UniprotMapping.tsv"
end

def self.formatted_overview_col(gene)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd get the counts in the following ways:

eid_count = EvidenceItem.joins(variant: [:gene]).where("evidence_items.status != 'rejected'").where(variant: {gene: gene}).distinct.count
variant_count = gene.variants.joins(:evidence_items).where("evidence_items.status != 'rejected'").distinct.count
assertion_count = gene.assertions.where("status != 'rejected'").distinct.count

You could also invert the logic and do something like this:

Assertion.joins(:gene).where("status != 'rejected'").where(gene: g).distinct.count

depending on what's more clear to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants