Skip to content

fhdsl/ITN_course_search

Repository files navigation

ITN_course_search

A auto-generating searchable table for ITN courses. The collection of information about the courses is programmatically queried from GitHub and processed, all except for additional modalities.

About

ITN_course_search uses the Github API to gather jhudsl and fhdsl organization repositories, specifically ITN courses, that we have worked on. It renders the table in a markdown-readable format. This repo has workflows that trigger collection building and table rendering once a week.

The table only includes repositories that meet the following required criteria:

  1. Repository within the jhudsl or fhdsl organizations
  2. Public repository
  3. Have a homepage listed
  4. Have a description listed
  5. Have “itn” and “course” as part of the repository tags (e.g., “itn-course”, or “itn” and “course”) (str_detect(topics, "itn") & str_detect(topics, "course"))
  6. Aren’t template per the tags (!str_detect(topics, "template ") – using a space because we want repos with tags of “templates” if the repo is providing templates, e.g., the Overleaf/LaTeX Course)
  7. Has a repository tag for launch date specified by launched-monYEAR (e.g., launched-aug2025 or launched-dec2023)

Shows the seven required criteria for including a course in the search table using the Overleaf course GitHub page as an example

Previewing changes in a PR? You’ll have to use the comprehensive download. The quick preview isn’t enabled since we removed the navbar from _site.yml

Interested in adding a course to the table?

At the moment, to add a course to the table, either wait for the repo to fetch the collection data, or open a PR with a trivial change. Alternatively, you can manually trigger the Build Collection Action within the “Actions” tab on GitHub. This will update the collection which will then automatically trigger rendering of the table and deployment with GitHub pages.

  • ☐ Make sure the above required criteria (1-7) are met (jhudsl/fhdsl organization, public, homepage, description, itn-course tag, isn’t a template, launch date tag).

Make sure the rest of the information for the table (e.g., title, access links/available course formats, etc.) is specified (where and how) the query procedure expects (explained below).

Shows the Actions tab on GitHub, specifically the Build Collection workflow and how to manually trigger the action to run on the main branch. First within the GitHub repository for the search table, navigate to the actions tab along the top. Then navigate to the build collection workflow along the left sidebar. Then click on the Run workflow dropdown menu and click on the Run workflow green button without adjusting any defaults.

Course title specification

Course title specification is within the course material files, usually the index.Rmd (or _quarto.yml for quarto book) file.

Rmarkdown courses specify the title in the index.Rmd file in the area between 3 dashes. The code looks for title with a colon at the beginning of the line to distinguish it from the subtitlet

Note that the query procedure looks for an index.Rmd file first, and if it doesn’t find one in the repository, it then assumes it could be a quarto course and looks for the _quarto.yml file next automatically.

A quarto course will have the title specified in the _quarto.yml file and replaces the title colon and space with nothing to extract the title.

Available course formats specification

Available course formats specification is within the course material files, usually the index.Rmd (or index.qmd for quarto book) file.

  • ☐ Verify that available course formats are listed in index.Rmd (or index.qmd for quarto book) file, specifically for Coursera and Leanpub if applicable. (The GitHub source material and GitHub pages homepage information for the course table will be taken from the API call rather than this information, but it’s good practice to have both included within this chunk too)

Available Course Formats are listed in the index.Rmd file for the Overleaf course as an example for Rmarkdown courses

Note that the query procedure looks for an index.Rmd file first, and if it doesn’t find one in the repository, it then assumes it could be a quarto course and looks for the index.qmd file next automatically.

Available Course Formats are listed in the index.qmd file for the example quarto book from the Containers for Scientists example course

What the code is looking for exactly

Because the get_linkOI() function is set up to find line(s) with the course format “pattern” (e.g., “Coursera” or “Leanpub”), then extract all URLs from those lines, and then subset to the relevant URL if needed (again using the “pattern”), these available course formats do not need to be in an ordered, bulleted, or enumerated list. They could all be mentioned in a notice box or paragraph. As long as the line with the link says “Coursera” or “Leanpub”, this process will find and extract the relevant links.

Available course formats do not need to be listed or enumerated in a bulleted or ordered list in order for the process to extract the information as seen with the NIH for Data Sharing course

Necessary background information and learning objectives specification

Necessary background information and learning objectives are typically specified as images (Google Slides grabbed with ottrpal::include_slide() function) within the course material files, usually the 01-intro.Rmd (or 01-intro.qmd for quarto book) file. The code blocks where these are specified need to have specific identifiers.

  • ☐ Verify that the 01-intro.Rmd (or 01-intro.qmd file for quarto courses, ex: Containers for Scientists) file has identifiers for code blocks grabbing relevant google slide images.
    • LOs: learning_objectives
    • Audience: for_individuals_who
    • Topics covered: topics_covered
    • Pre-reqs (if applicable): prereqs
What the code is looking for exactly

The code in the get_slide_info() function within the query_collection.R file looks for these code block identifiers exactly

Note that the query procedure first checks the course name to make sure it’s not part of a special set of courses that don’t follow the usual convention/location for this information. If the course isn’t in that list, the procedure checks for the 01-intro.Rmd file first and if that file isn’t found it assumes it could be a quarto course and looks for 01-intro.qmd automatically next.

Example of learning objectives, audience, and topics covered code blocks from the Overleaf Course

Example of a prereqs code block from the GitHub Automation for Scientists Course

If a different file contains the information, you’ll need to edit the get_slide_info() function within the query_collection.R file.

Examples where these blocks aren’t in the expected files include
  • AI for Efficient Programming: They’re in index.Rmd instead.
  • NIH for Data Sharing:
    • LOs are in their own file (and the chunk is commented out, but still accessible to this table building) Learning_objectives.Rmd instead.
    • Audience and Topics covered are in index.Rmd
  • AI for Decision Makers: They’re in 4 different files on 4 difference branches – one for each sub-course.

If your course’s introductory material isn’t located within the expected 01-intro.Rmd or 01-intro.qmd file locations, add the course name here within query_collection.R and then add the checking with the alternative file(s) for your course within this else, following the example of AI for Efficient Programming and NIH for Data Sharing, unless your course has sub-courses within the main repo (that aren’t in the table and will need new rows) or information on branches in which case, you’ll want to follow the example used for AI for Decision Makers and use a similar special function

Audience specification

Audience specification is not within the course material files but instead is within the repository settings

  • ☐ Add repo tags for audience (at least one of the following):
    • audience-software-developers
    • audience-researchers
    • audience-leaders

Audience tags are specified in the repository settings and can be any combination of the choices

What the code is looking for exactly

The code in the query_collection.R file that traverses the pages of the GitHub API request results looks for these tags exactly.

Category specification

(Note that the table does not currently display categories)

Category specification is not within the course material files but instead is within the repository settings.

  • ☐ Add repo tags for category (only one of the following):
    • category-best-practices
    • category-software-dev
    • category-fundamentals-tools-resources
    • category-hands-on-practice

Category tags are specified in the repository settings and should be only one of the choices.

What the code is looking for exactly

The code in the query_collection.R file that traverses the pages of the GitHub API request results either looks for these tags exactly (which is the case for category-software-dev and category-best-practices), or it looks for the prefix (category-fundamentals- rather than the full category-fundamentals-tools-resources and category-hands-on- rather than the full category-hands-on-practice).

Funding

All courses with an itn-course or just itn tag are assumed to be ITN funded. Add a hutch-course topic tag to the repo if it was Hutch funded and should include Hutch branding as well.

Cleaning up topic tags for display in the table

The query_collection.R file does some removal of topic tags but generally does NOT clean the topic tags data. Topic tags that are removed include data4all and reproducible-research (dasl course categorization tags) as well as the tags that provide information on funding (course$), audience, category, and launch date. Additionally, we remove the tag just “reproducibility” as this is redundant information (mutate(across(starts_with("topics_"), ~replace(., str_detect(., "audience-|category-|course$|launched-|data4all|reproducible-research|^reproducibility$"), NA))) %>%).

Very minimal cleaning is done within the prep_table() function within the format-tables.R script. This minimal cleaning includes (1) inserting a line break and a bullet point in place of every semicolon (which separates the topic tags in the collection following querying) and (2) replacing hyphens with a space. Special cases or substitutions of cleaning are handled within index.Rmd of this ITN_course_search repo, specifically in the wrangle_data code chunk.

Within that chunk …

  1. Use title case on the concepts with the str_to_title() function because the repo topic tags are all lower case.
  2. Ai –> AI (for the AI for Efficient Programming and AI for Decision Makers courses)
  3. Ci-Cd –> Continuous Integration/Continuous Deployment (for the Containers for Scientists Course)
  4. Nih –> NIH (for the Data Management and Sharing for NIH Proposals course)
  5. Hipaa –> HIPAA (for the Ethical Data Handling for Cancer Research course)
  6. Llm –> LLM (for the AI for Efficient Programming course)
  7. Phi –> (PHI) (for the Ethical Data Handling for Cancer Research course)
  8. Pii –> (PII) (for the Ethical Data Handling for Cancer Research course)
  9. Arxiv –> ArXiv (for the Overleaf and LaTeX course)
  10. Latex –> LaTeX (for the Overleaf and LaTeX course)
  11. And –> & (space saving, used for the Choosing Genomics Tools course )
  12. Dms –> DMS (for Data Management and Sharing for NIH Proposals course)

Add any additional specific changes to the topic tags for cleaning within that chunk going forward.

Additional Modalities for material

This is the only part of the table that is manually curated with information outside of GitHub that we can query or extract. The modalities data is added to the rest of the queried collection in the join_additional_modalities chunk of index.Rmd.

  • ☐ Add any additional modalities that are relevant to the course material in the relevant Google sheet following the standard/expected order of information below for the first three columns. Note that courses can have multiple additional resources. And if an additional resource is relevant for two or more courses, it will need to be listed separately (as a different row) for each of those courses.
    • Course/GitHub Repo name for the course in the first column
    • the description of the modality (this is what will be displayed under the icon) in the second column
    • a link to the resource in the third column

Supported modalities currently include

  • Videos (link contains "youtu.be")
  • Podcasts (link contains "buzzsprout")
  • Cheatsheets (link contains "cheatsheets")

Additional modalities that will be supported are soon to include

  • Soundbite (link contains "sciencecast")
  • Publication (link contains "doi|articles")
  • Workshop Materials (link contains "hutchdatascience|docs.google")
  • Data Resources (table) (link contains "dataResource") (this one has to be listed/looked for first in the case_when() because both the data resources and the computing resources have "computing_resources" in the URL)
  • Computing Resources (table) (link contains "computing_resources")

Note, each modality link will have both an icon and a type associated with it. This renders to be an icon that links out to the specific resource with the displayed name/description from the second column and the modality type specified at the end of that description. Look at how modality_constructed_link is built within the second mutate statement in the add_modalities() function within the format-tables.R file.

Adding icons

Audience (column is BroadAudience), course categories (column is Category), and funding (column is Funding) information adds icons to the data while building the tables. This is done within the prep_table() function of the format-tables.R file. If you are editing or adding a category to any of these, you will need to update those mutate steps there.

The additional modalities (or “More Resources” in the final table) also adds icons while building the table. However, this is done within the add_modalities() function of the format-tables.R file. If you are editing or adding a category to the modalities, you will need to update those mutate steps there. First set of case_when() statements relate to the icon that will be used for that type of modality, and the second set relate to the one or two words modality type description that will be used in labeling that icon.

Important files

  • scripts/query_collection.R: gathers information (audience, funding, topics, etc.) about ITN courses from their GitHub repos
  • resources/collection.tsv: where the collection from query_collection.R is stored.
  • scripts/format-tables.R: functions to wrangle course data and format course table
  • index.Rmd: drives building each course specific html page and the overall course table
  • chunks/*Rmd or chunks/#.md: chunks that we’ll borrow using ottrpal::borrow_chapter (from the base_ottr:dev container specified in config_automation.yml) and fill in {SPECIFIC INFO} for course (following the example of our cheatsheets repo). Because of this approach, a chunk will only inherit specific information if we pass it as a tag replacement. In other words, not every piece of information in each row/about a specific course will be available to the chunks, only the information we specify as a tag replacement).
    • about: aboutCourse.md with “{COURSE_DESCRIPTION}”, “{COURSE_CATEGORY}”, and “{COURSE_LAUNCH}” to be provided/replaced
    • audience: audienceCourse.Rmd with “{FOR_SLIDE_LINK}” and “{COURSE_AUDIENCE}” to be provided/replaced
    • format: formatFullCourse.Rmd with “{BOOKDOWN_LINK}”, “{GITHUB_LINK}”, “{COURSERA_LINK}”, and “{LEANPUB_LINK}” to be filled in
    • funding: fundingFullCourse.Rmd with “{hutch_funded}” to be filled in
    • LOs: loCourse.Rmd with “{LO_SLIDE_LINK}” to be provided/replaced
    • concepts discussed: conceptsCourse.Rmd with “{CONCEPTS_SLIDE_LINK}” tag to be provided/replaced
    • pre-requisites: prereqsCourse.Rmd with “{PREREQ_SLIDE_LINK}” and “{GITHUB_LINK}” tags to be provided/replaced. If there are pre-requisites for a course, and you want to add a direct link to them, look at or add to the conditionals in this particular .Rmd
  • *_template.Rmd: the template for driving course specific pages.
    • single_course_template.Rmd: layout for building general course pages
    • ai_course_template.Rmd: layout for AI for Decision Makers course page
  • *_coursePage.html: the output course specific html pages

About

A searchable table for ITN courses

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors