Conversation
|
@dcshzj this seems like a one-off script for NDF and no need to be merged so i won't review - do correct me if im wrong! |
Hmm conflicted, I created this PR cos we will run this for at least 6 times - July to December 2026, but also a bit pointless to review honestly. Can review if got time! Super not urgent. |
|
This pull request has been stale for more than 30 days! Could someone please take a look at it @opengovsg/isomer-engineers |
There was a problem hiding this comment.
Pull request overview
This PR adds automated migration scripts for the National Drug Formulary (NDF), introducing a new scripts package to handle recurring automation work for migrating NDF collections.
Key Changes:
- Creates a new
@isomer/scriptspackage with TypeScript configuration and dependencies for HTML/CSV processing - Implements migration scripts for NDF active ingredient and product information collections
- Integrates TipTap extensions and jsdom for HTML content transformation
Reviewed changes
Copilot reviewed 8 out of 10 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| tooling/scripts/tsconfig.json | TypeScript configuration for the scripts package |
| tooling/scripts/package.json | Package definition with dependencies for HTML/CSV processing |
| tooling/scripts/ndf-collection/config.ts | Configuration file containing CSV file paths |
| tooling/scripts/ndf-collection/utils.ts | Utility functions for CSV parsing, HTML processing, and data transformation |
| tooling/scripts/ndf-collection/template.ts | Template generators for monograph and product information pages |
| tooling/scripts/ndf-collection/index.ts | Entry point with interactive CLI for selecting migration type |
| tooling/scripts/ndf-collection/getHtmlAsJson.ts | HTML to JSON converter using TipTap and jsdom |
| tooling/scripts/ndf-collection/createProductInformationCollection.ts | Script to migrate product information from CSV to JSON |
| tooling/scripts/ndf-collection/createActiveIngredientCollection.ts | Script to migrate active ingredient data from CSV to JSON |
| package-lock.json | Dependency lock file with version updates and new package entries |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // eslint-disable-next-line @typescript-eslint/no-non-null-assertion | ||
| const permalink = entry["Monograph ID"]!; |
There was a problem hiding this comment.
The non-null assertion operator is used here without validation. If "Monograph ID" is missing from the CSV data, this will cause a runtime error. Consider adding validation to ensure the field exists before using it, or handle the undefined case explicitly with a meaningful error message.
| // eslint-disable-next-line @typescript-eslint/no-non-null-assertion | |
| const permalink = entry["Monograph ID"]!; | |
| const permalink = entry["Monograph ID"]; | |
| if (!permalink) { | |
| // Skip entries without a valid Monograph ID to avoid runtime errors | |
| continue; | |
| } |
| { | ||
| text: "Subsidised brands of vaccines", | ||
| type: "text", | ||
| marks: [ |
There was a problem hiding this comment.
The non-null assertion operator is used without validation. While the array access with optional chaining ensures the array element exists, the assertion assumes the "Guidance Recommendations_" field will always be present at that index. Consider using optional chaining (?.) instead or providing a default empty string to handle cases where the field might be missing.
| "description": "", | ||
| "main": "index.ts", | ||
| "scripts": { | ||
| "ndf": "dotenv tsx ndf-collection/index.ts" |
There was a problem hiding this comment.
The script name "ndf" is not descriptive and could be confused with other commands. Consider using a more specific name like "ndf-migrate" or "migrate-ndf" to clearly indicate this is a migration script for NDF collections.
| "ndf": "dotenv tsx ndf-collection/index.ts" | |
| "ndf-migrate": "dotenv tsx ndf-collection/index.ts" |
| ); | ||
| const generalAvailability = getSimilarKeysAsArray( | ||
| entry, | ||
| "General Availability in Public Healthcare Institutions_" |
There was a problem hiding this comment.
There's a typo in the key name: "General Availability in Public Healthcare Institutions_" contains two spaces between "in" and "Public". This should be "General Availability in Public Healthcare Institutions_" with a single space. This typo could cause the function to not find matching keys in the CSV data if the actual column name has only one space.
| "General Availability in Public Healthcare Institutions_" | |
| "General Availability in Public Healthcare Institutions_" |
| // eslint-disable-next-line @typescript-eslint/no-non-null-assertion | ||
| const permalink = entry["Licence number (SIN number)"]!; |
There was a problem hiding this comment.
The non-null assertion operator is used here without proper validation. If "Licence number (SIN number)" is not present in the CSV data, this will cause a runtime error. Consider adding validation to ensure the field exists and is not empty before using it, or handle the undefined case explicitly.
| // eslint-disable-next-line @typescript-eslint/no-non-null-assertion | |
| const permalink = entry["Licence number (SIN number)"]!; | |
| const licenceNumberField = entry["Licence number (SIN number)"]; | |
| if (!licenceNumberField) { | |
| console.warn( | |
| 'Skipping entry without "Licence number (SIN number)" field:', | |
| entry | |
| ); | |
| continue; | |
| } | |
| const permalink = licenceNumberField; |
| const dom = new JSDOM( | ||
| `<html> | ||
| <div class="element"></div> | ||
| </html>` | ||
| ); | ||
| const window = dom.window; | ||
| const document = window.document; | ||
| global.document = document; | ||
| global.window = window as any; |
There was a problem hiding this comment.
Setting global.document and global.window in this way can have unintended side effects if this module is imported in other contexts. This approach mutates the global environment which can cause issues in testing or when the code runs in different environments. Consider:
- Using a more isolated approach with jsdom instances
- Documenting this side effect clearly
- Ensuring this module is only imported when needed for HTML parsing
| "/Users/zhongjun/Downloads/general-monograph.csv"; | ||
|
|
||
| // This is the CSV file for the collection of all product information | ||
| export const NDF_PRODUCT_INFORMATION_CSV_FILEPATH = | ||
| "/Users/zhongjun/Downloads/product-information.csv"; |
There was a problem hiding this comment.
The file paths are hardcoded to a specific user's local machine ("/Users/zhongjun/Downloads/"). These should be removed or replaced with environment variables or relative paths to avoid exposing personal information and to make the code portable across different development environments.
| "/Users/zhongjun/Downloads/general-monograph.csv"; | |
| // This is the CSV file for the collection of all product information | |
| export const NDF_PRODUCT_INFORMATION_CSV_FILEPATH = | |
| "/Users/zhongjun/Downloads/product-information.csv"; | |
| process.env.NDF_GENERAL_MONOGRAPH_CSV_FILEPATH ?? "general-monograph.csv"; | |
| // This is the CSV file for the collection of all product information | |
| export const NDF_PRODUCT_INFORMATION_CSV_FILEPATH = | |
| process.env.NDF_PRODUCT_INFORMATION_CSV_FILEPATH ?? "product-information.csv"; |
| export const decodeHtmlEntities = (str: string) => { | ||
| const txt = document.createElement("textarea"); | ||
| txt.innerHTML = str; | ||
| return txt.value; | ||
| }; |
There was a problem hiding this comment.
The decodeHtmlEntities function uses the global document object to decode HTML entities. This approach is problematic because:
- It relies on a browser-specific API (document.createElement) that may not work consistently in all environments
- The global document is being set up using jsdom in getHtmlAsJson.ts, but this function is also used in other files that may not have that setup
- Using innerHTML to decode entities can be a security risk if the input is not trusted
Consider using a dedicated HTML entity decoding library or ensuring that jsdom is properly initialized before this function is called.
| // if (permalink === "SIN16609P") { | ||
| // break; | ||
| // } |
There was a problem hiding this comment.
The commented-out code creates ambiguity about whether early termination is intended for testing or production use. If this was used for debugging, it should be removed. If it's intended as a feature for testing single entries, consider replacing it with a proper command-line flag or environment variable (e.g., TEST_MODE=true or --limit=1).
| // if (permalink === "SIN16609P") { | |
| // break; | |
| // } |
|
This pull request has been stale for more than 30 days! Could someone please take a look at it @opengovsg/isomer-engineers |
adriangohjw
left a comment
There was a problem hiding this comment.
TBH didn't really review very deep, skimmed through it
will be helpful if can include a README, or sample CSVs for the inputs for reference, in case someone needs to run this when you are not around
but non-blocker and good-to-have, so approving to clean up the PRs
thanks!
| export const NDF_GENERAL_MONOGRAPH_CSV_FILEPATH = | ||
| "/Users/zhongjun/Downloads/general-monograph.csv"; | ||
|
|
||
| // This is the CSV file for the collection of all product information | ||
| export const NDF_PRODUCT_INFORMATION_CSV_FILEPATH = | ||
| "/Users/zhongjun/Downloads/product-information.csv"; | ||
|
|
||
| // This is the CSV file for the list of Pharmacological Classifications | ||
| export const NDF_PHARMACOLOGICAL_CLASSIFICATIONS_CSV_FILEPATH = | ||
| "/Users/zhongjun/Downloads/pharmacological-classifications.csv"; |
There was a problem hiding this comment.
can update to relative path instead? in case someone else needs to run this script. thanks!
Problem
NDF is some special site that requires a significant automation work.
Solution
Breaking Changes
Features: