Skip to content

feat: add automated migration scripts for NDF#1725

Open
dcshzj wants to merge 15 commits intomainfrom
feat/automated-migration-scripts
Open

feat: add automated migration scripts for NDF#1725
dcshzj wants to merge 15 commits intomainfrom
feat/automated-migration-scripts

Conversation

@dcshzj
Copy link
Copy Markdown
Contributor

@dcshzj dcshzj commented Nov 13, 2025

Problem

NDF is some special site that requires a significant automation work.

Solution

Breaking Changes

  • Yes - this PR contains breaking changes
  • No - this PR is backwards compatible

Features:

  • Add a new scripts package to handle all recurring automation work.
  • Add script for migration NDF collection.

@dcshzj dcshzj requested a review from a team as a code owner November 13, 2025 08:21
@socket-security
Copy link
Copy Markdown

socket-security bot commented Nov 13, 2025

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Updated@​vitest/​coverage-istanbul@​2.1.2 ⏵ 2.1.99910067 -199100
Updatedmsw-storybook-addon@​2.0.4 ⏵ 2.0.6100 +110069 +190 +4100
Updated@​tiptap/​extension-text@​2.22.3 ⏵ 2.27.1100 +110069 +3100 +1100
Updated@​tiptap/​extension-document@​2.22.3 ⏵ 2.27.1100 +110069 +3100 +1100
Updated@​chakra-ui/​theme-tools@​2.2.7 ⏵ 2.2.9100 +110069 +383 -2100
Updated@​tanstack/​react-query-devtools@​5.85.3 ⏵ 5.90.2100 +110071 +393 -4100
Updatedtailwindcss-react-aria-components@​1.1.4 ⏵ 1.2.0100 +110071 +396 +1100
Updated@​tiptap/​extension-table-row@​2.22.3 ⏵ 2.27.1100 +110071 +3100 +1100
Updated@​tiptap/​extension-dropcursor@​2.22.3 ⏵ 2.27.1100 +110071 +3100 +1100
Updated@​chakra-ui/​utils@​2.2.3 ⏵ 2.2.5100 +110071 +383 -2100
Updated@​tiptap/​extension-gapcursor@​2.22.3 ⏵ 2.27.1100 +110072 +3100 +1100
Updated@​tiptap/​extension-list-item@​2.22.3 ⏵ 2.27.1100 +110072 +3100 +1100
Updated@​tiptap/​extension-paragraph@​2.22.3 ⏵ 2.27.1100 +110072 +3100 +1100
Updated@​tiptap/​extension-table-header@​2.22.3 ⏵ 2.27.1100 +110072 +3100 +1100
Updated@​tiptap/​extension-table-cell@​2.22.3 ⏵ 2.27.1100 +110072 +3100 +1100
Updated@​tiptap/​extension-history@​2.22.3 ⏵ 2.27.1100 +110073 +3100 +1100
Updated@​tiptap/​extension-underline@​2.22.3 ⏵ 2.27.1100 +110073 +3100 +1100
Updated@​tanstack/​react-table@​8.21.2 ⏵ 8.21.3100 +110073 +184100
Updated@​tiptap/​extension-subscript@​2.22.3 ⏵ 2.27.1100 +110073 +3100 +1100
Updated@​tiptap/​extension-superscript@​2.22.3 ⏵ 2.27.1100 +110073 +3100 +1100
Updated@​tiptap/​extension-blockquote@​2.22.3 ⏵ 2.27.1100 +110073 +3100 +1100
Updated@​next/​eslint-plugin-next@​14.2.13 ⏵ 14.2.331001007399100
Updated@​babel/​preset-typescript@​7.27.1 ⏵ 7.28.51001007394 +1100
Updated@​tiptap/​extension-hard-break@​2.22.3 ⏵ 2.27.1100 +110073 +3100 +1100
Updated@​tiptap/​html@​2.25.0 ⏵ 2.27.11001007399 +1100
Updated@​types/​pg@​8.15.5 ⏵ 8.15.6100 +110073 +189 +2100
Updated@​tiptap/​extension-bullet-list@​2.22.3 ⏵ 2.27.1100 +110073 +3100 +1100
Updated@​tiptap/​extension-heading@​2.22.3 ⏵ 2.27.1100 +110073 +3100 +1100
Updated@​tiptap/​extension-strike@​2.22.3 ⏵ 2.27.1100 +110073 +3100 +1100
Updatedisomorphic-dompurify@​2.25.0 ⏵ 2.31.010010074 -294 +4100
Updated@​tiptap/​extension-horizontal-rule@​2.22.3 ⏵ 2.27.1100 +110074 +3100 +1100
Updatedtypescript-eslint@​8.33.1 ⏵ 8.46.410010074 +198 +1100
See 87 more rows in the dashboard

View full report

@adriangohjw
Copy link
Copy Markdown
Contributor

@dcshzj this seems like a one-off script for NDF and no need to be merged so i won't review - do correct me if im wrong!

@dcshzj
Copy link
Copy Markdown
Contributor Author

dcshzj commented Nov 17, 2025

@dcshzj this seems like a one-off script for NDF and no need to be merged so i won't review - do correct me if im wrong!

Hmm conflicted, I created this PR cos we will run this for at least 6 times - July to December 2026, but also a bit pointless to review honestly. Can review if got time! Super not urgent.

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Dec 17, 2025

This pull request has been stale for more than 30 days! Could someone please take a look at it @opengovsg/isomer-engineers

Copilot AI review requested due to automatic review settings January 5, 2026 09:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds automated migration scripts for the National Drug Formulary (NDF), introducing a new scripts package to handle recurring automation work for migrating NDF collections.

Key Changes:

  • Creates a new @isomer/scripts package with TypeScript configuration and dependencies for HTML/CSV processing
  • Implements migration scripts for NDF active ingredient and product information collections
  • Integrates TipTap extensions and jsdom for HTML content transformation

Reviewed changes

Copilot reviewed 8 out of 10 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
tooling/scripts/tsconfig.json TypeScript configuration for the scripts package
tooling/scripts/package.json Package definition with dependencies for HTML/CSV processing
tooling/scripts/ndf-collection/config.ts Configuration file containing CSV file paths
tooling/scripts/ndf-collection/utils.ts Utility functions for CSV parsing, HTML processing, and data transformation
tooling/scripts/ndf-collection/template.ts Template generators for monograph and product information pages
tooling/scripts/ndf-collection/index.ts Entry point with interactive CLI for selecting migration type
tooling/scripts/ndf-collection/getHtmlAsJson.ts HTML to JSON converter using TipTap and jsdom
tooling/scripts/ndf-collection/createProductInformationCollection.ts Script to migrate product information from CSV to JSON
tooling/scripts/ndf-collection/createActiveIngredientCollection.ts Script to migrate active ingredient data from CSV to JSON
package-lock.json Dependency lock file with version updates and new package entries

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +55 to +56
// eslint-disable-next-line @typescript-eslint/no-non-null-assertion
const permalink = entry["Monograph ID"]!;
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-null assertion operator is used here without validation. If "Monograph ID" is missing from the CSV data, this will cause a runtime error. Consider adding validation to ensure the field exists before using it, or handle the undefined case explicitly with a meaningful error message.

Suggested change
// eslint-disable-next-line @typescript-eslint/no-non-null-assertion
const permalink = entry["Monograph ID"]!;
const permalink = entry["Monograph ID"];
if (!permalink) {
// Skip entries without a valid Monograph ID to avoid runtime errors
continue;
}

Copilot uses AI. Check for mistakes.
{
text: "Subsidised brands of vaccines",
type: "text",
marks: [
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-null assertion operator is used without validation. While the array access with optional chaining ensures the array element exists, the assertion assumes the "Guidance Recommendations_" field will always be present at that index. Consider using optional chaining (?.) instead or providing a default empty string to handle cases where the field might be missing.

Copilot uses AI. Check for mistakes.
"description": "",
"main": "index.ts",
"scripts": {
"ndf": "dotenv tsx ndf-collection/index.ts"
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script name "ndf" is not descriptive and could be confused with other commands. Consider using a more specific name like "ndf-migrate" or "migrate-ndf" to clearly indicate this is a migration script for NDF collections.

Suggested change
"ndf": "dotenv tsx ndf-collection/index.ts"
"ndf-migrate": "dotenv tsx ndf-collection/index.ts"

Copilot uses AI. Check for mistakes.
);
const generalAvailability = getSimilarKeysAsArray(
entry,
"General Availability in Public Healthcare Institutions_"
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a typo in the key name: "General Availability in Public Healthcare Institutions_" contains two spaces between "in" and "Public". This should be "General Availability in Public Healthcare Institutions_" with a single space. This typo could cause the function to not find matching keys in the CSV data if the actual column name has only one space.

Suggested change
"General Availability in Public Healthcare Institutions_"
"General Availability in Public Healthcare Institutions_"

Copilot uses AI. Check for mistakes.
Comment on lines +45 to +46
// eslint-disable-next-line @typescript-eslint/no-non-null-assertion
const permalink = entry["Licence number (SIN number)"]!;
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-null assertion operator is used here without proper validation. If "Licence number (SIN number)" is not present in the CSV data, this will cause a runtime error. Consider adding validation to ensure the field exists and is not empty before using it, or handle the undefined case explicitly.

Suggested change
// eslint-disable-next-line @typescript-eslint/no-non-null-assertion
const permalink = entry["Licence number (SIN number)"]!;
const licenceNumberField = entry["Licence number (SIN number)"];
if (!licenceNumberField) {
console.warn(
'Skipping entry without "Licence number (SIN number)" field:',
entry
);
continue;
}
const permalink = licenceNumberField;

Copilot uses AI. Check for mistakes.
Comment on lines +30 to +38
const dom = new JSDOM(
`<html>
<div class="element"></div>
</html>`
);
const window = dom.window;
const document = window.document;
global.document = document;
global.window = window as any;
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting global.document and global.window in this way can have unintended side effects if this module is imported in other contexts. This approach mutates the global environment which can cause issues in testing or when the code runs in different environments. Consider:

  1. Using a more isolated approach with jsdom instances
  2. Documenting this side effect clearly
  3. Ensuring this module is only imported when needed for HTML parsing

Copilot uses AI. Check for mistakes.
Comment on lines +4 to +8
"/Users/zhongjun/Downloads/general-monograph.csv";

// This is the CSV file for the collection of all product information
export const NDF_PRODUCT_INFORMATION_CSV_FILEPATH =
"/Users/zhongjun/Downloads/product-information.csv";
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file paths are hardcoded to a specific user's local machine ("/Users/zhongjun/Downloads/"). These should be removed or replaced with environment variables or relative paths to avoid exposing personal information and to make the code portable across different development environments.

Suggested change
"/Users/zhongjun/Downloads/general-monograph.csv";
// This is the CSV file for the collection of all product information
export const NDF_PRODUCT_INFORMATION_CSV_FILEPATH =
"/Users/zhongjun/Downloads/product-information.csv";
process.env.NDF_GENERAL_MONOGRAPH_CSV_FILEPATH ?? "general-monograph.csv";
// This is the CSV file for the collection of all product information
export const NDF_PRODUCT_INFORMATION_CSV_FILEPATH =
process.env.NDF_PRODUCT_INFORMATION_CSV_FILEPATH ?? "product-information.csv";

Copilot uses AI. Check for mistakes.
Comment on lines +443 to +447
export const decodeHtmlEntities = (str: string) => {
const txt = document.createElement("textarea");
txt.innerHTML = str;
return txt.value;
};
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The decodeHtmlEntities function uses the global document object to decode HTML entities. This approach is problematic because:

  1. It relies on a browser-specific API (document.createElement) that may not work consistently in all environments
  2. The global document is being set up using jsdom in getHtmlAsJson.ts, but this function is also used in other files that may not have that setup
  3. Using innerHTML to decode entities can be a security risk if the input is not trusted

Consider using a dedicated HTML entity decoding library or ensuring that jsdom is properly initialized before this function is called.

Copilot uses AI. Check for mistakes.
Comment on lines +120 to +122
// if (permalink === "SIN16609P") {
// break;
// }
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commented-out code creates ambiguity about whether early termination is intended for testing or production use. If this was used for debugging, it should be removed. If it's intended as a feature for testing single entries, consider replacing it with a proper command-line flag or environment variable (e.g., TEST_MODE=true or --limit=1).

Suggested change
// if (permalink === "SIN16609P") {
// break;
// }

Copilot uses AI. Check for mistakes.
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Feb 28, 2026

This pull request has been stale for more than 30 days! Could someone please take a look at it @opengovsg/isomer-engineers

Copy link
Copy Markdown
Contributor

@adriangohjw adriangohjw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH didn't really review very deep, skimmed through it

will be helpful if can include a README, or sample CSVs for the inputs for reference, in case someone needs to run this when you are not around

but non-blocker and good-to-have, so approving to clean up the PRs

thanks!

Comment on lines +3 to +12
export const NDF_GENERAL_MONOGRAPH_CSV_FILEPATH =
"/Users/zhongjun/Downloads/general-monograph.csv";

// This is the CSV file for the collection of all product information
export const NDF_PRODUCT_INFORMATION_CSV_FILEPATH =
"/Users/zhongjun/Downloads/product-information.csv";

// This is the CSV file for the list of Pharmacological Classifications
export const NDF_PHARMACOLOGICAL_CLASSIFICATIONS_CSV_FILEPATH =
"/Users/zhongjun/Downloads/pharmacological-classifications.csv";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can update to relative path instead? in case someone else needs to run this script. thanks!

@@ -0,0 +1,128 @@
import fs from "fs";
import path from "path";
import Papa from "papaparse";
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants