Skip to content

feat(dataset): add final smishing rewrites, documentation, and report#46

Open
s223737886 wants to merge 3 commits intoHardhat-Enterprises:devfrom
s223737886:sms-rewriting/kalpna
Open

feat(dataset): add final smishing rewrites, documentation, and report#46
s223737886 wants to merge 3 commits intoHardhat-Enterprises:devfrom
s223737886:sms-rewriting/kalpna

Conversation

@s223737886
Copy link
Copy Markdown

@s223737886 s223737886 commented May 15, 2025

Summary

This pull request delivers the finalized dataset and supporting documentation for the Smishing Detection backend project, specifically for the Microsoft Planner task titled: Smishing Message Rewriting for Training and Smishing-report and also it contains a report named Smishing-report that explores on how the working of smishing-attack and why they're effective

What’s Included

  • ✅ Final processed dataset located at: machine-learning/datasets/Dataset.csv

    • 800 messages: original + rewritten smishing variants
    • Linked metadata fields: source, intent_type, malicious, threat_level, linked_to, etc.
  • ✅ Dataset documentation under: machine-learning/projects/DatasetDocumentation

    • dataset_schema.md
    • rewriting_strategy.md
    • smishing_taxonomy.md
    • traceability_mapping.md
    • preprocessing_guidelines.md
  • Report under: 'machine-learning/projects/Reports/Smishing_Report.docx

  • although the report is quite different from the above work but it delves into the working of smishing-attack and why they are so effective

Conventions Followed

  • Branch: sms-rewriting/kalpna (named per contribution guideline format)
  • Commit message format: follows Conventional Commits (feat, chore, etc.)
  • Pull request targets: dev branch (not main)
  • DatasetCombined.csv was removed as part of cleanup

Notes

  • GitHub may not allow automatic merging due to upstream changes — please feel free to resolve conflicts manually if required.
  • This contribution is scoped only to the dataset and does not include model code or frontend tasks.

Planner Task

This PR corresponds to the Microsoft Planner task: Smishing Message Rewriting for Training and Smishing-report
Dataset.csv
dataset_schema.md
preprocessing_guidelines.md
README.md
rewriting_strategy.md
smishing_taxonomy.md
traceability_mapping.md

Copy link
Copy Markdown
Member

@dec1belPP dec1belPP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @s223737886, there are some changes required before we can review your PR:

  • Your task's scope is to improve the dataset so you don't need to be having any changes done to any existing JavaScript or other Python files.
  • Rename the old dataset or keep it as it is instead of deleting it.
  • Resolve any conflicts manually before putting in your PR.

Please note that your PR will not be reviewed till all of these changes are made. Thank you.

@dec1belPP dec1belPP added the need changes This pull request needs changes before it can be merged. label May 15, 2025
@s223737886
Copy link
Copy Markdown
Author

I have made the required changes and committed the repo again and moreover I have included the report of my pull request 47 into the latest commit changes as it was closed. The report is named Smishing-report.

@dec1belPP
Copy link
Copy Markdown
Member

dec1belPP commented May 23, 2025

I have made the required changes and committed the repo again and moreover I have included the report of my pull request 47 into the latest commit changes as it was closed. The report is named Smishing-report.

Hey @s223737886, this PR is still not at an acceptable standard for review. To reiterate, please:

  • You have commited the enitre repo back again. Please sync your local fork and only commit only the files changed/added related to your feature.
  • Your task's scope is to improve the dataset so you don't need to be having any changes done to any existing JavaScript or other Python files.
  • Do not delete the old dataset. Please leave it as it is.
  • Resolve any conflicts locally before putting in your PR.

Please note that your PR will not be reviewed till all of these changes are made. Thank you.

@dec1belPP dec1belPP changed the title feat(dataset): add final smishing rewrites and documentation feat(dataset): improve smishing dataset May 23, 2025
@s223737886 s223737886 force-pushed the sms-rewriting/kalpna branch from 471e726 to 4ad0162 Compare May 23, 2025 07:28
@s223737886 s223737886 changed the title feat(dataset): improve smishing dataset feat(dataset): add final smishing rewrites, documentation, and report May 23, 2025
@s223737886
Copy link
Copy Markdown
Author

Thanks for the feedback Pasindu

I've now cleaned the branch and made the following updates based on your instructions:

Retained the original DatasetCombined.csv without any changes.

Added a new file Dataset.csv with the rewritten smishing messages.

Included only relevant changes related to the dataset: documentation (DatasetDocumentation) and the report (Smishing_Report.docx).

Removed all unrelated JavaScript or Python file changes from the PR.

Verified the branch is up-to-date with origin/dev.

Let me know if any other changes are needed. Thank you!

@dec1belPP dec1belPP removed the need changes This pull request needs changes before it can be merged. label May 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants