Skip to content

Conversation

@nvkevlu
Copy link
Collaborator

@nvkevlu nvkevlu commented Aug 11, 2023

Add experiment tracking with MONAI for MetricExchanger.
Makes use of monai with: Project-MONAI/MONAI#6220
Also #1566

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Quick tests passed locally by running ./runtest.sh.
  • In-line docstrings updated.
  • Documentation updated.

@nvkevlu nvkevlu requested a review from YuanTingHsieh August 11, 2023 17:38
@nvkevlu
Copy link
Collaborator Author

nvkevlu commented Aug 11, 2023

/build

Copy link
Collaborator

@YuanTingHsieh YuanTingHsieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic LGTM.

Suggest reduce duplicate codes as much as possible.

Copy link
Collaborator

@chesterxgchen chesterxgchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AnalyticsSender part needs to be discussed

@nvkevlu
Copy link
Collaborator Author

nvkevlu commented Aug 15, 2023

/build

@nvkevlu
Copy link
Collaborator Author

nvkevlu commented Aug 15, 2023

/build

@nvkevlu nvkevlu requested a review from YuanTingHsieh August 15, 2023 23:07
@nvkevlu
Copy link
Collaborator Author

nvkevlu commented Aug 16, 2023

/build

@nvkevlu nvkevlu requested a review from YuanTingHsieh August 16, 2023 16:55
Copy link
Collaborator

@YuanTingHsieh YuanTingHsieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM

@nvkevlu
Copy link
Collaborator Author

nvkevlu commented Aug 16, 2023

/build

@nvkevlu
Copy link
Collaborator Author

nvkevlu commented Aug 17, 2023

/build

@nvkevlu nvkevlu enabled auto-merge (squash) August 17, 2023 01:01
@nvkevlu nvkevlu merged commit 3cfb12a into NVIDIA:main Aug 17, 2023
holgerroth pushed a commit to nanaHa1003/NVFlare that referenced this pull request Sep 6, 2023
Remove redundant files and update running script.

Add license header for research/condist_fl.

Wrap training scripts inside main method (NVIDIA#1939)

Fixed the recursive FLComponents creation. (NVIDIA#1934)

* Fixed the resursive FLComponents creation.

* Remove the temp_fl_ctx change.

* Removed no used import.

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>

Rename Cell to CoreCell (cell.py -> core_cell.py)
Rename NewCell to Cell (new_cell.py -> cell.py)

Remove comment and unused codes

Add experiment tracking with MONAI for MetricExchanger (NVIDIA#1925)

* add `stats_sender_id` in `ClientAlgoExecutor`

Signed-off-by: KumoLiu <yunl@nvidia.com>

* add `NVFlareStatsHandler`

Signed-off-by: KumoLiu <yunl@nvidia.com>

* add experiment tracking with MONAI for MetricExchanger

* fix ci

* remove log_writer_metrics_exchanger.py which was not supposed to be there

* make changes after discussion about PR

* fix ci

* make fixes from PR comments

* make fixes from PR comments

* make fixes from PR comments

---------

Signed-off-by: KumoLiu <yunl@nvidia.com>
Co-authored-by: KumoLiu <yunl@nvidia.com>

Re-add cli persistent history (NVIDIA#1938)

* re-add cli persistent history

* change history file

* change to pathlib

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>

Job CLI:  create Job,  submit Job, list_templates, show_variables (NVIDIA#1888)

* add nvflare job command
update the job config setup

* fix meta.json error

* fix few bugs

* change with the new format

* Working in Progress

* move import to the top

* ALl things worked, Stil need cleanup and unit tests

* add missing transfer type

* restore

* restore

* restore

* restore

* 1. move some code to cli_utils.py
2. add CROSS validation workflow
3. avoid duplicated components, and empty components and empty executors
4. add nvflare config

* restore the change

* restore the change

* restore

* add pyhocon as required dependency

* restore setup dev version ( separate PR will do this part)

* reduce number of files

* restore

* add unit test

* 1. ConfigTreeEx
2. add unit tests

* Now design WORKING in PROGRESS

* add nvflare job show_workflows

* update create job

* WIP

* working in progress

* WIP

* working in progress

* working in progress

* working in progress

* add variable values

* working in progress

* working in progress

* rebase

* remove download

* fix 1 unit test

* CLI complete ( todo need to remove simulator related changes after another PR is merged)

* add debug on ci/unit test failure ( only on jenkins)

* temp remove a unit test

* restore

* rebase

* make pyhocon required dependency

* remove un-used files

* remove un-used files

* 1. remove un-used files
2. show_variables support all alternative formats
3. replace hard-coded names with constant variables

* Fix a refactoring introduced bug

* Fix a refactoring introduced bug

* style formats

* update client scripts

* check python versions update cross-validation workflows to use numpy

* add class arguments to the list, still have a bug

* 1. restructure the indexer, introduce keyIndex data structure.
2. merge is not refactored
2. unit-tests are not working yet.

* redesign the indexer. the code worked. still need to fix unit tests

* fix the unit-tests

* update

* clean up

* style format

* tweak

* rename the job from sag_cross_pt to sag_cross_np

POC Upgrade 2 (NVIDIA#1944)

* save startup kit location
refactoring POC
format and dependency
change the logic of get poc workspace

* rebased main

Helper and manager

Working but with gpu resource exception

Remove cc.token from resource spec to avoid confusing resource manager

Improve private function names

Use command check with fl context

Add document and remove unused codes

Reword and improve control flow

Rewrite double quote to heredocs to avoid bash/zsh issues

Update template

Fix controller unit test timing (NVIDIA#1937)

make sure the Job CLI support multi-config formats (NVIDIA#1946)

* make sure the code support multi-config formats

* make sure the code support multi-config formats

* remove debug

* style format

add workspace to config command (NVIDIA#1948)

update POC tutorials (NVIDIA#1949)

* update POC tutorials

* remove "--" in few more places

update docs after change in nvflare poc command [skip ci] (NVIDIA#1945)

* update docs after change in nvflare poc command

* remove unintended files

* add note in docs

* add to POC config info and some small fixes

* fix ci

* add note

Add Sean to build command (NVIDIA#1950)

Vertical XGBoost with PSI integration (NVIDIA#1922)

* vertical xgboost with psi integration

* formatting

* simplifying user exp

* improvements, changes to use hist executor

* minor improvements

* remove unused func

* separate psi into another job

* remove job scripts, improve data scripts

* generalize app

* add explanation for site-1 label owner

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>
Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com>

Allow metric negation in model selection (NVIDIA#1951)

* tensorboard logging and metric negation in model selection

* update license

* update license

* update license to 2023

* revert license header

* remove tb logging

add deprecation commands (NVIDIA#1952)

* add deprecation commands

* updaste style

---------

Co-authored-by: nvkevlu <55759229+nvkevlu@users.noreply.github.com>

Add CellCipher for secure message encryption/decryption
Add SessionKeyManager to handle key exchange and management

Fix fl model utils (NVIDIA#1902)

* Use explicit argument name instead of kwargs

* Address comments

SFM Heartbeat Support (NVIDIA#1942)

* Removed WAIT_UNTIL from Cellnet

* Added heartbeat support to all drivers

* Revert grpc keepalive to 2 Min

* Renamed capability HEARTBEAT to SEND_HEART

Enhance ML2FL API (NVIDIA#1953)

Add example figures to README.md and fix issues regarding to the PR comments.

Fix research/condist-fl license headers and update README.

Update README.md

Fix markdown syntax error in README.md

Update README.md

Update README.md

Add captions to figures.

Update README.md

Remove fobs calls (NVIDIA#1960)

* Removed the extra fobs.dumps() calls.

* removed more fobs.dump().

* Removed more fobs.dumps().

* Removed additional Fobs.dumps().

* Removed more Fobs.dumps() calls.

* Removed no use import.

Removed the not used import in cell.py (temporary) (NVIDIA#1961)

Improved error handling and fixed memory leak (NVIDIA#1921)

* Added more error handling and fixed the memory leak

* Ignore late ACKs

* Check for no payload scenario

* Addressed the PR comments, added lock, moved pop to top

---------

Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com>

Fix unit test and integration tests (NVIDIA#1962)

* Fix f3 communicator unit test

* Update dxo meta with FLModel meta

* Fix fl model util

Client controller (NVIDIA#1913)

* initial cut.

* WIP:

* WIP:

* added filters and task for client controller.

* Working version.

* Fixed the client_sag broadcast_tasks.

* Refacftored.

* Added error handling.

* WIP: client_controller change.

* Fixed the client controller _call_task_cb().

* Extracted the apply_data_filters() and apply_result_filters().

* refactored.

* Adjust the task result cb logic.

* Added server as the client_controller target.

* Added the client controller based cyclic example.

* codestyle changes.:

* codestyle changes for example.

* Removed no use import.

* Addressed the PR review feedbacks.

* Removed the cyclic example.

* Added direction support for the filters.

* minor fix.

* added target validation.

* optimized the task_utils.

* added direction control for Scope filters.

* Moved the constants to FilterKey.

* codestyle fix.

* license header year change.

* refactoried.

* further extract the common functions for task_utils.

* passed in the proper Scope field name.

* renamed a variable.

* Changed to use hard coded field name in the Scope.

---------

Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com>
Co-authored-by: Yan Cheng <58191769+yanchengnv@users.noreply.github.com>
Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>

Update POC tutorials and fix POC bugs (NVIDIA#1958)

* Update POC tutorials

* format style

* format style

* typos

* typos

* typos

* typos

* typos

* typos

* typos

* typos

* fixing typos

* rename method

* update wordings

* update wordings

* update wordings

* update wordings

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>

Add Job CLI Tutorials and step-by-step initial examples  (NVIDIA#1957)

* update job template and tutorials (WIP)

update POC tutorials: WIP

update POC tutorials: WIP

add tutorial for Job CLI

style formats

style formats

style formats

wording

wording

wording

update the tutorials

format style

* wording

* fix unit tests

* fix unit tests

* format

* fix timeout issue

* fix timeout issue

* fix timeout issue

* fix style and import related changes

* typos

* fixing typos

* fixing typos

* refactory main methods

* bug fixes

* update readme.md

Add more results in the README and fix some minor issues.

Refactor format_log_message with more readability (NVIDIA#1965)

Remove some POC stop message (NVIDIA#1966)

* 1. remove some message on nvflare poc stop
2. clean up the job CLI tutorial wordings

* remove output

* format

Add experiment tracking docs (NVIDIA#1963)

* add experiment tracking docs

* add missed docs

* remove paragraph

* make edits based on PR comments

* make consistent names of functions and variables with plural of metric

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>

Add SimpleCellCipher to remove session key manager
Refactor common functions to serve both designs

Fix KiTS19 URL in README.md

Change dict key in the checkpoints.

Rename 'extract_tensor' function to 'array_to_list'.

Improve CLI command error handling (NVIDIA#1971)

* improve CLI command error handling

* improve CLI command error handling

* formats

polish notebook for Job CLI (NVIDIA#1975)

update readme
holgerroth added a commit that referenced this pull request Sep 6, 2023
…stillation for Federated Learning from Partially Annotated Data" [skip ci] (#1940)

* Add implementation to ConDistFL research folder.

Remove redundant files and update running script.

Add license header for research/condist_fl.

Wrap training scripts inside main method (#1939)

Fixed the recursive FLComponents creation. (#1934)

* Fixed the resursive FLComponents creation.

* Remove the temp_fl_ctx change.

* Removed no used import.

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>

Rename Cell to CoreCell (cell.py -> core_cell.py)
Rename NewCell to Cell (new_cell.py -> cell.py)

Remove comment and unused codes

Add experiment tracking with MONAI for MetricExchanger (#1925)

* add `stats_sender_id` in `ClientAlgoExecutor`

Signed-off-by: KumoLiu <yunl@nvidia.com>

* add `NVFlareStatsHandler`

Signed-off-by: KumoLiu <yunl@nvidia.com>

* add experiment tracking with MONAI for MetricExchanger

* fix ci

* remove log_writer_metrics_exchanger.py which was not supposed to be there

* make changes after discussion about PR

* fix ci

* make fixes from PR comments

* make fixes from PR comments

* make fixes from PR comments

---------

Signed-off-by: KumoLiu <yunl@nvidia.com>
Co-authored-by: KumoLiu <yunl@nvidia.com>

Re-add cli persistent history (#1938)

* re-add cli persistent history

* change history file

* change to pathlib

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>

Job CLI:  create Job,  submit Job, list_templates, show_variables (#1888)

* add nvflare job command
update the job config setup

* fix meta.json error

* fix few bugs

* change with the new format

* Working in Progress

* move import to the top

* ALl things worked, Stil need cleanup and unit tests

* add missing transfer type

* restore

* restore

* restore

* restore

* 1. move some code to cli_utils.py
2. add CROSS validation workflow
3. avoid duplicated components, and empty components and empty executors
4. add nvflare config

* restore the change

* restore the change

* restore

* add pyhocon as required dependency

* restore setup dev version ( separate PR will do this part)

* reduce number of files

* restore

* add unit test

* 1. ConfigTreeEx
2. add unit tests

* Now design WORKING in PROGRESS

* add nvflare job show_workflows

* update create job

* WIP

* working in progress

* WIP

* working in progress

* working in progress

* working in progress

* add variable values

* working in progress

* working in progress

* rebase

* remove download

* fix 1 unit test

* CLI complete ( todo need to remove simulator related changes after another PR is merged)

* add debug on ci/unit test failure ( only on jenkins)

* temp remove a unit test

* restore

* rebase

* make pyhocon required dependency

* remove un-used files

* remove un-used files

* 1. remove un-used files
2. show_variables support all alternative formats
3. replace hard-coded names with constant variables

* Fix a refactoring introduced bug

* Fix a refactoring introduced bug

* style formats

* update client scripts

* check python versions update cross-validation workflows to use numpy

* add class arguments to the list, still have a bug

* 1. restructure the indexer, introduce keyIndex data structure.
2. merge is not refactored
2. unit-tests are not working yet.

* redesign the indexer. the code worked. still need to fix unit tests

* fix the unit-tests

* update

* clean up

* style format

* tweak

* rename the job from sag_cross_pt to sag_cross_np

POC Upgrade 2 (#1944)

* save startup kit location
refactoring POC
format and dependency
change the logic of get poc workspace

* rebased main

Helper and manager

Working but with gpu resource exception

Remove cc.token from resource spec to avoid confusing resource manager

Improve private function names

Use command check with fl context

Add document and remove unused codes

Reword and improve control flow

Rewrite double quote to heredocs to avoid bash/zsh issues

Update template

Fix controller unit test timing (#1937)

make sure the Job CLI support multi-config formats (#1946)

* make sure the code support multi-config formats

* make sure the code support multi-config formats

* remove debug

* style format

add workspace to config command (#1948)

update POC tutorials (#1949)

* update POC tutorials

* remove "--" in few more places

update docs after change in nvflare poc command [skip ci] (#1945)

* update docs after change in nvflare poc command

* remove unintended files

* add note in docs

* add to POC config info and some small fixes

* fix ci

* add note

Add Sean to build command (#1950)

Vertical XGBoost with PSI integration (#1922)

* vertical xgboost with psi integration

* formatting

* simplifying user exp

* improvements, changes to use hist executor

* minor improvements

* remove unused func

* separate psi into another job

* remove job scripts, improve data scripts

* generalize app

* add explanation for site-1 label owner

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>
Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com>

Allow metric negation in model selection (#1951)

* tensorboard logging and metric negation in model selection

* update license

* update license

* update license to 2023

* revert license header

* remove tb logging

add deprecation commands (#1952)

* add deprecation commands

* updaste style

---------

Co-authored-by: nvkevlu <55759229+nvkevlu@users.noreply.github.com>

Add CellCipher for secure message encryption/decryption
Add SessionKeyManager to handle key exchange and management

Fix fl model utils (#1902)

* Use explicit argument name instead of kwargs

* Address comments

SFM Heartbeat Support (#1942)

* Removed WAIT_UNTIL from Cellnet

* Added heartbeat support to all drivers

* Revert grpc keepalive to 2 Min

* Renamed capability HEARTBEAT to SEND_HEART

Enhance ML2FL API (#1953)

Add example figures to README.md and fix issues regarding to the PR comments.

Fix research/condist-fl license headers and update README.

Update README.md

Fix markdown syntax error in README.md

Update README.md

Update README.md

Add captions to figures.

Update README.md

Remove fobs calls (#1960)

* Removed the extra fobs.dumps() calls.

* removed more fobs.dump().

* Removed more fobs.dumps().

* Removed additional Fobs.dumps().

* Removed more Fobs.dumps() calls.

* Removed no use import.

Removed the not used import in cell.py (temporary) (#1961)

Improved error handling and fixed memory leak (#1921)

* Added more error handling and fixed the memory leak

* Ignore late ACKs

* Check for no payload scenario

* Addressed the PR comments, added lock, moved pop to top

---------

Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com>

Fix unit test and integration tests (#1962)

* Fix f3 communicator unit test

* Update dxo meta with FLModel meta

* Fix fl model util

Client controller (#1913)

* initial cut.

* WIP:

* WIP:

* added filters and task for client controller.

* Working version.

* Fixed the client_sag broadcast_tasks.

* Refacftored.

* Added error handling.

* WIP: client_controller change.

* Fixed the client controller _call_task_cb().

* Extracted the apply_data_filters() and apply_result_filters().

* refactored.

* Adjust the task result cb logic.

* Added server as the client_controller target.

* Added the client controller based cyclic example.

* codestyle changes.:

* codestyle changes for example.

* Removed no use import.

* Addressed the PR review feedbacks.

* Removed the cyclic example.

* Added direction support for the filters.

* minor fix.

* added target validation.

* optimized the task_utils.

* added direction control for Scope filters.

* Moved the constants to FilterKey.

* codestyle fix.

* license header year change.

* refactoried.

* further extract the common functions for task_utils.

* passed in the proper Scope field name.

* renamed a variable.

* Changed to use hard coded field name in the Scope.

---------

Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com>
Co-authored-by: Yan Cheng <58191769+yanchengnv@users.noreply.github.com>
Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>

Update POC tutorials and fix POC bugs (#1958)

* Update POC tutorials

* format style

* format style

* typos

* typos

* typos

* typos

* typos

* typos

* typos

* typos

* fixing typos

* rename method

* update wordings

* update wordings

* update wordings

* update wordings

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>

Add Job CLI Tutorials and step-by-step initial examples  (#1957)

* update job template and tutorials (WIP)

update POC tutorials: WIP

update POC tutorials: WIP

add tutorial for Job CLI

style formats

style formats

style formats

wording

wording

wording

update the tutorials

format style

* wording

* fix unit tests

* fix unit tests

* format

* fix timeout issue

* fix timeout issue

* fix timeout issue

* fix style and import related changes

* typos

* fixing typos

* fixing typos

* refactory main methods

* bug fixes

* update readme.md

Add more results in the README and fix some minor issues.

Refactor format_log_message with more readability (#1965)

Remove some POC stop message (#1966)

* 1. remove some message on nvflare poc stop
2. clean up the job CLI tutorial wordings

* remove output

* format

Add experiment tracking docs (#1963)

* add experiment tracking docs

* add missed docs

* remove paragraph

* make edits based on PR comments

* make consistent names of functions and variables with plural of metric

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>

Add SimpleCellCipher to remove session key manager
Refactor common functions to serve both designs

Fix KiTS19 URL in README.md

Change dict key in the checkpoints.

Rename 'extract_tensor' function to 'array_to_list'.

Improve CLI command error handling (#1971)

* improve CLI command error handling

* improve CLI command error handling

* formats

polish notebook for Job CLI (#1975)

update readme

* remove old file

* formatting

---------

Co-authored-by: Holger Roth <hroth@nvidia.com>
wyli pushed a commit to Project-MONAI/MONAI that referenced this pull request Sep 13, 2023
PR #6220 was closed and NVFlareStatsHandler has now been implemented in
NVFlare in NVIDIA/NVFlare#1925. However, there
is still the piece in MonaiAlgo to attach the stats_sender in
initialize, so this PR adds that missing piece.

### Types of changes
<!--- Put an `x` in all the boxes that apply, and remove the not
applicable items -->
- [x] Non-breaking change (fix or new feature that would not break
existing functionality).
- [ ] Breaking change (fix or new feature that would cause existing
functionality to change).
- [ ] New tests added to cover the changes.
- [ ] Integration tests passed locally by running `./runtests.sh -f -u
--net --coverage`.
- [ ] Quick tests passed locally by running `./runtests.sh --quick
--unittests --disttests`.
- [ ] In-line docstrings updated.
- [ ] Documentation updated, tested `make html` command in the `docs/`
folder.

---------

Signed-off-by: Kevin <kevlu@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
holgerroth pushed a commit to holgerroth/NVFlare that referenced this pull request Dec 4, 2023
* add `stats_sender_id` in `ClientAlgoExecutor`

Signed-off-by: KumoLiu <yunl@nvidia.com>

* add `NVFlareStatsHandler`

Signed-off-by: KumoLiu <yunl@nvidia.com>

* add experiment tracking with MONAI for MetricExchanger

* fix ci

* remove log_writer_metrics_exchanger.py which was not supposed to be there

* make changes after discussion about PR

* fix ci

* make fixes from PR comments

* make fixes from PR comments

* make fixes from PR comments

---------

Signed-off-by: KumoLiu <yunl@nvidia.com>
Co-authored-by: KumoLiu <yunl@nvidia.com>
holgerroth added a commit to holgerroth/NVFlare that referenced this pull request Dec 4, 2023
…stillation for Federated Learning from Partially Annotated Data" [skip ci] (NVIDIA#1940)

* Add implementation to ConDistFL research folder.

Remove redundant files and update running script.

Add license header for research/condist_fl.

Wrap training scripts inside main method (NVIDIA#1939)

Fixed the recursive FLComponents creation. (NVIDIA#1934)

* Fixed the resursive FLComponents creation.

* Remove the temp_fl_ctx change.

* Removed no used import.

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>

Rename Cell to CoreCell (cell.py -> core_cell.py)
Rename NewCell to Cell (new_cell.py -> cell.py)

Remove comment and unused codes

Add experiment tracking with MONAI for MetricExchanger (NVIDIA#1925)

* add `stats_sender_id` in `ClientAlgoExecutor`

Signed-off-by: KumoLiu <yunl@nvidia.com>

* add `NVFlareStatsHandler`

Signed-off-by: KumoLiu <yunl@nvidia.com>

* add experiment tracking with MONAI for MetricExchanger

* fix ci

* remove log_writer_metrics_exchanger.py which was not supposed to be there

* make changes after discussion about PR

* fix ci

* make fixes from PR comments

* make fixes from PR comments

* make fixes from PR comments

---------

Signed-off-by: KumoLiu <yunl@nvidia.com>
Co-authored-by: KumoLiu <yunl@nvidia.com>

Re-add cli persistent history (NVIDIA#1938)

* re-add cli persistent history

* change history file

* change to pathlib

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>

Job CLI:  create Job,  submit Job, list_templates, show_variables (NVIDIA#1888)

* add nvflare job command
update the job config setup

* fix meta.json error

* fix few bugs

* change with the new format

* Working in Progress

* move import to the top

* ALl things worked, Stil need cleanup and unit tests

* add missing transfer type

* restore

* restore

* restore

* restore

* 1. move some code to cli_utils.py
2. add CROSS validation workflow
3. avoid duplicated components, and empty components and empty executors
4. add nvflare config

* restore the change

* restore the change

* restore

* add pyhocon as required dependency

* restore setup dev version ( separate PR will do this part)

* reduce number of files

* restore

* add unit test

* 1. ConfigTreeEx
2. add unit tests

* Now design WORKING in PROGRESS

* add nvflare job show_workflows

* update create job

* WIP

* working in progress

* WIP

* working in progress

* working in progress

* working in progress

* add variable values

* working in progress

* working in progress

* rebase

* remove download

* fix 1 unit test

* CLI complete ( todo need to remove simulator related changes after another PR is merged)

* add debug on ci/unit test failure ( only on jenkins)

* temp remove a unit test

* restore

* rebase

* make pyhocon required dependency

* remove un-used files

* remove un-used files

* 1. remove un-used files
2. show_variables support all alternative formats
3. replace hard-coded names with constant variables

* Fix a refactoring introduced bug

* Fix a refactoring introduced bug

* style formats

* update client scripts

* check python versions update cross-validation workflows to use numpy

* add class arguments to the list, still have a bug

* 1. restructure the indexer, introduce keyIndex data structure.
2. merge is not refactored
2. unit-tests are not working yet.

* redesign the indexer. the code worked. still need to fix unit tests

* fix the unit-tests

* update

* clean up

* style format

* tweak

* rename the job from sag_cross_pt to sag_cross_np

POC Upgrade 2 (NVIDIA#1944)

* save startup kit location
refactoring POC
format and dependency
change the logic of get poc workspace

* rebased main

Helper and manager

Working but with gpu resource exception

Remove cc.token from resource spec to avoid confusing resource manager

Improve private function names

Use command check with fl context

Add document and remove unused codes

Reword and improve control flow

Rewrite double quote to heredocs to avoid bash/zsh issues

Update template

Fix controller unit test timing (NVIDIA#1937)

make sure the Job CLI support multi-config formats (NVIDIA#1946)

* make sure the code support multi-config formats

* make sure the code support multi-config formats

* remove debug

* style format

add workspace to config command (NVIDIA#1948)

update POC tutorials (NVIDIA#1949)

* update POC tutorials

* remove "--" in few more places

update docs after change in nvflare poc command [skip ci] (NVIDIA#1945)

* update docs after change in nvflare poc command

* remove unintended files

* add note in docs

* add to POC config info and some small fixes

* fix ci

* add note

Add Sean to build command (NVIDIA#1950)

Vertical XGBoost with PSI integration (NVIDIA#1922)

* vertical xgboost with psi integration

* formatting

* simplifying user exp

* improvements, changes to use hist executor

* minor improvements

* remove unused func

* separate psi into another job

* remove job scripts, improve data scripts

* generalize app

* add explanation for site-1 label owner

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>
Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com>

Allow metric negation in model selection (NVIDIA#1951)

* tensorboard logging and metric negation in model selection

* update license

* update license

* update license to 2023

* revert license header

* remove tb logging

add deprecation commands (NVIDIA#1952)

* add deprecation commands

* updaste style

---------

Co-authored-by: nvkevlu <55759229+nvkevlu@users.noreply.github.com>

Add CellCipher for secure message encryption/decryption
Add SessionKeyManager to handle key exchange and management

Fix fl model utils (NVIDIA#1902)

* Use explicit argument name instead of kwargs

* Address comments

SFM Heartbeat Support (NVIDIA#1942)

* Removed WAIT_UNTIL from Cellnet

* Added heartbeat support to all drivers

* Revert grpc keepalive to 2 Min

* Renamed capability HEARTBEAT to SEND_HEART

Enhance ML2FL API (NVIDIA#1953)

Add example figures to README.md and fix issues regarding to the PR comments.

Fix research/condist-fl license headers and update README.

Update README.md

Fix markdown syntax error in README.md

Update README.md

Update README.md

Add captions to figures.

Update README.md

Remove fobs calls (NVIDIA#1960)

* Removed the extra fobs.dumps() calls.

* removed more fobs.dump().

* Removed more fobs.dumps().

* Removed additional Fobs.dumps().

* Removed more Fobs.dumps() calls.

* Removed no use import.

Removed the not used import in cell.py (temporary) (NVIDIA#1961)

Improved error handling and fixed memory leak (NVIDIA#1921)

* Added more error handling and fixed the memory leak

* Ignore late ACKs

* Check for no payload scenario

* Addressed the PR comments, added lock, moved pop to top

---------

Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com>

Fix unit test and integration tests (NVIDIA#1962)

* Fix f3 communicator unit test

* Update dxo meta with FLModel meta

* Fix fl model util

Client controller (NVIDIA#1913)

* initial cut.

* WIP:

* WIP:

* added filters and task for client controller.

* Working version.

* Fixed the client_sag broadcast_tasks.

* Refacftored.

* Added error handling.

* WIP: client_controller change.

* Fixed the client controller _call_task_cb().

* Extracted the apply_data_filters() and apply_result_filters().

* refactored.

* Adjust the task result cb logic.

* Added server as the client_controller target.

* Added the client controller based cyclic example.

* codestyle changes.:

* codestyle changes for example.

* Removed no use import.

* Addressed the PR review feedbacks.

* Removed the cyclic example.

* Added direction support for the filters.

* minor fix.

* added target validation.

* optimized the task_utils.

* added direction control for Scope filters.

* Moved the constants to FilterKey.

* codestyle fix.

* license header year change.

* refactoried.

* further extract the common functions for task_utils.

* passed in the proper Scope field name.

* renamed a variable.

* Changed to use hard coded field name in the Scope.

---------

Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com>
Co-authored-by: Yan Cheng <58191769+yanchengnv@users.noreply.github.com>
Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>

Update POC tutorials and fix POC bugs (NVIDIA#1958)

* Update POC tutorials

* format style

* format style

* typos

* typos

* typos

* typos

* typos

* typos

* typos

* typos

* fixing typos

* rename method

* update wordings

* update wordings

* update wordings

* update wordings

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>

Add Job CLI Tutorials and step-by-step initial examples  (NVIDIA#1957)

* update job template and tutorials (WIP)

update POC tutorials: WIP

update POC tutorials: WIP

add tutorial for Job CLI

style formats

style formats

style formats

wording

wording

wording

update the tutorials

format style

* wording

* fix unit tests

* fix unit tests

* format

* fix timeout issue

* fix timeout issue

* fix timeout issue

* fix style and import related changes

* typos

* fixing typos

* fixing typos

* refactory main methods

* bug fixes

* update readme.md

Add more results in the README and fix some minor issues.

Refactor format_log_message with more readability (NVIDIA#1965)

Remove some POC stop message (NVIDIA#1966)

* 1. remove some message on nvflare poc stop
2. clean up the job CLI tutorial wordings

* remove output

* format

Add experiment tracking docs (NVIDIA#1963)

* add experiment tracking docs

* add missed docs

* remove paragraph

* make edits based on PR comments

* make consistent names of functions and variables with plural of metric

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>

Add SimpleCellCipher to remove session key manager
Refactor common functions to serve both designs

Fix KiTS19 URL in README.md

Change dict key in the checkpoints.

Rename 'extract_tensor' function to 'array_to_list'.

Improve CLI command error handling (NVIDIA#1971)

* improve CLI command error handling

* improve CLI command error handling

* formats

polish notebook for Job CLI (NVIDIA#1975)

update readme

* remove old file

* formatting

---------

Co-authored-by: Holger Roth <hroth@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants