-
Notifications
You must be signed in to change notification settings - Fork 231
Add experiment tracking with MONAI for MetricExchanger #1925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: KumoLiu <yunl@nvidia.com>
Signed-off-by: KumoLiu <yunl@nvidia.com>
|
/build |
nvflare/app_opt/tracking/wandb/wandb_writer_metrics_exchanger.py
Outdated
Show resolved
Hide resolved
nvflare/app_opt/tracking/mlflow/mlflow_writer_metrics_exchanger.py
Outdated
Show resolved
Hide resolved
...en_ct_segmentation_local/jobs/spleen_ct_segmentation_local/app/config/config_fed_client.json
Show resolved
Hide resolved
YuanTingHsieh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic LGTM.
Suggest reduce duplicate codes as much as possible.
chesterxgchen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AnalyticsSender part needs to be discussed
|
/build |
|
/build |
|
/build |
YuanTingHsieh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM
|
/build |
|
/build |
Remove redundant files and update running script. Add license header for research/condist_fl. Wrap training scripts inside main method (NVIDIA#1939) Fixed the recursive FLComponents creation. (NVIDIA#1934) * Fixed the resursive FLComponents creation. * Remove the temp_fl_ctx change. * Removed no used import. --------- Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com> Rename Cell to CoreCell (cell.py -> core_cell.py) Rename NewCell to Cell (new_cell.py -> cell.py) Remove comment and unused codes Add experiment tracking with MONAI for MetricExchanger (NVIDIA#1925) * add `stats_sender_id` in `ClientAlgoExecutor` Signed-off-by: KumoLiu <yunl@nvidia.com> * add `NVFlareStatsHandler` Signed-off-by: KumoLiu <yunl@nvidia.com> * add experiment tracking with MONAI for MetricExchanger * fix ci * remove log_writer_metrics_exchanger.py which was not supposed to be there * make changes after discussion about PR * fix ci * make fixes from PR comments * make fixes from PR comments * make fixes from PR comments --------- Signed-off-by: KumoLiu <yunl@nvidia.com> Co-authored-by: KumoLiu <yunl@nvidia.com> Re-add cli persistent history (NVIDIA#1938) * re-add cli persistent history * change history file * change to pathlib --------- Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com> Job CLI: create Job, submit Job, list_templates, show_variables (NVIDIA#1888) * add nvflare job command update the job config setup * fix meta.json error * fix few bugs * change with the new format * Working in Progress * move import to the top * ALl things worked, Stil need cleanup and unit tests * add missing transfer type * restore * restore * restore * restore * 1. move some code to cli_utils.py 2. add CROSS validation workflow 3. avoid duplicated components, and empty components and empty executors 4. add nvflare config * restore the change * restore the change * restore * add pyhocon as required dependency * restore setup dev version ( separate PR will do this part) * reduce number of files * restore * add unit test * 1. ConfigTreeEx 2. add unit tests * Now design WORKING in PROGRESS * add nvflare job show_workflows * update create job * WIP * working in progress * WIP * working in progress * working in progress * working in progress * add variable values * working in progress * working in progress * rebase * remove download * fix 1 unit test * CLI complete ( todo need to remove simulator related changes after another PR is merged) * add debug on ci/unit test failure ( only on jenkins) * temp remove a unit test * restore * rebase * make pyhocon required dependency * remove un-used files * remove un-used files * 1. remove un-used files 2. show_variables support all alternative formats 3. replace hard-coded names with constant variables * Fix a refactoring introduced bug * Fix a refactoring introduced bug * style formats * update client scripts * check python versions update cross-validation workflows to use numpy * add class arguments to the list, still have a bug * 1. restructure the indexer, introduce keyIndex data structure. 2. merge is not refactored 2. unit-tests are not working yet. * redesign the indexer. the code worked. still need to fix unit tests * fix the unit-tests * update * clean up * style format * tweak * rename the job from sag_cross_pt to sag_cross_np POC Upgrade 2 (NVIDIA#1944) * save startup kit location refactoring POC format and dependency change the logic of get poc workspace * rebased main Helper and manager Working but with gpu resource exception Remove cc.token from resource spec to avoid confusing resource manager Improve private function names Use command check with fl context Add document and remove unused codes Reword and improve control flow Rewrite double quote to heredocs to avoid bash/zsh issues Update template Fix controller unit test timing (NVIDIA#1937) make sure the Job CLI support multi-config formats (NVIDIA#1946) * make sure the code support multi-config formats * make sure the code support multi-config formats * remove debug * style format add workspace to config command (NVIDIA#1948) update POC tutorials (NVIDIA#1949) * update POC tutorials * remove "--" in few more places update docs after change in nvflare poc command [skip ci] (NVIDIA#1945) * update docs after change in nvflare poc command * remove unintended files * add note in docs * add to POC config info and some small fixes * fix ci * add note Add Sean to build command (NVIDIA#1950) Vertical XGBoost with PSI integration (NVIDIA#1922) * vertical xgboost with psi integration * formatting * simplifying user exp * improvements, changes to use hist executor * minor improvements * remove unused func * separate psi into another job * remove job scripts, improve data scripts * generalize app * add explanation for site-1 label owner --------- Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com> Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com> Allow metric negation in model selection (NVIDIA#1951) * tensorboard logging and metric negation in model selection * update license * update license * update license to 2023 * revert license header * remove tb logging add deprecation commands (NVIDIA#1952) * add deprecation commands * updaste style --------- Co-authored-by: nvkevlu <55759229+nvkevlu@users.noreply.github.com> Add CellCipher for secure message encryption/decryption Add SessionKeyManager to handle key exchange and management Fix fl model utils (NVIDIA#1902) * Use explicit argument name instead of kwargs * Address comments SFM Heartbeat Support (NVIDIA#1942) * Removed WAIT_UNTIL from Cellnet * Added heartbeat support to all drivers * Revert grpc keepalive to 2 Min * Renamed capability HEARTBEAT to SEND_HEART Enhance ML2FL API (NVIDIA#1953) Add example figures to README.md and fix issues regarding to the PR comments. Fix research/condist-fl license headers and update README. Update README.md Fix markdown syntax error in README.md Update README.md Update README.md Add captions to figures. Update README.md Remove fobs calls (NVIDIA#1960) * Removed the extra fobs.dumps() calls. * removed more fobs.dump(). * Removed more fobs.dumps(). * Removed additional Fobs.dumps(). * Removed more Fobs.dumps() calls. * Removed no use import. Removed the not used import in cell.py (temporary) (NVIDIA#1961) Improved error handling and fixed memory leak (NVIDIA#1921) * Added more error handling and fixed the memory leak * Ignore late ACKs * Check for no payload scenario * Addressed the PR comments, added lock, moved pop to top --------- Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com> Fix unit test and integration tests (NVIDIA#1962) * Fix f3 communicator unit test * Update dxo meta with FLModel meta * Fix fl model util Client controller (NVIDIA#1913) * initial cut. * WIP: * WIP: * added filters and task for client controller. * Working version. * Fixed the client_sag broadcast_tasks. * Refacftored. * Added error handling. * WIP: client_controller change. * Fixed the client controller _call_task_cb(). * Extracted the apply_data_filters() and apply_result_filters(). * refactored. * Adjust the task result cb logic. * Added server as the client_controller target. * Added the client controller based cyclic example. * codestyle changes.: * codestyle changes for example. * Removed no use import. * Addressed the PR review feedbacks. * Removed the cyclic example. * Added direction support for the filters. * minor fix. * added target validation. * optimized the task_utils. * added direction control for Scope filters. * Moved the constants to FilterKey. * codestyle fix. * license header year change. * refactoried. * further extract the common functions for task_utils. * passed in the proper Scope field name. * renamed a variable. * Changed to use hard coded field name in the Scope. --------- Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com> Co-authored-by: Yan Cheng <58191769+yanchengnv@users.noreply.github.com> Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com> Update POC tutorials and fix POC bugs (NVIDIA#1958) * Update POC tutorials * format style * format style * typos * typos * typos * typos * typos * typos * typos * typos * fixing typos * rename method * update wordings * update wordings * update wordings * update wordings --------- Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com> Add Job CLI Tutorials and step-by-step initial examples (NVIDIA#1957) * update job template and tutorials (WIP) update POC tutorials: WIP update POC tutorials: WIP add tutorial for Job CLI style formats style formats style formats wording wording wording update the tutorials format style * wording * fix unit tests * fix unit tests * format * fix timeout issue * fix timeout issue * fix timeout issue * fix style and import related changes * typos * fixing typos * fixing typos * refactory main methods * bug fixes * update readme.md Add more results in the README and fix some minor issues. Refactor format_log_message with more readability (NVIDIA#1965) Remove some POC stop message (NVIDIA#1966) * 1. remove some message on nvflare poc stop 2. clean up the job CLI tutorial wordings * remove output * format Add experiment tracking docs (NVIDIA#1963) * add experiment tracking docs * add missed docs * remove paragraph * make edits based on PR comments * make consistent names of functions and variables with plural of metric --------- Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com> Add SimpleCellCipher to remove session key manager Refactor common functions to serve both designs Fix KiTS19 URL in README.md Change dict key in the checkpoints. Rename 'extract_tensor' function to 'array_to_list'. Improve CLI command error handling (NVIDIA#1971) * improve CLI command error handling * improve CLI command error handling * formats polish notebook for Job CLI (NVIDIA#1975) update readme
…stillation for Federated Learning from Partially Annotated Data" [skip ci] (#1940) * Add implementation to ConDistFL research folder. Remove redundant files and update running script. Add license header for research/condist_fl. Wrap training scripts inside main method (#1939) Fixed the recursive FLComponents creation. (#1934) * Fixed the resursive FLComponents creation. * Remove the temp_fl_ctx change. * Removed no used import. --------- Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com> Rename Cell to CoreCell (cell.py -> core_cell.py) Rename NewCell to Cell (new_cell.py -> cell.py) Remove comment and unused codes Add experiment tracking with MONAI for MetricExchanger (#1925) * add `stats_sender_id` in `ClientAlgoExecutor` Signed-off-by: KumoLiu <yunl@nvidia.com> * add `NVFlareStatsHandler` Signed-off-by: KumoLiu <yunl@nvidia.com> * add experiment tracking with MONAI for MetricExchanger * fix ci * remove log_writer_metrics_exchanger.py which was not supposed to be there * make changes after discussion about PR * fix ci * make fixes from PR comments * make fixes from PR comments * make fixes from PR comments --------- Signed-off-by: KumoLiu <yunl@nvidia.com> Co-authored-by: KumoLiu <yunl@nvidia.com> Re-add cli persistent history (#1938) * re-add cli persistent history * change history file * change to pathlib --------- Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com> Job CLI: create Job, submit Job, list_templates, show_variables (#1888) * add nvflare job command update the job config setup * fix meta.json error * fix few bugs * change with the new format * Working in Progress * move import to the top * ALl things worked, Stil need cleanup and unit tests * add missing transfer type * restore * restore * restore * restore * 1. move some code to cli_utils.py 2. add CROSS validation workflow 3. avoid duplicated components, and empty components and empty executors 4. add nvflare config * restore the change * restore the change * restore * add pyhocon as required dependency * restore setup dev version ( separate PR will do this part) * reduce number of files * restore * add unit test * 1. ConfigTreeEx 2. add unit tests * Now design WORKING in PROGRESS * add nvflare job show_workflows * update create job * WIP * working in progress * WIP * working in progress * working in progress * working in progress * add variable values * working in progress * working in progress * rebase * remove download * fix 1 unit test * CLI complete ( todo need to remove simulator related changes after another PR is merged) * add debug on ci/unit test failure ( only on jenkins) * temp remove a unit test * restore * rebase * make pyhocon required dependency * remove un-used files * remove un-used files * 1. remove un-used files 2. show_variables support all alternative formats 3. replace hard-coded names with constant variables * Fix a refactoring introduced bug * Fix a refactoring introduced bug * style formats * update client scripts * check python versions update cross-validation workflows to use numpy * add class arguments to the list, still have a bug * 1. restructure the indexer, introduce keyIndex data structure. 2. merge is not refactored 2. unit-tests are not working yet. * redesign the indexer. the code worked. still need to fix unit tests * fix the unit-tests * update * clean up * style format * tweak * rename the job from sag_cross_pt to sag_cross_np POC Upgrade 2 (#1944) * save startup kit location refactoring POC format and dependency change the logic of get poc workspace * rebased main Helper and manager Working but with gpu resource exception Remove cc.token from resource spec to avoid confusing resource manager Improve private function names Use command check with fl context Add document and remove unused codes Reword and improve control flow Rewrite double quote to heredocs to avoid bash/zsh issues Update template Fix controller unit test timing (#1937) make sure the Job CLI support multi-config formats (#1946) * make sure the code support multi-config formats * make sure the code support multi-config formats * remove debug * style format add workspace to config command (#1948) update POC tutorials (#1949) * update POC tutorials * remove "--" in few more places update docs after change in nvflare poc command [skip ci] (#1945) * update docs after change in nvflare poc command * remove unintended files * add note in docs * add to POC config info and some small fixes * fix ci * add note Add Sean to build command (#1950) Vertical XGBoost with PSI integration (#1922) * vertical xgboost with psi integration * formatting * simplifying user exp * improvements, changes to use hist executor * minor improvements * remove unused func * separate psi into another job * remove job scripts, improve data scripts * generalize app * add explanation for site-1 label owner --------- Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com> Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com> Allow metric negation in model selection (#1951) * tensorboard logging and metric negation in model selection * update license * update license * update license to 2023 * revert license header * remove tb logging add deprecation commands (#1952) * add deprecation commands * updaste style --------- Co-authored-by: nvkevlu <55759229+nvkevlu@users.noreply.github.com> Add CellCipher for secure message encryption/decryption Add SessionKeyManager to handle key exchange and management Fix fl model utils (#1902) * Use explicit argument name instead of kwargs * Address comments SFM Heartbeat Support (#1942) * Removed WAIT_UNTIL from Cellnet * Added heartbeat support to all drivers * Revert grpc keepalive to 2 Min * Renamed capability HEARTBEAT to SEND_HEART Enhance ML2FL API (#1953) Add example figures to README.md and fix issues regarding to the PR comments. Fix research/condist-fl license headers and update README. Update README.md Fix markdown syntax error in README.md Update README.md Update README.md Add captions to figures. Update README.md Remove fobs calls (#1960) * Removed the extra fobs.dumps() calls. * removed more fobs.dump(). * Removed more fobs.dumps(). * Removed additional Fobs.dumps(). * Removed more Fobs.dumps() calls. * Removed no use import. Removed the not used import in cell.py (temporary) (#1961) Improved error handling and fixed memory leak (#1921) * Added more error handling and fixed the memory leak * Ignore late ACKs * Check for no payload scenario * Addressed the PR comments, added lock, moved pop to top --------- Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com> Fix unit test and integration tests (#1962) * Fix f3 communicator unit test * Update dxo meta with FLModel meta * Fix fl model util Client controller (#1913) * initial cut. * WIP: * WIP: * added filters and task for client controller. * Working version. * Fixed the client_sag broadcast_tasks. * Refacftored. * Added error handling. * WIP: client_controller change. * Fixed the client controller _call_task_cb(). * Extracted the apply_data_filters() and apply_result_filters(). * refactored. * Adjust the task result cb logic. * Added server as the client_controller target. * Added the client controller based cyclic example. * codestyle changes.: * codestyle changes for example. * Removed no use import. * Addressed the PR review feedbacks. * Removed the cyclic example. * Added direction support for the filters. * minor fix. * added target validation. * optimized the task_utils. * added direction control for Scope filters. * Moved the constants to FilterKey. * codestyle fix. * license header year change. * refactoried. * further extract the common functions for task_utils. * passed in the proper Scope field name. * renamed a variable. * Changed to use hard coded field name in the Scope. --------- Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com> Co-authored-by: Yan Cheng <58191769+yanchengnv@users.noreply.github.com> Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com> Update POC tutorials and fix POC bugs (#1958) * Update POC tutorials * format style * format style * typos * typos * typos * typos * typos * typos * typos * typos * fixing typos * rename method * update wordings * update wordings * update wordings * update wordings --------- Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com> Add Job CLI Tutorials and step-by-step initial examples (#1957) * update job template and tutorials (WIP) update POC tutorials: WIP update POC tutorials: WIP add tutorial for Job CLI style formats style formats style formats wording wording wording update the tutorials format style * wording * fix unit tests * fix unit tests * format * fix timeout issue * fix timeout issue * fix timeout issue * fix style and import related changes * typos * fixing typos * fixing typos * refactory main methods * bug fixes * update readme.md Add more results in the README and fix some minor issues. Refactor format_log_message with more readability (#1965) Remove some POC stop message (#1966) * 1. remove some message on nvflare poc stop 2. clean up the job CLI tutorial wordings * remove output * format Add experiment tracking docs (#1963) * add experiment tracking docs * add missed docs * remove paragraph * make edits based on PR comments * make consistent names of functions and variables with plural of metric --------- Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com> Add SimpleCellCipher to remove session key manager Refactor common functions to serve both designs Fix KiTS19 URL in README.md Change dict key in the checkpoints. Rename 'extract_tensor' function to 'array_to_list'. Improve CLI command error handling (#1971) * improve CLI command error handling * improve CLI command error handling * formats polish notebook for Job CLI (#1975) update readme * remove old file * formatting --------- Co-authored-by: Holger Roth <hroth@nvidia.com>
PR #6220 was closed and NVFlareStatsHandler has now been implemented in NVFlare in NVIDIA/NVFlare#1925. However, there is still the piece in MonaiAlgo to attach the stats_sender in initialize, so this PR adds that missing piece. ### Types of changes <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Integration tests passed locally by running `./runtests.sh -f -u --net --coverage`. - [ ] Quick tests passed locally by running `./runtests.sh --quick --unittests --disttests`. - [ ] In-line docstrings updated. - [ ] Documentation updated, tested `make html` command in the `docs/` folder. --------- Signed-off-by: Kevin <kevlu@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* add `stats_sender_id` in `ClientAlgoExecutor` Signed-off-by: KumoLiu <yunl@nvidia.com> * add `NVFlareStatsHandler` Signed-off-by: KumoLiu <yunl@nvidia.com> * add experiment tracking with MONAI for MetricExchanger * fix ci * remove log_writer_metrics_exchanger.py which was not supposed to be there * make changes after discussion about PR * fix ci * make fixes from PR comments * make fixes from PR comments * make fixes from PR comments --------- Signed-off-by: KumoLiu <yunl@nvidia.com> Co-authored-by: KumoLiu <yunl@nvidia.com>
…stillation for Federated Learning from Partially Annotated Data" [skip ci] (NVIDIA#1940) * Add implementation to ConDistFL research folder. Remove redundant files and update running script. Add license header for research/condist_fl. Wrap training scripts inside main method (NVIDIA#1939) Fixed the recursive FLComponents creation. (NVIDIA#1934) * Fixed the resursive FLComponents creation. * Remove the temp_fl_ctx change. * Removed no used import. --------- Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com> Rename Cell to CoreCell (cell.py -> core_cell.py) Rename NewCell to Cell (new_cell.py -> cell.py) Remove comment and unused codes Add experiment tracking with MONAI for MetricExchanger (NVIDIA#1925) * add `stats_sender_id` in `ClientAlgoExecutor` Signed-off-by: KumoLiu <yunl@nvidia.com> * add `NVFlareStatsHandler` Signed-off-by: KumoLiu <yunl@nvidia.com> * add experiment tracking with MONAI for MetricExchanger * fix ci * remove log_writer_metrics_exchanger.py which was not supposed to be there * make changes after discussion about PR * fix ci * make fixes from PR comments * make fixes from PR comments * make fixes from PR comments --------- Signed-off-by: KumoLiu <yunl@nvidia.com> Co-authored-by: KumoLiu <yunl@nvidia.com> Re-add cli persistent history (NVIDIA#1938) * re-add cli persistent history * change history file * change to pathlib --------- Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com> Job CLI: create Job, submit Job, list_templates, show_variables (NVIDIA#1888) * add nvflare job command update the job config setup * fix meta.json error * fix few bugs * change with the new format * Working in Progress * move import to the top * ALl things worked, Stil need cleanup and unit tests * add missing transfer type * restore * restore * restore * restore * 1. move some code to cli_utils.py 2. add CROSS validation workflow 3. avoid duplicated components, and empty components and empty executors 4. add nvflare config * restore the change * restore the change * restore * add pyhocon as required dependency * restore setup dev version ( separate PR will do this part) * reduce number of files * restore * add unit test * 1. ConfigTreeEx 2. add unit tests * Now design WORKING in PROGRESS * add nvflare job show_workflows * update create job * WIP * working in progress * WIP * working in progress * working in progress * working in progress * add variable values * working in progress * working in progress * rebase * remove download * fix 1 unit test * CLI complete ( todo need to remove simulator related changes after another PR is merged) * add debug on ci/unit test failure ( only on jenkins) * temp remove a unit test * restore * rebase * make pyhocon required dependency * remove un-used files * remove un-used files * 1. remove un-used files 2. show_variables support all alternative formats 3. replace hard-coded names with constant variables * Fix a refactoring introduced bug * Fix a refactoring introduced bug * style formats * update client scripts * check python versions update cross-validation workflows to use numpy * add class arguments to the list, still have a bug * 1. restructure the indexer, introduce keyIndex data structure. 2. merge is not refactored 2. unit-tests are not working yet. * redesign the indexer. the code worked. still need to fix unit tests * fix the unit-tests * update * clean up * style format * tweak * rename the job from sag_cross_pt to sag_cross_np POC Upgrade 2 (NVIDIA#1944) * save startup kit location refactoring POC format and dependency change the logic of get poc workspace * rebased main Helper and manager Working but with gpu resource exception Remove cc.token from resource spec to avoid confusing resource manager Improve private function names Use command check with fl context Add document and remove unused codes Reword and improve control flow Rewrite double quote to heredocs to avoid bash/zsh issues Update template Fix controller unit test timing (NVIDIA#1937) make sure the Job CLI support multi-config formats (NVIDIA#1946) * make sure the code support multi-config formats * make sure the code support multi-config formats * remove debug * style format add workspace to config command (NVIDIA#1948) update POC tutorials (NVIDIA#1949) * update POC tutorials * remove "--" in few more places update docs after change in nvflare poc command [skip ci] (NVIDIA#1945) * update docs after change in nvflare poc command * remove unintended files * add note in docs * add to POC config info and some small fixes * fix ci * add note Add Sean to build command (NVIDIA#1950) Vertical XGBoost with PSI integration (NVIDIA#1922) * vertical xgboost with psi integration * formatting * simplifying user exp * improvements, changes to use hist executor * minor improvements * remove unused func * separate psi into another job * remove job scripts, improve data scripts * generalize app * add explanation for site-1 label owner --------- Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com> Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com> Allow metric negation in model selection (NVIDIA#1951) * tensorboard logging and metric negation in model selection * update license * update license * update license to 2023 * revert license header * remove tb logging add deprecation commands (NVIDIA#1952) * add deprecation commands * updaste style --------- Co-authored-by: nvkevlu <55759229+nvkevlu@users.noreply.github.com> Add CellCipher for secure message encryption/decryption Add SessionKeyManager to handle key exchange and management Fix fl model utils (NVIDIA#1902) * Use explicit argument name instead of kwargs * Address comments SFM Heartbeat Support (NVIDIA#1942) * Removed WAIT_UNTIL from Cellnet * Added heartbeat support to all drivers * Revert grpc keepalive to 2 Min * Renamed capability HEARTBEAT to SEND_HEART Enhance ML2FL API (NVIDIA#1953) Add example figures to README.md and fix issues regarding to the PR comments. Fix research/condist-fl license headers and update README. Update README.md Fix markdown syntax error in README.md Update README.md Update README.md Add captions to figures. Update README.md Remove fobs calls (NVIDIA#1960) * Removed the extra fobs.dumps() calls. * removed more fobs.dump(). * Removed more fobs.dumps(). * Removed additional Fobs.dumps(). * Removed more Fobs.dumps() calls. * Removed no use import. Removed the not used import in cell.py (temporary) (NVIDIA#1961) Improved error handling and fixed memory leak (NVIDIA#1921) * Added more error handling and fixed the memory leak * Ignore late ACKs * Check for no payload scenario * Addressed the PR comments, added lock, moved pop to top --------- Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com> Fix unit test and integration tests (NVIDIA#1962) * Fix f3 communicator unit test * Update dxo meta with FLModel meta * Fix fl model util Client controller (NVIDIA#1913) * initial cut. * WIP: * WIP: * added filters and task for client controller. * Working version. * Fixed the client_sag broadcast_tasks. * Refacftored. * Added error handling. * WIP: client_controller change. * Fixed the client controller _call_task_cb(). * Extracted the apply_data_filters() and apply_result_filters(). * refactored. * Adjust the task result cb logic. * Added server as the client_controller target. * Added the client controller based cyclic example. * codestyle changes.: * codestyle changes for example. * Removed no use import. * Addressed the PR review feedbacks. * Removed the cyclic example. * Added direction support for the filters. * minor fix. * added target validation. * optimized the task_utils. * added direction control for Scope filters. * Moved the constants to FilterKey. * codestyle fix. * license header year change. * refactoried. * further extract the common functions for task_utils. * passed in the proper Scope field name. * renamed a variable. * Changed to use hard coded field name in the Scope. --------- Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com> Co-authored-by: Yan Cheng <58191769+yanchengnv@users.noreply.github.com> Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com> Update POC tutorials and fix POC bugs (NVIDIA#1958) * Update POC tutorials * format style * format style * typos * typos * typos * typos * typos * typos * typos * typos * fixing typos * rename method * update wordings * update wordings * update wordings * update wordings --------- Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com> Add Job CLI Tutorials and step-by-step initial examples (NVIDIA#1957) * update job template and tutorials (WIP) update POC tutorials: WIP update POC tutorials: WIP add tutorial for Job CLI style formats style formats style formats wording wording wording update the tutorials format style * wording * fix unit tests * fix unit tests * format * fix timeout issue * fix timeout issue * fix timeout issue * fix style and import related changes * typos * fixing typos * fixing typos * refactory main methods * bug fixes * update readme.md Add more results in the README and fix some minor issues. Refactor format_log_message with more readability (NVIDIA#1965) Remove some POC stop message (NVIDIA#1966) * 1. remove some message on nvflare poc stop 2. clean up the job CLI tutorial wordings * remove output * format Add experiment tracking docs (NVIDIA#1963) * add experiment tracking docs * add missed docs * remove paragraph * make edits based on PR comments * make consistent names of functions and variables with plural of metric --------- Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com> Add SimpleCellCipher to remove session key manager Refactor common functions to serve both designs Fix KiTS19 URL in README.md Change dict key in the checkpoints. Rename 'extract_tensor' function to 'array_to_list'. Improve CLI command error handling (NVIDIA#1971) * improve CLI command error handling * improve CLI command error handling * formats polish notebook for Job CLI (NVIDIA#1975) update readme * remove old file * formatting --------- Co-authored-by: Holger Roth <hroth@nvidia.com>
Add experiment tracking with MONAI for MetricExchanger.
Makes use of monai with: Project-MONAI/MONAI#6220
Also #1566
Types of changes
./runtest.sh.