Pocket Network Machine Learning Test-Bench

This repository contains tools for deploying and measure Machine Learning (ML) suppliers staked in the Pocket Network. The code available here was created initially under the POKT AI Lab socket.

Dev Deployment

We use TILT for development purposes. Please go to the tilt section for more information.

The Test-Bench

The Pocket Network test bench is an environment used to verify the correctness or soundness of an staked model, live on the network. It works by streamlining the tracking, sampling and execute tasks in a performant and scalable way. Each of these tasks is a instance of a particular metric, for example, the GLUE dataset. The architecture of the project is thought to be agnostic of the task to perform, and easily extendable. The test bench follows the structure presented in the following image: As it can be seen the test-bench has five main blocks (each a different App), that work together to track the task's scores of each of the Pocket suppliers and create a metric taxonomy of them. Briefly, the apps do the following:

Manager : Keeps the records of each supplier's scores. It checks for new suppliers, reviews the age and statistics of the task scores and requests more tasks to be executed. If there are finalized tasks (resulting from the Evaluator) it adds them to the supplier score tracking database.
Sampler : Checks for Manager requests and prepares the tasks to be done. In order to do that it keeps track of available datasets (if needed) and sample from them. The result of this App is a generic call request that is correct for the Pocket Service but independent of the task.
Requester : It controls the relays done. Using the provided Pocket Network App Keys it checks the current sessions and looks for suppliers that have pending tasks requests (generated by the Sampler). When it finds a match, it performs the relays against the suppliers and saves the raw answer.
Evaluator : Retrieves the responses of the Requester and finds the originating task requests, then it calculates the appropriate metrics and writes the resulting values.
Summarizer : Perform summaries on supplier's data: Retrieves the scores for each supplier on each task and constructs the taxonomy summaries. The result is a unique entry on a database containing the scores on the taxonomies nodes for each pair of supplier-taxonomy. Also it produces the identity comparison of all suppliers, grouping those that have the same backends and elects a proxy to test.

The Apps are all coordinated using Temporal IO, with the Manager and Requester being recurrent workflows and Sampler and Evaluator being triggered by the Manager and Requester respectively. The datasets are stored using PostgreSQL since it is the most effective way to handle datasets from the LMEH test suite which is the first to be implemented. The data communication between apps is done via MongoDB, which is also the holder of the suppliers collection, the one with the resulting scores for each tested supplier.

For more details on how the Apps interact, please read the Apps Readme.

Name		Name	Last commit message	Last commit date
Latest commit History 132 Commits
.github		.github
.idea		.idea
apps		apps
assets		assets
packages		packages
reports		reports
taxonomies		taxonomies
tilt		tilt
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pocket Network Machine Learning Test-Bench

Dev Deployment

The Test-Bench

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pocket Network Machine Learning Test-Bench

Dev Deployment

The Test-Bench

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages