Hyperparameters optimisation

Hi! Thanks a lot for this package. I am interested in how best to choose values of the hyperparameters. There are five of them that seem particularly relevant:

1. _d_: the number of hash functions, used to initialize the LSH forest data structure, by default 128.
2. _l_: the number of prefix trees, used to initialize the LSH forest data structure, by default 8.
3. _k_: the number of nearest neighbors used to create the _k_-nearest neighbor graph, by default 10.
4. $k_c$: the scalar by which _k_ is multiplied before querying the LSH forest, by default 10.
5. _p_: the size of the nodes, which affects the magnitude of their repelling force, by default 1/65.

The first two parameters are from ```tmap.LSHForest``` and their default values are defined [here](https://github.com/reymond-group/tmap/blob/c74b718a86843292ab6aad91b99196b0133faac9/src/_tmap/lshforest.hh#L125). The remaining parameters are from ```tmap.layout_from_lsh_forest``` and their default values are defined [here](https://github.com/reymond-group/tmap/blob/c74b718a86843292ab6aad91b99196b0133faac9/src/_tmap/layout.hh#L151).

From the supplement (https://ndownloader.figstatic.com/files/21710592) it seems that _p_ is particularly important (cf. figures S1+S2+S3+S7). I often see tmap visualizations that are too sparse, in particular that some branches are very long and that some branches are very short (e.g., with the leaves). The paper and the corresponding analysis of the hyperparameters are already from 4 years ago. I am wondering whether there is someone who has used this tool extensively, who has experimented with these hyperparameters, and who maybe has developed some rules of thumb how to optimize these hyperparameters, especially _p_, for example dependent on the number of data points, and maybe also dependent on the approximate number of suspected clusters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hyperparameters optimisation #65

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Hyperparameters optimisation #65

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions