Currently the way we implement the sp_tasks::spawn runtime API (which is roughly equivalent to a std::thread::spawn, just called from within the runtime and without any shared memory between the caller and the callee) has a few problems from what I can see:
- it is unbound**: you could exhaust system resources by spawning enough tasks
- it uses
spawn instead of spawn_blocking yet it is blocking (it uses a blocking channel, and calling into the runtime is also effectively a blocking operation)
- you can call it recursively without any limit
It can deadlock the whole async loop when called enough times in parallel, or called enough times recursively.
Its docs also say that it's dangerous when used incorrectly.
(** - although due to it using spawn you'll probably exhaust all of the tokio worker threads and deadlock the whole process first)
What we should do to fix this:
- Soft-limit the number of tasks which can be concurrently executed; add the rest into a queue. There's no point in supporting an unbound number of tasks executing at the same time since calling into the runtime is an inherently blocking operation, and you don't have an infinite number of CPU cores to run them all anyway. We should pick a preset number of supported parallel tasks and execute only at most that many simultaneously. (This also reduces the risk of triggering network partitions if this is used in a place which can affect consensus.)
- Hard-limit the number of levels the tasks can be spawned recursively, or (preferably) even just disallow it completely; otherwise you'll still risk a deadlock
It would also be nice to add a hard-limit for the number of tasks which can be queued at the same time, but unfortunately that could be a source of non-determinism depending on how fast the already queued tasks are executed; we could however add a potential hard-limit to the total number of tasks.
Fortunately from what I can see no one currently really uses this API? (At least it isn't used in polkadot, if my grepping is correct; not sure about other chains though.)
Currently the way we implement the
sp_tasks::spawnruntime API (which is roughly equivalent to astd::thread::spawn, just called from within the runtime and without any shared memory between the caller and the callee) has a few problems from what I can see:spawninstead ofspawn_blockingyet it is blocking (it uses a blocking channel, and calling into the runtime is also effectively a blocking operation)It can deadlock the whole async loop when called enough times in parallel, or called enough times recursively.
Its docs also say that it's dangerous when used incorrectly.
(** - although due to it using
spawnyou'll probably exhaust all of the tokio worker threads and deadlock the whole process first)What we should do to fix this:
It would also be nice to add a hard-limit for the number of tasks which can be queued at the same time, but unfortunately that could be a source of non-determinism depending on how fast the already queued tasks are executed; we could however add a potential hard-limit to the total number of tasks.
Fortunately from what I can see no one currently really uses this API? (At least it isn't used in
polkadot, if my grepping is correct; not sure about other chains though.)