DASH implementation (linear regression for genetics)#2658
DASH implementation (linear regression for genetics)#2658LaRiffle merged 24 commits intoOpenMined:masterfrom
Conversation
syft/frameworks/torch/linalg/lr.py
Outdated
| Args: | ||
| crypto_provider: a BaseWorker providing crypto elements for ASTs such as | ||
| Beaver triples | ||
| hbc_worker: The "Honest but Curious" BaseWorker |
There was a problem hiding this comment.
Can you put a few lines explaining the role ob the hbc worker?
There was a problem hiding this comment.
I tried to give an explanation of its role in both algorithms here and also used the definition of "Honest bu Curious" I found here.
What do you think?
| # Identity Matrix | ||
| I = torch.zeros_like(R) | ||
| for i in range(N): | ||
| I[i, i] += 1 |
There was a problem hiding this comment.
You can use torch.eye I think
There was a problem hiding this comment.
The idea here is to build an identity matrix with the same workers, crypto_provider and precision fractional as R.
For that, I would need whether a torch.eye_like (which unfortunately doesn't exist) or I am obligated to do like that to have something clean.
syft/frameworks/torch/linalg/lr.py
Outdated
| # Secred share tensors between hbc_worker, crypto_provider and a random worker | ||
| # and compute aggregates. It corresponds to the Combine stage of DASH's algorithm | ||
| idx = random.randint(0, len(self.workers) - 1) | ||
| XX_shared = sum(self._share_ptrs(XX_ptrs, idx)).sum(dim=0) |
There was a problem hiding this comment.
You doing the sum twice here
There was a problem hiding this comment.
If you look at the notebook with DASH implementation I added to Bloom's repo at his request, you will see that when we build the XXs matrices privately in the Compress step, we do a sum over the rows and at the Combine step we sum the XXs.
Here I am just doing the commutated operation, first I sum the XXs, then I sum the result over the row.
So the sum here is not redundant, it's needed.
There was a problem hiding this comment.
Finally, I decided to do it in the same order as in the notebook implementation, so now the sum over the axis 0 is being done at the call of the method _remote_dot_products. I have double-checked and the results are the same, but it's better to be in line with the notebook implem.
| # and compute aggregates. It corresponds to the Combine stage of DASH's algorithm | ||
| idx = random.randint(0, len(self.workers) - 1) | ||
| XX_shared = sum(self._share_ptrs(XX_ptrs, idx)).sum(dim=0) | ||
| Xy_shared = sum(self._share_ptrs(Xy_ptrs, idx)) |
There was a problem hiding this comment.
Why wouldn't you do the sum before sharing the values?
There was a problem hiding this comment.
Because I need all the values in the same 3 workers for the SecureNN 3-party computation (crypto_provider, hbc_worker and a random worker from the pool).
Before self._share_ptrs the tensors in the XX_ptrs list are not secret shared with the same workers.
There was a problem hiding this comment.
Oh yes alright, thanks for pointing this out!
| # Need the line below to perform inverse of a number in MPC | ||
| inv_denominator = ((0 * denominator + 1) / denominator).squeeze() | ||
|
|
||
| coef_shared = (Xy_shared - QX.t() @ Qy).squeeze() * inv_denominator |
There was a problem hiding this comment.
Can't you spare the line inv_denominator = ... by doing
coef_shared = (
(Xy_shared - QX.t() @ Qy) / denominator
).squeeze()There was a problem hiding this comment.
if you look at the computations of coef_shared and sigma2_shared I need to do 2 divisions by the same denominator.
By computing inv_denominator and using multiplication instead of division to compute coef_shared and sigma2_shared I optimize the time execution by having only one division overall.
Implemented Distributed Association Scan Hammer (DASH) algorithm. This algorithm is used in the context of Linear Regression for genetics. It corresponds to section 4 of Jonathan Bloom's paper.
TODO: