DASH implementation (linear regression for genetics)#2658

Merged

LaRiffle merged 24 commits intoOpenMined:masterfrom

andrelmfarias:DASH

Dec 19, 2019

Contributor

andrelmfarias commented Oct 11, 2019 •

edited

Loading

Implemented Distributed Association Scan Hammer (DASH) algorithm. This algorithm is used in the context of Linear Regression for genetics. It corresponds to section 4 of Jonathan Bloom's paper.

TODO:

Implement inversion of upper triangular matrix in SMPC using forward/back substitution: To be discussed, it requires several divisions...
Implement tests

andrelmfarias and others added 13 commits

October 3, 2019 13:43


          init implem DASH

9007ae2


          Merge remote-tracking branch 'upstream/dev' into DASH

4e11a2c


          continue DASH implem

b4b98bf


          finalized DASH algorithm

ac8a677


          Merge remote-tracking branch 'upstream/dev' into DASH

d14f649


          implemented inversion of upper triangular matrix

bac3b8c


          Merge branch 'dev' into DASH

81e5d41


          fixed implem inv_upper

4cd450b


          Merge branch 'DASH' of github.com:andrelmfarias/PySyft into DASH

caeaa90


          Merge remote-tracking branch 'upstream/dev' into DASH

341e9d6


          Merge remote-tracking branch 'upstream/dev' into DASH

67c4e40


          implemented test for DASH

c6f5669


          Merge branch 'dev' into DASH

f3da6cf

andrelmfarias changed the title ~~[WIP] DASH implementation (linear regression for genetics)~~ DASH implementation (linear regression for genetics)

robert-wagner changed the base branch from dev to master

October 26, 2019 18:17

LaRiffle reviewed

View reviewed changes

test/torch/linalg/test_operations.py Outdated Show resolved Hide resolved

iamtrask and others added 4 commits

November 17, 2019 09:51


          Merge branch 'master' into DASH

cbbd2f9


          Merge branch 'master' into DASH

c747537


          Merge branch 'master' into DASH

42abaea


          fixed bug with inversion of upper matrix and rm comment from test

657fdfc

LaRiffle suggested changes

View reviewed changes

syft/frameworks/torch/linalg/lr.py Show resolved Hide resolved

syft/frameworks/torch/linalg/lr.py Show resolved Hide resolved

syft/frameworks/torch/linalg/lr.py Outdated

+                  Args:
+                      crypto_provider: a BaseWorker providing crypto elements for ASTs such as
+                          Beaver triples
+                      hbc_worker: The "Honest but Curious" BaseWorker

Contributor

LaRiffle Dec 14, 2019

Can you put a few lines explaining the role ob the hbc worker?

Contributor Author

andrelmfarias Dec 17, 2019

I tried to give an explanation of its role in both algorithms here and also used the definition of "Honest bu Curious" I found here.

What do you think?

Contributor

LaRiffle Dec 18, 2019

Ok all good in that case

syft/frameworks/torch/linalg/lr.py Outdated Show resolved Hide resolved

syft/frameworks/torch/linalg/lr.py Outdated Show resolved Hide resolved

syft/frameworks/torch/linalg/lr.py

+                      # Identity Matrix
+                      I = torch.zeros_like(R)
+                      for i in range(N):
+                          I[i, i] += 1

Contributor

LaRiffle Dec 14, 2019

You can use torch.eye I think

Contributor Author

andrelmfarias Dec 17, 2019

The idea here is to build an identity matrix with the same workers, crypto_provider and precision fractional as R.

For that, I would need whether a torch.eye_like (which unfortunately doesn't exist) or I am obligated to do like that to have something clean.

Contributor

LaRiffle Dec 18, 2019

👍

syft/frameworks/torch/linalg/lr.py Outdated

+                      # Secred share tensors between hbc_worker, crypto_provider and a random worker
+                      # and compute aggregates. It corresponds to the Combine stage of DASH's algorithm
+                      idx = random.randint(0, len(self.workers) - 1)
+                      XX_shared = sum(self._share_ptrs(XX_ptrs, idx)).sum(dim=0)

Contributor

LaRiffle Dec 14, 2019

You doing the sum twice here

Contributor Author

andrelmfarias Dec 17, 2019

If you look at the notebook with DASH implementation I added to Bloom's repo at his request, you will see that when we build the XXs matrices privately in the Compress step, we do a sum over the rows and at the Combine step we sum the XXs.
Here I am just doing the commutated operation, first I sum the XXs, then I sum the result over the row.
So the sum here is not redundant, it's needed.

Contributor Author

andrelmfarias Dec 17, 2019

Finally, I decided to do it in the same order as in the notebook implementation, so now the sum over the axis 0 is being done at the call of the method _remote_dot_products. I have double-checked and the results are the same, but it's better to be in line with the notebook implem.

syft/frameworks/torch/linalg/lr.py

+                      # and compute aggregates. It corresponds to the Combine stage of DASH's algorithm
+                      idx = random.randint(0, len(self.workers) - 1)
+                      XX_shared = sum(self._share_ptrs(XX_ptrs, idx)).sum(dim=0)
+                      Xy_shared = sum(self._share_ptrs(Xy_ptrs, idx))

Contributor

LaRiffle Dec 14, 2019

Why wouldn't you do the sum before sharing the values?

Contributor Author

andrelmfarias Dec 17, 2019

Because I need all the values in the same 3 workers for the SecureNN 3-party computation (crypto_provider, hbc_worker and a random worker from the pool).

Before self._share_ptrs the tensors in the XX_ptrs list are not secret shared with the same workers.

Contributor

LaRiffle Dec 18, 2019

Oh yes alright, thanks for pointing this out!

syft/frameworks/torch/linalg/lr.py

+                      # Need the line below to perform inverse of a number in MPC
+                      inv_denominator = ((0 * denominator + 1) / denominator).squeeze()
+                      coef_shared = (Xy_shared - QX.t() @ Qy).squeeze() * inv_denominator

Contributor

LaRiffle Dec 14, 2019

Can't you spare the line inv_denominator = ... by doing

coef_shared = (
 (Xy_shared - QX.t() @ Qy) / denominator
).squeeze()

Contributor Author

andrelmfarias Dec 17, 2019

if you look at the computations of coef_shared and sigma2_shared I need to do 2 divisions by the same denominator.

By computing inv_denominator and using multiplication instead of division to compute coef_shared and sigma2_shared I optimize the time execution by having only one division overall.

Contributor

LaRiffle Dec 18, 2019

That's right, my bad!

syft/frameworks/torch/linalg/lr.py Outdated Show resolved Hide resolved

iamtrask and others added 6 commits

December 16, 2019 15:58


          Merge branch 'master' into DASH

d62b48b


          Merge branch 'master' into DASH

68f3003


          improved docstrings / comments and aggregated check in one block

1e8c38b


          moved sum over axis 0 of XX to _remote_dot_products method

b87e196


          fixed notation

637cf91


          Merge branch 'master' into DASH

48d5378

LaRiffle approved these changes

View reviewed changes


          Merge branch 'master' into DASH

1936e9c

LaRiffle merged commit 9fd0309 into OpenMined:master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet