Skip to content

DASH implementation (linear regression for genetics)#2658

Merged
LaRiffle merged 24 commits intoOpenMined:masterfrom
andrelmfarias:DASH
Dec 19, 2019
Merged

DASH implementation (linear regression for genetics)#2658
LaRiffle merged 24 commits intoOpenMined:masterfrom
andrelmfarias:DASH

Conversation

@andrelmfarias
Copy link
Contributor

@andrelmfarias andrelmfarias commented Oct 11, 2019

Implemented Distributed Association Scan Hammer (DASH) algorithm. This algorithm is used in the context of Linear Regression for genetics. It corresponds to section 4 of Jonathan Bloom's paper.

TODO:

  • Implement inversion of upper triangular matrix in SMPC using forward/back substitution: To be discussed, it requires several divisions...
  • Implement tests

@andrelmfarias andrelmfarias changed the title [WIP] DASH implementation (linear regression for genetics) DASH implementation (linear regression for genetics) Oct 24, 2019
@robert-wagner robert-wagner changed the base branch from dev to master October 26, 2019 18:17
Args:
crypto_provider: a BaseWorker providing crypto elements for ASTs such as
Beaver triples
hbc_worker: The "Honest but Curious" BaseWorker
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put a few lines explaining the role ob the hbc worker?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to give an explanation of its role in both algorithms here and also used the definition of "Honest bu Curious" I found here.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok all good in that case

# Identity Matrix
I = torch.zeros_like(R)
for i in range(N):
I[i, i] += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use torch.eye I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is to build an identity matrix with the same workers, crypto_provider and precision fractional as R.

For that, I would need whether a torch.eye_like (which unfortunately doesn't exist) or I am obligated to do like that to have something clean.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

# Secred share tensors between hbc_worker, crypto_provider and a random worker
# and compute aggregates. It corresponds to the Combine stage of DASH's algorithm
idx = random.randint(0, len(self.workers) - 1)
XX_shared = sum(self._share_ptrs(XX_ptrs, idx)).sum(dim=0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You doing the sum twice here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at the notebook with DASH implementation I added to Bloom's repo at his request, you will see that when we build the XXs matrices privately in the Compress step, we do a sum over the rows and at the Combine step we sum the XXs.
Here I am just doing the commutated operation, first I sum the XXs, then I sum the result over the row.
So the sum here is not redundant, it's needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, I decided to do it in the same order as in the notebook implementation, so now the sum over the axis 0 is being done at the call of the method _remote_dot_products. I have double-checked and the results are the same, but it's better to be in line with the notebook implem.

# and compute aggregates. It corresponds to the Combine stage of DASH's algorithm
idx = random.randint(0, len(self.workers) - 1)
XX_shared = sum(self._share_ptrs(XX_ptrs, idx)).sum(dim=0)
Xy_shared = sum(self._share_ptrs(Xy_ptrs, idx))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't you do the sum before sharing the values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I need all the values in the same 3 workers for the SecureNN 3-party computation (crypto_provider, hbc_worker and a random worker from the pool).

Before self._share_ptrs the tensors in the XX_ptrs list are not secret shared with the same workers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes alright, thanks for pointing this out!

# Need the line below to perform inverse of a number in MPC
inv_denominator = ((0 * denominator + 1) / denominator).squeeze()

coef_shared = (Xy_shared - QX.t() @ Qy).squeeze() * inv_denominator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you spare the line inv_denominator = ... by doing

coef_shared = (
 (Xy_shared - QX.t() @ Qy) / denominator
).squeeze()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you look at the computations of coef_shared and sigma2_shared I need to do 2 divisions by the same denominator.

By computing inv_denominator and using multiplication instead of division to compute coef_shared and sigma2_shared I optimize the time execution by having only one division overall.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, my bad!

@LaRiffle LaRiffle merged commit 9fd0309 into OpenMined:master Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants