414 long running time of sample posterior predictive and eventual death by oom by AlexanderFengler · Pull Request #436 · lnccbrown/HSSM

AlexanderFengler · 2024-05-20T02:56:13Z

posterior predictive now has a safe_mode that chunks computations
the n_samples argument was renamed to draws, and one can pass None | int | list | np.ndarray
when running posterior predictive with kind='mean', the posterior naming cleans up rt,response_mean --> v
prior predictives get assigned to .traces now, and naming is also cleaned up
our sample_prior_predictive() will include the parent parameter as well now via an internal call to .predict()

…-hddm Merging main.

…d to _mean prediction consistently

AlexanderFengler · 2024-05-20T02:57:33Z

I will try to add a few more tests to this before merging.

digicosmos86

Looks good! Two higher-level comments:

Since the only call to simulator is done here:

HSSM/src/hssm/distribution_utils/dist.py

Lines 309 to 315 in 19f786d

    
           sim_out = simulator( 
        
               theta=theta, 
        
               model=model_name, 
        
               n_samples=n_samples, 
        
               random_state=seed, 
        
               **kwargs, 
        
           )

, maybe we can use a for-loop here over n_samples to make the sampling safe, instead of patching the higher-level functions themselves? This way we can avoid running many intermediate-level code multiple times.

InferenceData object does not come with attributes like posterior, or posterior_predictive by default, so type checker complains. The use of the square bracket notation is preferred. Or if this is too annoying we can disable this check (attr-defined) globally in pyproject.toml mypy section, but that can be a bit risky

digicosmos86 · 2024-05-20T13:31:12Z

+
+        if "posterior_predictive" in idata.groups():
+            del idata.posterior_predictive
+            print("pre-existing posterior_predictive group deleted from idata. \n")


This should be a warning

digicosmos86

Looks awesome! Just some style suggestions at this point. Feel free to merge after the fixes :)

digicosmos86 · 2024-05-21T13:54:02Z

+        if "posterior_predictive" in idata.groups():
+            if idata is not None:


Should the order be the other way around?

@digicosmos86 changed. This was useless to begin with, just an artifact appeasing mypy...

digicosmos86 · 2024-05-22T17:57:15Z

 from inspect import isclass
 from os import PathLike
-from typing import Any, Callable, Literal
+from typing import Any, Callable, Literal, Union


We don't use Union any more. Now that we have Python 3.10, we use the | operator instead

digicosmos86 · 2024-05-22T17:58:24Z

            self.model, self._parent_param, self.response_c, self.response_str
        )
        self.set_alias(self._aliases)
+        # _logger.info(self.pymc_model.initial_point())


Should we remove debug comments?

Stylistically eventually yes, but rn, I think it can sometimes still help future PRs that interact with this code. Here I literally have the next PR that I need to work on in mind. So in general agree, but let's skip here :)

digicosmos86 · 2024-05-22T17:58:33Z

        if self._inference_obj is not None:
            if self._parent not in self._inference_obj.posterior.data_vars.keys():
-                self.model.predict(self._inference_obj, kind="mean", inplace=True)
+                # self.model.predict(self._inference_obj, kind="mean", inplace=True)


Should we remove debug comments?

digicosmos86 · 2024-05-22T18:01:15Z

+                self._parent in self._inference_obj.posterior.data_vars.keys()
+                and "rt,response_mean" in self._inference_obj.posterior.data_vars.keys()


data_vars are dicts, so the Python 3 style is to not use keys()

digicosmos86 · 2024-05-22T18:05:14Z

+            and not np.allclose(draws, idata["posterior"].draw.values)
+        ):
+            # Reassign posterior to sub-sampled version
+            setattr(idata_copy, "posterior", idata["posterior"].isel(draw=draws))


Are there any differences between setattr() and idata.add_groups()?

to be honest I don't know... let me look into that independently to understand it properly.

Actually, at least used somewhat semantically here, add_groups is about new groups, setattr is about reassigning to pre-existing group.

digicosmos86 · 2024-05-22T18:07:58Z

+            if safe_mode:
+                # safe mode splits the draws into chunks of 10 to avoid
+                # memory issues (TODO: Figure out the source of memory issues)
+                split_draws = _split_array(
+                    idata_copy["posterior"].draw.values, divisor=10
+                )
+
+                posterior_predictive_list = []
+                for samples_tmp in split_draws:
+                    tmp_posterior = idata["posterior"].sel(draw=samples_tmp)
+                    setattr(idata_copy, "posterior", tmp_posterior)
+                    self.model.predict(
+                        idata_copy, kind, data, True, include_group_specific
+                    )
+                    posterior_predictive_list.append(idata_copy["posterior_predictive"])
+
+                if inplace:
+                    idata.add_groups(
+                        posterior_predictive=xr.concat(
+                            posterior_predictive_list, dim="draw"
+                        )
+                    )
+                    # for inplace, we don't return anything
+                    return None
+                else:
+                    # Reassign original posterior to idata_copy
+                    setattr(idata_copy, "posterior", idata["posterior"])
+                    # Add new posterior predictive group to idata_copy
+                    del idata_copy["posterior_predictive"]
+                    idata_copy.add_groups(
+                        posterior_predictive=xr.concat(
+                            posterior_predictive_list, dim="draw"
+                        )
+                    )
+                    return idata_copy
+            elif inplace:
+                # If not safe-mode
+                # We call .predict() directly without any
+                # chunking of data.
+
+                # .predict() is called on the copy of idata
+                # since we still subsampled (or assigned) the draws
                self.model.predict(idata_copy, kind, data, True, include_group_specific)
+
+                # posterior predictive group added to idata
                idata.add_groups(
                    posterior_predictive=idata_copy["posterior_predictive"]
                )
-
+                # don't return anything if inplace
                return None
-
+            else:
+                # Not safe mode and not inplace
+                # Function acts as very thin wrapper around
+                # .predict(). It just operates on the
+                # idata_copy object
+                return self.model.predict(
+                    idata_copy, kind, data, inplace, include_group_specific
+                )


This if block looks slightly confusing. I think I understand what you mean, but would

if safe_mode: if inplace: ... else: ... else: if inplace: ... else: ...

be more readable?

digicosmos86 · 2024-05-22T18:08:45Z

-                idata_copy, kind, data, False, include_group_specific
+                idata, kind, data, inplace, include_group_specific
            )



Add an else clause here to throw an error whenever other values are specified?

digicosmos86 · 2024-05-22T18:15:01Z

        return var_names

+    def _drop_parent_str_from_idata(
+        self, idata: Union[az.InferenceData, None]


Suggested change

self, idata: Union[az.InferenceData, None]

self, idata: az.InferenceData | None

AlexanderFengler added 3 commits May 19, 2024 17:36

wip

70c57f2

Merge branch 'main' into 388-change-slice-sampler-parameters-to-match…

927c3b3

…-hddm Merging main.

prior predictive extends idata now and parent parameters gets assigne…

aee9d2e

…d to _mean prediction consistently

AlexanderFengler requested a review from digicosmos86 May 20, 2024 02:56

AlexanderFengler linked an issue May 20, 2024 that may be closed by this pull request

Long running time of sample_posterior_predictive() and eventual death by OOM #414

Closed

jainraj reviewed May 20, 2024

View reviewed changes

Comment thread src/hssm/hssm.py

digicosmos86 requested changes May 20, 2024

View reviewed changes

AlexanderFengler added 2 commits May 20, 2024 22:10

add tests and address final comments

cef7726

fix return type split_array

c290409

AlexanderFengler requested a review from digicosmos86 May 21, 2024 02:14

AlexanderFengler added 3 commits May 21, 2024 22:57

fix tests

2a06b14

drop logging initial point

85321ee

tim

07d1140

digicosmos86 approved these changes May 22, 2024

View reviewed changes

AlexanderFengler added 2 commits May 22, 2024 17:28

one more round of comments

3094391

add few clarifying comments

c965923

AlexanderFengler merged commit 80c5248 into main May 23, 2024

digicosmos86 deleted the 414-long-running-time-of-sample_posterior_predictive-and-eventual-death-by-oom branch November 28, 2024 17:51

	sim_out = simulator(
	theta=theta,
	model=model_name,
	n_samples=n_samples,
	random_state=seed,
	**kwargs,
	)

		if "posterior_predictive" in idata.groups():
		if idata is not None:

		self._parent in self._inference_obj.posterior.data_vars.keys()
		and "rt,response_mean" in self._inference_obj.posterior.data_vars.keys()

	self, idata: Union[az.InferenceData, None]
	self, idata: az.InferenceData \| None

Conversation

AlexanderFengler commented May 20, 2024

Uh oh!

AlexanderFengler commented May 20, 2024

Uh oh!

Uh oh!

digicosmos86 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

digicosmos86 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants