-
Notifications
You must be signed in to change notification settings - Fork 17
Version 0.3.3 crashes kernel with large dataset #75
Description
Hi all,
Thank you for the work on this awesome package!
I am using bambi to fit a fairly complex model on a large dataset with a single group effect, which has many individuals. There are around 183,000 rows and 30,542 unique groups. Version 0.3.3 crashes Jupyter reliably when instantiating this design matrix. Interestingly, version 0.2.0 will instantiate it, with the caveat I can't include an interaction term I can in 0.3.3 within bambi (see bambi issue 495).
I've tried a few different approaches, and have found it will instantiate with about 25-50% of the data (using DataFrame.sample(frac=.25)), so it seems more an issue of sheer scale than anything else. I've also tried with Spyder, getting the same issue.
The code below will grab a modified dataset of the same size and structure I am using and set up the model design, which kills my kernel after a minute or two.
import pandas as pd
from formulae import design_matrices
trouble = pd.read_feather('https://osf.io/kw2xh/download')
md = design_matrices('response ~ bs(time, degree=2, df=10) * state + state * level * trait + (1|pid)', data=trouble)
0.2.0 will return this after a little while, however. Any help is greatly appreciated, hopefully this issue isn't localised to my machine!