Skip to content

Version 0.3.3 crashes kernel with large dataset #75

@alexjonesphd

Description

@alexjonesphd

Hi all,

Thank you for the work on this awesome package!

I am using bambi to fit a fairly complex model on a large dataset with a single group effect, which has many individuals. There are around 183,000 rows and 30,542 unique groups. Version 0.3.3 crashes Jupyter reliably when instantiating this design matrix. Interestingly, version 0.2.0 will instantiate it, with the caveat I can't include an interaction term I can in 0.3.3 within bambi (see bambi issue 495).

I've tried a few different approaches, and have found it will instantiate with about 25-50% of the data (using DataFrame.sample(frac=.25)), so it seems more an issue of sheer scale than anything else. I've also tried with Spyder, getting the same issue.

The code below will grab a modified dataset of the same size and structure I am using and set up the model design, which kills my kernel after a minute or two.

import pandas as pd
from formulae import design_matrices

trouble = pd.read_feather('https://osf.io/kw2xh/download')
md = design_matrices('response ~ bs(time, degree=2, df=10) * state + state * level * trait + (1|pid)', data=trouble)

0.2.0 will return this after a little while, however. Any help is greatly appreciated, hopefully this issue isn't localised to my machine!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions