Is this your first time submitting a feature request?
Describe the feature
This relates to a conversion on Slack which also resulted in issue #7117.
If we allow dbt to hash larger seed files I think these should be computed incrementally to avoid allocation of too much memory. This will be most important in CI where memory is more limited.
My idea is to add another classmethod to the FileHash dataclass which has the same signature as from_contents. This method can then be used specifically for seed file hashes and from_contents can be continued to be used alongside it.
@classmethod
def from_path(cls, path: str, name="sha256") -> "FileHash":
"""Create a file hash from the file at given path.
"""
pass
Describe alternatives you've considered
The alternative is to go with what we have which could spike memory usage for those who override the default limit and have a lot of large seed files. Yes this could be an anti pattern but there's no reason to limit the user like that here.
Who will this benefit?
I would argue that the default limit should be increased from 1 MB but as we have the environment variable override possibility in above PR I would be fine with keeping the default at 1 MB for now.
Therefore, the most benefit are more advanced projects which will override the 1 MB limit with the environment variable.
Are you interested in contributing this feature?
Yes, I will create the PR
Anything else?
No response
Is this your first time submitting a feature request?
Describe the feature
This relates to a conversion on Slack which also resulted in issue #7117.
If we allow dbt to hash larger seed files I think these should be computed incrementally to avoid allocation of too much memory. This will be most important in CI where memory is more limited.
My idea is to add another classmethod to the
FileHashdataclass which has the same signature asfrom_contents. This method can then be used specifically for seed file hashes and from_contents can be continued to be used alongside it.Describe alternatives you've considered
The alternative is to go with what we have which could spike memory usage for those who override the default limit and have a lot of large seed files. Yes this could be an anti pattern but there's no reason to limit the user like that here.
Who will this benefit?
I would argue that the default limit should be increased from 1 MB but as we have the environment variable override possibility in above PR I would be fine with keeping the default at 1 MB for now.
Therefore, the most benefit are more advanced projects which will override the 1 MB limit with the environment variable.
Are you interested in contributing this feature?
Yes, I will create the PR
Anything else?
No response