Skip to content

[CT-2271] [Feature] Compute seed file hashes incrementally #7124

@noppaz

Description

@noppaz

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

This relates to a conversion on Slack which also resulted in issue #7117.

If we allow dbt to hash larger seed files I think these should be computed incrementally to avoid allocation of too much memory. This will be most important in CI where memory is more limited.

My idea is to add another classmethod to the FileHash dataclass which has the same signature as from_contents. This method can then be used specifically for seed file hashes and from_contents can be continued to be used alongside it.

@classmethod
  def from_path(cls, path: str, name="sha256") -> "FileHash":
      """Create a file hash from the file at given path.
      """
      pass

Describe alternatives you've considered

The alternative is to go with what we have which could spike memory usage for those who override the default limit and have a lot of large seed files. Yes this could be an anti pattern but there's no reason to limit the user like that here.

Who will this benefit?

I would argue that the default limit should be increased from 1 MB but as we have the environment variable override possibility in above PR I would be fine with keeping the default at 1 MB for now.

Therefore, the most benefit are more advanced projects which will override the 1 MB limit with the environment variable.

Are you interested in contributing this feature?

Yes, I will create the PR

Anything else?

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requesthelp_wantedTrickier changes, with a clear starting point, good for previous/experienced contributors

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions