fix:handle compound aliases like .tgz when cataloging archives#4421
fix:handle compound aliases like .tgz when cataloging archives#4421spiffcs merged 9 commits intoanchore:mainfrom
.tgz when cataloging archives#4421Conversation
Signed-off-by: Yuntao Hu <victorhu493@gmail.com>
|
Wow TY so much @VictorHuu for the PR. I think there are some other sections where we use Identify and want this fix. I'm going to push some changes here and we'll use this branch as the fix. |
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
|
I'm also going to quickly explore an approach suggested by @willmurphyscode which would pass the reader. This might give us better automatic detection here where we don't have to implement/maintain this map 😄 |
update: file_source.go:213 - switch to stream-based (already opens file later anyway) tar_file_traversal.go:23 - opens the file on line 17, so could pass tarReader defer: unknowns_tasks.go:64 only has coords.RealPath, would need to open files (potential perf hit for many files) model.go:159 isArchive() is a helper - opening files could be more expensive here Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
|
Looks like I need to check the CLI tests now with the new split approach |
tar.gz when cataloging archives.tgz when cataloging archives
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
internal/file/archive_aliases.go
Outdated
| // | ||
| // This function is a drop-in replacement for archives.Identify that centralizes | ||
| // the compound alias handling logic in one place. | ||
| func IdentifyArchive(ctx context.Context, path string) (archives.Format, io.Reader, error) { |
There was a problem hiding this comment.
Previously, do we ever already have a file open when calling archives.Identify? Should this accept a reader rather than trying to open the file?
There was a problem hiding this comment.
I think it's better here if the function handles opening and closing it's own content here. If we pass a reader and it gets used then there might be instances in the code where we expect it to still be available after the function is called and is no longer available. I can check to see if we don't do this, but I thought a more defensive approach would be to create/cleanup its own resources here.
There was a problem hiding this comment.
See comment below - I thought we had discussed that actually reading bytes from the file to check whether its an archive was a behavior change we didn't necessarily want in this bugfix release.
I don't think it's correct to os.Open here, but I'm not clear on when we reach line 42.
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
|
TY @VictorHuu for the help on this one |
…nchore#4421) --------- Signed-off-by: Yuntao Hu <victorhu493@gmail.com> Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com> Co-authored-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com> Signed-off-by: Yuntao Hu <victorhu493@gmail.com>
Description
see: #4416 (comment)
In the near future, it's the library
archiveswho should fix the logic to support compound extension names like.tgzSummary of magic bytes vs extension for archive detection:
update:
file_source.go:213 - switch to stream-based (already opens file later anyway)
tar_file_traversal.go:23 - opens the file on line 17, so could pass tarReader
defer detection and use new map:
unknowns_tasks.go:64 only has coords.RealPath, would need to open files (potential perf hit for many files)
model.go:159 isArchive() is a helper - opening files could be more expensive here
Type of change
Checklist: