Improve packfile reading performance#906
Conversation
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
plumbing/format/idxfile: add new Index and MemoryIndex
Signed-off-by: Javi Fontan <jfontan@gmail.com>
In one case it disables the cache and the other disables lookup when the scanner is not seekable. Could be added back later. Signed-off-by: Javi Fontan <jfontan@gmail.com>
It's still not complete: * 64 bit offsets * IdxChecksum Signed-off-by: Javi Fontan <jfontan@gmail.com>
Signed-off-by: Javi Fontan <jfontan@gmail.com>
Signed-off-by: Javi Fontan <jfontan@gmail.com>
Signed-off-by: Javi Fontan <jfontan@gmail.com>
Signed-off-by: Javi Fontan <jfontan@gmail.com>
This functionality may be moved elsewhere in the future but is needed now to fit filesystem.ObjectStorage and the new index. Signed-off-by: Javi Fontan <jfontan@gmail.com>
Index is also automatically generated when OnFooter is called. Signed-off-by: Javi Fontan <jfontan@gmail.com>
Now dotgit.PackWriter uses the new packfile.Parser and index. Signed-off-by: Javi Fontan <jfontan@gmail.com>
Feature/new packfile parser
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
Signed-off-by: Javi Fontan <jfontan@gmail.com>
Signed-off-by: Javi Fontan <jfontan@gmail.com>
Signed-off-by: Javi Fontan <jfontan@gmail.com>
Bugfixes and IndexStorage
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
plumbing: packfile, new Packfile representation
Signed-off-by: Javi Fontan <jfontan@gmail.com>
Signed-off-by: Javi Fontan <jfontan@gmail.com>
Signed-off-by: Javi Fontan <jfontan@gmail.com>
Tests and indexes in packfile decoder
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
plumbing/format/packfile/decoder.go
Outdated
| d.writer.Add(obj.Hash(), uint64(h.Offset), crc) | ||
| } | ||
|
|
||
| d.offsetToHash[h.Offset] = obj.Hash() |
There was a problem hiding this comment.
why do we need that having the index?
There was a problem hiding this comment.
You may not have the index built
| }) | ||
| } | ||
|
|
||
| /* |
There was a problem hiding this comment.
should that be uncommented?
| var base plumbing.EncodedObject | ||
| var ok bool | ||
| hash, err := p.FindHash(offset) | ||
| if err == nil { |
There was a problem hiding this comment.
yeah, that's intentional
|
|
||
| func (s *PackfileSuite) TestContent(c *C) { | ||
| storer := memory.NewObjectStorage() | ||
| decoder, err := NewDecoder(NewScanner(s.f.Packfile()), storer) |
There was a problem hiding this comment.
why we are using decoder here?
There was a problem hiding this comment.
To fill the memory storage and check if Packfile decodes objects correctly as decoder does
|
Windows tests seem to be failing because of an access denied removing fixtures 🤷♀️ |
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
|
Just added a benchmark for Before, using Now, using As we can see, Parser is slower. Like, way slower. |
|
New results after some optimizations, now it's faster and uses less memory than decoder. |
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
|
We might want to rename |
|
@erizocosmico Windows issue is likely to be related to a packfile not being closed before the test ends. |
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
|
Added benchmarks for PackfileIter. Before: Now: Now, reading object content: |
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
fa697fa to
65dc4f9
Compare
| Children []*objectInfo | ||
| SHA1 plumbing.Hash | ||
|
|
||
| Content []byte |
There was a problem hiding this comment.
I think with an LRU cache for content by offset we can avoid using a lot of memory.
There was a problem hiding this comment.
We'd still need to read objects more times, which is why it was slower before
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
|
Renamed DiskObject to FSObject and I'm fixing the not closed packfile thing |
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
24dc247 to
d93b386
Compare
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
|
Tests should be passing now. |
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
|
I think it can be reviewed now |
| ) | ||
|
|
||
| const ( | ||
| fanout = 256 |
There was a problem hiding this comment.
the previous fanout was 255, was an error?
There was a problem hiding this comment.
Real fanout size is 256, we were using 255 because the last element is the object count. JGit uses 256 as well https://github.com/eclipse/jgit/blob/6d370d837c5faa7caff2e6e3e4723b887f2fbdca/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/PackIndexV2.java#L67
plumbing/format/idxfile/idxfile.go
Outdated
| Version uint32 | ||
| Fanout [256]uint32 | ||
| // FanoutMapping maps the position in the fanout table to the position | ||
| // in the Names, Offset32 and Crc32 slices. This improves the memory |
plumbing/format/idxfile/idxfile.go
Outdated
| FanoutMapping [256]int | ||
| Names [][]byte | ||
| Offset32 [][]byte | ||
| Crc32 [][]byte |
plumbing/format/idxfile/idxfile.go
Outdated
| Hash plumbing.Hash | ||
| CRC32 uint32 | ||
| Offset uint64 | ||
| func (idx *MemoryIndex) findHashIndex(h plumbing.Hash) int { |
There was a problem hiding this comment.
In go is recommended return a second boolean value
findHashIndex(h plumbing.Hash) (int, bool)
plumbing/format/idxfile/idxfile.go
Outdated
| return idx.getCrc32(k, i) | ||
| } | ||
|
|
||
| func (idx *MemoryIndex) getCrc32(firstLevel, secondLevel int) (uint32, error) { |
| firstLevel, secondLevel int | ||
| } | ||
|
|
||
| func (i *idxfileEntryIter) Next() (*Entry, error) { |
There was a problem hiding this comment.
this function can be split in two
There was a problem hiding this comment.
Which part do you mean? I don't see any way to split this
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
| } | ||
| } | ||
|
|
||
| if delta && !p.scanner.IsSeekable { |
There was a problem hiding this comment.
Shouldn't we use the storer to retrieve objects instead of storing all deltas in memory?
There was a problem hiding this comment.
Some storers, such as the MemoryStorage, do not allow deltas stored inside them
More info at: https://docs.google.com/document/d/1I2Vte4LhMdpx0kvlrg-k6hNA-0kUGtUkC1-kfwbCmeI