Add support for pointing indexes in ocaml-index#2051
Add support for pointing indexes in ocaml-index#2051Tim-ats-d wants to merge 9 commits intoocaml:mainfrom
Conversation
art-w
left a comment
There was a problem hiding this comment.
@voodoos raised another issue: Since the indexes are not self-contained anymore, it's possible that a pointing index targets another index file that has been removed or updated. If we don't check for that and follow pointers to arbitrary locations, some very bad things could happen!
I think I could make some kind of defensive programming and simply fail if the pointed-to index file no longer exists. What do you think? |
0908c3a to
8032c54
Compare
8032c54 to
bc62cca
Compare
art-w
left a comment
There was a problem hiding this comment.
It's looking good! I only have some minor requests to improve the id checking logic :)
| let fetch_id filename = | ||
| In_channel.with_open_bin filename (fun fd -> | ||
| seek_in fd (String.length Config.index_magic_number); | ||
| int_of_binstring (really_input_string fd ptr_size)) |
There was a problem hiding this comment.
It looks wrong to use with_open here, instead of checking an already open file descriptor: we have no guarantee that the file won't change in between the next time we open it to actually read from it. We can assume that the file contents will only be updated by creating a fresh new file (which guarantees a previously opened fd still see the old contents), so if we have checked a file descriptor there's no risk of a race :)
| | In_memory _ | In_memory_reused _ | On_disk _ | Duplicate _ -> () | ||
| | On_disk_ptr { filename; loc; id } -> | ||
| let pointed_index_id = fetch_id filename in | ||
| if pointed_index_id = id then |
There was a problem hiding this comment.
This is too eager: We would want to delay this check in fetch when we have no choice but to read from the file (otherwise we are checking the id too early and the file may not have the same id when we re-open it to read it during fetch)
|
Here are some benchmark results from Hyperfine I got running Without pointing index: With pointing-index enabled: Complete command I used: |
|
Concerning reading performance, here are some other metrics. I ran With pointing-index enabled: Without: So reading performances are mostly similar. |
This PR adds support for pointing to an index file that already exists on disk.
This mechanism results in significantly smaller index files and fewer write operations.