Skip to content

Add support for pointing indexes in ocaml-index#2051

Open
Tim-ats-d wants to merge 9 commits intoocaml:mainfrom
Tim-ats-d:pointing-indexes
Open

Add support for pointing indexes in ocaml-index#2051
Tim-ats-d wants to merge 9 commits intoocaml:mainfrom
Tim-ats-d:pointing-indexes

Conversation

@Tim-ats-d
Copy link
Copy Markdown
Contributor

This PR adds support for pointing to an index file that already exists on disk.

This mechanism results in significantly smaller index files and fewer write operations.

Copy link
Copy Markdown
Contributor

@art-w art-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@voodoos raised another issue: Since the indexes are not self-contained anymore, it's possible that a pointing index targets another index file that has been removed or updated. If we don't check for that and follow pointers to arbitrary locations, some very bad things could happen!

@Tim-ats-d
Copy link
Copy Markdown
Contributor Author

@voodoos raised another issue: Since the indexes are not self-contained anymore, it's possible that a pointing index targets another index file that has been removed or updated. If we don't check for that and follow pointers to arbitrary locations, some very bad things could happen!

I think I could make some kind of defensive programming and simply fail if the pointed-to index file no longer exists. What do you think?

@Tim-ats-d Tim-ats-d force-pushed the pointing-indexes branch 2 times, most recently from 0908c3a to 8032c54 Compare April 1, 2026 13:21
Copy link
Copy Markdown
Contributor

@art-w art-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking good! I only have some minor requests to improve the id checking logic :)

let fetch_id filename =
In_channel.with_open_bin filename (fun fd ->
seek_in fd (String.length Config.index_magic_number);
int_of_binstring (really_input_string fd ptr_size))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks wrong to use with_open here, instead of checking an already open file descriptor: we have no guarantee that the file won't change in between the next time we open it to actually read from it. We can assume that the file contents will only be updated by creating a fresh new file (which guarantees a previously opened fd still see the old contents), so if we have checked a file descriptor there's no risk of a race :)

| In_memory _ | In_memory_reused _ | On_disk _ | Duplicate _ -> ()
| On_disk_ptr { filename; loc; id } ->
let pointed_index_id = fetch_id filename in
if pointed_index_id = id then
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too eager: We would want to delay this check in fetch when we have no choice but to read from the file (otherwise we are checking the id too early and the file may not have the same id when we re-open it to read it during fetch)

@Tim-ats-d
Copy link
Copy Markdown
Contributor Author

Tim-ats-d commented Apr 3, 2026

Here are some benchmark results from Hyperfine I got running ocaml-index on a bunch of ocaml-index files generated from Merlin repo.

Without pointing index:

Time (mean ± σ):      2.425 s ±  0.087 s    [User: 1.968 s, System: 0.395 s]
Range (min … max):    2.313 s …  2.553 s    10 runs

With pointing-index enabled:

Time (mean ± σ):     896.2 ms ±  42.5 ms    [User: 640.6 ms, System: 245.2 ms]
Range (min … max):   840.6 ms … 972.3 ms    10 runs

Complete command I used:

ocaml-index aggregate dot_merlin_reader.ocaml-index merlin_analysis.ocaml-index merlin_commands.ocaml-index merlin_config.ocaml-index merlin_dot_protocol.ocaml-index merlin_extend.ocaml-index merlin_index_format.ocaml-index merlin_kernel.ocaml-index merlin_specific.ocaml-index merlin_utils.ocaml-index ocaml-index.ocaml-index ocaml_compression.ocaml-index ocaml_parsing.ocaml-index ocaml_preprocess.ocaml-index ocaml_typing.ocaml-index ocaml_utils.ocaml-index ocamlmerlin_server.ocaml-index os_ipc.ocaml-index query_commands.ocaml-index query_protocol.ocaml-index  --root . --rewrite-root

@Tim-ats-d
Copy link
Copy Markdown
Contributor Author

Tim-ats-d commented Apr 3, 2026

Concerning reading performance, here are some other metrics. I ran ocaml-index dump on a project ocaml-index file aggregating the same files used on the previous benchmark.

With pointing-index enabled:

Time (mean ± σ):      3.990 s ±  0.076 s    [User: 3.372 s, System: 0.581 s]
  Range (min … max):    3.909 s …  4.129 s    10 runs

Without:

Time (mean ± σ):      3.803 s ±  0.141 s    [User: 3.417 s, System: 0.356 s]
  Range (min … max):    3.628 s …  4.092 s    10 runs

So reading performances are mostly similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants