Some benchmark numbers in the following order:
- original
- fresh repo, BatchSize::SmallInput
- fresh but index-preloaded repo, BatchSize::SmallInput
- fresh but index-preloaded repo, BatchSize::LargeInput
% cargo run --release --features bench -- bench revset ':main'
revsets/:main time: [271.49 µs 271.74 µs 272.07 µs]
revsets/:main time: [754.17 µs 758.22 µs 764.02 µs]
revsets/:main time: [367.11 µs 372.65 µs 381.99 µs]
revsets/:main time: [341.76 µs 342.98 µs 344.35 µs]
% cargo run --release --features bench -- bench revset 'author(martinvonz)'
revsets/author(martinvonz)
time: [767.43 µs 770.52 µs 775.59 µs]
revsets/author(martinvonz)
time: [31.960 ms 31.984 ms 32.011 ms]
revsets/author(martinvonz)
time: [31.478 ms 31.538 ms 31.615 ms]
revsets/author(martinvonz)
time: [31.503 ms 31.526 ms 31.550 ms]
I think the fresh but index-preloaded repo is close to the practical
evaluation environment. With BatchSize::SmallInput, it appears to consume
~600MB (RES) memory (compared to ~50MB in LargeInput.) I don't think that's
huge, but it might affect cache usage, so I chose LargeInput.
https://docs.rs/criterion/latest/criterion/enum.BatchSize.html
Inline diffs on multi-byte UTF-8 characters would match individual
bytes, causing garbled diffs in some cases. For example, replacing
`⊢` with `⊣`, which differ in the final byte only, caused the
diff to display a diff of the bytes instead the character.
This commit uses a workaround present in Mercurial by treating all
bytes 0x80 and above as word characters, causing any multi-byte
character to be treated as a word and not segmented.
https://www.mercurial-scm.org/repo/hg/file/6.3.3/mercurial/patch.py#l51
This command is similar to Mercurial's revset benchmarking command. It
lets you pass in a file containing revsets. I also included a file
with some revsets to test on the git.git repo. I put it in `testing/`,
which doesn't seem perfect. I'm happy to hear suggestions for better
places, or we can move it later if we find a better place.
Note that these tests don't clear caches between each run (or even
between tests), so revsets that rely on filtering commit data that's
not indexed appear faster than they typically are in reality.
I suspect the `jj bench walkrevs` command was from before we had
support for revsets. Now there doesn't seem to be any reason to have a
specific command for only range revsets (`foo..bar`), so let's replace
it by a command for benchmarking an arbitrary revset.
The `jj bench` commands are mostly meant for developers, so lets hide
the command from help and behind a `bench` feature flag. The feature
flags avoids bloating the binary with the `criterion` dependencies,
which was the reason I removed the command in 18c0b97d9d.
This just backs out commit 18c0b97d9d without making any changes,
except for resolving conflicts.
I want a way to benchmark different revsets on e.g. the Git Core repo
or the Linux repo.
This serves the role of limit() in Mercurial. Since revsets in JJ is
(conceptually) an unordered set, a "limit" predicate should define its
ordering criteria. That's why the added predicate is named as "latest".
Closes#1110
There are no remaining places where we iterate over a revset and need
the `IndexEntry`s, so we can now make `Revset::iter()` yield
`CommitId`s instead.
I'm about to make `Revset::iter()` yield just `CommitId`s, but the
tests in `test_default_revset_graph_iterator.rs` need an `IndexEntry`
iterator so they can pass it into `RevsetGraphIterator::new()`. This
commits prepares for the change by adding a
`RevsetImpl::iter_graph_impl()` that returns `RevsetGraphIterator`,
keeping `InternalRevset` still hidden within the revset engine. We
could instead have made that (and `ToPredicateFn`) visible to tests. I
can't say which is better.
I don't know if we ever resolve revsets in a mutable repo, but now
that we can get a change id index from a revset, it's easier to
implement this functionality that way.
This replaces the direct use of `IdIndex` in `ReadonlyRepo` by use of
`Revset::change_id_index()`.
I made the `Index` trait require `Send` and `Sync` in order to be able
to store an instance of it in `ReadonlyRepo` (via `ChangeIdIndex`) and
still have that be `Send` and `Sync`. We could alternatively store the
`ChangeIdIndex` in a `Mutex`. Now that will be up to the
`ChangeIdIndex` instead.
One of the remaining places we depend on index positions is when
creating a `ChangeIdIndex`. This moves that into the revset engine
(which is coupled to the commit index implementation) by adding a
`Revset::change_id_index()` method. We will also use this function
later when add support for resolving change id prefixes within a small
revset.
The current implementation simply creates an in-memory index using the
existing `IdIndex` we have in `repo.rs`.
The custom implementation at Google might do the same for small
revsets that are available on the client, but for revsets involving
many commits on the server, it might use a suboptimmal implementation
that uses longer-than-necessary prefixes for performance reasons. That
can be done by querying a server-side index including changes not in
the revset, and then verifying that the resulting commits are actually
in the revset.