Since IdIndex is immutable, we don't need fast insertion provided by BTreeMap.
Let's simply use Vec for some speed up. More importantly, this allows us to
store multiple (ChangeId, CommitId) pairs for the same change id, and will
unblock the use of IdIndex in revset::resolve_symbol().
Some benchmark numbers (against my "linux" repo) follow.
Command:
hyperfine --warmup 3 "jj log -r master \
-T 'commit_id.short_prefix_and_brackets()' \
--no-commit-working-copy --no-graph"
Original:
Time (mean ± σ): 1.892 s ± 0.031 s [User: 1.800 s, System: 0.092 s]
Range (min … max): 1.833 s … 1.935 s 10 runs
This commit:
Time (mean ± σ): 867.5 ms ± 2.7 ms [User: 809.9 ms, System: 57.7 ms]
Range (min … max): 862.3 ms … 871.0 ms 10 runs
With my "jj" work repo, this saves ~4ms to show the log with default revset.
Command:
JJ_CONFIG=/dev/null hyperfine --warmup 3 --runs 100 \
"jj log -T 'commit_id.short_prefix_and_brackets() \
change_id.short_prefix_and_brackets()' \
--no-commit-working-copy"
Baseline (a7541e1ba4):
Time (mean ± σ): 54.1 ms ± 16.4 ms [User: 46.4 ms, System: 7.8 ms]
Range (min … max): 36.5 ms … 78.1 ms 100 runs
This commit:
Time (mean ± σ): 49.5 ms ± 16.4 ms [User: 42.4 ms, System: 7.2 ms]
Range (min … max): 31.4 ms … 70.9 ms 100 runs
This iterator will be used to merge neighbor commit ids across segments.
resolve_prefix() is simplified to non-short-circuiting loop. I think that's
fine because visiting parents is cheap, and the costly operation here is
segment_resolve_prefix().
entry_by_pos() could also be migrated to iterator, but I leave the unsafe
bits there.
ReadonlyIndex implementation leverages the existing binary search
function. MutableIndex one is basically the same as repo::IdIndex.
Shortest prefix length could be calculated for each segment, but I think
returning neighbors is better for testing.
This is ugly, but we need a special case because root_change_id and
root_commit_id aren't equal but share the same prefix bytes. In practice,
no one would care for the shortest root id prefix, but we'll need to deal
with a similar problem when migrating prefix id resolution to repo layer.
This helps us to migrate commit_id index to ReadonlyIndex. For large
repositories, this also reduces initialization cost, but that's not the main
intent of this change.
https://github.com/martinvonz/jj/pull/1041#issuecomment-1399225876
common_hex_len() and iter_half_bytes() are added to backend.rs since more
call sites will be added to index.rs, and I feel index.rs isn't a good place
to host this kind of utility functions.
I made it a free function. Alternatively, the root id could be instantiated
by and obtained through backend, but I don't think we'll need such level of
abstraction.
I'm going to add a workaround for shortest prefix calculation of the root ids,
where this function will be used.
Make op resolution a closed operation, powered by a callback provided by the
caller which runs under an internal lock scope. This allows for greatly
simplifying the internal lifetime structuring.
If commit_id[..prefix_len] < prefix, commit_id < prefix is obviously true.
If commit_id[..prefix_len] == prefix, commit_id < prefix returns false. So
slicing isn't needed.
This makes commit_id_byte_prefix_to_pos() basically the same as
segment_commit_id_to_pos(), and these two functions can be merged.
matches() is called from resolve_change_id() loop right now, so it's better to
not allocate String there. Regarding new IdIndex integration, I'll probably make
IdIndex store raw byte ids instead of hexes, and use HexPrefix to look up
range and test prefixes. I think this is basically the same as prefix lookup
in MutableIndex, but I have no idea if we can factor out a common interface.
I made HexPrefix store (Vec<u8>, bool) instead of (Vec<u8>, Option<u8>) so
both min/partial prefixes can be borrowed as slice.
By inlining `wite_commit_internal()` into `write_commit()`, we can
avoid redoing some steps when we retry. This includes taking the mutex
lock, and reading the tree object and parent commits. It also means
that we avoid cloning the input commit object, which we otherwise
would even in the non-retrying case. I haven't measured if any of this
makes a significant difference, but I think it also slightly
simplifies the code, so it doesn't have to.
This is fast enough to be used on medium-sized repositories such as git/git.
It is a bit slow, but bearable, on huge repositories such as torvalds/linux.
There is 0 performance penalty if the display of unique prefixes is disabled
A trie-based implementation will be submitted for consideration in a
follow-up PR. It is faster, but more complicated.
**Update:** I also just discovered https://sapling-scm.com/docs/internals/indexedlog/
There are three important aspects of performance that seemed relevant:
1. Speed of computing the shortest unique prefix per id. It is worlds faster
than the naive implementation before this commit. It can be optimized
furher by using a trie or maybe the `fst` crate.
2. Speed of inital loading of the index that happens before the first commit is
shown. This is the part that's noticeable but bearable on torvalds/linux.
This could be optimized by storing a sorted list of commit and change ids on
disk. This would likely involve reworking the `Index`.
Failing that, the speed of inital loading doesn't change if a trie is used
and would likely be worse with the `fst` crate
3. Memory use is unremarkable here. I don't have good tools to measure it
precisely, but it does not balloon to gigabytes even on the linux repo.
This creates a templater function `short_underscore_prefix` for commit and
change ids. It is similar to `short` function, but shows one fewer hexadecimal
digit and inserts an underscore after the shortest unique prefix.
Highlighting with an underline and perhaps color/bold will be in a follow-up
PR.
The implementation is quadratic, a simple comparison of each id with every
other id. It is replaced in a subsequent commit. The problem with it is that,
while it works fine for a `jj`-sized repo, it becomes is painfully slow with a
repo the size of git/git.
Still, this naive implemenation is included here since it's simple, and could
be used as a reference implementation.
The `shortest_unique_prefix_length` function goes into `repo.rs` since that's
convenient for follow-up commits in this PR to have nicer diffs.
Fixes https://github.com/martinvonz/jj/issues/1050
Thanks to Martin for suggesting the exact fix.
The tests go into the new tests/test_duplicate_command.rs, which will be
expanded shortly with other tests depending on this bugfix.
You may use "abc\\" in .gitignore to ignore a file named "abc\". In this
case, removing training spaces on "abc\\ " must result in "abc\\" as the
trailing space is not escaped, the preceeding backslash being part of
the previous "\\" escaping sequence.
- branches has the signature branches([needle]), meaning the needle is optional (branches() is equivalent to branches("")) and it matches all branches whose name contains needle as a substring
- remote_branches has the signature remote_branches([branch_needle[, remote_needle]]), meaning it can be called with no arguments, or one argument (in which case, it's similar to branches), or two arguments where the first argument matches branch names and the second argument matches remote names (similar to branches, remote_branches(), remote_branches("") and remote_branches("", "") are all equivalent)
Dereferencing `self` as `*self` in order to perform patten-matching
using `ref` is unnecessary and will be done automatically by the
compiler (match ergonomics, introduced in Rust 1.26).
We don't care the ref content as long as it is unique, so using threaded
RNG should be fine.
This change means refs/jj/keep will now contain refs of the following
forms:
- new create_no_gc_ref(): 0f8d6cd9721823906cfb55dac99d7bf5
- old create_no_gc_ref(): 0f6d93fe-0507-4db8-ad0a-6317f02e27b9
- prevent_gc(commit_id): 0f9c15100b6f1373f38186357e274a829fb6c4e2