Suppose the input list is presorted, sorting a sorted vec would be cheaper
than .insert()-ing sorted items one by one.
In my "linux" repo (watchman eanbled):
- jj-0: baseline
- jj-1: previous (don't randomize by HashMap)
- jj-2: this
% hyperfine --sort command --warmup 3 --runs 10 -L bin jj-0,jj-1,jj-2 \
"target/release-with-debug/{bin} -R ~/mirrors/linux status"
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux status
Time (mean ± σ): 1.034 s ± 0.020 s [User: 0.881 s, System: 0.212 s]
Range (min … max): 1.011 s … 1.068 s 10 runs
Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux status
Time (mean ± σ): 849.3 ms ± 13.8 ms [User: 710.7 ms, System: 199.3 ms]
Range (min … max): 821.7 ms … 870.2 ms 10 runs
Benchmark 3: target/release-with-debug/jj-2 -R ~/mirrors/linux status
Time (mean ± σ): 786.2 ms ± 16.7 ms [User: 650.7 ms, System: 204.1 ms]
Range (min … max): 760.8 ms … 805.2 ms 10 runs
Relative speed comparison
1.32 ± 0.04 target/release-with-debug/jj-0 -R ~/mirrors/linux status
1.08 ± 0.03 target/release-with-debug/jj-1 -R ~/mirrors/linux status
1.00 target/release-with-debug/jj-2 -R ~/mirrors/linux status
According to the doc, this is compatible with the map syntax.
https://protobuf.dev/programming-guides/proto3/#maps
This change means that the serialized file states are sorted by RepoPath,
so BTreeMap<RepoPath, _> can be reconstructed with fewer cache misses.
In my "linux" repo (watchman enabled):
- jj-0: baseline
- jj-1: this
% hyperfine --sort command --warmup 3 --runs 10 -L bin jj-0,jj-1,jj-2 \
"target/release-with-debug/{bin} -R ~/mirrors/linux status"
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux status
Time (mean ± σ): 1.034 s ± 0.020 s [User: 0.881 s, System: 0.212 s]
Range (min … max): 1.011 s … 1.068 s 10 runs
Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux status
Time (mean ± σ): 849.3 ms ± 13.8 ms [User: 710.7 ms, System: 199.3 ms]
Range (min … max): 821.7 ms … 870.2 ms 10 runs
Relative speed comparison
1.32 ± 0.04 target/release-with-debug/jj-0 -R ~/mirrors/linux status
1.08 ± 0.03 target/release-with-debug/jj-1 -R ~/mirrors/linux status
Cache-misses got reduced:
% perf stat -e task-clock,cycles,instructions,cache-references,cache-misses \
-- ./target/release-with-debug/jj-0 -R ~/mirrors/linux --no-pager status
1,091.68 msec task-clock # 1.032 CPUs utilized
4,179,596,978 cycles # 3.829 GHz
6,166,231,489 instructions # 1.48 insn per cycle
134,032,047 cache-references # 122.776 M/sec
29,322,707 cache-misses # 21.88% of all cache refs
1.057474164 seconds time elapsed
0.897042000 seconds user
0.194819000 seconds sys
% perf stat -e task-clock,cycles,instructions,cache-references,cache-misses \
-- ./target/release-with-debug/jj-1 -R ~/mirrors/linux --no-pager status
927.05 msec task-clock # 1.083 CPUs utilized
3,451,299,198 cycles # 3.723 GHz
6,222,418,272 instructions # 1.80 insn per cycle
98,499,363 cache-references # 106.251 M/sec
11,998,523 cache-misses # 12.18% of all cache refs
0.855938336 seconds time elapsed
0.720568000 seconds user
0.207924000 seconds sys
The `scm-record` library comments say that the `file_mode` is:
> The Unix file mode of the file (before any changes), if available.
This reverts commit ffd6884 and fixes#2591 and #2548.
The idea is the same as the heads_pos() change in 9832ee205d. While
IndexEntry::position() should be cheap, saving 20 bytes per entry appears to
improve the performance in mid-size repos.
In my "linux" repo:
revsets/all()
-------------
baseline 1.24 156.0±1.06ms
this 1.00 126.0±0.51ms
I don't see significant difference in small-sized repos like "jj" or "git".
IndexEntryByPosition isn't removed since it's still used by the revset engine.
This removes the last use of `ouroboros`. Since `TreeEntriesDirItem`
is only used in "legacy trees" (before tree-level conflicts), I didn't
bother to check the performance impact. I also didn't bother to check
the matcher before adding the entries to the list, instead leaving
that where it is in `Iterator::next()`.
This removes the last use of `ouroboros` in `merged_tree.rs`. The set
of conflicts to iterate is usually so small that I didn't bother
checking the performance impact.
While importing the `ouroboros` crate and the `aliasable` crate it
depends on, the "unsafe Rust reviewer" expressed some concern that
they contain a lot of unsafe code that's hard to review. We can avoid
the unsafe code altogether by making `TreeEntriesIterator` not
self-refential. Instead, we can collect the matching entries in an
individual tree up front. It does have some performance cost:
```
❯ hyperfine --warmup 3 --runs 30 \
'/tmp/jj-before --ignore-working-copy files -r v6.0' \
'/tmp/jj-after --ignore-working-copy files -r v6.0'
Benchmark 1: /tmp/jj-before --ignore-working-copy files -r v6.0
Time (mean ± σ): 461.4 ms ± 14.3 ms [User: 232.1 ms, System: 229.4 ms]
Range (min … max): 443.4 ms … 496.3 ms 30 runs
Benchmark 2: /tmp/jj-after --ignore-working-copy files -r v6.0
Time (mean ± σ): 482.0 ms ± 14.3 ms [User: 257.2 ms, System: 224.9 ms]
Range (min … max): 461.8 ms … 513.3 ms 30 runs
Summary
'/tmp/jj-before --ignore-working-copy files -r v6.0' ran
1.04 ± 0.04 times faster than '/tmp/jj-after --ignore-working-copy files -r v6.0'
```
I think that's acceptable.
This is much faster (maybe because of better cache locality?) Another option
is to use BTreeSet, but the BinaryHeap version is slightly faster.
"bench revset" result in my linux repo:
revsets/heads(tags())
---------------------
baseline 3.28 560.6±4.01ms
1 2.92 500.0±2.99ms
2 1.98 339.6±1.64ms
3 (this) 1.00 171.2±0.30ms
Apparently, IndexEntry::generation_number() isn't cheap probably because it
involves random access to larger memory region, and the u32 value might not
be aligned. Let's instead store the generation numbers in BinaryHeap.
Also, heads_pos() becomes slightly faster by keeping the BinaryHeap entries
small, so I've removed the IndexEntry at all.
This makes the default log and disambiguation revsets fast, which evaluate
'heads(immutable_heads())'.
"bench revset" result in my linux repo:
revsets/heads(tags())
---------------------
baseline 3.28 560.6±4.01ms
1 2.92 500.0±2.99ms
2 (this) 1.98 339.6±1.64ms
All callers just iterate over the parent entries.
"bench revset" result in my linux repo:
revsets/heads(tags())
---------------------
baseline 3.28 560.6±4.01ms
1 (this) 2.92 500.0±2.99ms
For loose refs, uninteresting directories can be just skipped. For packed refs,
gix will have to do binary search for each prefix to find the starting point.
Still it's better overall if the repository contains tons of refs/jj/keep refs.
With my linux repo containing ~5k loose jj refs, this saves ~40ms:
% hyperfine --warmup 3 --runs 10 \
"/tmp/jj-gix --ignore-working-copy git import -R ~/mirrors/linux" \
"/tmp/jj-gix-iter --ignore-working-copy git import -R ~/mirrors/linux"
Benchmark 1: /tmp/jj-gix --ignore-working-copy git import -R ~/mirrors/linux
Time (mean ± σ): 151.6 ms ± 11.4 ms [User: 38.8 ms, System: 111.6 ms]
Range (min … max): 129.8 ms … 159.5 ms 10 runs
Benchmark 2: /tmp/jj-gix-iter --ignore-working-copy git import -R ~/mirrors/linux
Time (mean ± σ): 109.9 ms ± 11.6 ms [User: 27.5 ms, System: 82.4 ms]
Range (min … max): 89.4 ms … 117.8 ms 10 runs
Gitoxide errors are boxed since there are various error types and they tend
to exceed the clippy size limit.
Apparently, gitoxide is faster than git2:
% hyperfine --warmup 3 --runs 10 \
"/tmp/jj-baseline --ignore-working-copy git import -R ~/mirrors/linux" \
"/tmp/jj-gix --ignore-working-copy git import -R ~/mirrors/linux"
Benchmark 1: /tmp/jj-baseline --ignore-working-copy git import -R ~/mirrors/linux
Time (mean ± σ): 205.4 ms ± 15.7 ms [User: 59.6 ms, System: 144.6 ms]
Range (min … max): 189.7 ms … 223.9 ms 10 runs
Benchmark 2: /tmp/jj-gix --ignore-working-copy git import -R ~/mirrors/linux
Time (mean ± σ): 176.2 ms ± 13.7 ms [User: 41.2 ms, System: 134.0 ms]
Range (min … max): 155.4 ms … 186.5 ms 10 runs
If a commit pointed to by HEAD or ref is missing, the ref is considered
invalid and excluded by import_refs(). The current test behavior appears to
depend on some in-memory cache of git2::Repository.
A coworker was having trouble finding a link to the jj Discord community. Add it to the top of the document, alongside the other badges, so it's easier to find.