We're likely to use the right (or new) context lines in rendered diffs, but
it's odd that the hunks iterator choose which context hunk to return. We'll
also need both contents to calculate left/right line numbers.
Since the hunk content types are the same, I also split enum DiffHunk into
{ kind, contents } pair.
This change made some diff benches slow, maybe because the generated code
becomes slightly worse due to the added abstraction? I'll revisit the
performance problem later. There are a couple of ways to mitigate it.
```
group new old
----- --- ---
bench_diff_git_git_read_tree_c 1.02 61.0±0.23µs 1.00 59.7±0.38µs
bench_diff_lines/modified/10k 1.00 41.6±0.24ms 1.02 42.3±0.22ms
bench_diff_lines/modified/1k 1.00 3.8±0.07ms 1.00 3.8±0.03ms
bench_diff_lines/reversed/10k 1.29 23.4±0.20ms 1.00 18.2±0.26ms
bench_diff_lines/reversed/1k 1.05 517.2±5.55µs 1.00 493.7±59.72µs
bench_diff_lines/unchanged/10k 1.00 3.9±0.10ms 1.08 4.2±0.10ms
bench_diff_lines/unchanged/1k 1.01 356.8±2.33µs 1.00 353.7±1.99µs
```
(I don't get stable results on my noisy machine, so the results would vary.)
The added comparison functions correspond to --ignore-all-space and
--ignore-space-change. --ignore-space-at-eol can be combined with the other
flags, so it will have to be implemented as a preprocessing function.
--ignore-blank-lines will also require some change in the tokenizer function.
This could be implemented as a newtype `Wrapper<'a>(&'a [u8])`, but a lifetime
of the wrap function couldn't be specified correctly:
fn diff(left: &[u8], right: &[u8], wrap_fn: F, ..)
where
F: for<'a> Fn(&'a [u8]) -> W<'a>, // F::Output<'a> can't be specified
W: Copy + Eq + Hash
If the wrapper were of `&Wrapper([u8])` type, `Fn(&[u8]) -> &W` works. However,
it means we can no longer set comparison parameter (such as Regex) dynamically.
Another idea is to add some filter function of `Fn(&[u8]) -> Cow<'_, [u8]>`
type, but I don't think we would want to pay the allocation cost in
hashing/comparison code. `Fn(&[u8]) -> impl Iterator<Item = &[u8]>` might work,
but it would be equally complex.
We'll use low-level HashTable to customize Eq/Hash without implementing newtype
wrappers.
Unneeded default features are disabled for now. Note that the new default
hasher, foldhash, is released under the Zlib license, which isn't currently
included in the allow list.
Most collection references implement `.into_iter()` or its mutable version,
so it is possible to iterate over the elements without using an explicit
method to do so.
We have two options to achieve "diff --ignore-*-space":
a. preprocess contents to be diffed, then translate hunk ranges back
b. add hooks to customize eq and hash functions
I originally thought (a) would be easier, but actually, there aren't many
changes needed to implement (b). And (b) should have a fewer logic errors.
This patch removes assumption that each unchanged region has the same content
length. It won't be true if whitespace characters are ignored.
Intersection of unchanged ranges becomes a simple merge-join loop, so I've
removed the existing tests. I also added a fast path for the common 2-way
diffs in which we don't have to build vec![(pos, vec![pos])].
One source of confusion introduced by this change is that WordPosition means
both global and local indices. This is covered by the added tests, but I might
add separate local/global types later.
It's silly that we build new Vec for each recursion stack and merge elements
back. I don't see a measurable performance difference in the diff bench, but
this change will help simplify the next patch. If a result vec were created for
each unchanged_ranges() invocation, it would probably make more sense to return
a list of "local" word positions. Then, callers would have to translate the
returned positions to the caller's local positions.
We can assign a unique integer to each (word, occurrence) pair instead. As a
bonus, HashMap can be replaced with Vec.
```
group new old
----- --- ---
bench_diff_git_git_read_tree_c 1.00 72.5±3.25µs 1.08 78.5±0.48µs
bench_diff_lines/modified/10k 1.00 45.1±1.18ms 1.10 49.8±1.85ms
bench_diff_lines/modified/1k 1.00 4.1±0.07ms 1.11 4.5±0.34ms
bench_diff_lines/reversed/10k 1.00 19.0±0.12ms 1.12 21.2±1.26ms
bench_diff_lines/reversed/1k 1.00 558.5±37.42µs 1.17 655.6±16.27µs
bench_diff_lines/unchanged/10k 1.00 5.3±0.78ms 1.33 7.0±0.89ms
bench_diff_lines/unchanged/1k 1.00 422.0±16.68µs 1.28 540.7±13.96µs
```
I'll add a few more helper methods there. It might also make sense to cache
precomputed hash values.
unchanged_ranges() is made private since there are no external callers, and
I'm going to add more private types.
See discussion thread in linked issue.
With this PR, all revset functions in [BUILTIN_FUNCTION_MAP](8d166c7642/lib/src/revset.rs (L570))
that return multiple values are either named in plural or the naming is hard to misunderstand (e.g. `reachable`)
Fixes: #4122
Stacking at AliasExpanded node looks wonky. If we migrate error handling to
Diagnostics API, it might make sense to remove AliasExpanded node and add
node.aliases: vec![(id, span), ..] field instead.
Some closure arguments are inlined in order to help type inference.
I recently got random test_commit_parallel*() failures, and this patch appears
to fix the problem.
SimpleOpHeadsStore::update_op_heads() adds new_id file and removes old_ids
files in that order. It ensures that there exists at least one id file, but it
doesn't mean readdir() can observe the added id file.
1. process A: get_op_heads() -> opendir()
2. process B: update_op_heads() -> add_op_head(new_id), remove_op_head(old_id)
3. process A: -> readdir() (can miss new_id)
update_op_heads() could do rename(old_ids[0], new_id), but I don't remember if
readdir() can always pick up a renamed entry.
I think it's slightly easier to follow if we calculate a diff of input bits
first. I don't know which one is faster, but I assume compiler can optimize to
similar instructions.
Comparing each byte before comparing the nibbles is more efficient. A
benchmark comparing the old and new implementations with various common
prefix lengths shows:
```
Common hex len/old/3 time: [7.5444 ns 7.5807 ns 7.6140 ns]
Common hex len/new/3 time: [1.2100 ns 1.2144 ns 1.2192 ns]
Common hex len/old/6 time: [11.849 ns 11.879 ns 11.910 ns]
Common hex len/new/6 time: [1.9950 ns 2.0046 ns 2.0156 ns]
Common hex len/old/32 time: [63.030 ns 63.345 ns 63.718 ns]
Common hex len/new/32 time: [6.4647 ns 6.4800 ns 6.4999 ns]
```
Jujutsu's branches do not behave like Git branches, which is a major
hurdle for people adopting it from Git. They rather behave like
Mercurial's (hg) bookmarks.
We've had multiple discussions about it in the last ~1.5 years about this rename in the Discord,
where multiple people agreed that this _false_ familiarity does not help anyone. Initially we were
reluctant to do it but overtime, more and more users agreed that `bookmark` was a better for name
the current mechanism. This may be hard break for current `jj branch` users, but it will immensly
help Jujutsu's future, by defining it as our first own term. The `[experimental-moving-branches]`
config option is currently left alone, to force not another large config update for
users, since the last time this happened was when `jj log -T show` was removed, which immediately
resulted in breaking users and introduced soft deprecations.
This name change will also make it easier to introduce Topics (#3402) as _topological branches_
with a easier model.
This was mostly done via LSP, ripgrep and sed and a whole bunch of manual changes either from
me being lazy or thankfully pointed out by reviewers.