This change adds a `non_obsolete_heads(<set>)` revset, which walks up
ancestors of the input set until it gets to a non-obsolete and
non-pruned commit. That's what we do by default in `jj log`
(i.e. without `--all`). Now we can make `jj log` use revsets and teach
it a `-r` option!
This adds `parents(foo)` and `ancestors(foo)` as alternative ways of
writing `:foo` and `*:foo`.
I haven't added support for for whitespace yet; the parsing is very
strict. The error messages will also need to be improved later.
This patch adds initial support for a DSL for specifying revisions
inspired by Mercurial's "revset" language. The initial support
includes prefix operators ":" (parents) and "*:" (ancestors) with
naive parsing of the revsets. Mercurial uses postfix operator "^" for
parent 1 just like Git does. It uses prefix operator "::" for
ancestors and the same operator as postfix operator for descendants. I
did it differently because I like the idea of using the same operator
as prefix/postfix depending on desired direction, so I wanted to apply
that to parents/children as well (and for
predecessors/successors). The "*" in the "*:" operator is copied from
regular expression syntax. Let's see how it works out. This is an
experimental VCS, after all.
I've updated the CLI to use the new revset support.
The implementation feels a little messy, but you have to start
somewhere...
This actually seems to make it slightly slower, but it fixes an
important bug (we used to evolve only one topological branch per `jj
evolve` call). The slowdown seemed to be on the order of 5% when
evolving 100 commits on git.git's "what's cooking" branch.
I suspect that at least one reason that I didn't make
`MutableRepo::base_repo` by an `Arc<ReadonlyRepo>` before was that I
thought that that would mean that `start_transaction()` would need be
moved off of `ReadonlyRepo` so it can be given an
`&Arc<ReadonlyRepo>`, which would make it much less convenient to
use. It turns out that a `self` argument can actually be of type
`&Arc<ReadonlyRepo>`.
See test case for details.
Before:
test bench_diff_10k_lines_reversed ... bench: 36,249,659 ns/iter (+/- 174,455)
test bench_diff_10k_modified_lines ... bench: 37,258,890 ns/iter (+/- 803,963)
test bench_diff_10k_unchanged_lines ... bench: 4,252 ns/iter (+/- 69)
test bench_diff_1k_lines_reversed ... bench: 982,834 ns/iter (+/- 6,467)
test bench_diff_1k_modified_lines ... bench: 3,343,469 ns/iter (+/- 23,243)
test bench_diff_1k_unchanged_lines ... bench: 231 ns/iter (+/- 2)
test bench_diff_git_git_read_tree_c ... bench: 95,559 ns/iter (+/- 816)
After:
test bench_diff_10k_lines_reversed ... bench: 36,186,715 ns/iter (+/- 196,903)
test bench_diff_10k_modified_lines ... bench: 37,511,000 ns/iter (+/- 1,370,476)
test bench_diff_10k_unchanged_lines ... bench: 3,099 ns/iter (+/- 8)
test bench_diff_1k_lines_reversed ... bench: 986,010 ns/iter (+/- 11,565)
test bench_diff_1k_modified_lines ... bench: 3,370,938 ns/iter (+/- 17,041)
test bench_diff_1k_unchanged_lines ... bench: 230 ns/iter (+/- 2)
test bench_diff_git_git_read_tree_c ... bench: 102,189 ns/iter (+/- 1,052)
So this patch makes diffing even slower (but still easily fast enough
for all cases I've run into in real life). There's probably a lot that
can be done to make things faster, but the first priority is that the
diffs are correct and easy to read.
This is yet another step towards making it easy to propagate
`BrokenPipe` errors. The `jj diff` code (naturally) diffs two trees
and prints the diffs. If the printing fails, we shouldn't just crash
like we do today.
The new code is probably slower since it does more copying (the
callback got references to the `FileRepoPath` and `TreeValue`). I hope
that won't make a noticeable difference. At least `jj diff -r
334afbc76fbd --summary` didn't seem to get measurably slower.
The iterator version is easier to use and we get rid of the ugly type
parameter for the error type. I also simplified the code by using
`Peekable` iterators.
The new diff algorithm produces pretty bad diffs in some cases, such
as cc4b1e9230 in this repo (the parent of this commit). I think the
problem there is that many words are repeated over and over. Diffing
first at the line level and then refining the diff of the changed
ranges at the word level gives much better results. That's what this
patch does. After this patch, `jj diff -r cc4b1e923091` looks pretty
similar to the diff in GitHub's UI.
I hope to get around to doing the same for the merge code soon.
Impact on benchmarks:
Before:
test bench_diff_10k_lines_reversed ... bench: 42,647,532 ns/iter (+/- 765,347)
test bench_diff_10k_modified_lines ... bench: 21,407,980 ns/iter (+/- 126,366)
test bench_diff_10k_unchanged_lines ... bench: 4,235 ns/iter (+/- 16)
test bench_diff_1k_lines_reversed ... bench: 1,190,483 ns/iter (+/- 7,192)
test bench_diff_1k_modified_lines ... bench: 1,919,766 ns/iter (+/- 9,665)
test bench_diff_1k_unchanged_lines ... bench: 231 ns/iter (+/- 1)
test bench_diff_git_git_read_tree_c ... bench: 174,702 ns/iter (+/- 1,199)
After:
test bench_diff_10k_lines_reversed ... bench: 38,289,509 ns/iter (+/- 129,004)
test bench_diff_10k_modified_lines ... bench: 33,140,659 ns/iter (+/- 3,989,339)
test bench_diff_10k_unchanged_lines ... bench: 3,099 ns/iter (+/- 14)
test bench_diff_1k_lines_reversed ... bench: 973,551 ns/iter (+/- 94,895)
test bench_diff_1k_modified_lines ... bench: 3,033,818 ns/iter (+/- 29,513)
test bench_diff_1k_unchanged_lines ... bench: 230 ns/iter (+/- 1)
test bench_diff_git_git_read_tree_c ... bench: 79,100 ns/iter (+/- 963)
So most of them get slower, as expected. The last one, taken from a
real diff in the git.git repo, get faster, however (which is also what
I would have expected).
I made a quite late change in a recent patch to make the merge code to
merge based on lines instead of words. I forgot to update the tests
(and to even run them). Sorry :(
The previous patch switched over the content-merge code to use the new
histogram diff code. This patch switches over the content-diff code to
use the histogram diff code. As before, the immediate goal is to speed
it up. `jj diff -r c28ded83fc` in the git.git repo is a good example
of a diff that's extremely slow to calculate with our current
LCS-based diff. With this patch, that drops from 35 s to 0.12 s.
The diff was slightly better before. I think that's mostly because of
our different definition of a "word" in the data. We can improve that
later. The speedup we get now is easily worth the slightly worse diff.
With the histogram diff code from the previous patch, we can now start
using that for finding the "sync regions" in 3-way merge. That helps a
lot with the slow merging we had before this patch. `jj diff -r
9d540e9726` in the git.git repo drops from 22 s to 0.15 s with this
patch. (That commit is a rather arbitrary merge commit from aroun 5
years ago.)
With the new diff algorithm, the output of `jj diff -r 9d540e9726` in
git.git looks better if we find unchanged sync regions based on lines
than on words, so that's what I'm using in this patch. That's a change
compared the the LCS-based diff we used before this patch. I suspect
the reason that finding sync regions based on words works worse now is
not because of the change from LCS to histogram but because of the
change in how we define a word. My goal right now is mostly to make it
faster; I'll get back to refining the diff result later.
The current diff algorithm does a full LCS on the words of the texts,
which is really slow. Diffing the working copy when e.g.
`src/commands.py` has changes far apart takes seconds. This patch adds
an implementation inspired by JGit's Histogram diff. I say "inspired"
because I just didn't quite understand it :P In particular, I didn't
understand what it does when it finds non-unique elements. I decided
to line up the leading common elements on both sides of the merge. I
don't know if that usually gives good enough results in practice.
I'm sure this can still be optimized a lot, but this seems good enough
as a start. There is also many things to improve about the quality of
the diffs.
I just changed my `~/.gitignore` and some tests started failing
because the working copy respects the user's `~/.gitignore`. We should
probably not depend on `$HOME` in the library crate. For now, this
patch just makes sure we set it to an arbitrary directory in the tests
where it matters.
`test_commit_parallel` was failing on Mac in the GitHub CI. I suspect
the reason was that it was timing out. The test runs in about 1 s on
my Linux desktop and in about 3 s on my Mac laptop. It failed after 31
in the GitHub CI. This patch increases the timeout to 1 minute to try
to make the test pass. It would be better to set the timeout to a
higher value only in tests, but this will be good enough for now. By
the way, it has turned out that git notes (at least libgit2's
implementation of them) are too slow, so we should probably eventually
create our own storage for the extra metadata instead.
I only noticed that there was a newer version when running `cargo
install --path .`, which resulted in warnings about deprecated
functions. There's no other reason I'm aware of to upgrade now.
We can now finally use the commit index for filtering out ancestors
from the sets of heads.
I haven't timed the change from most of the recent work on
performance, but I did a measurement after this commit. I modified a
commit in the git.git repo's "what's cooking" branch (because that's
linear). Then I ran `jj evolve` so the 100 commits after it would get
evolved. That took ~700ms. `git rebase` of the same 100 commits took
~6s.
I also compared `jj op undo` of that `jj evolve` operation. With this
patch, that was sped up from ~6.8s to ~125ms.
`MutableRepo` has more information needed for taking fast-paths, and
it will have to make the same decision for doing incremental updates
of the evolution state anyway.