Commit graph

319 commits

Author SHA1 Message Date
Martin von Zweigbergk
79b5b8d681 evolution: rewrite divergence resolution as iterator
This commit rewites the divergence-resolution part of `evolve()` as an
iterator (though not implementing the `Iterator` trait). Iterators are
just much easier to work with: they can easily be stopped, and errors
are easy to propagate. This patch therefore lets us propagate errors
from writing to stdout (typically pipe errors).
2021-05-15 09:16:19 -07:00
Martin von Zweigbergk
401bd2bd7b evolution: refactor evolve_divergent_change() to return result
I'm going to replace the `evolve()` function and its listener trait by
an iterator or something similar. This commit is to prepare for that.
2021-05-15 09:16:19 -07:00
Martin von Zweigbergk
a028f33e3b working copy: don't remove already-tracked files in ignored directories
The recent optimization to not walk ignored directories did not
account for the case where there already are files in the ignored
directory.
2021-05-15 09:15:45 -07:00
Martin von Zweigbergk
cbc3e915b7 working copy: allow overwriting untracked files
Before this commit, we would crash when checking out a commit that has
a file that's currently ignored in the working copy.
2021-05-15 08:24:56 -07:00
Martin von Zweigbergk
df175f549f working copy: don't assume that $HOME exists (it doesn't on Windows CI) 2021-05-13 23:02:56 -07:00
Martin von Zweigbergk
c1060610bd working copy: optimize simple case of entire directory being ignored
This makes the workging copy walk skip an entire ignored directory if
there are no negative patterns later in the ignore file. That speeds
up `jj st` in this repo with ~13k files in `target/` from ~100 ms to
~25 ms (6.0dB). This closes issue #8.
2021-05-13 22:28:38 -07:00
Martin von Zweigbergk
88f7f4732b gitignores: add own implementation and stop using libgit2's
This is to address issue #8. I haven't added the optimization to avoid
walking all the files in `target/` yet. Even so, this patch still
speeds up `jj st` in this repo, with ~13k files in `target/`, from
~320 ms to ~100 ms (-5.1dB). The time actually checking if paths match
gitignores seems to go down from 116 ms to 6 ms. I think that's mostly
because libgit2 has to look for `.gitignore` files in every parent
directory every time we ask it about a file, while the rewritten code
looks for a `.gitignore` file only when visiting a new directory.
2021-05-13 22:23:59 -07:00
Martin von Zweigbergk
31eff96cb7 cli: record full argv in operation log
When using the command line interface (which is the only interface so
far), it seems more useful to see the exact command that was run than
a logical description of what it does. This patch makes the CLI record
that information in the operation metadata in a new key/value field. I
put it in a generic key/value field instead of a more specialized
field because the key/value field seems like a useful thing to have in
general. However, that means that we "have to" do shell-escaping when
saving the data instead of leaving the data unescaped and adding the
shell-escaping when presenting it. I added very simple shell-escaping
for now.
2021-05-09 22:42:13 -07:00
Martin von Zweigbergk
70b99c960e transaction: make commit() return resulting ReadonlyRepo
I've wanted the API to look like this for a while. It seems like a
good API to me. It means that the caller won't have to reload the repo
after committing. The cost seems relatively small. It involves copying
potentially a lot of data in memory (at least the View object), but it
shouldn't involve reading from disk or any other processing. To reduce
the amount of data to copy, it may be worth switching to persistent
data types. I've also wanted to do that for the copying we do when
start a transaction.

I couldn't measure any slowdown caused by this change.
2021-05-08 13:50:59 -07:00
Martin von Zweigbergk
9e3e6f03a1 repo: store Operation object, not just its ID in ReadonlyRepo
This change simplifies a bit on its own, and it will help with the
next change as well.
2021-05-07 22:53:11 -07:00
Martin von Zweigbergk
c05878b9a6 maintenance: add support for latest nightly toolchain 2021-05-07 22:48:24 -07:00
Martin von Zweigbergk
eb348be32c revsets: make description() lazy by using new FilterRevset 2021-05-02 15:16:51 -07:00
Martin von Zweigbergk
e6f751a24d revsets: generalize merge revset implementation to be a generic filter 2021-05-02 14:58:10 -07:00
Martin von Zweigbergk
7065cecfdc revsets: add revset yielding merge commits 2021-05-02 14:33:38 -07:00
Martin von Zweigbergk
aef27d5701 revsets: remove transitive edges in graph iterator by default
The git.git repo seems to have lots of merges from far back in the
history into newer history. That results in `jj log -r 'git_refs()'`
being completely useless because of the number of such edges. For
example, v2.31.0 has almost 600 edges going out of it and presumably
merging (forking) back into various different previous versions. Git,
unlike Mercurial, seems to remove an edge from the graph if the edge
can also be reached via a longer path. This commit makes it so we also
do that (i.e. the filtered graph is a transitive reduction of the
graph before filtering).

This slows down `jj log -r ,,v2.0.0 -T ""` by about 2%. That's still
small enough that it doesn't seem worth it to have a separate iterator
for contiguous ranges (which would be an option).
2021-05-01 23:25:33 -07:00
Martin von Zweigbergk
33da97f0bf revsets: add iterator adapter for rendering simplified graph of set
When rendering a non-contiguous subset of the commits, we want to
still show the connections between the commits in the graph, even
though they're not directly connected. This commit introduces an
adaptor for the revset iterators that also yield the edges to show in
such a simplified graph.

This has no measurable impact on `jj log -r ,,v2.0.0` in the git.git
repo.


The output of `jj log -r 'v1.0.0 | v2.0.0'` now looks like this:

```
o   e156455ea491 e156455ea491 gitster@pobox.com 2014-05-28 11:04:19.000 -07:00 refs/tags/v2.0.0
:\  Git 2.0
: ~
o c2f3bf071ee9 c2f3bf071ee9 junkio@cox.net 2005-12-21 00:01:00.000 -08:00 refs/tags/v1.0.0
~ GIT 1.0.0
```

Before this commit, it looked like this:

```
o e156455ea491 e156455ea491 gitster@pobox.com 2014-05-28 11:04:19.000 -07:00 refs/tags/v2.0.0
| Git 2.0
| o   c2f3bf071ee9 c2f3bf071ee9 junkio@cox.net 2005-12-21 00:01:00.000 -08:00 refs/tags/v1.0.0
| |\  GIT 1.0.0
```

The output of `jj log -r 'git_refs()'` in the git.git repo is still
completely useless (it's >350k lines and >500MB of data). I think
that's because we don't filter out edges to ancestors that we have
transitive edges to. Mercurial also doesn't filter out such edges, but
Git (with `--simplify-by-decoration`) seems to filter them out. I'll
change it soon so we filter them out.
2021-05-01 14:56:52 -07:00
Martin von Zweigbergk
b953d7d801 git_store: revert lock timeout to 10s
This backs out commit 67e11e0fc3. We now use one thread per CPU in
tests (419002fab4), so I hope the tests won't need the 1-minute
timeout anymore.
2021-05-01 14:40:47 -07:00
Martin von Zweigbergk
df2caab274 tests: add a helper for building commit graphs when only topology is important 2021-04-30 22:46:20 -07:00
Martin von Zweigbergk
419002fab4 tests: use one thread per core in concurrency tests
The tests have been failing in GitHub's CI quite frequently. It's
about time I try to do something about it. Let's see if this helps.
2021-04-29 00:01:04 -07:00
Martin von Zweigbergk
46edbbef09 revsets: add revset function for getting all git refs
This adds a `git_refs()` revset that includes all commits pointed to
by a git ref. It's not very useful yet because the graph log doesn't
use the right type of edges for non-contiguous commits.
2021-04-28 23:34:17 -07:00
Martin von Zweigbergk
f5151bdbbe revsets: add a RevsetIterator type, to simplify API and enable adapters 2021-04-28 23:34:17 -07:00
Martin von Zweigbergk
0145be3693 index: add a newtype wrapper for IndexPosition
It seems better to both hide the specific type and to get some more
type safety.
2021-04-28 23:34:14 -07:00
Martin von Zweigbergk
5b18e89a4d diff: fix LCS when a line/word/byte has been moved later 2021-04-28 23:33:18 -07:00
Martin von Zweigbergk
13134bd5a4 cleanup: address warnings reported by new clippy version 2021-04-28 09:12:48 -07:00
Martin von Zweigbergk
c6f6498cc9 conflicts: add newline after conflict marker lines
Merging is currently done with line-level granularity, so it makes
sense to have newlines after the markers. That makes them easier to
edit out when resolving conflicts.
2021-04-24 13:53:24 -07:00
Martin von Zweigbergk
9dc18524fc revsets: add "ancestor difference" range operator (like git's ..) 2021-04-23 19:10:28 -07:00
Martin von Zweigbergk
49173de423 revsets: add DAG range operator (like hg's infix ::)
This lets you use the same operator as we currently have for ancestors
and descendants (`,,`) to also specify a DAG range. That's what
Mercurial uses the `::` operator for and what Git has `git log
--ancestry-path` for.
2021-04-23 19:10:26 -07:00
Martin von Zweigbergk
d8c209c82a revsets: give parents/children operators higher precedence than range operators 2021-04-23 18:45:42 -07:00
Martin von Zweigbergk
9de5f94af6 revsets: use same error variant for imcomplete parse as for syntax error 2021-04-23 16:59:04 -07:00
Martin von Zweigbergk
0b0374d401 revsets: make parsed Children and Descendants have roots and heads
It seems clearer to let the parsed `RevsetExpression`s have only root
and head expression instead of adding the ancestors when building the
expression tree.
2021-04-23 16:15:17 -07:00
Martin von Zweigbergk
f209354503 revsets: simplify and clarify description revset slightly 2021-04-23 13:30:31 -07:00
Martin von Zweigbergk
145731ec74 revsets: change operators around a bit to prepare for infix DAG range operator
I really liked the idea of having the operators for parents and
ancestors (etc.) look similar, but that turned out to be problematic
when we want to add an infix operator for a DAG range (hg's `::`
revset operator and git's `--ancestry-path` flag). Let's say we chose
`:*:` as the operator. Part of the problem is how to parse `foo:*:bar`
without eagerly parsing the `foo:`. It would also be nicer to use
exactly the same operator as prefix, postfix, and infix. Since the
"parents" operator can be repeated, we can't have it be just `:` and
the "ancestors" operator be `::`. We could make the "ancestors"
operator be something like `*:*` (or anything symmetric with the `:`
symbol on the inside). However, at that point, the operator is getting
ugly and hard to type. Another option would be to use `:` for
ancestors and `::` for parents, but that is counterintuitive and get
annoying if you want to repeat it. So it seems that the best option is
to simply pick different symbols for parents/children and
ancestors/descendants/range.

This patch changes the ancestors/descendants operators to both be
`,,`. I'm not at all attached to that particular symbol. I suspect
we'll change it later.
2021-04-23 11:11:07 -07:00
Martin von Zweigbergk
c894a7435f revsets: make function arguments always be revset expressions
Now that expressions may contain literal strings, we can simply have
functions accept only expressions arguments. That simplifies both the
grammar and the code.

A small drawback is that `description((foo), bar)` is now allowed and
does a search for the string "foo" (not "(foo)"). That seems
unlikely to trip up users.
2021-04-23 10:51:41 -07:00
Martin von Zweigbergk
5819687237 revsets: accept quoted symbol names
Git refs with names containing e.g "-" are currently not accepted
symbol names, and I don't plan to change the grammar to accept
them. Instead, let's have the user quote symbol names containing
unusual characters. That way we can keep these symbols reserved for
revset operators.

With this patch the user can do e.g. `jj diff -r '"v2.9.0-rc2"'`.
2021-04-23 10:37:42 -07:00
Martin von Zweigbergk
d78fd9e979 revsets: add functions and operators for children and descendants
This adds `children(<set>)` and `<set>:` for the children of the given
set, and `descendants(<set>)` and `<set>:*` for the descendants of the
given set. The children and descendants are filtered to be among
ancestors of non-obsolete commits. I haven't added a way of overriding
that yet.
2021-04-21 23:34:20 -07:00
Martin von Zweigbergk
618abf4379 revsets: use consistent "_op" suffix for operator rules in the grammar
This is especially important now that we leak the rule names into the
`SyntaxError` message. For example, the error message when doing `jj
diff -r :` will now mention "expected parents_op, ancestors_op, or
primary". It seems much clearer with the "_op" suffixes there. Longer
term, we should think more about how we can best surface syntax errors
from the library crate.
2021-04-21 18:53:58 -07:00
Martin von Zweigbergk
a4ef42962c revsets: don't crash when given ungrammatical revset
I also snuck in some updates to the test cases.
2021-04-21 18:53:58 -07:00
Martin von Zweigbergk
744f209e76 revsets: move parse-tests to to revset module (from separate test module)
The tests don't need any complex set up (no repo necessary), so they
can be in the `revset` module itself. I'm sure we'll need to split up
that module later (at least separate out the parsing), but that's a
separate problem.
2021-04-21 18:53:58 -07:00
Martin von Zweigbergk
6bc1361b84 index: make revision walk be by position instead of generation number
I don't know why I made it walk by generation number to start
with. Walking by position is better in at least two ways: 1) revsets
now depend on the walks to be by descending index position (though
they could equally well depend on the walks to be by generation number
-- it just needs to be consistent), and 2) the log output gets less
interleaved.

This commit makes the number of bytes in the graphlog output in the
git.git repo drop by ~40% due to the reduced amount of
interleaving. Also, it reduces the time of `jj bench walkrevs v1.0.0
v2.0.0` in the git.git repo by 32% (9.4ms -> 6.4ms) and `jj bench
walkrevs v2.0.0 v1.0.0` by 33% (7.7ms -> 5.1ms).
2021-04-21 18:53:56 -07:00
Martin von Zweigbergk
98f4e24892 cli: make benchmark ids include parameters
It makes no sense to compare a run of `jj walkrevs v1.0.0 v2.0.0` with
a run of `jj walkrevs v2.0.0 v1.0.0`, for example.
2021-04-21 16:56:45 -07:00
Martin von Zweigbergk
64fcf90c68 view: make root commit public 2021-04-18 23:04:15 -07:00
Martin von Zweigbergk
b52cfc156c revsets: add a public_heads() revset function 2021-04-18 22:52:31 -07:00
Martin von Zweigbergk
3a65c1d2ab revsets: add intersection operator 2021-04-18 22:45:12 -07:00
Martin von Zweigbergk
332580918c revsets: add union operator 2021-04-18 22:45:12 -07:00
Martin von Zweigbergk
2ac5d1f912 revsets: allow spaces in most places (but not after prefix operators) 2021-04-18 22:45:12 -07:00
Martin von Zweigbergk
c04f418e67 revsets: add difference operator 2021-04-18 22:45:12 -07:00
Martin von Zweigbergk
e733b074e1 revsets: restructure grammar to prepare for operator precedences 2021-04-18 22:45:12 -07:00
Martin von Zweigbergk
d9ae7cdd6d revsets: allow parenthesized expressions
We'll clearly want to allow parenthesized expressions once we have
infix operators (if not before). Let's prepare by allowing parentheses
already now.
2021-04-18 22:45:12 -07:00
Martin von Zweigbergk
d71c083a7f cli: use revsets also when looking up by description 2021-04-18 22:45:12 -07:00
Martin von Zweigbergk
62f0778942 revsets: add a description() revset
The revset is currently eagerly evaluated, which is clearly bad. We'll
need to fix that later.
2021-04-18 22:45:12 -07:00
Martin von Zweigbergk
05e9149157 revsets: add a non_obsolete_heads() revset
This change adds a `non_obsolete_heads(<set>)` revset, which walks up
ancestors of the input set until it gets to a non-obsolete and
non-pruned commit. That's what we do by default in `jj log`
(i.e. without `--all`). Now we can make `jj log` use revsets and teach
it a `-r` option!
2021-04-18 22:45:10 -07:00
Martin von Zweigbergk
30aa459d2a revsets: add a all_heads() revset function
This adds a `all_heads()` revset function, which contains all heads in
the view, i.e. including non-public heads and obsolete heads.
2021-04-18 22:31:46 -07:00
Martin von Zweigbergk
88904e2b63 revsets: add support for function syntax
This adds `parents(foo)` and `ancestors(foo)` as alternative ways of
writing `:foo` and `*:foo`. 

I haven't added support for for whitespace yet; the parsing is very
strict. The error messages will also need to be improved later.
2021-04-18 21:25:58 -07:00
Martin von Zweigbergk
2d6325b0f4 revsets: define grammar in pest 2021-04-18 21:25:58 -07:00
Martin von Zweigbergk
0d62a336af revsets: initial support for Mercurial-style revsets
This patch adds initial support for a DSL for specifying revisions
inspired by Mercurial's "revset" language. The initial support
includes prefix operators ":" (parents) and "*:" (ancestors) with
naive parsing of the revsets. Mercurial uses postfix operator "^" for
parent 1 just like Git does. It uses prefix operator "::" for
ancestors and the same operator as postfix operator for descendants. I
did it differently because I like the idea of using the same operator
as prefix/postfix depending on desired direction, so I wanted to apply
that to parents/children as well (and for
predecessors/successors). The "*" in the "*:" operator is copied from
regular expression syntax. Let's see how it works out. This is an
experimental VCS, after all.

I've updated the CLI to use the new revset support.

The implementation feels a little messy, but you have to start
somewhere...
2021-04-18 21:25:51 -07:00
Martin von Zweigbergk
7861968f64 index: make IndexRef::entry_by_id() etc return entry with repo's lifetime
It's useful to be able to know that given a `repo: RepoRef<'a>`, the
the lifetime of `repo.index().entry_by_id()` will also be `'a`.
2021-04-15 07:00:04 -07:00
Martin von Zweigbergk
4c3d73ff3b evolution: walk orphans using index
This actually seems to make it slightly slower, but it fixes an
important bug (we used to evolve only one topological branch per `jj
evolve` call). The slowdown seemed to be on the order of 5% when
evolving 100 commits on git.git's "what's cooking" branch.
2021-04-14 08:25:14 -07:00
Martin von Zweigbergk
783e1f6512 repo: make MutableRepo have an Arc<ReadonlyRepo> instead of a reference
I suspect that at least one reason that I didn't make
`MutableRepo::base_repo` by an `Arc<ReadonlyRepo>` before was that I
thought that that would mean that `start_transaction()` would need be
moved off of `ReadonlyRepo` so it can be given an
`&Arc<ReadonlyRepo>`, which would make it much less convenient to
use. It turns out that a `self` argument can actually be of type
`&Arc<ReadonlyRepo>`.
2021-04-11 13:42:31 -07:00
Martin von Zweigbergk
ce855bccfa repo: make reload() and reload_at() return a new ReadonlyRepo
After this patch `ReadonlyRepo` is even closer to readonly. That makes
it easier to reason about. It will allow some further cleanups too.
2021-04-11 10:39:29 -07:00
Martin von Zweigbergk
e3ca27bf77 revsets: support git refs 2021-04-10 10:10:09 -07:00
Martin von Zweigbergk
40f75ec641 revsets: don't crash if given non-hex symbol 2021-04-10 10:08:47 -07:00
Martin von Zweigbergk
9e8a7e2ba6 revsets: move code for resolving symbol to commit to new module 2021-04-10 09:46:27 -07:00
Martin von Zweigbergk
102f7a0416 diff: also recurse into final region after after unchanged regions
See test case for details.


Before:
test bench_diff_10k_lines_reversed  ... bench:  36,249,659 ns/iter (+/- 174,455)
test bench_diff_10k_modified_lines  ... bench:  37,258,890 ns/iter (+/- 803,963)
test bench_diff_10k_unchanged_lines ... bench:       4,252 ns/iter (+/- 69)
test bench_diff_1k_lines_reversed   ... bench:     982,834 ns/iter (+/- 6,467)
test bench_diff_1k_modified_lines   ... bench:   3,343,469 ns/iter (+/- 23,243)
test bench_diff_1k_unchanged_lines  ... bench:         231 ns/iter (+/- 2)
test bench_diff_git_git_read_tree_c ... bench:      95,559 ns/iter (+/- 816)


After:
test bench_diff_10k_lines_reversed  ... bench:  36,186,715 ns/iter (+/- 196,903)
test bench_diff_10k_modified_lines  ... bench:  37,511,000 ns/iter (+/- 1,370,476)
test bench_diff_10k_unchanged_lines ... bench:       3,099 ns/iter (+/- 8)
test bench_diff_1k_lines_reversed   ... bench:     986,010 ns/iter (+/- 11,565)
test bench_diff_1k_modified_lines   ... bench:   3,370,938 ns/iter (+/- 17,041)
test bench_diff_1k_unchanged_lines  ... bench:         230 ns/iter (+/- 2)
test bench_diff_git_git_read_tree_c ... bench:     102,189 ns/iter (+/- 1,052)


So this patch makes diffing even slower (but still easily fast enough
for all cases I've run into in real life). There's probably a lot that
can be done to make things faster, but the first priority is that the
diffs are correct and easy to read.
2021-04-08 23:54:54 -07:00
Martin von Zweigbergk
f4a41f3880 trees: make tree diff return an iterator instead of taking a callback
This is yet another step towards making it easy to propagate
`BrokenPipe` errors. The `jj diff` code (naturally) diffs two trees
and prints the diffs. If the printing fails, we shouldn't just crash
like we do today.

The new code is probably slower since it does more copying (the
callback got references to the `FileRepoPath` and `TreeValue`). I hope
that won't make a noticeable difference. At least `jj diff -r
334afbc76fbd --summary` didn't seem to get measurably slower.
2021-04-07 23:18:00 -07:00
Martin von Zweigbergk
8b2ce18254 trees: make diff_entries() return an iterator instead of taking a callback
The iterator version is easier to use and we get rid of the ugly type
parameter for the error type. I also simplified the code by using
`Peekable` iterators.
2021-04-07 15:48:11 -07:00
Martin von Zweigbergk
5c10c93e64 diff: fix tests broken by the previous commit
Sorry, I forgot to run the automated tests again :(
2021-04-07 11:00:04 -07:00
Martin von Zweigbergk
0dd000d236 diff: do final refinement at byte-level for non-word bytes
This results in significantly more readable diffs on commits like
659393bec2 in this repo.


Before:
test bench_diff_10k_lines_reversed  ... bench:  38,122,998 ns/iter (+/- 557,688)
test bench_diff_10k_modified_lines  ... bench:  32,556,563 ns/iter (+/- 548,114)
test bench_diff_10k_unchanged_lines ... bench:       4,231 ns/iter (+/- 15)
test bench_diff_1k_lines_reversed   ... bench:     958,296 ns/iter (+/- 46,963)
test bench_diff_1k_modified_lines   ... bench:   3,014,723 ns/iter (+/- 15,830)
test bench_diff_1k_unchanged_lines  ... bench:         249 ns/iter (+/- 2)
test bench_diff_git_git_read_tree_c ... bench:      78,599 ns/iter (+/- 1,079)

After:
test bench_diff_10k_lines_reversed  ... bench:  38,289,493 ns/iter (+/- 413,712)
test bench_diff_10k_modified_lines  ... bench:  37,352,516 ns/iter (+/- 1,293,950)
test bench_diff_10k_unchanged_lines ... bench:       4,238 ns/iter (+/- 13)
test bench_diff_1k_lines_reversed   ... bench:     967,253 ns/iter (+/- 8,506)
test bench_diff_1k_modified_lines   ... bench:   3,358,028 ns/iter (+/- 37,154)
test bench_diff_1k_unchanged_lines  ... bench:         233 ns/iter (+/- 1)
test bench_diff_git_git_read_tree_c ... bench:      95,787 ns/iter (+/- 740)


So the biggest slowdown is when there are modified lines.
2021-04-07 10:27:17 -07:00
Martin von Zweigbergk
f634ff0e3f files: make diff() return an iterator instead of using a callback
Iterators are generally nicer to work with. My immediate goal is to be
able to propagate errors when failing to write to stdout.
2021-04-07 10:07:18 -07:00
Martin von Zweigbergk
d7395cc34a diff: add copyright header 2021-04-06 21:26:37 -07:00
Martin von Zweigbergk
7e4e43f358 diff: first diff lines, then refine to words, producing better diffs
The new diff algorithm produces pretty bad diffs in some cases, such
as cc4b1e9230 in this repo (the parent of this commit). I think the
problem there is that many words are repeated over and over. Diffing
first at the line level and then refining the diff of the changed
ranges at the word level gives much better results. That's what this
patch does. After this patch, `jj diff -r cc4b1e923091` looks pretty
similar to the diff in GitHub's UI.

I hope to get around to doing the same for the merge code soon.

Impact on benchmarks:

Before:
test bench_diff_10k_lines_reversed  ... bench:  42,647,532 ns/iter (+/- 765,347)
test bench_diff_10k_modified_lines  ... bench:  21,407,980 ns/iter (+/- 126,366)
test bench_diff_10k_unchanged_lines ... bench:       4,235 ns/iter (+/- 16)
test bench_diff_1k_lines_reversed   ... bench:   1,190,483 ns/iter (+/- 7,192)
test bench_diff_1k_modified_lines   ... bench:   1,919,766 ns/iter (+/- 9,665)
test bench_diff_1k_unchanged_lines  ... bench:         231 ns/iter (+/- 1)
test bench_diff_git_git_read_tree_c ... bench:     174,702 ns/iter (+/- 1,199)

After:
test bench_diff_10k_lines_reversed  ... bench:  38,289,509 ns/iter (+/- 129,004)
test bench_diff_10k_modified_lines  ... bench:  33,140,659 ns/iter (+/- 3,989,339)
test bench_diff_10k_unchanged_lines ... bench:       3,099 ns/iter (+/- 14)
test bench_diff_1k_lines_reversed   ... bench:     973,551 ns/iter (+/- 94,895)
test bench_diff_1k_modified_lines   ... bench:   3,033,818 ns/iter (+/- 29,513)
test bench_diff_1k_unchanged_lines  ... bench:         230 ns/iter (+/- 1)
test bench_diff_git_git_read_tree_c ... bench:      79,100 ns/iter (+/- 963)


So most of them get slower, as expected. The last one, taken from a
real diff in the git.git repo, get faster, however (which is also what
I would have expected).
2021-04-04 21:50:31 -07:00
Martin von Zweigbergk
cc4b1e9230 test: fix merge tests to expect line-based merging
I made a quite late change in a recent patch to make the merge code to
merge based on lines instead of words. I forgot to update the tests
(and to even run them). Sorry :(
2021-04-01 08:27:27 -07:00
Martin von Zweigbergk
c071d412af diff: use new diff algorithm for content diff
The previous patch switched over the content-merge code to use the new
histogram diff code. This patch switches over the content-diff code to
use the histogram diff code. As before, the immediate goal is to speed
it up. `jj diff -r c28ded83fc` in the git.git repo is a good example
of a diff that's extremely slow to calculate with our current
LCS-based diff. With this patch, that drops from 35 s to 0.12 s.

The diff was slightly better before. I think that's mostly because of
our different definition of a "word" in the data. We can improve that
later. The speedup we get now is easily worth the slightly worse diff.
2021-03-31 22:22:59 -07:00
Martin von Zweigbergk
3c35dbace6 merge: use new diff algorithm for finding sync regions
With the histogram diff code from the previous patch, we can now start
using that for finding the "sync regions" in 3-way merge. That helps a
lot with the slow merging we had before this patch. `jj diff -r
9d540e9726` in the git.git repo drops from 22 s to 0.15 s with this
patch. (That commit is a rather arbitrary merge commit from aroun 5
years ago.)

With the new diff algorithm, the output of `jj diff -r 9d540e9726` in
git.git looks better if we find unchanged sync regions based on lines
than on words, so that's what I'm using in this patch. That's a change
compared the the LCS-based diff we used before this patch. I suspect
the reason that finding sync regions based on words works worse now is
not because of the change from LCS to histogram but because of the
change in how we define a word. My goal right now is mostly to make it
faster; I'll get back to refining the diff result later.
2021-03-31 22:16:19 -07:00
Martin von Zweigbergk
1e657c5331 diff: add a histogram(-like?) diff algorithm
The current diff algorithm does a full LCS on the words of the texts,
which is really slow. Diffing the working copy when e.g.
`src/commands.py` has changes far apart takes seconds. This patch adds
an implementation inspired by JGit's Histogram diff. I say "inspired"
because I just didn't quite understand it :P In particular, I didn't
understand what it does when it finds non-unique elements. I decided
to line up the leading common elements on both sides of the merge. I
don't know if that usually gives good enough results in practice.

I'm sure this can still be optimized a lot, but this seems good enough
as a start. There is also many things to improve about the quality of
the diffs.
2021-03-31 22:15:36 -07:00
Martin von Zweigbergk
998e23db3c index: add IndexEntry::parents() and predecessors() returning Vec<IndexEntry> 2021-03-31 14:48:03 -07:00
Martin von Zweigbergk
53d1757994 dag_walk: remove unused TopoIter 2021-03-18 16:42:30 -07:00
Martin von Zweigbergk
db4e8bc458 cargo: upgrade to protobuf 2.22.1 to avoid workaround for rustfmt::skip 2021-03-18 13:06:42 -07:00
Martin von Zweigbergk
07c2b2316f repo: remove obsolete part of a TODO (we use the index to filter out non-heads) 2021-03-17 08:28:21 -07:00
Martin von Zweigbergk
30cd94f842 dag_walk: rename unreachable() to heads() to match name we use in index module 2021-03-16 23:54:51 -07:00
Martin von Zweigbergk
5aec8b9d77 evolution: use index for filtering out ancestors of candidates in new_parent()
This speeds up `jj evolve` of 100 linear commits of the "what's
cooking" branch in the git.git repo further, from ~700 ms to ~400 ms.
2021-03-16 23:43:44 -07:00
Martin von Zweigbergk
73f20c8696 transaction: delete write_commit() and as_repo_ref() helpers
With this patch, the simple delegating helpers are gone from
`Transaction`.
2021-03-16 22:45:58 -07:00
Martin von Zweigbergk
f9873c49ec transaction: remove add_head(), remove_head(), and set_view() helpers 2021-03-16 22:31:28 -07:00
Martin von Zweigbergk
06df609482 transaction: delete check_out() and set_checkout() helpers 2021-03-16 22:31:28 -07:00
Martin von Zweigbergk
808d0af66d transaction: remove evolution() and store() helpers 2021-03-16 22:31:24 -07:00
Martin von Zweigbergk
16d97ef8c0 transaction: remove index() and view() helpers 2021-03-16 22:05:51 -07:00
Martin von Zweigbergk
5ed14185a0 git: take a MutableRepo instead of a Transaction 2021-03-16 22:05:51 -07:00
Martin von Zweigbergk
769f88bbae tests: rename test_transaction to test_mut_repo
The test doesn't test any logic in the `Transaction` type itself
anymore.
2021-03-16 22:05:51 -07:00
Martin von Zweigbergk
2c2b5fb3b7 evolution: take a MutableRepo instead of a Transaction 2021-03-16 22:05:51 -07:00
Martin von Zweigbergk
c3b9d1cd13 rewrite: take a MutableRepo instead of a Transaction 2021-03-16 22:05:51 -07:00
Martin von Zweigbergk
ee8423a69e MutableRepo: rename repo to base_repo to clarify its role 2021-03-16 22:05:50 -07:00
Martin von Zweigbergk
69de4698ac tests: set $HOME in a few tests to avoid depending in developer's ~/.gitignore
I just changed my `~/.gitignore` and some tests started failing
because the working copy respects the user's `~/.gitignore`. We should
probably not depend on `$HOME` in the library crate. For now, this
patch just makes sure we set it to an arbitrary directory in the tests
where it matters.
2021-03-16 22:05:36 -07:00
Martin von Zweigbergk
67e11e0fc3 git_store: wait 1 minute for lock on refs to help tests
`test_commit_parallel` was failing on Mac in the GitHub CI. I suspect
the reason was that it was timing out. The test runs in about 1 s on
my Linux desktop and in about 3 s on my Mac laptop. It failed after 31
in the GitHub CI. This patch increases the timeout to 1 minute to try
to make the test pass. It would be better to set the timeout to a
higher value only in tests, but this will be good enough for now. By
the way, it has turned out that git notes (at least libgit2's
implementation of them) are too slow, so we should probably eventually
create our own storage for the extra metadata instead.
2021-03-16 11:28:22 -07:00
Martin von Zweigbergk
81a0e0bd2a protobuf: upgrade to version 2.22.0
I only noticed that there was a newer version when running `cargo
install --path .`, which resulted in warnings about deprecated
functions. There's no other reason I'm aware of to upgrade now.
2021-03-15 17:09:29 -07:00
Martin von Zweigbergk
1ebdd4ecf0 MutableRepo: use index when enforcing view invariants
We can now finally use the commit index for filtering out ancestors
from the sets of heads.

I haven't timed the change from most of the recent work on
performance, but I did a measurement after this commit. I modified a
commit in the git.git repo's "what's cooking" branch (because that's
linear). Then I ran `jj evolve` so the 100 commits after it would get
evolved. That took ~700ms. `git rebase` of the same 100 commits took
~6s.

I also compared `jj op undo` of that `jj evolve` operation. With this
patch, that was sped up from ~6.8s to ~125ms.
2021-03-15 16:35:45 -07:00
Martin von Zweigbergk
3ecb4ec16b MutableRepo: in fast-path for adding head, simply remove parent heads 2021-03-15 15:38:09 -07:00
Martin von Zweigbergk
2c92fca75a MutableView: don't require whole Commit when CommitId is enough 2021-03-15 15:36:03 -07:00
Martin von Zweigbergk
b4b1de3ddc view: let MutableRepo enforce view invariants
`MutableRepo` has more information needed for taking fast-paths, and
it will have to make the same decision for doing incremental updates
of the evolution state anyway.
2021-03-15 15:17:36 -07:00
Martin von Zweigbergk
b9fe944e76 view: remove unnecessary removing of parents in add_head()
We call `enforce_invariants()` right after removing the parent
commits, and that will remove parents anyway.
2021-03-15 15:06:14 -07:00
Martin von Zweigbergk
12a47bd6ed MutableRepo: don't calculate evolution state only to update it 2021-03-15 15:03:50 -07:00
Martin von Zweigbergk
f0619c07ac MutableEvolution: make MutableRepo responsible for lazy calculation
This patch continues the work from the previous pathc. From this
patch, we no longer calculate the evolution state just because a
transaction starts. We still unnecessarily calculate it when adding a
commit within the transaction, however. I'll fix that next.
2021-03-15 15:03:14 -07:00