ok/jj
1
0
Fork 0
forked from mirrors/jj
Commit graph

204 commits

Author SHA1 Message Date
Martin von Zweigbergk
5c10c93e64 diff: fix tests broken by the previous commit
Sorry, I forgot to run the automated tests again :(
2021-04-07 11:00:04 -07:00
Martin von Zweigbergk
0dd000d236 diff: do final refinement at byte-level for non-word bytes
This results in significantly more readable diffs on commits like
659393bec2 in this repo.


Before:
test bench_diff_10k_lines_reversed  ... bench:  38,122,998 ns/iter (+/- 557,688)
test bench_diff_10k_modified_lines  ... bench:  32,556,563 ns/iter (+/- 548,114)
test bench_diff_10k_unchanged_lines ... bench:       4,231 ns/iter (+/- 15)
test bench_diff_1k_lines_reversed   ... bench:     958,296 ns/iter (+/- 46,963)
test bench_diff_1k_modified_lines   ... bench:   3,014,723 ns/iter (+/- 15,830)
test bench_diff_1k_unchanged_lines  ... bench:         249 ns/iter (+/- 2)
test bench_diff_git_git_read_tree_c ... bench:      78,599 ns/iter (+/- 1,079)

After:
test bench_diff_10k_lines_reversed  ... bench:  38,289,493 ns/iter (+/- 413,712)
test bench_diff_10k_modified_lines  ... bench:  37,352,516 ns/iter (+/- 1,293,950)
test bench_diff_10k_unchanged_lines ... bench:       4,238 ns/iter (+/- 13)
test bench_diff_1k_lines_reversed   ... bench:     967,253 ns/iter (+/- 8,506)
test bench_diff_1k_modified_lines   ... bench:   3,358,028 ns/iter (+/- 37,154)
test bench_diff_1k_unchanged_lines  ... bench:         233 ns/iter (+/- 1)
test bench_diff_git_git_read_tree_c ... bench:      95,787 ns/iter (+/- 740)


So the biggest slowdown is when there are modified lines.
2021-04-07 10:27:17 -07:00
Martin von Zweigbergk
f634ff0e3f files: make diff() return an iterator instead of using a callback
Iterators are generally nicer to work with. My immediate goal is to be
able to propagate errors when failing to write to stdout.
2021-04-07 10:07:18 -07:00
Martin von Zweigbergk
d7395cc34a diff: add copyright header 2021-04-06 21:26:37 -07:00
Martin von Zweigbergk
7e4e43f358 diff: first diff lines, then refine to words, producing better diffs
The new diff algorithm produces pretty bad diffs in some cases, such
as cc4b1e9230 in this repo (the parent of this commit). I think the
problem there is that many words are repeated over and over. Diffing
first at the line level and then refining the diff of the changed
ranges at the word level gives much better results. That's what this
patch does. After this patch, `jj diff -r cc4b1e923091` looks pretty
similar to the diff in GitHub's UI.

I hope to get around to doing the same for the merge code soon.

Impact on benchmarks:

Before:
test bench_diff_10k_lines_reversed  ... bench:  42,647,532 ns/iter (+/- 765,347)
test bench_diff_10k_modified_lines  ... bench:  21,407,980 ns/iter (+/- 126,366)
test bench_diff_10k_unchanged_lines ... bench:       4,235 ns/iter (+/- 16)
test bench_diff_1k_lines_reversed   ... bench:   1,190,483 ns/iter (+/- 7,192)
test bench_diff_1k_modified_lines   ... bench:   1,919,766 ns/iter (+/- 9,665)
test bench_diff_1k_unchanged_lines  ... bench:         231 ns/iter (+/- 1)
test bench_diff_git_git_read_tree_c ... bench:     174,702 ns/iter (+/- 1,199)

After:
test bench_diff_10k_lines_reversed  ... bench:  38,289,509 ns/iter (+/- 129,004)
test bench_diff_10k_modified_lines  ... bench:  33,140,659 ns/iter (+/- 3,989,339)
test bench_diff_10k_unchanged_lines ... bench:       3,099 ns/iter (+/- 14)
test bench_diff_1k_lines_reversed   ... bench:     973,551 ns/iter (+/- 94,895)
test bench_diff_1k_modified_lines   ... bench:   3,033,818 ns/iter (+/- 29,513)
test bench_diff_1k_unchanged_lines  ... bench:         230 ns/iter (+/- 1)
test bench_diff_git_git_read_tree_c ... bench:      79,100 ns/iter (+/- 963)


So most of them get slower, as expected. The last one, taken from a
real diff in the git.git repo, get faster, however (which is also what
I would have expected).
2021-04-04 21:50:31 -07:00
Martin von Zweigbergk
cc4b1e9230 test: fix merge tests to expect line-based merging
I made a quite late change in a recent patch to make the merge code to
merge based on lines instead of words. I forgot to update the tests
(and to even run them). Sorry :(
2021-04-01 08:27:27 -07:00
Martin von Zweigbergk
c071d412af diff: use new diff algorithm for content diff
The previous patch switched over the content-merge code to use the new
histogram diff code. This patch switches over the content-diff code to
use the histogram diff code. As before, the immediate goal is to speed
it up. `jj diff -r c28ded83fc` in the git.git repo is a good example
of a diff that's extremely slow to calculate with our current
LCS-based diff. With this patch, that drops from 35 s to 0.12 s.

The diff was slightly better before. I think that's mostly because of
our different definition of a "word" in the data. We can improve that
later. The speedup we get now is easily worth the slightly worse diff.
2021-03-31 22:22:59 -07:00
Martin von Zweigbergk
3c35dbace6 merge: use new diff algorithm for finding sync regions
With the histogram diff code from the previous patch, we can now start
using that for finding the "sync regions" in 3-way merge. That helps a
lot with the slow merging we had before this patch. `jj diff -r
9d540e9726` in the git.git repo drops from 22 s to 0.15 s with this
patch. (That commit is a rather arbitrary merge commit from aroun 5
years ago.)

With the new diff algorithm, the output of `jj diff -r 9d540e9726` in
git.git looks better if we find unchanged sync regions based on lines
than on words, so that's what I'm using in this patch. That's a change
compared the the LCS-based diff we used before this patch. I suspect
the reason that finding sync regions based on words works worse now is
not because of the change from LCS to histogram but because of the
change in how we define a word. My goal right now is mostly to make it
faster; I'll get back to refining the diff result later.
2021-03-31 22:16:19 -07:00
Martin von Zweigbergk
1e657c5331 diff: add a histogram(-like?) diff algorithm
The current diff algorithm does a full LCS on the words of the texts,
which is really slow. Diffing the working copy when e.g.
`src/commands.py` has changes far apart takes seconds. This patch adds
an implementation inspired by JGit's Histogram diff. I say "inspired"
because I just didn't quite understand it :P In particular, I didn't
understand what it does when it finds non-unique elements. I decided
to line up the leading common elements on both sides of the merge. I
don't know if that usually gives good enough results in practice.

I'm sure this can still be optimized a lot, but this seems good enough
as a start. There is also many things to improve about the quality of
the diffs.
2021-03-31 22:15:36 -07:00
Martin von Zweigbergk
998e23db3c index: add IndexEntry::parents() and predecessors() returning Vec<IndexEntry> 2021-03-31 14:48:03 -07:00
Martin von Zweigbergk
53d1757994 dag_walk: remove unused TopoIter 2021-03-18 16:42:30 -07:00
Martin von Zweigbergk
db4e8bc458 cargo: upgrade to protobuf 2.22.1 to avoid workaround for rustfmt::skip 2021-03-18 13:06:42 -07:00
Martin von Zweigbergk
07c2b2316f repo: remove obsolete part of a TODO (we use the index to filter out non-heads) 2021-03-17 08:28:21 -07:00
Martin von Zweigbergk
30cd94f842 dag_walk: rename unreachable() to heads() to match name we use in index module 2021-03-16 23:54:51 -07:00
Martin von Zweigbergk
5aec8b9d77 evolution: use index for filtering out ancestors of candidates in new_parent()
This speeds up `jj evolve` of 100 linear commits of the "what's
cooking" branch in the git.git repo further, from ~700 ms to ~400 ms.
2021-03-16 23:43:44 -07:00
Martin von Zweigbergk
73f20c8696 transaction: delete write_commit() and as_repo_ref() helpers
With this patch, the simple delegating helpers are gone from
`Transaction`.
2021-03-16 22:45:58 -07:00
Martin von Zweigbergk
f9873c49ec transaction: remove add_head(), remove_head(), and set_view() helpers 2021-03-16 22:31:28 -07:00
Martin von Zweigbergk
06df609482 transaction: delete check_out() and set_checkout() helpers 2021-03-16 22:31:28 -07:00
Martin von Zweigbergk
808d0af66d transaction: remove evolution() and store() helpers 2021-03-16 22:31:24 -07:00
Martin von Zweigbergk
16d97ef8c0 transaction: remove index() and view() helpers 2021-03-16 22:05:51 -07:00
Martin von Zweigbergk
5ed14185a0 git: take a MutableRepo instead of a Transaction 2021-03-16 22:05:51 -07:00
Martin von Zweigbergk
769f88bbae tests: rename test_transaction to test_mut_repo
The test doesn't test any logic in the `Transaction` type itself
anymore.
2021-03-16 22:05:51 -07:00
Martin von Zweigbergk
2c2b5fb3b7 evolution: take a MutableRepo instead of a Transaction 2021-03-16 22:05:51 -07:00
Martin von Zweigbergk
c3b9d1cd13 rewrite: take a MutableRepo instead of a Transaction 2021-03-16 22:05:51 -07:00
Martin von Zweigbergk
ee8423a69e MutableRepo: rename repo to base_repo to clarify its role 2021-03-16 22:05:50 -07:00
Martin von Zweigbergk
69de4698ac tests: set $HOME in a few tests to avoid depending in developer's ~/.gitignore
I just changed my `~/.gitignore` and some tests started failing
because the working copy respects the user's `~/.gitignore`. We should
probably not depend on `$HOME` in the library crate. For now, this
patch just makes sure we set it to an arbitrary directory in the tests
where it matters.
2021-03-16 22:05:36 -07:00
Martin von Zweigbergk
67e11e0fc3 git_store: wait 1 minute for lock on refs to help tests
`test_commit_parallel` was failing on Mac in the GitHub CI. I suspect
the reason was that it was timing out. The test runs in about 1 s on
my Linux desktop and in about 3 s on my Mac laptop. It failed after 31
in the GitHub CI. This patch increases the timeout to 1 minute to try
to make the test pass. It would be better to set the timeout to a
higher value only in tests, but this will be good enough for now. By
the way, it has turned out that git notes (at least libgit2's
implementation of them) are too slow, so we should probably eventually
create our own storage for the extra metadata instead.
2021-03-16 11:28:22 -07:00
Martin von Zweigbergk
81a0e0bd2a protobuf: upgrade to version 2.22.0
I only noticed that there was a newer version when running `cargo
install --path .`, which resulted in warnings about deprecated
functions. There's no other reason I'm aware of to upgrade now.
2021-03-15 17:09:29 -07:00
Martin von Zweigbergk
1ebdd4ecf0 MutableRepo: use index when enforcing view invariants
We can now finally use the commit index for filtering out ancestors
from the sets of heads.

I haven't timed the change from most of the recent work on
performance, but I did a measurement after this commit. I modified a
commit in the git.git repo's "what's cooking" branch (because that's
linear). Then I ran `jj evolve` so the 100 commits after it would get
evolved. That took ~700ms. `git rebase` of the same 100 commits took
~6s.

I also compared `jj op undo` of that `jj evolve` operation. With this
patch, that was sped up from ~6.8s to ~125ms.
2021-03-15 16:35:45 -07:00
Martin von Zweigbergk
3ecb4ec16b MutableRepo: in fast-path for adding head, simply remove parent heads 2021-03-15 15:38:09 -07:00
Martin von Zweigbergk
2c92fca75a MutableView: don't require whole Commit when CommitId is enough 2021-03-15 15:36:03 -07:00
Martin von Zweigbergk
b4b1de3ddc view: let MutableRepo enforce view invariants
`MutableRepo` has more information needed for taking fast-paths, and
it will have to make the same decision for doing incremental updates
of the evolution state anyway.
2021-03-15 15:17:36 -07:00
Martin von Zweigbergk
b9fe944e76 view: remove unnecessary removing of parents in add_head()
We call `enforce_invariants()` right after removing the parent
commits, and that will remove parents anyway.
2021-03-15 15:06:14 -07:00
Martin von Zweigbergk
12a47bd6ed MutableRepo: don't calculate evolution state only to update it 2021-03-15 15:03:50 -07:00
Martin von Zweigbergk
f0619c07ac MutableEvolution: make MutableRepo responsible for lazy calculation
This patch continues the work from the previous pathc. From this
patch, we no longer calculate the evolution state just because a
transaction starts. We still unnecessarily calculate it when adding a
commit within the transaction, however. I'll fix that next.
2021-03-15 15:03:14 -07:00
Martin von Zweigbergk
61acee52f4 ReadonlyEvolution: make ReadonlyRepo responsible for lazy calculation
This patch changes it so that `ReadonlyEvolution` does not lazily
calculate its state and the caller, i.e. `ReadonlyRepo`, is instead
responsible for the laziness. That will allow the caller to make
decisions based on whether the state has been
calculated. Specifically, we don't want to calculate the evolution
state in order to update it incrementally if it hasn't already been
calculated. It's better to just leave it uncalculated in that case.

As a result of moving the laziness out of `ReadonlyEvolution`, we also
don't need to the reference to `ReadonlyRepo` anymore, which
simplifies things a bunch. The next patch will continue by making the
corresponding change to `MutableEvolution`, which will let us simplify
even more.
2021-03-15 14:41:27 -07:00
Martin von Zweigbergk
43315bc9d2 git: fix bad formatting from commit 1e9d428406 2021-03-14 22:28:12 -07:00
Martin von Zweigbergk
91117f36b6 cargo: work around warning in generated protobuf code with new nightly rustc 2021-03-14 22:25:43 -07:00
Martin von Zweigbergk
1e9d428406 git: skip tags pointing to GPG keys and similar when importing refs 2021-03-14 20:14:18 -07:00
Martin von Zweigbergk
429a1ad7ab git: set authentication callback on fetch as well
I guess I had not run `jj git fetch` from GitHub until I tried to
fetch the result of PR #6 just now.
2021-03-14 17:18:51 -07:00
Jun Wu
d1d502c062 tests: disable tests failing on Windows
This unblocks enabling GitHub CI. I took a quick look at
some failures but the causes do not seem obvious to me.
2021-03-14 15:51:32 -07:00
Jun Wu
935da3e13f lock: treat PermissionDenied on Windows as transient error
On Windows it can be PermissionDenied when creating the new file
exclusively. This change makes lock_concurrent test pass on Windows.
2021-03-14 15:51:32 -07:00
Jun Wu
eacab648b0 working_copy: clean up ".git" automatically
TreeState::write_tree leaves a ".git" file in the working copy. This is
undesirable but more problematic on Windows - The second time
TreeState::write_tree would panic because Repository::init_opts will fail
with a Permission Denied error.

This seems to be a libgit2 defect. But for now let's just remove ".git"
automatically. This makes `cargo test --test smoke_test` pass on Windows.
2021-03-14 15:49:42 -07:00
Jun Wu
4cd29a2130 working_copy: avoid std::os::unix on Windows
std::os::unix::fs::PermissionsExt::mode() does not exist on Windows.
Treat files on Windows as regular files.
2021-03-14 15:49:22 -07:00
Martin von Zweigbergk
5631e85502 view: don't enforce invariants in merge_views()
We now only call the function from `MutableRepo::merge()`. There we
pass the result to `MutableView::set_view()`, which already enforces
the invariants.
2021-03-14 11:07:34 -07:00
Martin von Zweigbergk
8048d9641e commands: rewrite jj op undo using new MutableRepo::merge() 2021-03-14 10:57:57 -07:00
Martin von Zweigbergk
a7f4f4cf5b rustfmt: configure to merge imports by module
Perhaps we should even set the config to "Item" to reduce merge conflicts.
2021-03-14 10:53:14 -07:00
Martin von Zweigbergk
4b8484e561 rustfmt: configure to group imports 2021-03-14 10:46:25 -07:00
Martin von Zweigbergk
ac9fb1832d OpHeadsStore: move logic for merging repos to MutableRepo
This adds `MutableRepo::merge()`, which applies the difference between
two `ReadonRepo`s to itself. That results in much simpler code than
the current code in `merge_op_heads()`. It also lets us write `undo`
using the new function. Finally -- and this is the actual reason I did
it now -- it prepares for using the index when enforcing view
invariants.
2021-03-14 10:43:39 -07:00
Martin von Zweigbergk
e9ddfdd8bc Repo: repurpose ReadonlyRepo::loader() to return loader for existing repo
It's sometimes useful to create a `RepoLoader` given an existing
`ReadonlyRepo`. We already do that in `ReadonlyRepo::reload()`. This
patch repurposes `ReadonlyRepo::reload()` for that.
2021-03-14 10:34:18 -07:00