mirrors/jj

mirror of https://github.com/martinvonz/jj.git synced 2024-10-26 00:19:59 +00:00

Author	SHA1	Message	Date
Martin von Zweigbergk	e1a02c5c5b	merged_tree: make TreeDiffDirItem not self-referential This removes another dependency on `ouroboros`, for a small performance hit: ``` ❯ hyperfine --warmup 3 --runs 30 \ '/tmp/jj-before --ignore-working-copy diff -s --from v5.0 --to v6.0' \ '/tmp/jj-after --ignore-working-copy diff -s --from v5.0 --to v6.0' Benchmark 1: /tmp/jj-before --ignore-working-copy diff -s --from v5.0 --to v6.0 Time (mean ± σ): 689.7 ms ± 23.9 ms [User: 400.0 ms, System: 289.8 ms] Range (min … max): 666.9 ms … 759.2 ms 30 runs Benchmark 2: /tmp/jj-after --ignore-working-copy diff -s --from v5.0 --to v6.0 Time (mean ± σ): 710.9 ms ± 19.2 ms [User: 420.4 ms, System: 290.6 ms] Range (min … max): 688.5 ms … 752.0 ms 30 runs Summary '/tmp/jj-before --ignore-working-copy diff -s --from v5.0 --to v6.0' ran 1.03 ± 0.05 times faster than '/tmp/jj-after --ignore-working-copy diff -s --from v5.0 --to v6.0' ```	2023-11-17 03:50:34 -08:00
Martin von Zweigbergk	61d87fe296	merged_tree: make `TreeEntriesIterator` not self-referential While importing the `ouroboros` crate and the `aliasable` crate it depends on, the "unsafe Rust reviewer" expressed some concern that they contain a lot of unsafe code that's hard to review. We can avoid the unsafe code altogether by making `TreeEntriesIterator` not self-refential. Instead, we can collect the matching entries in an individual tree up front. It does have some performance cost: ``` ❯ hyperfine --warmup 3 --runs 30 \ '/tmp/jj-before --ignore-working-copy files -r v6.0' \ '/tmp/jj-after --ignore-working-copy files -r v6.0' Benchmark 1: /tmp/jj-before --ignore-working-copy files -r v6.0 Time (mean ± σ): 461.4 ms ± 14.3 ms [User: 232.1 ms, System: 229.4 ms] Range (min … max): 443.4 ms … 496.3 ms 30 runs Benchmark 2: /tmp/jj-after --ignore-working-copy files -r v6.0 Time (mean ± σ): 482.0 ms ± 14.3 ms [User: 257.2 ms, System: 224.9 ms] Range (min … max): 461.8 ms … 513.3 ms 30 runs Summary '/tmp/jj-before --ignore-working-copy files -r v6.0' ran 1.04 ± 0.04 times faster than '/tmp/jj-after --ignore-working-copy files -r v6.0' ``` I think that's acceptable.	2023-11-17 03:50:34 -08:00
Yuya Nishihara	93fbcec2f7	index: use BinaryHeap instead of BTreeSet in common_ancestors_pos() For the same reason as the heads_pos() change. We just want to omit duplicated items.	2023-11-16 08:27:59 +09:00
Yuya Nishihara	d4059520a9	index: cache generation numbers during common_ancestors_pos() computation I'm not sure if this is better, but common_ancestors_pos() would have a similar property to heads_pos().	2023-11-16 08:27:59 +09:00
Yuya Nishihara	ea4bdd718d	index: use "while let" in common_ancestors_pos()	2023-11-16 08:27:59 +09:00
Yuya Nishihara	02c84a8596	index: remove stale "allow(unstable_name_collisions)" I think this is remainder of nightly shims.	2023-11-16 08:27:59 +09:00
Yuya Nishihara	6399c392fd	index: make heads_pos() deduplicate entries without building separate set This is much faster (maybe because of better cache locality?) Another option is to use BTreeSet, but the BinaryHeap version is slightly faster. "bench revset" result in my linux repo: revsets/heads(tags()) --------------------- baseline 3.28 560.6±4.01ms 1 2.92 500.0±2.99ms 2 1.98 339.6±1.64ms 3 (this) 1.00 171.2±0.30ms	2023-11-16 08:27:59 +09:00
Yuya Nishihara	9832ee205d	index: optimize heads_pos() to cache generation numbers during computation Apparently, IndexEntry::generation_number() isn't cheap probably because it involves random access to larger memory region, and the u32 value might not be aligned. Let's instead store the generation numbers in BinaryHeap. Also, heads_pos() becomes slightly faster by keeping the BinaryHeap entries small, so I've removed the IndexEntry at all. This makes the default log and disambiguation revsets fast, which evaluate 'heads(immutable_heads())'. "bench revset" result in my linux repo: revsets/heads(tags()) --------------------- baseline 3.28 560.6±4.01ms 1 2.92 500.0±2.99ms 2 (this) 1.98 339.6±1.64ms	2023-11-16 08:27:59 +09:00
Yuya Nishihara	1e933b84dd	index: make IndexEntry::parents() lazy instead of collecting to Vec All callers just iterate over the parent entries. "bench revset" result in my linux repo: revsets/heads(tags()) --------------------- baseline 3.28 560.6±4.01ms 1 (this) 2.92 500.0±2.99ms	2023-11-16 08:27:59 +09:00
Yuya Nishihara	39b065f7ab	git: on import_refs(), exclude uninteresting dirs such as refs/jj/keep For loose refs, uninteresting directories can be just skipped. For packed refs, gix will have to do binary search for each prefix to find the starting point. Still it's better overall if the repository contains tons of refs/jj/keep refs. With my linux repo containing ~5k loose jj refs, this saves ~40ms: % hyperfine --warmup 3 --runs 10 \ "/tmp/jj-gix --ignore-working-copy git import -R ~/mirrors/linux" \ "/tmp/jj-gix-iter --ignore-working-copy git import -R ~/mirrors/linux" Benchmark 1: /tmp/jj-gix --ignore-working-copy git import -R ~/mirrors/linux Time (mean ± σ): 151.6 ms ± 11.4 ms [User: 38.8 ms, System: 111.6 ms] Range (min … max): 129.8 ms … 159.5 ms 10 runs Benchmark 2: /tmp/jj-gix-iter --ignore-working-copy git import -R ~/mirrors/linux Time (mean ± σ): 109.9 ms ± 11.6 ms [User: 27.5 ms, System: 82.4 ms] Range (min … max): 89.4 ms … 117.8 ms 10 runs	2023-11-14 17:35:27 +09:00
Yuya Nishihara	044716ee40	git: migrate import_refs() to gix::Repository Gitoxide errors are boxed since there are various error types and they tend to exceed the clippy size limit. Apparently, gitoxide is faster than git2: % hyperfine --warmup 3 --runs 10 \ "/tmp/jj-baseline --ignore-working-copy git import -R ~/mirrors/linux" \ "/tmp/jj-gix --ignore-working-copy git import -R ~/mirrors/linux" Benchmark 1: /tmp/jj-baseline --ignore-working-copy git import -R ~/mirrors/linux Time (mean ± σ): 205.4 ms ± 15.7 ms [User: 59.6 ms, System: 144.6 ms] Range (min … max): 189.7 ms … 223.9 ms 10 runs Benchmark 2: /tmp/jj-gix --ignore-working-copy git import -R ~/mirrors/linux Time (mean ± σ): 176.2 ms ± 13.7 ms [User: 41.2 ms, System: 134.0 ms] Range (min … max): 155.4 ms … 186.5 ms 10 runs	2023-11-14 17:35:27 +09:00
Yuya Nishihara	6c98dfcdcb	git: have import_refs() obtain git2::Repository instance from store This helps gitoxide migration. It's theoretically possible to import Git refs from non-Git backend, but I don't think such API flexibility is needed.	2023-11-14 17:35:27 +09:00
Yuya Nishihara	dbb1adaf0a	git: move import-related types close to import_refs() function	2023-11-14 17:35:27 +09:00
Yuya Nishihara	f991705e47	tests: add test for importing missing ancestor of HEAD If a commit pointed to by HEAD or ref is missing, the ref is considered invalid and excluded by import_refs(). The current test behavior appears to depend on some in-memory cache of git2::Repository.	2023-11-14 17:35:27 +09:00
Yuya Nishihara	8e143541a5	operation: propagate OpStoreError from parents() We need to .collect_vec() the parents iterator to temporary buffer since the borrowed iterator can't be returned back to the dag_walk functions. Another option is to clone op_store and parent ids to remove &self lifetime from the iterator, but that also means a temporary Vec is created.	2023-11-14 07:16:39 +09:00
Yuya Nishihara	8ddad859e8	dag_walk: add fallible topo_order_reverse_lazy() Unlike dfs_ok(), this function short-circuits at an Err as we use non-lazy topo_order_forward() internally. I think that's good enough. If we implement GC on operation log, deleted parents will be excluded (or mapped to tombstone) by caller. An Err shouldn't mean it's GC-ed.	2023-11-14 07:16:39 +09:00
Yuya Nishihara	3d5a07e86a	dag_walk: add fallible dfs(), topo_order(), heads(), and closest_common_node() This unblocks the use of Result<T, E> in op.parents(). There are two ways to encode errors: a. impl IntoIterator<Item = Result<T, E>> b. Result<V, E> where V: FromIterator<Item = T> I think (a) is more natural to algorithms like dfs(), which can process error nodes transparently. Still the caller might have to collect the source iterator to temporary Vec to conform to the neighbors_fn signature. It's not easy for neighbors_fn to return an iterator borrowing the input node. We already have GAT, but doesn't have return-position impl Trait in trait yet.	2023-11-14 07:16:39 +09:00
Yuya Nishihara	e5a9a26911	dag_walk: remove unused and untested leaves() function	2023-11-14 07:16:39 +09:00
Anton Bulakh	e3a1e5b80e	sign: Implement storage for digital commit signatures Recognize signature metadata from git commit objects, implement a basic version of that for the native backend. Extract the signed data (a commit binary repr without the signature) to be verified later.	2023-11-12 03:37:13 +02:00
Yuya Nishihara	b42a69db6d	git_backend: configure committer (and author) of gix::Repository Otherwise, ref updates would fail if we port git::export_refs() to gitoxide. This change isn't strictly needed for the backend itself, but we'll reuse the gix::Repository instance created by the backend when importing and exporting Git refs.	2023-11-11 22:35:54 +09:00
Yuya Nishihara	ea32c0cb9e	git_backend: pass UserSettings to GitBackend constructors	2023-11-11 22:35:54 +09:00
Yuya Nishihara	8a2048a0e5	repo: pass UserSettings to store factories and initializers GitBackend will use it to configure gix::Repository. I think UserSettings is generally useful to pass store-specific parameters, so I've updated all factory functions.	2023-11-11 22:35:54 +09:00
Yuya Nishihara	6125fb160e	op_store: embed details in operation/view not found error This is basically a copy of BackendError::ObjectNotFound. The failed id may be either view or operation id.	2023-11-11 22:35:40 +09:00
Yuya Nishihara	ea96513fd1	op_store: deduplicate functions that map io::Error to OpStoreError io_to_read_error() also translates ErrorKind::NotFound.	2023-11-11 22:35:40 +09:00
Martin von Zweigbergk	120115a20d	cli: pass `MaterializedTreeValue` into `git_diff_part()` Just a little preparation for reading the materialized values concurrently.	2023-11-10 04:54:47 -08:00
Waleed Khan	a60733f632	tree: remove unsafe with `ouroboros` for self-referential iterators	2023-11-09 21:50:29 -08:00
Yuya Nishihara	6ff3a4f3df	repo: reimplement DirtyCell without using unsafe While the safe implementation is a bit more complex (and probably more branchy), I don't think the runtime overhead would matter here. Let's remove one more unsafe for better code maintainability.	2023-11-10 07:42:45 +09:00
Martin von Zweigbergk	9b24d24612	conflicts: add another helper for materializing a tree value We have a few places where we have a `MergedTreeValue` and need to read the data associated with it so we can write to the working copy or include it in a diff. Let's extract some of that shared logic to a function so we can reuse it. I plan to use it for reading file contents in advance while streaming a diff in `local_working_copy` soon (and probably in `jj diff` thereafter), but I think it seems like an improvement on its own.	2023-11-08 21:21:38 -08:00
Martin von Zweigbergk	65bd5cacba	working copy: on checkout, move read from store out of `write_()` functions I'd like to read N files ahead from the backend, to avoid serializing too many server calls on backends that are backed by a server. Moving the reads a little earlier is a little step towards that. The `TreeState::write_()` functions can now be made into free/static functions if we prefer.	2023-11-08 21:21:38 -08:00
Yuya Nishihara	084b99e1e2	index: rewrite CompositeIndex::entry_by_pos() by leveraging ancestors iterator We no longer have "unsafe" in this function, so let's use the iterator API instead of recursion. Apparently I haven't pushed this change before because unsafe in .find_map() looked scary.	2023-11-08 12:09:33 +09:00
Anton Bulakh	d27351b978	misc: drop a few low-hanging unsafes Remove a couple of unnecessary unsafes: - The NonZeroUsize is a constant where the unwrap will optimize away anyway and we don't have an unsafe without any good reason there :) - The other two were simply not needed, lifetimes worked fine, maybe Rust became better since that code was written? NLL? Anyway, they're gone now	2023-11-08 02:16:08 +02:00
Yuya Nishihara	2ac9865ce7	revset: exclude @git branches from remote_branches() As discussed in Discord, it's less useful if remote_branches() included Git-tracking branches. Users wouldn't consider the backing Git repo as a remote. We could allow explicit 'remote_branches(remote=exact:"git")' query by changing the default remote pattern to something like 'remote=~exact:"git"'. I don't know which will be better overall, but we don't have support for negative patterns anyway.	2023-11-08 07:34:30 +09:00
Yuya Nishihara	59640496aa	cargo: sort dependencies list alphabetically	2023-11-07 23:46:05 +09:00
Yuya Nishihara	d1b0c4cc48	merge: relax input type of Merge::from_removes_adds()	2023-11-07 17:10:12 +09:00
Yuya Nishihara	e0c35684af	merge: rename Merge::new() to Merge::from_removes_adds() Since (removes, adds) pair is no longer the canonical representation of Merge, the name Merge::new() seems too generic. Let's give more verbose name.	2023-11-07 17:10:12 +09:00
Yuya Nishihara	2c128f1b61	merged_tree: convert from legacy conflicts through interleaved list This is basically the same change as the previous commit.	2023-11-07 17:10:12 +09:00
Yuya Nishihara	a734f46130	merged_tree: build unresolved Merge<Tree> from interleaved list We no longer need to iterate removes and adds separately.	2023-11-07 17:10:12 +09:00
Yuya Nishihara	dd26b7be40	merge: add Merge constructor that accepts interleaved values Also migrated some callers of 3-way merge, where [left, base, right] order looks okay.	2023-11-07 17:10:12 +09:00
Yuya Nishihara	803b41c426	merge: load legacy Merge values without allocating intermediate buffers	2023-11-07 17:10:12 +09:00
Yuya Nishihara	09987c1d27	merge: micro-optimize allocation of Merge object for resolved value It's super common that a Merge object holds a resolved value, so let's inline up to 1 element. T of Merge<T> usually consists of a couple of pointer-sized fields. I don't see any measurable speed up, but it's no worse than the original.	2023-11-07 17:10:12 +09:00
Martin von Zweigbergk	1140295829	merged_tree: extract polling of tree futures into a function	2023-11-07 00:03:50 -08:00
Martin von Zweigbergk	c77417d4e4	merged_tree: drop outer loop in `TreeDiffStreamImpl::poll_next()` As suggested by Yuya. I also added a comment and an assertion in the case where return `Poll::Pending`.	2023-11-07 00:03:50 -08:00
Martin von Zweigbergk	d989d4093d	merged_tree: let backend influence whether to use new diff algo Since the concurrent diff algorithm is significantly slower when using the Git backend, I think we'll have to use switch between the two algorithms depending on backend. Even if the concurrent version always performed as well as the sequential version, exactly how concurrent it should be probably still depends on the backend. This commit therefore adds a function to the `Backend` trait, so each backend can say how much concurrency they deal well with. I then use that number for choosing between the sequential and concurrent versions in `MergedTree::diff_stream()`, and also to decide the number of concurrent reads to do in the concurrent version.	2023-11-06 23:12:02 -08:00
Martin von Zweigbergk	f40adb84fc	merged_tree: add a `Stream` for concurrent diff off trees When diffing two trees, we currently start at the root and diff those trees. Then we diff each subtree, one at a time, recursively. When using a commit backend that uses remote storage, like our backend at Google does, diffing the subtrees one at a time gets very slow. We should be able to diff subtrees concurrently. That way, the number of roundtrips to a server becomes determined by the depth of the deepest difference instead of by the number of differing trees (times 2, even). This patch implements such an algorithm behind a `Stream` interface. It's not hooked in to `MergedTree::diff_stream()` yet; that will happen in the next commit. I timed the new implementation by updating `jj diff -s` to use the new diff stream and then ran it on the Linux repo with `jj diff --ignore-working-copy -s --from v5.0 --to v6.0`. That slowed down by ~20%, from ~750 ms to ~900 ms. Maybe we can get some of that performance back but I think it'll be hard to match `MergedTree::diff()`. We can decide later if we're okay with the difference (after hopefully reducing the gap a bit) or if we want to keep both implementations. I also timed the new implementation on our cloud-based repo at Google. As expected, it made some diffs much faster (I'm not sure if I'm allowed to share figures).	2023-11-06 23:12:02 -08:00
Martin von Zweigbergk	9af09ec236	test_meregd_tree: test diffing with a matcher We didn't have any tests at all for `MergedTree::diff()` with a matcher other than `EverythingMatcher`. This patch adds a few.	2023-11-06 23:12:02 -08:00
Martin von Zweigbergk	16aa8e8f10	test_merged_tree: nest each part of `test_diff_dir_file()` I'm about to add a few more checks for diffing with a matcher. I think it will help make it readable and reduce the risk of mixing up variables between each part of the test if we use some nested blocks. I also removed some unnecessary `.clone()` calls while at it.	2023-11-06 23:12:02 -08:00
Martin von Zweigbergk	c9ce80a82a	merged_tree: extract function for merged iterator of basenames in diff I'm going to reuse this for stream/async diffing.	2023-11-06 23:12:02 -08:00
Martin von Zweigbergk	b72f04ba61	merged_tree: rename `all_tree_conflict_names()` since it's not about conflicts	2023-11-06 23:12:02 -08:00
Yuya Nishihara	3fddc31da8	merge: remove Merge::take() which is no longer used Merge::take() is no longer a cheap function. We can add into_vec() if needed.	2023-11-07 06:52:35 +09:00
Yuya Nishihara	92dfe59ade	refs: run non-trivial merge of ref targets without destructuring Merge object	2023-11-07 06:52:35 +09:00

1 2 3 4 5 ...

2268 commits