I don't think there's much reason to run most tests with a `.git`
directory outside of `.jj`. I think it's just that way for historical
reasons. It's been that way since I added support for `.jj`-internal
repos in a8a9f7dedd.
The reason I want to switch is to make it a little easier to create
test repos for different backends. The problem with `.jj`-external git
repos is that they depend on an additional path.
I had to update `test_bad_locking.rs` to make the code merging
directories able handle missing directories on some side, because
git's loose objects result in directories getting created on one or
both sides.
I ran into some issues here when switching our tests to use
`.jj`-internal git repos. For example, the `std::fs::copy()` calls
started failing, which may be related to #2103. I think one problem is
that we could end up calling `merge_directories()` twice for the same
directory. This patch fixes that by deduping the paths we call with,
and makes the function assume that the output directory doesn't exist.
I've noticed `WorkspaceCommandHelper::format_file_path()` appear in
profiles a few times. A big part of that is spent in
`RepoPath::to_fs_path()`. I think I had been thinking that
`PathBuf::join()` takes `self` by value and mutates it, but it turns
out it creates a new instance. So our `result = result.join(...)` in a
loop was copying the `PathBuf` over and over. This fixes that and also
reserves the expected size. That speeds up `jj files
--ignore-working-copy -r v6.0` in the Linux repo from 546.0 s to 509.3
s (6.7%).
The main goal of this change is to enable tree-level conflict format, but it
also allows us to bulk-import commits on clone/init. I think a separate method
will help if we want to provide progress information, enable check for
.jjconflict entries under certain condition, etc.
Since git::import_refs() now depends on GitBackend type, it might be better to
remove git_repo from the function arguments.
Since we have overloaded operator symbols, we need to deduplicate them
upfront. Legacy and compat operators are also removed from the suggestion.
It's a bit ugly to mutate the error struct before calling Error::renamed_rule(),
but I think it's still better than reimplementing message formatting function.
Since e7e49527ef "git: ensure that remote branches never diverge", the last
known "refs/remotes" ref should be synced with the corresponding remote branch.
So we can always trust the branch@remote expression. We don't need "refs/tags"
lookup either since tags should have been imported by git::import_refs().
FWIW, I'm thinking of reorganizing view.git_refs() map as per-remote views.
It would be nice if we can get rid of revsets and template keywords exposing
low-level Git ref primitives.
Git doesn't have a root commit, so we should skip branches pointing to
it on export, just like we do with conflicted branches (which Git also
doesn't support).
Before this patch, the order would depend on the reason we failed to
export a ref, because we would add to the `failed_branches` list in
several different places. What's worse, when the export failed because
the branch was conflicted or had an invalid name (from Git's
perspective), it was non-deterministic because we iterated over a
HashSet. This patch fixes that by sorting at the end.
Note that we still want the `branches_to_update` map to be a
`BTreeMap` so we update branches in deterministic order. Otherwise the
error when trying to export both branches `main` and `main/sub` will
become non-deterministic.
Suppose "x::y" is the operator that defaults to "root()::visible_heads()"
respectively, "::" is identical to "all()". Since we've just changed the
behavior of "..y", ".." is now "root()..visible_heads()" meaning "~root()".
I was looking into some overly verbose logs and happened to notice
that `Store` uses the default derived, which presumably means it's
going to include all objects in its cached. Just including the
backend's debug string seems enough.
In `LockedWorkingCopy::drop()`, we panic if the caller had not called
`finish()`. IIRC, the idea was both to find bugs where we forgot to
call `finish()` and to prevent continuing with a modified
`WorkingCopy` instance. I don't think the former has been a problem in
practice. It has been a problem in practice to call `discard()` to
avoid the panic, though. To address that, we can make the `Drop`
implementation discard the changes (forcing a reload of the state if
the working copy is accessed again).
When restoring (`jj restore`) a 3-sided conflict from one tree into a
2-sided tree (or a resolved tree), we'll need to extend the size arity
of the target tree to that of the source tree. I had not considered
this case before. This patch relaxes the constraint in
`MergedTreeBuilder` to allow such cases. The additional trees are
based on empty trees with only the larger overrides in them.
Many (most?) callers of `Store::empty_tree_id()` really want a
`MergedTreeId`, so let's create a helper for that. It returns the
`Legacy` variant, which is what all current callers used. That should
be all we need since the two variants compare equal these days, and
since trees built based on the legacy variant can get promoted to the
new variant on write if the config is enabled.
When we start writing tree-level conflicts in an existing repo, we
don't want commits that change the format to be non-empty if they
don't change any content. This patch updates `MergeTreeId::eq()` to
consider two resolved trees equal even if only their `MergedTreeId`
variant is different (one is path-level and one is tree-level).
I think I've gone through all places we compare tree ids and checked
that it's safe to compare them this way. One consequence is that
rebasing a commit without changing the parents (typically
auto-rebasing after `jj describe`) will not lead to the tree id
getting upgraded, due to an optimization we have for that case. I
don't think that's serious enough to handle specially; we'll have to
support the old format for existing repos for a while regardless of a
few commits not getting upgraded right away.
The number of failing tests with the config option enabled drop from
108 to 11 with this patch.
We're finally ready to start writing trees using the new format where
we represent conflicts by having multiple trees in the commit instead
of having a single tree with multiple entries at a path. This patch
adds a config option for that. It's not ready to be used yet, so I
haven't updated the release notes or other documentation.
I added only a simple CLI test for testing what happens when the
config is enabled in an existing repo. 108 tests currently fail if we
flip the default.
I think most tests want a `MergedTree`, so this makes `create_tree()`
return that. I kept the old function as `create_single_tree()`. That's
now only used in `test_merge_trees` and `test_merged_tree`.
I also consistently imported the functions now, something I've
considered doing for a long time.
I made it simply fail on explicit fetch/import, and ignored on implicit import.
Since the error mode is predictable and less likely to occur. I don't think it
makes sense to implement warning propagation just for this.
Closes#1690.
With the already existing `MergedTree::resolve()` and all the recent
refactorings into `Merge<T>`, it's now very easy to add support for
3-way merging of `MergedTree` instances.
This introduces a `MergedTreeBuilder` type, which takes a set of base
trees and overrides. The idea is that it will be able to write
multiple trees or a legacy tree. For now, it's only able to write
legacy trees. To show that it works, the working copy's snaphotting
code has been updated to use it.
We currently represent the root tree id in a commit by `Merge<TreeId>`
plus a boolean `uses_tree_conflict_format`. It's better to use an enum
for that. That makes it harder to forget to check which type of tree
it is, and it makes it impossible to store a legacy tree with multiple
ids (as we could with `uses_tree_conflict_format=false`,
`root_tree=Merge::new(...)`).
Maybe more importantly, we're also going to want to pass around this
information in most places where we currently pass a single `TreeId`,
and passing two separate values would be annoying.
Unlike the git backend, we don't need to support path-level conflicts
for existing repos because we don't care about compatibility with
existing repos using the native backend. However, we still need to
support both formats until all code paths are able to handle
tree-level conflicts.
As #2165 showed, when diffing two `MergedTree::Legacy` variants (or
one of each variant) and re recurse into a subtree, we need to treat
that as a legacy tree too, so we expand `TreeValue::Conflict`s found
in the diff.
This converts `TreeDiffIterator::tree()` and
`TreeDiffIterator::single_tree()` into associated functions and passes
in the `&MergedTree` into the former. This prepares for fixing #2165,
and it removes the need for the `TreeDiffIterator::store` field.
This is what I proposed in #2095. @ is now an operator to concatenate symbols.
Unlike the other operators, lhs/rhs of @ is not a target of alias substitution.
'x' in 'x@y' doesn't look like a named variable, though it's technically
possible to allow definition of an alias expanded to a symbol of specific remote
or vice versa. This will probably apply to the kind:pattern syntax, where
aliases are expanded due to the current implementation restriction. I've added
a TODO comment about that.
I'm going to change the parsing rule of name@remote, and @ will no longer be
included in a symbol identifier. I could add a separate test for remote symbols,
but I think it's better to write tests that cover both "x"@"y" and "x@y" paths.
`itertools::interleave()` does exactly what we want for
`Merge::iter()`. I had just not thought to look for it
before. Hopefully it's not noticeably slow.
An alternative name for it would be `arity()`, but `num_sides()`
probably more clearly says that it's not about the number of removes
or the total number of terms.
We were using `current_tree()` only for an assertion where we were
walking its entries. Now that `MergedTree` supports that, we can
replace `current_tree()` by `current_merged_tree()`.
There's more work needed before the working copy can fully work with
tree-level conflicts. We still need to be able to store multiple tree
ids in the `tree_state` file, and we need to be able to create
multiple trees instead of writing conflict objects to the backend.
To support tree-level conflicts, we're going to need to update the
working copy from one `MergedTree` to another. We're going need to
store multiple tree ids in the `tree_state` file. This patch gets us
closer to that by getting the diff from `MergedTree`s`, even though we
assume that they are legacy trees for now, so we can write to the
single-tree `tree_state` file.
If we're going to be able to replace most instances of `Tree` by
`MergedTree`, we'll need to be able to diff two `MergedTree`s. This
implements support for that. The implementation copies a lot from the
diff iterator we have for `Tree`. I suspect we should be able to reuse
some of the code by introducing some traits that can then be
implemented by both `Tree` and `MergedTree`. I've left a TODO about
that.
When we do an update between two `MergedTree` instances, we'll get
diffs between two `Merge<Option<TreeValue>>`. This commit prepares for
that by changing the type of the `before` and `after` arguments we
pass into the closure in `update()`.
I think it's a little easier to follow if we don't update the stats in
the large callback. It also reduces the risk of forgetting to update
the stats in some case (like in the exec-bit-optimization case I just
removed).
When updating the working copy from one tree to another, if only the
executable bit has changed between the two trees, we set the
executable bit on the file without touching its contents. The
optimization probably gets used quite rarely. Maybe it's even so
rarely that it's a pessimization overall. Perhaps its value lies more
in that we avoid updating the file's mtime unnecessarily. Either way,
I'm about to change this code to use `Merge<Option<TreeValue>>` and
that will make this block more complex. I don't think it's worth the
complexity even it provides some small benefit sometimes.
Many of the `TreeBuilder` users have an `Option<TreeValue>` and call
either `set()` or `remove()` or the builder depending on whether the
value is present. Let's centralize this logic in a new
`TreeBuilder::set_or_remove()`.
One of the error types that I later created embedded `BackendError`, but `clippy` complained that the size of the type was too large. This helps address that.
The VS Code "Better TOML" plugin (which I think most of our VS Code developers use?) doesn't support the `x.y = z` syntax at the top level, even though it's valid TOML.
This is also useful if we ever want to add additional properties in different sub-crates (although unlikely for the near future).
When the main `TreeState::snapshot()` thread doesn't receive any
updated tree entries over the channel, it correctly doesn't write a
new tree. However, it also doesn't write the working copy state file
(`.jj/working_copy/tree_state`). This resulted in performance
regression in 3f97a6da78. From that commit, repeated snapshotting
would have to re-read all files from disk because it didn't remember
the updated mtime from the previous time.
This patch fixes the bug by also writing the file if there were any
new file states.
This doesn't seem to make any difference right now, but it will if we
write the state file when there are mtime-only changes, which we
currently don't do.
`revset::parse()` already has a `RevsetWorkspaceContext` argument, so
I think it makes sense to put that and the other context arguments
into a larger `RevsetParseContext` object.
We resolve file paths into repo-relative paths while parsing the
revset expression, so I think it's consistent to also resolve which
workspace "@" refers to while parsing it. That means we won't need the
workspace context both while parsing and while resolving symbols.
In order to break things like `author("martinvonz@")` (thanks to @yuja
for catching this), I also changed the parsing of working-copy
expressions so they are not allowed to be
quoted. `author(martinvonz@)` will therefore be an error now. That
seems like a small improvement anyway, since we have recently talked
about making `root` and `[workspace]@` not parsed as other symbols.
Per discussion in #2107, I believe "exact" is preferred.
We can also change the default to exact match, but it doesn't always make
sense. Exact match would be useful for branches(), but not for description().
We could define default per predicate function, but I'm pretty sure I cannot
remember which one is which.
git-branchless calls it a substring, so let's do the same.
FWIW, I copied literal:_ from Mercurial, but it's exact:_ in git-branchless.
I have no idea which one is preferred. Since this feature isn't released, we
can freely change it if exact:_ makes more sense.
https://github.com/arxanas/git-branchless/wiki/Reference:-Revsets#patterns
This commit replaces the functions `UserSettings::user_name_placeholder()`` and
`UserSettings::user_email_placeholder()` with `const` `&str`s to emphasize that
the placeholder strings must not be changed to support commits without
names or email addresses made before this change.
The code for getting the current tree object was repeated a few times
over. I'm going to soon make it return a `MergedTree` and I don't want
to repeat that code (it's more complicated than the current code).
The syntax is slightly different from Mercurial. In Mercurial, a pattern must
be quoted like "<kind>:<needle>". In JJ, <kind> is a separate parsing node, and
it must not appear in a quoted string. This allows us to report unknown prefix
as an error.
There's another subtle behavior difference. In Mercurial, branch(unknown) is
an error, whereas our branches(literal:unknown) is resolved to an empty set.
I think erroring out doesn't make sense for JJ since branches() by default
performs substring matching, so its behavior is more like a filter.
The parser abuses DAG range syntax for now. It can be rewritten once we remove
the deprecated x:y range syntax.
Add type annotation to `vec` to avoid the following build error if you
additionally import `bstr`:
```
~/jj> cargo test
Compiling jj-lib v0.8.0 (/home/aspotashev/jj/lib)
warning: unused import: `bstr`
--> lib/src/default_index_store.rs:30:5
|
30 | use bstr;
| ^^^^
|
= note: `#[warn(unused_imports)]` on by default
error[E0282]: type annotations needed
--> lib/src/default_index_store.rs:564:14
|
564 | .as_mut()
| ^^^^^^
565 | .write_u32::<LittleEndian>(parent_overflow.len() as u32)
| --------- type must be known at this point
|
help: try using a fully qualified path to specify the expected types
|
563 | <[u8] as AsMut<T>>::as_mut(&mut buf[parent_overflow_offset..parent_overflow_offset + 4])
| +++++++++++++++++++++++++++++++ ~
For more information about this error, try `rustc --explain E0282`.
warning: `jj-lib` (lib) generated 1 warning
error: could not compile `jj-lib` (lib) due to previous error; 1 warning emitted
```
Reason to support `bstr` being imported: in a Bazel environment where
crates are imported with certain features enabled, jj-lib may pull in
bstr as part of the following dependency chain:
jj-lib -> insta -> similar -> bstr.
We now have all the pieces in place to read the current tree as a
`MergedTree` when snapshotting the working copy. For now, it's still
always a legacy tree. We'll need to update the working copy state file
to support storing multiple trees before we can create a `MergedTree`
with multiple sides here.
For tree-level conflicts, we're going to be getting
`Merge<Option<TreeValue>>` from the current tree and produce a new
such value if contents changes on disk. This commit gets us a little
closer to that by passing in a value of that type into
`write_path_to_store()`.
This seems to have a small but measurable performance
impact. Snapshotting the working copy in the git repo with all files
`touch`ed went from 2.36 s to 2.43 s (3%). I think that's okay,
especially since most files' mtimes rarely change, and we only pay the
price when it has.
If the value at a path hasn't changed, there's no need to send it over
the channel and have the receiver add it to `TreeBuilder`. I couldn't
measure any performance impact.
Now we should no longer send `TreeValue::Conflict` variants over the
tree entry channel.
When writing tree-level conflicts, we're going to be writing multiple
tree (maybe using some new `MergedTreeBuilder`), so we'll need the
full `Merge<Option<TreeValue>>` object. This gets us closer to that by
sending such objects over the channel and having the receiver write
the conflict object.
Note that we still sometimes send `TreeValue::Conflict` variants over
the channel. That only happens if they're unchanged.
When writing tree-level conflicts, we won't pass `TreeValue::Conflict`
over the `tree_entries` channel. Instead, we're going to pass possibly
unresolved `Merge<Option<TreeValue>>` instances. This commit prepares
for that by changing the type even though we'll only pass
`Merge::normal()` over the channel at this point.
I did this partly to see what the performance impact is. I tested that
by touching all files in the git.git repo to force the trees (and
files) to be rewritten. There was no measurable impact at all
(best-of-10 time was 2.44 s before and 2.40 s after, but I assume that
was a fluke).
This basically means that heads in a filtered graph appear in reverse
chronological order. Before, "jj log -r 'tags()'" in linux-stable repo would
look randomly sorted once you ran "jj debug reindex" in it.
With this change, indexing is more like breadth-first search, and BFS is
known to be bad at rendering nice graph (because branches run in parallel.)
However, we have a post process to group topological branches, so we don't
have this problem. For serialization formats like Mercurial's revlog iirc,
BFS leads to bad compression ratio, but our index isn't that kind of data.
Reindexing gets slightly slower, but I think this is negligible.
(in Git repository)
% hyperfine --warmup 3 --runs 10 "jj debug reindex --ignore-working-copy"
(original)
Time (mean ± σ): 1.521 s ± 0.027 s [User: 1.307 s, System: 0.211 s]
Range (min … max): 1.486 s … 1.573 s 10 runs
(new)
Time (mean ± σ): 1.568 s ± 0.027 s [User: 1.368 s, System: 0.197 s]
Range (min … max): 1.531 s … 1.625 s 10 runs
Another idea is to sort heads chronologically and run DFS-based topological
sorting. It's ad-hoc, but worked surprisingly well for my local repositories.
For repositories with lots of long-running branches, this commit will provide
more predictable result than DFS-based one.
With the new `Merge::iter()`, we can simplify the code a bit by
combining that with `zip`.
I'll simplify the last part of `update_from_content()` next.
Implementing `Iterator` and `FromIterator` on `Merge<T>` provides much
more flexibility than the current `map()`, `try_map()`, etc.
`Merge::from_iter()` wouldn't have a way of failing if it's given an
unexpected (even) number of items. I would be fine with having it
panic, but we can't even usefully do that, because
e.g. `Option::from_iter()` will pass us an iterator ends early if the
input interator ends early. For example,
`Merge::resolved(None).iter().collect()` would call
`Merge::from_iter()` with an empty iterator (first item `None`). So, I
instead created a `MergeBuilder` type implementing `FromIterator`, and
let `MergeBuilder::build()` panic if there were an even number of
items.
I re-implemented some existing `Merge` methods using the new
facilities in this commit. Maybe we should remove some of the methods.
This allows us to reorder commits to be indexed in bulk.
The incremental update optimization is applied only for a single head. This
could be tried for multiple heads, but it's unlikely that every head has
a single new commit for each.
This is similar to what mut_repo.add_head() does.
I'm going to adjust the visiting order so the bulk-imported history preserves
chronological order. It might be a small adjustment on the current DFS
approach, or new function based on Kahn's algorithm. Either way, it's important
that both "jj git import" and "jj debug reindex" use the same underlying
function.
Almost the entire method deals with `FileType::Normal`, so we can
reduce indentation and repeated matching on the file type by doing it
early and returning in the non-normal-file cases.
For tree-level conflicts, we're eventually not going to have
`ConflictId`. We'd want to make `write_conflict_to_store()` take a
`Merge<Option<TreeValue>>` and return an updated such value. That
would leave very little logic in the function, so let's just inline it
instead.
`update_from_content()` already writes file content for each term of
an unresolved merge, so it seems consistent for it to also write the
file content for resolved merges. I think this should simplify further
refactoring for tree-level conflicts and for preserving the executable
bit.
Since `update_from_contents()` only works with file contents and not
the executable or other kinds of paths, I think it makes more sense
for it to deal with `FileId`s instead of `TreeValue`s.
There were still many instances of `conflict` left from before we
renamed `Conflict<T>` to `Merge<T>`. I decided to rename many of them
based on the type parameter instead of the container. I think that
made it more readable in many cases.
I think I moved way too many functions onto `Merge<Option<TreeValue>>`
in 82883e648d. This effectively reverts almost all of that
commit. The `Merge<T>` type is simple container and it seems like it
should be at fairly low level in the dependency graph. By moving
functions off of it, we can get rid of the back-depdencies from the
`merge` module to the `conflict` module that I introduced when I moved
`Merge` to the `merge` module. I'm thinking the `conflict` module can
focus on materialized conflicts.
Per discussion in #2009. This behavior isn't affected by e7e49527ef "git:
ensure that remote branches never diverge", but it's subtle enough to write
a test.
I was considering how refs would be imported if we had a per-remote view of
named branches (and tags): Each remote has a view, and jj remembers the last
known view state to compute diffs. That's the same for the pseudo "git" remote.
Under the current storage, these view states are represented as follows:
git_refs["refs/heads/{name}"] # pseudo "git" remote branches
git_refs["refs/tags/{name}"] # pseudo "git" remote tags
git_refs["refs/remotes/{remote}/{name}"] # real remote branches
and the diffs are merged in to branches[name].local_target and tags[name].
We also have branches[name].remote_targets[remote], but I think it's redundant
because a tracking branch should also be the last known state, not something
that can diverge from the actual state. To make that clear, this commit
replaces the use of the "merge" API.
As reported in #1970, SSH authentication would sometimes run into a
loop where it repeatedly tries to use ssh-agent for authentication
without making progess. The problem can be reproduced by simply
removing `$SSH_AUTH_KEY` from your environment (and not having a Git
credentials helper configured, I think).
This seems to be a bug introduced by b104f8e154c21. That commit meant
to make it so we attempt to use ssh-agent and fall back to using
(password-less) keys after that. The problem is that
`git2::Cred::ssh_key_from_agent()` just returns an object that will be
used later for looking up the credentials from ssh-agent, so the call
will not fail because ssh-agent is not reachable.
This commit attempts to fix the problem by having the credentials
callback attempt to use ssh-agent only once.
Perhaps the most important invariant in `.jj/working_copy/tree_state`
is that its set of files in it matches the files in its tree. In
particular, if a file that exists in the tree doesn't exist in the
file state and doesn't exist on disk either, we won't notice that it's
gone, and we will therefore not delete it from the tree on future
rounds of snapshotting either.
Now that we process the outputs from the file system traversal by
reading from channels, we can separate the processing from the file
system traversal. When the working copy is unchanged, processing tree
entries and deleted files takes practically no time, but processing
file states and present files takes significant time.