I think most tests want a `MergedTree`, so this makes `create_tree()`
return that. I kept the old function as `create_single_tree()`. That's
now only used in `test_merge_trees` and `test_merged_tree`.
I also consistently imported the functions now, something I've
considered doing for a long time.
I made it simply fail on explicit fetch/import, and ignored on implicit import.
Since the error mode is predictable and less likely to occur. I don't think it
makes sense to implement warning propagation just for this.
Closes#1690.
With the already existing `MergedTree::resolve()` and all the recent
refactorings into `Merge<T>`, it's now very easy to add support for
3-way merging of `MergedTree` instances.
As #2165 showed, when diffing two `MergedTree::Legacy` variants (or
one of each variant) and re recurse into a subtree, we need to treat
that as a legacy tree too, so we expand `TreeValue::Conflict`s found
in the diff.
I'm going to change the parsing rule of name@remote, and @ will no longer be
included in a symbol identifier. I could add a separate test for remote symbols,
but I think it's better to write tests that cover both "x"@"y" and "x@y" paths.
An alternative name for it would be `arity()`, but `num_sides()`
probably more clearly says that it's not about the number of removes
or the total number of terms.
To support tree-level conflicts, we're going to need to update the
working copy from one `MergedTree` to another. We're going need to
store multiple tree ids in the `tree_state` file. This patch gets us
closer to that by getting the diff from `MergedTree`s`, even though we
assume that they are legacy trees for now, so we can write to the
single-tree `tree_state` file.
If we're going to be able to replace most instances of `Tree` by
`MergedTree`, we'll need to be able to diff two `MergedTree`s. This
implements support for that. The implementation copies a lot from the
diff iterator we have for `Tree`. I suspect we should be able to reuse
some of the code by introducing some traits that can then be
implemented by both `Tree` and `MergedTree`. I've left a TODO about
that.
`revset::parse()` already has a `RevsetWorkspaceContext` argument, so
I think it makes sense to put that and the other context arguments
into a larger `RevsetParseContext` object.
We resolve file paths into repo-relative paths while parsing the
revset expression, so I think it's consistent to also resolve which
workspace "@" refers to while parsing it. That means we won't need the
workspace context both while parsing and while resolving symbols.
In order to break things like `author("martinvonz@")` (thanks to @yuja
for catching this), I also changed the parsing of working-copy
expressions so they are not allowed to be
quoted. `author(martinvonz@)` will therefore be an error now. That
seems like a small improvement anyway, since we have recently talked
about making `root` and `[workspace]@` not parsed as other symbols.
Per discussion in #2107, I believe "exact" is preferred.
We can also change the default to exact match, but it doesn't always make
sense. Exact match would be useful for branches(), but not for description().
We could define default per predicate function, but I'm pretty sure I cannot
remember which one is which.
This commit replaces the functions `UserSettings::user_name_placeholder()`` and
`UserSettings::user_email_placeholder()` with `const` `&str`s to emphasize that
the placeholder strings must not be changed to support commits without
names or email addresses made before this change.
The syntax is slightly different from Mercurial. In Mercurial, a pattern must
be quoted like "<kind>:<needle>". In JJ, <kind> is a separate parsing node, and
it must not appear in a quoted string. This allows us to report unknown prefix
as an error.
There's another subtle behavior difference. In Mercurial, branch(unknown) is
an error, whereas our branches(literal:unknown) is resolved to an empty set.
I think erroring out doesn't make sense for JJ since branches() by default
performs substring matching, so its behavior is more like a filter.
The parser abuses DAG range syntax for now. It can be rewritten once we remove
the deprecated x:y range syntax.
`update_from_content()` already writes file content for each term of
an unresolved merge, so it seems consistent for it to also write the
file content for resolved merges. I think this should simplify further
refactoring for tree-level conflicts and for preserving the executable
bit.
Since `update_from_contents()` only works with file contents and not
the executable or other kinds of paths, I think it makes more sense
for it to deal with `FileId`s instead of `TreeValue`s.
I think I moved way too many functions onto `Merge<Option<TreeValue>>`
in 82883e648d. This effectively reverts almost all of that
commit. The `Merge<T>` type is simple container and it seems like it
should be at fairly low level in the dependency graph. By moving
functions off of it, we can get rid of the back-depdencies from the
`merge` module to the `conflict` module that I introduced when I moved
`Merge` to the `merge` module. I'm thinking the `conflict` module can
focus on materialized conflicts.
Per discussion in #2009. This behavior isn't affected by e7e49527ef "git:
ensure that remote branches never diverge", but it's subtle enough to write
a test.
Since `Conflict<T>` can also represent a non-conflict state (a single
term), `Merge<T>` seems like better name.
Thanks to @ilyagr for the suggestion in
https://github.com/martinvonz/jj/pull/1774#discussion_r1257547709
Sorry about the churn. It would have been better if I thought of this
name before I introduced `Conflict<T>`.
`.gitignores` in ignored directories should be ignored. Before this
commit, we would visit ignored directories like any others if there
were any ignored paths in them.
I've done a lot of preparation for this commit, but There's still a
bit of duplication between the new code and the existing code. I don't
mind improving it if anyone has suggestions. Otherwise I might end up
doing that when I get back to working on snapshotting tree-level
conflicts soon.
This fixes#1785.
It's currently the same code path for handling changes to tracked
paths in ignored directories as outside ignored directories, but I'm
about to change that.
I also updated the assertion in the test to compare all entries
instead of just the tree id, so it's easier to spot errors if it
fails.
It seemed awkward if merged PR is sometimes rendered as a first branch.
Instead of sorting edges in index order, let's build a HashSet only when
deduplication is needed.
It's currently a bit complicated to snapshot the working copy and
there's a lot of duplication in tests. This commit introduces a
function to simplify it. I made the function snapshot the working copy
and save the updated state. Some of the tests I changed previously
discarded the changes instead of saving them, but I think they all did
so because it was simpler. I left a few call sites unchanged because
they make concurrent changes.
This is breaking change. Old jj binary will panic if it sees a view saved by
new jj. Alternatively, we can store both new and legacy data for backward
compatibility.
This also lets us compare the resulting tree because the working copy
now exactly matches the tree (it used to be that the `.gitignore` file
wasn't initially snapshotted).
We don't seem to have any tests that our protection from undetected
changes caused by writes happening right after checkout, so let's add
one. The test case loops 100 times and each iteration fails slightly
more than 80% of the time on my machine (if I remove the protection in
`TreeState::update_file_state()`), so it seems quite good at
triggering the race.
I don't see any reason for these tests to differ depending on backend,
so let's avoid running them twice (there are probably lots of other
tests that we're also running with different backends for little
reason).
Copied from MergedTree::has_conflict(). I feel it's slightly better since
RefTarget "is" always a Conflict-based type. It could be inverted to
RefTarget::is_resolved(), but refs are usually resolved, and all callers
have special case for !is_resolved() state.
With `MergedTree`, we can iterate over conflicts by descending into
only the subdirectories that cannot be trivially resolved. We assume
that the trees have previously been resolved as much as possible, so
we don't attempt to resolve conflicts again.
This adds a function for resolving conflicts that can be automatically
resolved, i.e. like our current `merge_trees()` function. However, the
new function is written to merge an arbitrary number of trees and, in
case of unresolvable conflicts, to produce a `Conflict<TreeId>` as
result instead of writing path-level conflicts to the backend. Like
`merge_trees()`, it still leaves conflicts unresolved at the file
level if any hunks conflict, and it resolves paths that can be
trivially resolved even if there are other paths that do conflict.
In order to store conflicts in the commit, as conflicts between a set
of trees, we want to be able merge those trees on the fly. This
introduces a type for that. It has a `Merge(Conflict(Tree))` variant,
where the individual trees cannot have path-level conflicts. It also
has a `Legacy(Tree)` variant, which does allow path-level conflicts. I
think that should help us with the migration.
Alternatively, we can wrap BTreeMap<String, Option<RefTarget>> to flatten
Option<&Option<..>> internally, but doing that would be tedious. It would
also be unclear if map.remove(name) should construct an absent RefTarget if
the ref doesn't exist.
The next commit will change these maps to store Option<RefTarget> entries, but
None entries will still be omitted from the serialized data. Since ContentHash
should describe the serialized data, relying on the generic ContentHash would
cause future hash conflict where absent RefTarget entries will be preserved.
For example, ([remove], [None, add]) will be serialized as ([remove], [add]),
and deserialized to ([remove], [add, None]). If we add support for lossless
serialization, hash(([remove], [None, add])) should differ from the lossy one.
Summary: Unneeded with the MSRV bump. The other one
will have to wait until Rust 1.72.0
Signed-off-by: Austin Seipp <aseipp@pobox.com>
Change-Id: Ifb3d862aedd3f2aeb05a86ce76978d4f
Since RefTarget will be reimplemented on top of Conflict<Option<CommitId>>,
we won't be able to simply return a slice of type &[CommitId]. These functions
are also renamed in order to disambiguate from Conflict::adds()/removes().
We saw a failure in `test_merge_views_git_heads()` in the GitHub CI,
but I wasn't able to reproduce it locally. Using the
`commit_transactions()` helper in the test should fix it.
It's named after Conflict::from_legacy_form(). If RefTarget is migrated to
new Conflict type, from_legacy_form([], [add]) will create a normal target,
and from_legacy_form([], []) will be equivalent to the current None target.
That's why this function isn't named as RefTarget::conflict().
If RefTarget is migrated to new Conflict type, this function will create
RefTarget(Conflict::resolved(Some(id))).
We still need some .unwrap() to insert Option<RefTarget> into map, but maps
will be changed to store new RefTarget type, and their mutation API will
guarantee that Conflict::resolved(None) is eliminated.
Perhaps, all view.get_<kind>(name) functions can return reference, but I'm
not willing to change the interface at this point. I'll revisit this after
migrating Option<RefTarget> to new Conflict-based type.
If we migrate RefTarget to new Conflict-based type, it won't store
Conflict<CommitId>, but Conflict<Option<CommitId>>. As the Option will
be internalized, new RefTarget type will also represent an absent target.
The 'target: Option<RefTarget>' argument will be replaced with new RefTarget
type.
I've also renamed the function for consistency with the following changes.
It would be surprising if set_local_branch(name, target) could remove the
branch. I feel the name set_local_branch_target() is less confusing.
The order of conflicted ids slightly changed since Conflict::simplify()
tries to preserve diff pairs. It shouldn't matter so long as the result is
stable.
Git CLI rejects it (though the data model would probably work fine with
"HEAD" branch.) This ensures that HEAD@git in JJ world is not an exported
branch named "HEAD".
Almost everyone calls the project "jj", and there seeems to be
consensus that we should rename the crates. I originally wanted the
crates to be called `jj` and `jj-lib`, but `jj` was already
taken. `jj-cli` is probably at least as good for it anyway.
Once we've published a 0.8.0 under the new names, we'll release 0.7.1
versions under the old names with pointers to the new crates names.
It seemed too verbose to always include @remote branches, so synced remotes
are omitted by default. If the given symbol contained '@', all remote symbols
are populated so that the distance of remote fragment is taken into account.
Since `Conflict`s can represent the resolved state, so
`Conflict<ContentHunk>` can represent the states that we use
`MergeHunk` for. `MergeHunk` does force the user to handle the
resolved case, which may be useful. I suppose one could use the same
argument for making `Conflict` an enum, i.e. if we think that
`MergeHunk`'s two variants are beneficial, then we should consider
making `Conflict` an enum with those two variants.
It's useful to have a more readable `Debug` format for `Vec<u8>`
(`"foo"` is better than `[102, 111, 111]`). It might also make types
in function signatures and elsewhere more readable.
This only parses the fields relevant to us, i.e.:
- name: the stable identifier of the submodule
- path: the path to the submodule in the current commit
- url: the remote we can clone the submodule from
The full list of .gitmodules fields can be found at
https://git-scm.com/docs/gitmodules.
This commit fixes#1305
Before this commit, running `jj init --git-repo=./` in a folder that
does not have a .git would cause jj to panick and leave an unfinished corrupted jj repo.
This commit fixes that by changing the call chain to return an error
instead of calling .unwrap() and panicking. This commit also adds logic to delete the unfinished jj
repository when the git backend initialization failed.
Before this commit, running the above command would result in the following
```
Running `jj/target/debug/jj init --git-repo=./`
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { code: -3, klass: 2, message: "failed to resolve path '/Users/kevincliao/github/jj/test-repo/.jj/repo/store/../../../.git': No such file or directory" }', lib/src/git_backend.rs:83:75
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```
After this commit, the result is the following and the jj repo is deleted:
```
Running `jj/target/debug/jj init --git-repo=./`
Error: Failed to access the repository: Error: Failed to open git repository: failed to resolve path '/Users/kevincliao/github/jj/test-repo/.jj/repo/store/../../../.git': No such file or directory; class=Os (2); code=NotFound (-3)
```
..and other assorted boilerplate. These are just stubs for now, but now
that we've reserved the `submodule_store` subdirectory, we can start
adding more functionality.
Before we had `conflicts::Conflict`, most of these functions took a
`backend::Conflict`. I think I didn't want to pollute the `backend`
module with this kind of logic, trying to keep it focused on
storage. Now that we have the type in `conflicts`, however, I think it
makes sense to move these functions onto it.
This gets rid of round-trip conversion from queries like "(main..)-". I have
such expression in my default log/disambiguation revset, and the query could
take ~150ms to convert head positions back and forth if the repository had
tons of unmerged commits.
For tree-level conflicts (#1624), I plan to remove `ConflictId`
completely. This commit removes `ConflictId` from
`update_conflict_from_content()` by instead making it take a
`Conflict<Option<TreeValue>>` and return a possibly different such
value.
I made the call site in `working_copy` avoid writing the conflict to
the store if it's unchanged, but I didn't make the same optimization
in `merge_tools` becuase it's much more likely to have changed there.
Use `br@git` instead.
Before, if there is not a local branch `br`, jj tried to resolve
it as a git ref `refs/heads/br`. Unchanged from before, `br` can
still be resolved as a tag `refs/tag/br`.
Currently, if the user modifies a modify/delete conflict, we always
consider the result resolved. That happens because we materialize the
missing side of the conflict as an empty string but when we parse the
conflict, we expect only the number of sides in the input
conflict. For example, if the input is a regular modify/delete
conflict with one remove and one add, the materialized markers will
have one remove and two adds (one of them empty), but when we try to
parse it, we expect one remove and only one add. When we fail to parse
it, we consider it resolved.
This commit fixes the bug by using
`conflicts::Conflict<Option<TreeValue>>` and keeping track of which
sides were supposed to be empty. We could have fixed the bug without
switching to `conflicts::Conflict`, but we want to switch anyway, and
the fix happens naturally when switching.
I made a typo and got something like this:
```
Error: Commit or change id prefix "wl" is ambiguous
```
Since we can tell commit ids from change ids these days, let's make
the error message say which kind of id it is. Changing that also kind
of forced me to make a special error for empty strings. Otherwise we
would have to arbitrarily say that an empty string is a commit id or
change id. A specific error message for empty strings seems helpful,
so that's probably for the better anyway.
We can't get rid of the other "impl Index"es because .as_composite() must
return a real reference type. Maybe we could turn CompositeIndex into an
owned wrapper, but I don't know if that would be worth the effort.
It might sound scary to add public .mutable_index() accessor, but I think
it's okay because immutable MutableIndex reference has no more power than
Index.
This allows us to implement Index for lifetime-bound type such as
CompositeIndex<'_>.
The idea is that .as_composite() is equivalent to .as_index(), but for the
implementation type. I'm going to add "impl Index for CompositeIndex" to
clean up index references passed to revset engine.
Otherwise, "jj init --git-repo ." would create extra table files per commit,
and merge them.
I considered adding an explicit GitBackend method to be called from
git::import_refs(), but the call order matters. The method should be invoked
before calling store.get_commit(..) or mut_repo.add_head(..). Since commits
are likely to be loaded from the head, we can instead make read_commit()
import ancestor metadata at all.
Alternatively, we could make a Git commit hidden until it's inserted into
the extra table. It's rather big change, and I wouldn't like to do that
without thinking more thoroughly.
My first attempt was to fix up corrupted index when merging, but it turned
out to be not easy because the self side may contain corrupted data. It's
also possible that two concurrent commit operations have exactly the same
view state (because change id isn't hashed into commit id), and only the
table heads diverge.
#924
This bug concerns the way `import_refs` that gets called by `fetch` computes
the heads that should be visible after the import.
Previously, the list of such heads was computed *before* local branches were
updated based on changes to the remote branches. So, commits that should have
been abandoned based on this update of the local branches weren't properly
abandoned.
Now, `import_refs` tracks the heads that need to be visible because of some ref
in a mapping keyed by the ref. If the ref moves or is deleted, the
corresponding heads are updated.
Fixes#864
Now that we return the written commit from `write_commit()`, let's
make the timestamps match what was actually written, accounting for
the whole-second precision and the adjustment we do to avoid
collisions.
This has several advantages:
* Makes it possible to downcast to non-Git custom backends (might be
useful at Google, but we haven't needed it yet)
* Lets us access more specific functionality on the `GitBackend`,
making it possible to access the `git2::Repository` without
creating a copy of it.
* Removes the dependency on Git from the backend