One of the remaining places we depend on index positions is when
creating a `ChangeIdIndex`. This moves that into the revset engine
(which is coupled to the commit index implementation) by adding a
`Revset::change_id_index()` method. We will also use this function
later when add support for resolving change id prefixes within a small
revset.
The current implementation simply creates an in-memory index using the
existing `IdIndex` we have in `repo.rs`.
The custom implementation at Google might do the same for small
revsets that are available on the client, but for revsets involving
many commits on the server, it might use a suboptimmal implementation
that uses longer-than-necessary prefixes for performance reasons. That
can be done by querying a server-side index including changes not in
the revset, and then verifying that the resulting commits are actually
in the revset.
The function is only used in tests, so it doesn't belong in
`default_revset_engine`. Also, it's not specific to that
implementation, so I rewrote as a revset evaluation.
I'd like to be able to pass a `self` of `type `&ReadonlyRepo` to
functions that take a `&dyn Repo`. For that, we need `ReadonlyRepo`
itself to implement `Repo` instead of having `Arc<ReadonlyRepo>`
implement it. I could have solved it in a different way, but the `Arc`
requirement seems like an unnecessary constraint.
The index position is specific to the default index implementation and
we don't want to use it in outside of there. This commit removes the
use of it as a key for nodes in the graphlog.
I timed it on the git.git repo using `jj log -r 'all()' -T commit_id`
(the worst case I can think of) and it slowed down from ~2.02 s to
~2.20 s (~9%).
For large repos, it's useful to be able to use shorter change id and
commit id prefixes by resolving the prefix in a limited subset of the
repo (typically the same subset that you'd want to see in your default
log output). For very large repos, like Google's internal one, the
shortest unique prefix evaluated within the whole repo is practically
useless because it's long enough that the user would want to copy and
paste it anyway.
Mercurial supports this with its `revisions.disambiguatewithin` config
(added in https://www.mercurial-scm.org/repo/hg/rev/503f936489dd). I'd
like to add the same feature to jj. Mercurial's implementation works
by attempting to resolve the prefix in the whole repo and then, if the
prefix was ambiguous, it resolves it in the configured subset
instead. The advantage of doing it that way is that there's no extra
cost of resolving the revset defining the subset if the prefix was not
ambiguous within the whole repo. However, there are two important
reasons to do it differently in jj:
* We support very large repos using custom backends, and it's probably
cheaper to resolve a prefix within the subset because it can all be
cached on the client. Resolving the prefix within the whole repo
requires a roundtrip to the server.
* We want to be able to resolve change id prefixes, which is always
done in *some* revset. That revset is currently `all()`, i.e. all
visible commits. Even on local disk, it's probably cheaper to
resolve a small revset first and then resolve the prefix within that
than it is to build up the index of all visible change ids.
We could achieve the goal by letting each revset engine respect the
configured subset, but since the solution proposed above makes sense
also for local-disk repos, I think it's better to do it outside of the
revset engine, so all revset engines can share the code.
This commit prepares for the new functionality by moving the symbol
resolution out of `Index::evaluate_revset()`.
We want to allow custom revset engines define their own graph
iterator. This commit helps with that by adding a
`Revset::iter_graph()` function that returns an abstract iterator.
The current `RevsetGraphIterator` can be configured to skip or include
transitive edges. It skips them by default and we don't expose option
in the CLI. I didn't bother including that functionality in the new
`iter_graph()` either. At least for now, it will be up to the
implementation whether it includes such edges (it would of course be
free to ignore the caller's request even if we added an option for it
in the API).
We want to allow customization of the revset engine, so it can query
server indexes, for example. The current revset implementation will be
our default implementation for now. What's left in the `revset` module
after this commit is mostly parsing code.
The tests adding and removing heads to the repo mostly want to verify
that the set of heads is expected. Some of them also check that
commits are available in the index. But they shouldn't care about the
exact index stats.
I don't think there's much to gain from making the index match exactly
what's reachable from the view. FWIW, our cloud-based implementation
at Google will probably make everyone's commits visible in the index
regardless of which operation they're at.
This is another step towards allowing a custom `jj` binary to have its
own index type. We're going to have a server-backed index
implementation at Google, for example.
In `git_fetch()`, any glob present in `globs` is an "allow" mark. Using
`&[]` to represent an "allow-all" may be misleading, as it could
indicate that no branch (only the git HEAD) should be fetched.
By using an `Option<&[&str]>`, it is clearer that `None` means that
all branches are fetched.
To be able to make e.g. `jj log some/path` perform well on cloud-based
repos, a custom revset engine needs to be able to see the paths to
filter by. That way it is able pass those to a server-side index. This
commit helps with that by effectively converting `jj log -r foo
some/path` into `jj log -r 'foo & file(some/path)'`.
I'm about to make `RepoLoader::init()` return a `Result`, and I don't
want to have to wrap that in a new error in
`ReadonlyRepo::load_at_head()` since that's only used in tests.
This should fix#1304. I think the added test simulates the behavior of
multiple rebase conflicts, but I don't have expertise around this.
add_index could be replaced with a peekable iterator, but the iterator version
wouldn't be as readable as the current implementation.
The type doesn't seem to provide any benefit. I don't think I had a
good reason for creating it in the first place; it was probably just
unfamiliarity with Rust.
I was thinking of replacing `RevsetIterator` by a regular
`Iterator<Item=IndexEntry>`. However, that would make it easier to
pass in an iterator that produces revisions in a non-topological order
into `RevsetGraphIterator`, which would produce unexpected results (it
would result in nodes that are not connected to their parents, if
their parents had already been emitted). I think it makes sense to
instead pass in a revset into `RevsetGraphIterator`.
Incidentally, it will also be useful to have the full revset available
in `RevsetGraphIterator` if we rewrite the algorithm to be more
similar to Mercurial's and Sapling's algorithm, which involves asking
the revset if it contains parent revisions.
We write conflict to the working copy by materializing them as
conflict markers in a file. When the file has been modified (or just
the mtime has changed), we parse the markers to reconstruct the
conflict. For example, let's say we see this conflict marker:
```
<<<<<<<
+++++++
b
%%%%%%%
-a
+c
>>>>>>>
```
Then we will create a hunk with ["a"] as removed and ["b", "c"] as
added.
Now, since commit b84be06c08, when we materialize conflicts, we
minimize the diff part of the marker (the `%%%%%%%` part). The problem
is that that minimization may result in a different order of the
positive conflict terms. That's particularly bad because we do the
minimization per hunk, so we can end up reconstructing an input that
never existed.
This commit fixes the bug by only considering the next add and the one
after that, and emitting either only the first with `%%%%%%%`, or both
of them, with the first one in `++++++++` and the second one in
`%%%%%%%`.
Note that the recent fix to add context to modify/delete conflicts
means that when we parse modified such conflicts, we'll always
consider them resolved, since the expected adds/removes we pass will
not match what's actually in the file. That doesn't seem so bad, and
it's not obvious what the fix should be, so I'll leave that for later.
It took a while before I realized that conflicts could be modeled as
simple algebraic expressions with positive and negative terms (they
were modeled as recursive 3-way conflicts initially). We've been
thinking of them that way for a while now, so let's make the
`ConflictPart` name match that model.
When we materialize modify/delete conflicts, we currently don't
include any context lines. That's because modify/delete conflicts have
only two sides, so there's no common base to compare to. Hunks that
are unchanged on the "modify" side are therefore not considered
conflicting, and since they they don't contribute new changes, they're
simply skipped (here:
3dfedf5814/lib/src/files.rs (L228-L230)).
It seems more useful to instead pretend that the missing side is an
empty file. That way we'll get a conflict in the entire file.
We can still decide later to make e.g. `jj resolve` prompt the user on
modify/delete conflicts just like `hg resolve` does (or maybe it
actually happens earlier there, I don't remember).
Closes#1244.
This is just a little preparation for extracting a `Repo` trait that's
implemented by both `ReadonlyRepo` and `MutableRepo`. The `index()`
function in that trait will of course have to return the same type in
both implementations, and that type will be `&dyn Index`.
Even though we don't know the details yet, we know that we want to
make the index pluggable like the commit and opstore
backends. Defining a trait for it should be a good step. We can refine
the trait later.
By separating the value spaces change ids and commit ids, we can
simplify lookup of a prefix. For example, if we know that a prefix is
for a change id, we don't have to try to find matching commit ids. I
think it might also help new users more quickly understand that change
ids are not commit ids.
This commit is a step towards that separation. It allows resolving
change ids by using hex digits from the back of the alphabet instead
of 0-f, so 'z'='0', 'y'='1', etc, and 'k'='f'. Thanks to @ilyagr for
the idea. The regular hex digits are still allowed.
Git's HEAD ref is similar to other refs and can logically have
conflicts just like the other refs in `git_refs`. As with the other
refs, it can happen if you run concurrent commands importing two
different updates from Git. So let's treat `git_head` the same as
`git_refs` by making it an `Option<RefTarget>`.
Add a new git.auto-local-branch config option. When set to false, a
remote-tracking branch imported from Git will not automatically create a
local branch target. This is implemented by a new GitSettings struct
that passes Git-related settings from UserSettings.
This behavior is particularly useful in a co-located jj and Git repo,
because a Git remote might have branches that are not of everyday
interest to the user, so it does not make sense to export them as local
branches in Git. E.g. https://github.com/gitster/git, the maintainer's
fork of Git, has 379 branches, most of which are topic branches kept
around for historical reasons, and Git developers wouldn't be expected
to have local branches for each remote-tracking branch.
I've preferred "working-copy commit" over "checkout" for a while
because I think it's clearer, but there were lots of places still
using "checkout". I've left "checkout" in places where it refers to
the action of updating the working copy or the working-copy commit.
`SimpleOpHeadsStore` currently stores its files in
`.jj/repo/op_heads/simple_op_heads/`. The `.jj/repo/op_heads/type`
file indicates the type of op-heads backend. If that contains
"simple_op_head_store", we use the `SimpleOpHeadsStore`
backend. There's no need for the `simple_op_heads` directory to also
indicate the type of backend in its name. I kept just the `heads` in
the name to make it less redundant with the parent directory (which is
`op_heads)`. We could alternatively call the directory `values` or
similar.
Make op resolution a closed operation, powered by a callback provided by the
caller which runs under an internal lock scope. This allows for greatly
simplifying the internal lifetime structuring.
- branches has the signature branches([needle]), meaning the needle is optional (branches() is equivalent to branches("")) and it matches all branches whose name contains needle as a substring
- remote_branches has the signature remote_branches([branch_needle[, remote_needle]]), meaning it can be called with no arguments, or one argument (in which case, it's similar to branches), or two arguments where the first argument matches branch names and the second argument matches remote names (similar to branches, remote_branches(), remote_branches("") and remote_branches("", "") are all equivalent)
I don't think Workspace::load() should be permissive in that regard.
WorkspaceLoader could provide such function, but I feel it's more like
CLI business. CLI can also look for parent '.git' directory to suggest
'jj init --git-repo=..' if needed.
We already have `create_random_commit()`, which returns a
`CommitBuilder`. Most callers directly write that to a
`MutableRepo`. That currently returns a `Commit`, but I'm about to
make it propagate errors from the backend. That would add an
`unwrap()` to this sequence, making it longer. Let's create a simple
helper for these callers to simplify this common pattern.
When you're done with the `CommitBuilder`, you're going to have to
call `write_to_repo()`, passing it a mutable `MutableRepo`
reference. It's a bit simpler to pass that reference when we create
the `CommitBuilder` instead, so that's what this patch does.
A drawback of passing in the mutable reference when we create the
builder is that we can't have multiple unfinished `CommitBuilder`
instance live at the same time. We don't have any such use cases yet,
and it's not hard to work around them, so I think this change is worth
it.
The next commit will introduce a newtype for -m/--message argument which
can be converted Into<String>.
Since CommitBuilder is a thin wrapper, code bloat caused by generic parameters
wouldn't matter. I have another set of commits that makes all builder methods
accept Into/IntoIterator, which will remove some of .clone() calls from tests.
I ran an upgraded Clippy on the codebase. All the changes seem to be
about using variables directly in format strings instead of passing
them as separate arguments.
This will be a building block of 'parents(base)' revset. 'base---' will
be .filter_by_generation(3..4) for example. I think 'ancestors(base)' can
also have an optional generation parameter, but I haven't considered any
particular syntax yet.
This basically transforms 's1 & (f() | s2)' to
's1.iter().filter(all && f || s2)'. Still the predicate part includes "all",
the filter function doesn't need to load commit data for every entry since
's1.iter().filter(all)' is tested first. To optimize "all" predicate out,
maybe we can add a wrapper that returns '|_: &IndexEntry| true'.
Instead of inserting AsFilter(_) node, I could add a recursive is_filter()
function. That would also work so long as the height of RevsetExpression tree
is limited. I chose node insertion just for ease of snapshot testing.
@yuja asked on #701 about the difference between the state in the
`git_export_view` and what we have in `mut_repo.view()`. It's true
that the branches in `mut_repo.view().git_refs()` should match what we
wrote to disk. We can therefore remove the on-disk storage and
simplify quite a bit. For now, I create the `last_export_view` from
the `mut_repo.view().git_refs()` before calling
`export_changes()`. I'll clean up a bit more next.
I think this is correct even considering e.g. undo. Let's consider
what would happen in a non-colocated Git repo (not because tricky
cases cannot happen there but because the explicit exports and imports
make it easier to discuss, and more cases can occur). If the user
moved a branch and then did `jj git export`, `jj undo`, and then `jj
git export` again, we would think on the second export that we should
perform the same changes to the Git repo, which should have no effect.
This patch also fixes the bug we were forced to work around in the
test case in the previous patch.
This removes one of our uses of Thrift.
This fixes the bugs shown by the tests added in the previous patch by
checking that the git branches we're about to update have not been
updated by git since our last export. If they have, we fail those
branches. The user can then re-import from the git repo and resolve
any conflicts before exporting again.
I had to update the `test_export_import_sequence` to make it
pass. That shows a new bug, which I'll fix next. The problem is that
the exported view doesn't get updated on import, so we would try to
export changes compared to an earlier export, even though we actually
knew (because of the `jj git import`) that the state in git had
changed.
If you update a branch using regular `git` (or some Git-based tool)
between two `jj git export`, we will overwrite that change if you had
also changed the branch in jj land. There's a similar problem if you
delete the branch in jj land. Let's have a test for that. I'm going to
make us not overwrite it soon. This patch adds a test for those cases,
plus many other cases in consistent way. Since the new test covers
some cases tested by existing tests, I removed those tests.
It seems that we didn't have a test for this simple case. I wrote this
test case while working on #111 but I don't know why I didn't push it
back then.
A new FileType, GitSubmodule is added which is ignored. Files or
directories having this type are not added to the work queue and
are ignored in snapshot. Submodules are not created by jujutsu
when resetting or checking out a tree, they should be currently
managed using git.
Because a unary negation node '~y' is more primitive than the corresponding
difference node 'x~y', '~y' is easier to deal with while rewriting the tree.
That's the main reason to add RevsetExpression::NotIn node.
As we have a NotIn node, it makes sense to add an operator for that. This
patch reuses '~' token, which I feel intuitive since the other set operators
looks like bitwise ops. Another option is '!'.
The unary '~' operator has the highest precedence among the set operators,
but they are lower than the ranges. This might be counter intuitive, but
useful because a prefix range ':x' can be negated without parens.
Maybe we can remove the redundant infix operator 'x ~ y', but it isn't
decided yet.
Let's acknowledge everyone's contributions by replacing "Google LLC"
in the copyright header by "The Jujutsu Authors". If I understand
correctly, it won't have any legal effect, but maybe it still helps
reduce concerns from contributors (though I haven't heard any
concerns).
Google employees can read about Google's policy at
go/releasing/contributions#copyright.
Follows up c5ed3e1477. Now change/commit ids are resolved at the same
precedence, which means there are at least three types of ambiguity.
I don't think we would need to discriminate these.
Because the use of the change id is recommended, any operation should abort
if a valid change id happens to match a commit id. We still try the commit
id lookup first as the change id lookup is more costly.
Ambiguous change/commit id is reported as AmbiguousCommitIdPrefix for now.
Maybe we can merge AmbiguousCommit/ChangeIdPrefix errors into one?
Closes#799
The CLI will load aliases from config, insert them one by one, and warn if
declaration part is invalid. That's why RevsetAliasesMap is a public struct
and needs to be instantiated by the caller.
To reduce conflicts between branches like `main` and `main/sub`, it's
better to first delete refs in git that have been deleted in jj, and
then add/update refs that have been added/updated in jj.
Since we now write a (partial) view object of the exported branches to
disk (since 7904474320), we can safely skip exporting some
branches. We already skip conflicted branches. This commit makes us
also skip branches that we fail to write to the backing Git repo,
instead of failing the whole operation (after possibly updating some
Git refs).
I made the `export_refs()` function return the branches that
failed. We should probably make that a struct later and have a
separate field for branches that we skipped due to conflicts.
Closes#493.
This adds a test for attempting to export both a branch called `main`
and one called `main/sub` (#493), as well as for exporting a branch
with an empty string as name (reported directly to me by @lkorinth).
The expression 'x ~ empty()' is identical to 'x & file(".")', but more
intuitive.
Note that 'x ~ empty()' is slower than 'x & file(".")' since the negative
intersection isn't optimized right now. I think that can be handled as
follows: 'x ~ filter(f)' -> 'x & filter(!f)' -> 'filter(!f, x)'
There are no "non-normal" files, so "normal" is not needed. We have
symlinks and conflicts, but they are not files, so I think just "file"
is unambiguous.
I left `testutils::write_normal_file()` because there it's used to
mean "not executable file" (there's also a `write_executable_file()`).
I left `working_copy::FileType::Normal` since renaming `Normal` there
to `File` would also suggest we should rename `FileType`, and I don't
know what would be a better name for that type.
We currently get the hostname and username from the `whoami` crate. We
do that in lib crate, without giving the caller a way to override
them. That seems wrong since it might be used in a server and
performing operations on behalf of some other user. This commit makes
the hostname and username configurable, so the calling crate can pass
them in. If they have not been passed in, we still default to the
values from the `whoami` crate.
When we export branches to Git, we didn't update our own record of
Git's refs. This frequently led to spurious conflicts in these refs
(e.g. #463). This is typically what happened:
1. Import a branch pointing to commit A from Git
2. Modify the branch in jj to point to commit B
3. Export the branch to Git
4. Update the branch in Git to point to commit C
5. Import refs from Git
In step 3, we forgot to update our record of the branch in the repo
view's `git_refs` field. That led to the import in step 5 to think
that the branch moved from A to C in Git, which conflicts with the
internal branch target of B.
This commit fixes the bug by updating the refs in the `MutableRepo`.
Closes#463.
As I said in the previous patch, I don't know why I made the initial
export to Git a no-op. Exporting everything makes more sense to
(current-)me. It will make it slightly easier to skip exporting
conflicted branches (#463). It also lets us remove a `jj export` call
from `test_templater.rs`.
To fix#463, I think we want to skip conflicted branches when we
export instead of erroring out. It seems we didn't have test case for
the current behavior, so let's add one.
This is a test case for #463. It's not exactly the same case, but I'm
confident that the root cause is the same (that the
`.jj/repo/git_export_operation_id` doesn't include the git refs we
just updated).
These calls often appear in expressions long enough that not having to
qualify it means that we can sometimes avoid wrapping a line. I
noticed because IntelliJ told me that `test_git.rs` had some
unnecessary qualificiations (the function was already imported there).
The `testutils` module should ideally not be part of the library
dependencies. Since they're used by the integration tests (and the CLI
tests), we need to move them to a separate crate to achieve that.
If you remove all refs from the backing Git repo and then run `jj git
import`, we would see that all commits disappeared from the Git repo,
so we would remove them from the jj repo too. However, we do that by
doing a history walk from old heads to the new heads, which includes
the root commit when the new heads is an empty set. That means that we
mark the root commit as abandoned, which led to a crash in
`rewrite.rs` (when we try pick the root commit's first parent to use
as parent for rebased commits).
I was trying to create a reproduction script for #412, but the script
ran into another bug first. The script removed all the local and
remote branches from the backing Git repo. I noticed that we would
then try to abandon all commits. We should still count Git HEAD's
target as visible and not try to abandon it. This patch fixes that.
Since 'merges()' just filters the candidates set per item, it doesn't need
a candidates argument. Perhaps, 'merges(x)' could be a predicate to select
merge commits within a subgraph 'x', but I don't know if that would be
useful.
Since d56ae79d3f, `WorkingCopy` no longer reads `.gitignores`
directly from `$HOME/.gitignore`, so we don't need the workaround to
prevent it in the tests.
More workspace-derived parameters will be added, and I don't think wrapping
with Option for each makes sense because all parameters should be available
if workspace exists.
Let WorkspaceCommandHelper clone it. WorkspaceCommandHelper could return
workspace_id by reference, but doing that would introduce noisy .clone()
calls and lifetime mess.
This changes `RepoLoader` to take a map of functions that load a
specific type of backend, keyed by the backend type. The backend type
is read from `.jj/repo/store/backend`.
The `ReadonlyRepo::init_*()` functions were unused or used only in
tests. Let's remove them, thereby making the repo less aware of
specific backend implementations.
In many of these places, we don't need an owned value, so using a
reference means we don't force the caller to clone the value. I really
doubt it will have any noticeable impact on performance (I think these
are all once-per-repo paths); it's just a little simpler this way.
This moves the logic for handling the root commit when writing commits
from `CommitBuilder` into the individual backends. It always bothered
me a bit that the `commit::Commit` wrapper had a different idea of the
number of parents than the wrapped `backend::Commit` had.
With this change, the `LocalBackend` will now write the root commit in
the list of parents if it's there in the argument to
`write_commit()`. Note that root commit itself won't be written. The
main argument for not writing it is that we can then keep the fake
all-zeros hash for it. One argument for writing it, if we were to do
so, is that it would make the set of written objects consistent, so
any future processing of them (such as GC) doesn't have to know to
ignore the root commit in the list of parents.
We still treat the two backends the same, so the user won't be allowed
to create merges including the root commit even when using the
`LocalBackend`.
I feel the original -------/+++++++ pair is slightly confusing because
each half can be a separator by itself. I don't know what character other
than '-'/'+' is preferred, but let's pick '%' (for "mod") per @martinvonz
suggestion.
`wc_commit` seems clearer than `checkout` and not too much longer. I
considered `working_copy` but it was less clear (could be the path to
the working copy, or an instance of `WorkingCopy`). I also considered
`working_copy_commit`, but that seems a bit too long.
This will be a basic building block of 'jj log PATH'. The implementation
is naive, but works fine for small repos like jj. For mid-size repos,
there would be various areas which need to be optimized.
Otherwise a file could be created out of the working copy directory.
This only works for untracked symlinks and sequentially "added" symlinks
and files. For "removed" and "modified" entries, the parent directories are
considered valid and fs::remove_file() will be called. This also doesn't
prevent race conditions caused by concurrent checkouts.
New create_parent_dirs() would be slightly slower than the original because
it traverses directories from the root whereas fs::create_dir_all() does that
from the leaf and exits when reached to a directory.
One advantage of our conflict marker style (compared to the usual
3-way markers) is that they provide the user with the diff between the
base and one side so the user doesn't have to do that in their head
(which is how I use 3-way markers anyway). However, since we currently
always use the "first" side for the diff, that diff can be larger than
if we had picked the other side, which makes the marker style worse
than the usual 3-way markers. This has bothered me many times and it's
about time we fix it.
The `CommitBuilder::store` field is used only in
`CommitBuilder::write_to_repo()`, but we can easily get access to the
`Store` from the `repo` argument there, so let's remove the field.
When rebasing commits after rewrites, we also update all workspaces'
checkouts. If the new commit is closed, we create a new commit on
top. Since we're hoping to remove the open/closed concept, we need a
new condition. I considered creating a new commit on top if the change
ID was different from before the rewrite. However, that would make at
least `jj split` more complicated because it makes the first commit
keep the change ID but it wants the second commit to be checked
out. This patch instead creates the new commit on top only when the
original commit was abandoned.
This patch makes us treat special files (e.g. Unix sockets) as absent
when snapshotting the working copy. We can consider later reporting
such files back to the caller (possibly via callback) so it can inform
the user about them.
Closes#258
If a commit's author field has the placeholder user/email values
(i.e. "(no name configured)" and "(no email configured)"), and they
have now configured their email and username, they probably want us to
update the author field with the new information, so that's what this
patch does. Thanks to durin42@ for the suggestion on #322.
We don't even have any settings that affect the repo, so there's no
point in passing the settings. I think this was a leftover from before
we separated out the "workspace" concept; now we no longer create a
working-copy commit when we initialize a repo (we do that when we
attach the workspace).
The request to show the log output with more recent commits at the
bottom comes up once in a while (among Mercurial users, and now also
for jj from @arxanas). It's pretty easy to implement by adding an
adapter to the current `RevsetGraphIterator`. It works by first
collecting all nodes and edges into a vector and then yielding them in
reverse order and with reversed edges. That means it's no longer lazy,
but that seems fine since the feature is optional. Also, it's only the
subset of nodes that are in the selected revset that will be
collected.
Making the CLI use the new iterator adapter will come in a later
patch.
I think I copied the name `write_tree()` from Git, but I find it quite
confusing, since it's not clear if it write a tree to the working copy
or reads the working copy and writes a tree to the store (it's the
former).
Now that I'm using GitHub PRs instead of pushing directly to the main
branch, it's quite annoying to have to abandon the old commits after
GitHub rebases them. This patch makes it so we compare the remote's
previous heads to the new heads and abandons any commits that were
removed on the remote. As usual, that means that descendants get
rebased onto the closest remaining commit.
This is half of #241. The other half is to detect rewritten branches
and rebase on top.
Let's say we have a simple history like this:
```
B C D
\|/
A
```
Branch `main` initially points to commit B. Two concurrent operations
then move the branch to commits C and D. When the two concurrent
operations get merged, the branch will be recorded as pointing to
"C+D-B". If a subsequent operation now abandons commit B, we would
update the "removed" side of the branch conflict. That seems a little
dishonest. I think the reason I did it that way was in order to not
keep B visible back when having it present in the "removed" side would
keep it visible (before 33bf6ce1d5).
I noticed this issue while working on #241 because
`test_import_refs_reimport()` started failing. That test case is
pretty much exactly the case above.
This patch makes room for sparse patterns in the `TreeState` proto
message. We also start setting that value to a list of just the
pattern `.` when we create new working copies. Old working copies
without the sparse patterns are also interpreted as having that single
pattern. Note that this absence of sparse patterns is different from a
present list of no patterns. The latter is a valid state and means
that no paths are included in the sparse checkout.
The `DescendantRebaser` keeps a map of branches from the source
commit, so it gets efficient lookup of branches to update when a
commit has been rebased. This map was not kept up to date as we
rebased. That could lead to branches getting left on hidden
intermediate commits. Specifically, if a commit with a branch was
rewritten by some command, and an ancestor of it was also rewritten,
then we'd only update the branch only the first step and not update it
again when rebasing onto the rewritten ancestor.
When a directory is missing in one merge input (base or one side), we
would consider that a merge conflict. This patch changes that so we
instead merge trees by treating the missing tree as empty.
This introduces a `connected(x)` function, which is simply the same as
`x:x`. It's occasionally useful if `x` is a long expression. It's also
useful as a building block for `root(x)` (coming soon).
We do it for all the other kinds of objects already. It's useful to
have the path for backends that store objects by path (we don't have
any such backends yet). I think the reason I didn't do it from the
beginning was because we had separate `RepoPath` types for files and
directories back then.
We depend on comparing the workspace root with the Git repo's path to
know if we're sharing the working copy with it. For that to work
reliably, we need the paths to be canonicalized, so that's what this
patch tries to do.
There was a TODO about adding a test case for a delete/modify conflict
in a branch target that got resolved by abandoning a commit. The
resolution is to delete the branch. That case couldn't happend with
our old evolution-based mechanism for tracking rewrites (because we
couldn't un-prune a commit then).
If we have recorded in `MutableRepo` that commits have been abandoned
or rewritten, we should always rebase descendants before committing
the transaction (otherwise there's no reason to record the
rewrites). That's not much of a risk in the CLI because we already
have that logic in a central place there (`finish_transaction()`), but
other users of the library crate could easily miss it. Perhaps we
should automatically do any necessary rebasing we commit the
transaction in the library crate instead, but for now let's just have
a check for that to catch such bugs.
It's unusual for the current commit to have descendants, but it can
happen. In particular, it can easily happen when you run `jj new`. You
probably don't want to abandon it in those cases.
The library crate shouldn't look up the user's `$HOME` directory
(maybe the library is used by a server process), so let's have the
caller pass it into the library crate instead.
We no longer need the commit ID, so we shouldn't make the callers pass
it. This lets us simplify several tests, because they no longer to
create commits just to check out a tree in the working copy.
We used to use the value to detect races, but we use the tree ID and
the operation ID these days, so we don't need the commit ID.
By changing this, we can avoid creating some commit IDs in tests,
which is why I tackled this issue now.
There are only two callers of `LockedWorkingCopy::check_out()`. One is
in `commands.rs`. That caller already checks after taking the lock
that the old commit ID is as expected. The other caller is
`WorkingCopy::check_out()`. We can simply move the check to that level
since it's the only caller that cares now.
We resolve checkouts in favor of the first-committed operation (which
is more likely to have managed to update the working copy). The test
case has been flaky on GitHub lately. I've run it 1000 times on my
machine without failure. I don't know if GitHub's machines are just
faster in some way (SSD, maybe) that makes them finish the two
operations in the test in the same millisecond. Let's add a
1-millisecond sleep to see if that helps. If it doesn't, then maybe
the issue is that the clock has lower precision (or their clocks can
go backwards?).
`LockedWorkingCopy::discard()` shouldn't result in changes to the
on-disk state, but `LockedWorkingCopy::check_out()` may have already
written a state file, which is surprising. The changes also remain in
memory, which is also surprising. Let's fix both of those issues.
One of the .gitignore tests writes a tree from the working copy
twice. However, it discards the `LockedWorkingCopy` instance after the
first write, so the second write shouldn't really see the changes from
the first write. It does see them because we don't clear them in
memory (and we also surprisingly write them to disk). I'm about to fix
that, so the test needs to be fixed first.