This partially reverts the change in 30fb7995c2 "view: make local/remote
branches iterator yield RemoteRef instead of RefTarget." As I'm going to add
diff function for RemoteRef pairs, we'll need a generic version of merge-join
iterator anyway.
We need to let async-ness propagate up from the backend because
`block_on()` doesn't like to be called recursively. The conflict
materialization code is a good place to make async because it doesn't
depends on anything that isn't already async-ready.
It seems we'll end up using `block_on()` quite a bit, at least until
we're done transitioning to async, and the function name doesn't
conflict with anything else, so let's always import it when we need
it.
We can provide more actionable error message than "not fast-forwardable". If
the push was fast-forwardable, "jj branch track" should be able to merge the
remote branch without conflicts, so the added step would be minimal.
Although this is logically correct, the error message is a bit cryptic. It's
probably better to reject push if non-tracking remote branches exist.
#1136
We'll use remote_ref.tracking_target() to classify push action, but not all
callers of local_remote_branches() need tracking_target() instead of target.
This means that the commits previously pinned by remote branches are no longer
abandoned. I think that's more correct since "push" is the operation to
propagate local view to remote, and uninteresting commits should have been
locally abandoned.
Since I'm going to make git::push_branches() update the repo view internally,
it should fail fast if the remote name is reserved. Before, the problem was
detected on git::import_refs().
Since pushed remote branches will share the common base targets with locals,
these branches should be marked as tracking. git::push_branches() will handle
that. It looks ugly that the public GitBranchPushTargets type keeps "force"-d
branches as a separate set, but we'll need to rework that anyway when we
implement --force-with-lease behavior. So let's leave it for now.
Some of the git::push_updates() tests have been migrated to the new function.
I left a couple of basic tests for git::push_updates() because push_updates()
will be used to implement a low-level "jj git push-refs" command.
I made import_refs() not preserve commits referenced by remote branches at
520f692a46 "git: on import_refs(), don't preserve old branches referenced by
remote refs." The idea is that remote branches are weak, and commits referenced
by these refs can be freely rewritten by future local changes without moving
the refs. I don't think that's wrong, but 520f692a46 also made "new" remote
changes be abandoned by old remote refs. This problem occurs only when
git.auto-local-branch is off.
I think there are two ways to fix the problem:
a. pin non-tracking remote branches just like local refs
b. pin newly fetched refs in addition to local refs
This patch implements (b) because it's simpler and more obvious that the
fetched commits would never be abandoned immediately.
This add support for custom `jj` binaries to use custom working-copy
backends. It works in the same way as with the other backends, i.e. we
write a `.jj/working_copy/type` file when the working copy is
initialized, and then we let that file control which implementation to
use (see previous commit).
I included an example of a (useless) working-copy implementation. I
hope we can figure out a way to test the examples some day.
This makes `Workspace::load()` look a new `.jj/working_copy/type` file
in order to load the right working copy implementation, just like
`Repo::load()` picks the right backends based on `.jj/store/type`,
`.jj/op_store/type`, etc. We don't write the file yet, and we don't
have a way of adding alternative working copy implementations, so it
will always be `LocalWorkingCopy` for now.
Our internal working copy implementations at Google will need the
commit so they can walk history backwards until they get to a "public"
commit. They'll then use that to tell build tools and virtual file
systems to present that as a base.
I'm not sure if we'll need to update `reset()` too. It's currently
only used by `jj untrack`, which doesn't change the commit's parent,
so it wouldn't affect any history walks.
`ReadonlyRepo::init()` takes callbacks for initializing each kind of
backend. We called these things like `op_store_initializer`. I found
that confusing because it is not a `OpStoreFactory` (which is for
loading an existing backend). This patch tries to clarify that by
renaming the arguments and adding types for each kind of callback
function.
This patch adds MutableRepo::track_remote_branch() as we'll probably need to
track the default branch on "jj git clone". untrack_remote_branch() is also
added for consistency.
We could instead migrate the storage types to (local_branches, remote_views),
but that would be more involved and break forward compatibility with little
benefit. Maybe we can do that later when we introduce remote tags.
The state field isn't saved yet. git import/export code paths are migrated,
but new tracking state is always calculated based on git.auto-local-branch
setting. So the tracking state is effectively a global flag.
As we don't know whether the existing remote branches have been merged in to
local branches, we assume that remote branches are "tracking" so long as the
local counterparts exist. This means existing locally-deleted branch won't
be pushed without re-tracking it. I think it's rare to leave locally-deleted
branches for long. For "git.auto-local-branch = false" setup, users might have
to untrack branches if they've manually "merged" remote branches and want to
continue that workflow. I considered using git.auto-local-branch setting in the
migration path, but I don't think that would give a better result. The setting
may be toggled after the branches got merged, and I'm planning to change it
default off for better Git interop.
Implementation-wise, the state enum can be a simple bool. It's enum just
because I originally considered to pack "forgotten" concept into it. I have
no idea which will be better for future extension.
It's going to be easier to define a `LockedWorkingCopy` trait if it
doesn't need to borrow from `WorkingCopy`, so let's remove the
reference we currently have and have
`LockedLocalWorkingCopy::finish()` return the new `LocalWorkingCopy`
instead.
I think the main disadvantage is that we now have to remember to
replace the old `LocalWorkingCopy` instance by the new one, whereas
the compiler would remind us before this commit. We could make
`start_modification()` take an owned `self`, but that would be a bit
annoying to work with when we have the instance stored in a field.
I'm about to make `LockedLocalWorkingCopy` not borrow from
`LocalWorkingCopy`. That will make it easier to forget to update any
`LocalWorkingCopy` variables when the modifications have been
committed. This patch introduces a wrapper around
`LockedLocalWorkingCopy` to help prevent that.
Thanks to Yuya for the suggestion.
`LocalWorkingCopy::check_out()` can be expressed using the planned
`WorkingCopy` trait, so it doesn't need to be in the trait itself
`WorkingCopy`. I wasn't sure if I should make it a free function in
`working_copy`, but I ended up moving it onto `Workspace`.
Since set_remote_branch_target() is called while merging refs, its tracking
state shouldn't be reinitialized. The other callers are migrated to new setter
to keep the story simple.
This isn't important, but I'm going to change remote_targets to store RemoteRef
instead of RefTarget, so I went ahead and change the other field types as well.
We could fix do_git_clone() instead, but it seemed a bit weird that the
git_repo_path is relative to the store path which is unknown to callers.
Fixes#2374
There's a subtle behavior change. Unlike the original remove_remote_branch(),
remote_views entry is not discarded when the branches map becomes empty. The
reasoning here is that the remote view can be added/removed when the remote
is added/removed respectively, though that's not implemented yet. Since the
serialized data cannot represent an empty remote, such view may generate
non-unique content hash.
These functions depend heavily on the underlying data structure, and I haven't
decided abstract View API to access to per-remote data types. Let's use the
underlying data type for now.
I'm planning to add support for untracked remote branches, and under that
model, there will be many remote branches without local counterparts. That's
the main reason why remote branches are grouped by remote, not by branch name.
The added helper functions will be used by simple_op_store and view.
get_branch() would need to reconstruct the remote_targets map if we migrate
the underlying data structure to per-remote views. Let's remove the method as
it is only used in tests.
It seems pretty clear from the context. Turns out we only use the
function in a test case. Maybe we don't even need it. It's easy to
provide it, though.
The `TreeStateError` type is specific to the current local-disk
working-copy backend, so it should not be part of the generic
working-copy interface I'm trying to create.
I think some of the errors variants in `CheckoutError` are too
specific to the local-disk implementation. Let's merge them and make
them less specific, so it's easier to define a reasonable trait for
the working copy.
As I'm going to split branches into per-remote map, .get_branch(name) will need
to gather remote branches by name to construct remote_targets map. Let's instead
iterate local/remote branches separately. I also migrated diffing of the other
kinds of refs to filter out unchanged entries upfront.
As we now diff incoming git refs against our known remote branches, the problem
described in the comment no longer occurs. test_branch_forget_fetched_branch()
passes, and the inline comments in the test are still valid.
As we need to build a set of all branch names anyway, we can also put old/new
targets there. InvalidGitName is moved to caller since the diff function no
longer converts RefName to "refs/" string.
The idea is that the "remote" refs could have been "op restore"-d whereas
view.git_refs() will never be. The next commit will update known_remote_refs
to be constructed from the current remote branches.
Instead of building these lists in a single loop, we could load new git_refs
to the view first, and then build diffs of the remote refs. I considered that,
but I feel it would be a bit awkward to update refs before importing commits
to the view.
The "remote" refs are stored in BTreeMap since merging order should be stable.
As I'm going to add separate lists of changed git_refs/remote_refs, it'll
become a bit unclear which one we should check for reserved remotes. The
diff might also be reorganized as a list of (remote, name, kind, old_target,
new_target) where remote == "git" means the git-tracking branch. In this
data structure, the notion of reserved remote name would be lost.
I'm going to rewrite `TreeDiffIterator` to fetch one level (depth) of
the tree at a time and concurrently. One step towards that is to
convert the iterator to a `Stream`. I'd like to do that by making the
current `Iterator` implementation call the new `Stream`
implementation. However, we can't call `futures::executor::block_on()`
on a future that itself calls `futures::executor::block_on()` (as
`Store::read_tree()` does), so the first step is to bubble up the
async-ness a bit. This patch does that by fetching both sides of the
diff concurrently. That should give close to a 2x speedup on
high-latency backends. (It doesn't help with our backend at Google,
however, because we have a daemon process that does some speculative
prefetching that usually downloads the child trees anyway.)
`futures::stream::Stream::collect()` requires a collection that
implements `Default` and `Extend`, and I would like to to be able to
collect a stream of trees.
The commit backend at Google is cloud-based (and so are the other
backends); it reads and writes commits from/to a server, which stores
them in a database. That makes latency much higher than for disk-based
backends. To reduce the latency, we have a local daemon process that
caches and prefetches objects. There are still many cases where
latency is high, such as when diffing two uncached commits. We can
improve that by changing some of our (jj's) algorithms to read many
objects concurrently from the backend. In the case of tree-diffing, we
can fetch one level (depth) of the tree at a time. There are several
ways of doing that:
* Make the backend methods `async`
* Use many threads for reading from the backend
* Add backend methods for batch reading
I don't think we typically need CPU parallelism, so it's wasteful to
have hundreds of threads running in order to fetch hundreds of objects
in parallel (especially when using a synchronous backend like the Git
backend). Batching would work well for the tree-diffing case, but it's
not as composable as `async`. For example, if we wanted to fetch some
commits at the same time as we were doing a diff, it's hard to see how
to do that with batching. Using async seems like our best bet.
I didn't make the backend interface's write functions async because
writes are already async with the daemon we have at Google. That
daemon will hash the object and immediately return, and then send the
object to the server in the background. I think any cloud-based
solution will need a similar daemon process. However, we may need to
reconsider this if/when jj gets used on a server with a custom backend
that writes directly to a database (i.e. no async daemon in between).
I've tried to measure the performance impact. That's the largest
difference I've been able to measure was on `jj diff
--ignore-working-copy -s --from v5.0 --to v6.0` in the Linux repo,
which increases from 749 ms to 773 ms (3.3%). In most cases I've
tested, there's no measurable difference. I've tried diffing from the
root commit, as well as `jj --ignore-working-copy log --no-graph -r
'::v3.0 & author(torvalds)' -T 'commit_id ++ "\n"'` (to test a
commit-heavy load).
This effectively undoes d8a313cdd4, which is no longer needed since
we just changed that error handling. It should make it easier to share
some of the current if/else blocks.
Before this patch, when updating to a commit that has a file that's
currently an ignored file on disk, jj would crash. After this patch,
we instead leave the conflicting files or directories on disk. We
print a helpful message about how to inspect the differences between
the intended working copy and the actual working copy, and how to
discard the unintended changes.
Closes#976.
I'm about to add handling of parent dirs that are existing ignored
files, so it's better to have it in one place. The only functional
difference should be that we now create parent directories for git
submodules. I don't think that matters.
It's about time we make the working copy a pluggable backend like we
have for the other storage. We will use it at Google for at least two
reasons:
* To support our virtual file system. That will be a completely
separate working copy backend, which will interact with the virtual
file system to update and snapshot the working copy.
* On local disk, we need to tell our build system where to find the
paths that are not in the sparse patterns. We plan to do that by
wrapping the standard local working copy backend (the one moved in
this commit), writing a symlink that points to the mainline commit
where the "background" files can be read from.
Let's start by renaming the exising implementation to
`local_working_copy`.
I've added a boolean flag to the store to ensure that the migration never runs
more than once after the view gets "op restore"-d. I'll probably reorganize the
branches structure to support non-tracking branches later, but updating the
storage format in a single commit would be too involved.
If jj is downgraded, these "git" remote refs would be exported to the Git repo.
Users might have to remove them manually.
I'm going to migrate "refs/heads/" branches to .remote_targets["git"]. This
commit will simplify the story as we won't have to exclude "refs/remotes/git/"
refs when diffing or renaming/removing remote.
Since both has_id() and resolve_prefix() do binary search, their costs are
practically the same. I think has_id() would complete with fewer ops, but such
level of optimization wouldn't be needed here. More importantly, this ensures
that unreachable commits aren't imported by GitBackend::read_commit().
One problematic scenario is that we have commits imported by old jj, and all
of their descendant commits are created by jj. Therefore import_head_commits()
wouldn't reach the old ancestor commits.
This change might bury a real bug, but I don't have a better alternative. Maybe
we can remove this hack after a couple of jj releases, and add a debug command
that imports all reachable Git commits from all historical heads.
Closes#2343
As we can set HEAD to an arbitrary ref by using .reference_symbolic(), we don't
have to manage a ref that can also be valid as a branch name.
Fixes#1495
I'll add a workaround for the root parent issue #1495 there. We can pass in
the wc parent id instead of the wc_commit object, but we might want to use
wc_commit.id() to generate a unique placeholder ref name.
While debugging git issues, I often ended up creating a deadlock by adding
debug prints. It's also not obvious that git::export_refs() works even if the
git_repo() has already been locked, whereas git::import_refs() wouldn't. Let's
consolidate lock handling to the backend implementation.
I think most users who change the set of immutable heads away from
`trunk() | tags()` are going to also want to change the default log
revset to include the newly mutable commit and to exclude the newly
immutable commits. So let's update the default log revset to use
`immutable_heads()` instead.
`test_templater` changed because we have overridden the set of
immutable commits there so `jj log` now includes the remote branch.