There's a subtle behavior change. Unlike the original remove_remote_branch(),
remote_views entry is not discarded when the branches map becomes empty. The
reasoning here is that the remote view can be added/removed when the remote
is added/removed respectively, though that's not implemented yet. Since the
serialized data cannot represent an empty remote, such view may generate
non-unique content hash.
These functions depend heavily on the underlying data structure, and I haven't
decided abstract View API to access to per-remote data types. Let's use the
underlying data type for now.
I'm planning to add support for untracked remote branches, and under that
model, there will be many remote branches without local counterparts. That's
the main reason why remote branches are grouped by remote, not by branch name.
The added helper functions will be used by simple_op_store and view.
Summary: This allows `workspace forget` to forget multiple workspaces in a
single action; it now behaves more consistently with other verbs like `abandon`
which can take multiple revisions at one time.
There's some hoop-jumping involved to ensure the oplog transaction description
looks nice, but as they say: small conveniences cost a lot.
Signed-off-by: Austin Seipp <aseipp@pobox.com>
Change-Id: Id91da269f87b145010c870b7dc043748
get_branch() would need to reconstruct the remote_targets map if we migrate
the underlying data structure to per-remote views. Let's remove the method as
it is only used in tests.
It seems pretty clear from the context. Turns out we only use the
function in a test case. Maybe we don't even need it. It's easy to
provide it, though.
The `TreeStateError` type is specific to the current local-disk
working-copy backend, so it should not be part of the generic
working-copy interface I'm trying to create.
I think some of the errors variants in `CheckoutError` are too
specific to the local-disk implementation. Let's merge them and make
them less specific, so it's easier to define a reasonable trait for
the working copy.
Summary: Workspaces are most useful to test different versions (commits) of
the tree within the same repository, but in many cases you want to check out a
specific commit within a workspace.
Make that trivial with a `--revision` option which will be used as the basis
for the new workspace. If no `-r` option is given, then the previous behavior
applies: the workspace is created with a working copy commit created on top of
the current working copy commit's parent.
Signed-off-by: Austin Seipp <aseipp@pobox.com>
Change-Id: I23549efe29bc23fb9f75437b6023c237
Before this patch, it was an error to run `jj config set --user foo
'[1]'` twice. But it's only been broken since the previous commit
because '[1]' was interpreted as a string before then.
As I'm going to split branches into per-remote map, .get_branch(name) will need
to gather remote branches by name to construct remote_targets map. Let's instead
iterate local/remote branches separately. I also migrated diffing of the other
kinds of refs to filter out unchanged entries upfront.
Now we have a separate map for "git" tracking remote, we can always preserve
the last imported/exported git_refs. The option to restore git-tracking refs
has been removed. Perhaps, --what can be reorganized as --local and --remote
<NAME>.
As we now diff incoming git refs against our known remote branches, the problem
described in the comment no longer occurs. test_branch_forget_fetched_branch()
passes, and the inline comments in the test are still valid.
As we need to build a set of all branch names anyway, we can also put old/new
targets there. InvalidGitName is moved to caller since the diff function no
longer converts RefName to "refs/" string.
The idea is that the "remote" refs could have been "op restore"-d whereas
view.git_refs() will never be. The next commit will update known_remote_refs
to be constructed from the current remote branches.
Instead of building these lists in a single loop, we could load new git_refs
to the view first, and then build diffs of the remote refs. I considered that,
but I feel it would be a bit awkward to update refs before importing commits
to the view.
The "remote" refs are stored in BTreeMap since merging order should be stable.
As I'm going to add separate lists of changed git_refs/remote_refs, it'll
become a bit unclear which one we should check for reserved remotes. The
diff might also be reorganized as a list of (remote, name, kind, old_target,
new_target) where remote == "git" means the git-tracking branch. In this
data structure, the notion of reserved remote name would be lost.
I'm going to rewrite `TreeDiffIterator` to fetch one level (depth) of
the tree at a time and concurrently. One step towards that is to
convert the iterator to a `Stream`. I'd like to do that by making the
current `Iterator` implementation call the new `Stream`
implementation. However, we can't call `futures::executor::block_on()`
on a future that itself calls `futures::executor::block_on()` (as
`Store::read_tree()` does), so the first step is to bubble up the
async-ness a bit. This patch does that by fetching both sides of the
diff concurrently. That should give close to a 2x speedup on
high-latency backends. (It doesn't help with our backend at Google,
however, because we have a daemon process that does some speculative
prefetching that usually downloads the child trees anyway.)
`futures::stream::Stream::collect()` requires a collection that
implements `Default` and `Extend`, and I would like to to be able to
collect a stream of trees.
The commit backend at Google is cloud-based (and so are the other
backends); it reads and writes commits from/to a server, which stores
them in a database. That makes latency much higher than for disk-based
backends. To reduce the latency, we have a local daemon process that
caches and prefetches objects. There are still many cases where
latency is high, such as when diffing two uncached commits. We can
improve that by changing some of our (jj's) algorithms to read many
objects concurrently from the backend. In the case of tree-diffing, we
can fetch one level (depth) of the tree at a time. There are several
ways of doing that:
* Make the backend methods `async`
* Use many threads for reading from the backend
* Add backend methods for batch reading
I don't think we typically need CPU parallelism, so it's wasteful to
have hundreds of threads running in order to fetch hundreds of objects
in parallel (especially when using a synchronous backend like the Git
backend). Batching would work well for the tree-diffing case, but it's
not as composable as `async`. For example, if we wanted to fetch some
commits at the same time as we were doing a diff, it's hard to see how
to do that with batching. Using async seems like our best bet.
I didn't make the backend interface's write functions async because
writes are already async with the daemon we have at Google. That
daemon will hash the object and immediately return, and then send the
object to the server in the background. I think any cloud-based
solution will need a similar daemon process. However, we may need to
reconsider this if/when jj gets used on a server with a custom backend
that writes directly to a database (i.e. no async daemon in between).
I've tried to measure the performance impact. That's the largest
difference I've been able to measure was on `jj diff
--ignore-working-copy -s --from v5.0 --to v6.0` in the Linux repo,
which increases from 749 ms to 773 ms (3.3%). In most cases I've
tested, there's no measurable difference. I've tried diffing from the
root commit, as well as `jj --ignore-working-copy log --no-graph -r
'::v3.0 & author(torvalds)' -T 'commit_id ++ "\n"'` (to test a
commit-heavy load).