I'm going to add a Merge method that removes negative/positive terms pair, and
swap_remove() is the easiest option. The order of the conflicted ref targets
doesn't matter.
Many callers use interleaved iterators, and recently-added serialization code
is built on top of that, so I think it's better to store terms in that format.
map() functions no longer use MergeBuilder as we know the mapped values are
ordered properly. flatten() and simplify() are reimplemented to work with the
interleaved values. The other changes are trivial.
This motivation for this is so we can easily skip calling the function
if the user has opted out of the propagation of abandoned commits we
usually do (#2504). However, it seems like a good piece of code to
extract regardless of that feature.
One less git2 API use in CLI.
The function name GitBackend::init_colocated() is a bit odd, but we need to
specify the work-tree path, not the ".git" repo path. So we can't eliminate
the notion of the working copy path anyway.
What make rebase_to_dest_parent a good candidate for jj_lib::rewrite module:
- It is used both in obslog and interdiff. It's a sign that it may be moved to a lower layer
- CommandError is returned by converting from TreeMergeError. Not explicitly.
- It only use jj_lib::rewrite fonctions.
This will make it a little faster to update the working copy at Google
once we've made `MergedTree::diff_stream()` fetch trees
concurrently. (It only makes it a little faster because we still fetch
files serially.)
I'm going to implement a `Stream`-based version optimized for
high-latency (RPC-based) commit backends. So far, that implementation
is about 20% slower in the Linux repo when running `jj diff
--ignore-working-copy -s --from v5.0 --to v6.0`. I think that's almost
only because the algorithm is different, not because it's async per
se.
This commit adds a `Stream`-based version of `MergedTree::diff()` that
just wraps the regular iterator in stream. I updated `jj diff` to use
it. I couldn't measure any difference on the command above in the
Linux repo. I think that means we can safely use the same
`Stream`-based interface regardless of backend, even if we end up
needing two different implementations of the `Stream`. We would then
be using the wrapped iterator from this commit for local backends, and
the new implementation for remote backends. But ideally we can make
the remote-friendly implementation fast enough that we don't need two
implementations.
During the transition to using more async code, I keep running into
https://github.com/rust-lang/futures-rs/issues/2090. Right now, I want
to convert `MergedTree::diff()` into a `Stream`. I don't want to
update all call sites at once, so instead I'm adding a
`MergedTree::diff_stream()` method, which just wraps
`MergedTree::diff()` in a `Stream. However, since the iterator is
synchronous, it needs to block on the async `Backend::read_tree()`
calls. If we then also block on the `Stream` in the CLI, we run into
the panic.
We had similar code in two places for restoring paths from one tree to
another. Let's reuse it instead.
I put the new function in the `rewrite` module. I'm not sure if that's
right place. Maybe it belongs in `tree`?
Since gix::Repository::config_snapshot() borrows the repo instance, it has to
be allocated in caller's stack. That's why GitBackend::git_config() is removed.
My gut feeling is that gitoxide aims to be more transparent than libgit2. We'll
need to know more about the underlying Git data model.
Random comments on gix API:
* gix::Repository provides API similar to git2::Repository, but has less
"convenient" functions. For example, we need to use .find_object() +
.try_to/into_<kind>() instead of .find_<kind>().
* gix::Object, Blob, etc. own raw data as bytes. gix::object and gix::objs
types provide high-level views on such data.
* Tree building is pretty low-level compared to git2.
* gix leverages bstr (i.e. bytes) extensively.
It's probably not difficult to migrate git::import/export_refs(). It might
help eliminate the startup overhead of libssl initialization. The gix-based
GitBackend appears to be a bit faster, but that wouldn't practically matter.
#2316
Otherwise, the initialized repo could have a different work-dir path than the
load()-ed one. libgit2 appears to do some normalization somewhere, but gix
won't.
I've enabled the "index" component from the "basic" feature set, which would
be needed to implement colocated repo functionality. The doc suggests that
a library shouldn't activate "max-performance-safe", but our crate is also
an application so it would be okay to enable the feature. We'll need "parallel"
anyway to make GitBackend Sync.
https://docs.rs/gix/latest/gix/#feature-flags
This avoids https://github.com/rust-lang/futures-rs/issues/2090. I
don't think we need to worry about reading legacy conflicts
asynchronously - async is really only useful for Google's backend
right now, and we don't use the legacy format at Google. In
particular, I don't want `MergedTree::value()` to have to be async.
I want to fix error propagation before I start using async in this
code. This makes the diff iterator propagate errors from reading tree
objects.
Errors include the path and don't stop the iteration. The idea is that
we should be able to show the user an error inline in diff output if
we failed to read a tree. That's going to be especially useful for
backends that can return `BackendError::AccessDenied`. That error
variant doesn't yet exist, but I plan to add it, and use it in
Google's internal backend.
Reasons to introduce this alias:
* Reduces complexity of a type, to silence Clippy warnings in the
future if we use this type as a type parameter
* The type is used quite frequently, so it makes sense to have a name
for it
* It's easier to visually scan for the end of the type when you don't
have to match opening and closing angle brackets
I'm going to add `MergedTreeValue` as an alias for
`Merge<Option<TreeValue>>`, but we already have a type by that name in
`merged_tree`. This patch renames it away, to make room for the new
alias. I used `MergedTreeVal` for this borrowing version to be a bit
like how `str` is a borrowed version of `String`.
Since "jj git fetch --branch" supports glob patterns, users would expect that
"jj git push --branch glob:.." also works.
The error handling bits are copied from "branch" sub commands. We might want to
extract it to a common helper function, but I haven't figured out a reasonable
boundary point yet.
AFAICT, all callers of `Merge::to_file_merge()` are already well
prepared for working with executable files. It's called from these
places:
* `local_working_copy.rs`: Materialized conflicts are correctly
updated using `Merge::with_new_file_ids()`.
* `merge_tools/`: Same as above.
* `cmd_cat()`: We already ignore the executable bit when we print
non-conflicted files, so it makes sense to also ignore it for
conflicted files.
* `git_diff_part()`: We print all conflicts with mode "100644" (the
mode for regular files). Maybe it's best to use "100755" for
conflicts that are unambiguously executable, or maybe it's better to
use a fake mode like "000000" for all conflicts. Either way, the
current behavior seems fine.
* `diff_content()`: We use the diff content in various diff
formats. We could add more detail about the executable bits in some
of them, but I think the current output is fine. For example,
instead of our current "Created conflict in my-file", we could say
"Created conflict in executable file my-file" or "Created conflict
in ambiguously executable file my-file". That's getting verbose,
though.
So, I think all we need to do is to make `Merge::to_file_merge()` not
require its inputs to be non-executable.
Closes#1279.
Resolves states are most common and the current format is pretty
verbose. Let's print it as if `Merge` were an enum with `Resolved` and
`Conflicted` variants instead.
Since local/remote branches are now of different types, it doesn't make much
sense to dispatch merging through RefName. Let's add merge_<kind>() methods
instead.
MutableRepo handles merging of the other kind of refs internally, and the
merge function is short enough to inline. I also removed early returns since
most callers provide non-identical ref targets, and merge_ref_targets() should
be cheap if the inputs can be trivially merged.
This partially reverts the change in 30fb7995c2 "view: make local/remote
branches iterator yield RemoteRef instead of RefTarget." As I'm going to add
diff function for RemoteRef pairs, we'll need a generic version of merge-join
iterator anyway.
We need to let async-ness propagate up from the backend because
`block_on()` doesn't like to be called recursively. The conflict
materialization code is a good place to make async because it doesn't
depends on anything that isn't already async-ready.
It seems we'll end up using `block_on()` quite a bit, at least until
we're done transitioning to async, and the function name doesn't
conflict with anything else, so let's always import it when we need
it.
We can provide more actionable error message than "not fast-forwardable". If
the push was fast-forwardable, "jj branch track" should be able to merge the
remote branch without conflicts, so the added step would be minimal.
Although this is logically correct, the error message is a bit cryptic. It's
probably better to reject push if non-tracking remote branches exist.
#1136
We'll use remote_ref.tracking_target() to classify push action, but not all
callers of local_remote_branches() need tracking_target() instead of target.
This means that the commits previously pinned by remote branches are no longer
abandoned. I think that's more correct since "push" is the operation to
propagate local view to remote, and uninteresting commits should have been
locally abandoned.
Since I'm going to make git::push_branches() update the repo view internally,
it should fail fast if the remote name is reserved. Before, the problem was
detected on git::import_refs().
Since pushed remote branches will share the common base targets with locals,
these branches should be marked as tracking. git::push_branches() will handle
that. It looks ugly that the public GitBranchPushTargets type keeps "force"-d
branches as a separate set, but we'll need to rework that anyway when we
implement --force-with-lease behavior. So let's leave it for now.
Some of the git::push_updates() tests have been migrated to the new function.
I left a couple of basic tests for git::push_updates() because push_updates()
will be used to implement a low-level "jj git push-refs" command.
I made import_refs() not preserve commits referenced by remote branches at
520f692a46 "git: on import_refs(), don't preserve old branches referenced by
remote refs." The idea is that remote branches are weak, and commits referenced
by these refs can be freely rewritten by future local changes without moving
the refs. I don't think that's wrong, but 520f692a46 also made "new" remote
changes be abandoned by old remote refs. This problem occurs only when
git.auto-local-branch is off.
I think there are two ways to fix the problem:
a. pin non-tracking remote branches just like local refs
b. pin newly fetched refs in addition to local refs
This patch implements (b) because it's simpler and more obvious that the
fetched commits would never be abandoned immediately.
This add support for custom `jj` binaries to use custom working-copy
backends. It works in the same way as with the other backends, i.e. we
write a `.jj/working_copy/type` file when the working copy is
initialized, and then we let that file control which implementation to
use (see previous commit).
I included an example of a (useless) working-copy implementation. I
hope we can figure out a way to test the examples some day.
This makes `Workspace::load()` look a new `.jj/working_copy/type` file
in order to load the right working copy implementation, just like
`Repo::load()` picks the right backends based on `.jj/store/type`,
`.jj/op_store/type`, etc. We don't write the file yet, and we don't
have a way of adding alternative working copy implementations, so it
will always be `LocalWorkingCopy` for now.
Our internal working copy implementations at Google will need the
commit so they can walk history backwards until they get to a "public"
commit. They'll then use that to tell build tools and virtual file
systems to present that as a base.
I'm not sure if we'll need to update `reset()` too. It's currently
only used by `jj untrack`, which doesn't change the commit's parent,
so it wouldn't affect any history walks.
`ReadonlyRepo::init()` takes callbacks for initializing each kind of
backend. We called these things like `op_store_initializer`. I found
that confusing because it is not a `OpStoreFactory` (which is for
loading an existing backend). This patch tries to clarify that by
renaming the arguments and adding types for each kind of callback
function.
This patch adds MutableRepo::track_remote_branch() as we'll probably need to
track the default branch on "jj git clone". untrack_remote_branch() is also
added for consistency.
We could instead migrate the storage types to (local_branches, remote_views),
but that would be more involved and break forward compatibility with little
benefit. Maybe we can do that later when we introduce remote tags.
The state field isn't saved yet. git import/export code paths are migrated,
but new tracking state is always calculated based on git.auto-local-branch
setting. So the tracking state is effectively a global flag.
As we don't know whether the existing remote branches have been merged in to
local branches, we assume that remote branches are "tracking" so long as the
local counterparts exist. This means existing locally-deleted branch won't
be pushed without re-tracking it. I think it's rare to leave locally-deleted
branches for long. For "git.auto-local-branch = false" setup, users might have
to untrack branches if they've manually "merged" remote branches and want to
continue that workflow. I considered using git.auto-local-branch setting in the
migration path, but I don't think that would give a better result. The setting
may be toggled after the branches got merged, and I'm planning to change it
default off for better Git interop.
Implementation-wise, the state enum can be a simple bool. It's enum just
because I originally considered to pack "forgotten" concept into it. I have
no idea which will be better for future extension.
It's going to be easier to define a `LockedWorkingCopy` trait if it
doesn't need to borrow from `WorkingCopy`, so let's remove the
reference we currently have and have
`LockedLocalWorkingCopy::finish()` return the new `LocalWorkingCopy`
instead.
I think the main disadvantage is that we now have to remember to
replace the old `LocalWorkingCopy` instance by the new one, whereas
the compiler would remind us before this commit. We could make
`start_modification()` take an owned `self`, but that would be a bit
annoying to work with when we have the instance stored in a field.
I'm about to make `LockedLocalWorkingCopy` not borrow from
`LocalWorkingCopy`. That will make it easier to forget to update any
`LocalWorkingCopy` variables when the modifications have been
committed. This patch introduces a wrapper around
`LockedLocalWorkingCopy` to help prevent that.
Thanks to Yuya for the suggestion.
`LocalWorkingCopy::check_out()` can be expressed using the planned
`WorkingCopy` trait, so it doesn't need to be in the trait itself
`WorkingCopy`. I wasn't sure if I should make it a free function in
`working_copy`, but I ended up moving it onto `Workspace`.