Currently, if the user modifies a modify/delete conflict, we always
consider the result resolved. That happens because we materialize the
missing side of the conflict as an empty string but when we parse the
conflict, we expect only the number of sides in the input
conflict. For example, if the input is a regular modify/delete
conflict with one remove and one add, the materialized markers will
have one remove and two adds (one of them empty), but when we try to
parse it, we expect one remove and only one add. When we fail to parse
it, we consider it resolved.
This commit fixes the bug by using
`conflicts::Conflict<Option<TreeValue>>` and keeping track of which
sides were supposed to be empty. We could have fixed the bug without
switching to `conflicts::Conflict`, but we want to switch anyway, and
the fix happens naturally when switching.
For support for tree-level conflicts (#1624), I'm probably going to
introduce a `MergedTree` type representing a set of trees to
merge. That will be similar to `Tree`, but instead of having values of
type `TreeValue`, it will have values that can represent a single
state or a conflict. The `TreeValue` type itself will eventually lose
its `Conflict` variant.
To prepare for that, this commit introduces a `Conflict<T>` type. That
type is intended to be close to what the future
`MergedTree::path_value()`, `MergedTree::entries()`, etc. The next few
commits will replace most current uses of `backend::Conflict` by this
new `conflicts::Conflict` type. They will use `Option<TreeValue>` as
type parameter. Unlike the current `backend::Conflict` type, the
explicit tracking of `None` values will let us better preserve the
ordering and tying it to the tree-level conflict's order.
Suppose many override entries share the same parent directories, it should
be cheaper to look up the tree_cache from leaf than root. I also think
recursion is easier to follow than for loop.
I made a typo and got something like this:
```
Error: Commit or change id prefix "wl" is ambiguous
```
Since we can tell commit ids from change ids these days, let's make
the error message say which kind of id it is. Changing that also kind
of forced me to make a special error for empty strings. Otherwise we
would have to arbitrarily say that an empty string is a commit id or
change id. A specific error message for empty strings seems helpful,
so that's probably for the better anyway.
This prepares for allowing the base tree to be a conflict at the
root-tree level (#1624).
We could remove the `Conflict` variant completely. I tried doing that
and it slowed down `jj diff` by ~3% in the Linux repo with a clean
working copy with only mtime bumped on all files.
I don't know why I made it return an owned value. It seems like an
unnecessary restriction that the value implements `Clone`, so let's
return a reference instead.
We can't get rid of the other "impl Index"es because .as_composite() must
return a real reference type. Maybe we could turn CompositeIndex into an
owned wrapper, but I don't know if that would be worth the effort.
It might sound scary to add public .mutable_index() accessor, but I think
it's okay because immutable MutableIndex reference has no more power than
Index.
This allows us to implement Index for lifetime-bound type such as
CompositeIndex<'_>.
The idea is that .as_composite() is equivalent to .as_index(), but for the
implementation type. I'm going to add "impl Index for CompositeIndex" to
clean up index references passed to revset engine.
This handles the basic case of where the matcher says that a whole
subtree is not matched. In the Linux repo, That's already enough to
speed up `jj --ignore-working-copy files samples` from 298 ms to 129
ms.
Note that one test changed because the new `trivial_merge()` is more
strict than the old algorithm. I don't think that's a problem because
5-way conflicts are not very common, and I prefer to be strict now and
possibly relax it later if we decide that we would prefer that.
All call paths already check before calling the function that the
condition is true. One caller - `tree::try_resolve_file_conflict()` -
checks it itself. The other caller -
`conflicts::materialize_merge_result()` - doesn't, but its callers
have checked it via `extract_file_conflict_as_single_hunk()`.
The deleted comment about empty strings seems to be obsolete since
e48ace56d1. The caller pads the inputs with empty strings since that
commit.
I think we should ideally change this function's signature to make it
impossible to call it with bad inputs, and I hope to get back to that
soon.
We already resolve merge conflicts between hunks, trees, and refs, and
maybe more. They each have their own code for the handling trivial
merges (where the output is equal to one of the inputs). They look
surprisingly different. This commit adds a generic function for doing
that. Curiously, this new implementation uses implements it in yet
another way (basically using a multi-set).
I've added a helper function because the construction of the range expression
is a bit noisy. It could be a Repo method, but I don't want to make it a
default implementation of the trait method.
revset::walk_revs() let the caller handle RevsetEvaluationError since the
evaluation engine may error out even with such a trivial query. For now, most
callers just .unwrap() the error as before.
Otherwise, "jj init --git-repo ." would create extra table files per commit,
and merge them.
I considered adding an explicit GitBackend method to be called from
git::import_refs(), but the call order matters. The method should be invoked
before calling store.get_commit(..) or mut_repo.add_head(..). Since commits
are likely to be loaded from the head, we can instead make read_commit()
import ancestor metadata at all.
Alternatively, we could make a Git commit hidden until it's inserted into
the extra table. It's rather big change, and I wouldn't like to do that
without thinking more thoroughly.
I'm going to extract a helper function that converts git2::Commit to
backend::Commit struct, and the commit id can also be obtained from the
git2::Commit object.
My first attempt was to fix up corrupted index when merging, but it turned
out to be not easy because the self side may contain corrupted data. It's
also possible that two concurrent commit operations have exactly the same
view state (because change id isn't hashed into commit id), and only the
table heads diverge.
#924
GitBackend will reuse this lock to not assign multiple change ids to a
single commit. We could add a separate lock file that covers the section
from get_head() to save_table(), but I think reusing the table lock is good
enough.
This bug concerns the way `import_refs` that gets called by `fetch` computes
the heads that should be visible after the import.
Previously, the list of such heads was computed *before* local branches were
updated based on changes to the remote branches. So, commits that should have
been abandoned based on this update of the local branches weren't properly
abandoned.
Now, `import_refs` tracks the heads that need to be visible because of some ref
in a mapping keyed by the ref. If the ref moves or is deleted, the
corresponding heads are updated.
Fixes#864
Now that we return the written commit from `write_commit()`, let's
make the timestamps match what was actually written, accounting for
the whole-second precision and the adjustment we do to avoid
collisions.
The internal backend at Google doesn't let you write any value you
want for in the committer field. The `Store` type still caches the
value it attempted to write, which gets a little weird when the
written value is not what we tried to write. We should use the value
the backend actually wrote. However, we don't know if the backend
changed anything without reading the value back, which is often
wasteful. This commit changes the API to return the written value.
I only changed the signature of `write_commit()` for now. Maybe we
should make a similar change to `write_tree()`.
This has several advantages:
* Makes it possible to downcast to non-Git custom backends (might be
useful at Google, but we haven't needed it yet)
* Lets us access more specific functionality on the `GitBackend`,
making it possible to access the `git2::Repository` without
creating a copy of it.
* Removes the dependency on Git from the backend
In large repos, the unique prefixes can get somewhat long (~6 hex
digits seems typical in the Linux repo), which makes them less useful
for manually entering on the CLI. The user typically cares most about
a small set of commits, so it would be nice to give shorter unique ids
to those. That's what Mercurial enables with its
`experimental.revisions.disambiguatewithin` config. This commit
provides an implementation of that feature in `IdPrefixContext`.
In very large repos, it can also be slow to calculate the unique
prefixes, especially if it involves a request to a server. This
feature becomes much more important in such repos.
I would like to copy Mercurial's way of abbreviating ids within a
user-configurable revset. We would do it for both commit ids and
change ids. For that feature, we need a place to keep the set of
commits the revset evaluates to. This commit adds a new
`IdPrefixContext` type which will eventually be that place. The new
type has functions for going back and forth between full and
abbreviated ids. I've updated the templater to use it.
When creating `RevsetExpression` programmatically, I think we should
use commit ids instead of symbols in the expression. This commit adds
a check for that by using a `SymbolResolver` that always errors
out.
I would eventually want the `SymbolResolver` to be customizable (in
custom `jj` binaries), so we want to make sure we always use the
customized version of it.
I left `RevsetExpression::resolve()` unchanged. I consider that to be
for programmatically created expressions.
I'd like to make the symbol resolution more flexible, both so we can
support customizing it (in custom `jj` binaries) and so we can use it
for resolving short prefixes within a small revset.
I plan to add `revsets.short-prefixes` and `revsets.immutable` soon,
and I think `[revsets]` seems like reasonable place to put them. It
seems consistent with our `[templates]` section. However, it also
suffers from the same problem as that section, which is that the
difference between `[templates]` and `[template-aliases]` is not
clear. We can decide about about templates and revsets later.
The current behavior was introduced by 20eb9ecec1 "git: don't abandon
HEAD commit when it loses a branch." While the change made HEAD mutation
behavior more consistent with a plain ref operation, HEAD can also move on
checkout, and checkout shouldn't be considered a history rewriting operation.
I'm not saying the new behavior is always correct, but I think it's safer
than losing old HEAD branch. I also think this change will help if we want
to extract HEAD management function from git::import_refs().
Fixes#1042.
Allows automatic recovery when encountering stale lockfiles, and more
efficient blocking rather than polling for fresh ones. The previous
implementation is preserved for other platforms.
There were two issues on my end:
1. `known_hosts` doesn't seem to be recognized
2. SSH Agent is ignored despite running
A workaround for 1. is to set the HOME environment variable on Windows, so I added a hint to suggest this. Ideally we would add a `certificate_check` callback to the remote callbacks, but the git2 crate doesn't expose whether the certificate check already succeeded, which makes it useless for this purpose (as we'd be prompting users to accept a certificate even though that certificate is already known to be valid).
As for 2., I changed the behavior from "check SSH Agent if some env variables exist" to "check SSH Agent and only fail if some env variables exist". On Windows SSH Agent doesn't use these env variables (but trying to communicate with it will still work), so now Windows properly works with SSH Agent.
When using a sparse working copy (e.g. with no files at all) and
updating the working copy from the root commit to a commit with
millions of files, we shouldn't have to walk the parts of the diff
that doesn't match the sparse patterns. However, we still do the full
walk because our `Tree::diff()` currently doesn't care about what the
matcher tells us to visit, it only filters out unwanted files after
visiting them. This commit fixes that for the special (but common)
case of matching nothing in a directory.
I tried also adding special handling for when the matcher says that we
should only visit a few entries, but it wasn't clearly better in the
cases I tested it on. I'll keep that patch around and might send it if
I find some cases where it helps.
We could add `walk.descendants(root_positions)` method, and apply
`.filter_by_generation(range)`, but queue-based `.descendants()` would be
slower than the one using reachable set. So I didn't add such method.
I also considered reimplementing non-lazy version of this function without
using the current RevWalkGenerationRange, but it appears the current iterator
version performs well even if we have to do .collect_vec() and .reverse().
This helps to extract a trait that abstracts CompositeIndex and descendants
map. Since the entry type E is a newtype wrapper, there wouldn't be runtime
cost.
I'm going to add a RevWalk method to walk descendants with generation filter,
which will use this helper method. RevWalk::take_until_roots() uses .min()
instead of .last() since RevWalk shouldn't know the order of the input set.
Before, the number of the generations to track would increase at each merge
point. This was really bad for queries like ':@--' in merge-heavy history,
but I didn't notice the problem because ancestors query is lazy and
the default log template is slow. Since I'm going to reuse RevWalk for
'roots++:' queries, which can't be lazy, I need to fix this problem first.
As we don't have a revset expression to specify exact generation range,
gen.end is initialized to either 1 or close to u32::MAX. So, this change
means long-lived generation ranges will eventually be merged into one.
The substitution rule and tests are copied from ancestors/parents. The backend
logic will be reimplemented later. For now, it naively repeats children().
This is a minimal change to replace Children with Descendants. A generation
parameter could be added to RevsetExpression::DagRange, but it's not needed
as of now.
I'm going to add generation parameter to Children/DagRange nodes, and
'Children { .. }' will be substituted to 'DagRange { .., gen: 1 }'. This
commit helps future code move.
Lifetime bounds of the arguments are unnecessarily restricted. It appears
walk_ancestors_until_roots() captures arguments lifetime on rustc 1.64.0.
I think the problem will go away if walk_*() functions are extracted to
RevWalk methods where input arguments will become less generic.
It doesn't matter, but can simplify the function interface. I'll probably
extract this function to RevWalk so the descendants with/without generation
filter can be tested without using revset API.
It no longer needs to be on the `Index` trait, thereby removing the
last direct use of `IndexEntry` in the trait (it's still used
indirectly in `walk_revs()`).
Now we have 4 callers, I concluded this is common enough to add an
extension method. Still I think it's preferred to define config items in
src/config/*.toml if possible. It will catch typo of config keys.
A chain of 4 billion commits is a lot, but it's not out of the
question, so let's support it. The current default index will not be
able to handle that many commits, so I let that still use 32-bit
integers.
New ResolvedExpression enum ensures that the evaluation engine doesn't have
to know the symbol resolution details. In this commit, I've moved Filter
and NotIn resolution to resolve_visibility(). Implicit All/VisibleHeads
resolution will be migrated later.
It's tempting to combine resolve_symbols() and resolve_visibility() to get
rid of panic!()s, but the resolution might have to be two passes to first
resolve&collect explicit commit ids, and then substitute "all()" with
"(:visible_heads())|commit_id|..". It's also possible to apply some tree
transformation after symbol resolution.
This makes it clear what should be resolved at resolve_symbols(). Symbol
is a bit special while parsing function arguments, but it's no different
than the other unresolved references at expression level.
I'm thinking of transforming RevsetExpression to a enum dedicated for
the evaluation stage. To help the migration, I want to remove the use of
the RevsetExpression builder API from the evaluation engine.
Fewer virtual dispatch is also better.
The `heads()` revset function with one argument is the counterpart to
`roots()`. Without arguments, it returns the visible heads in the
repo, i.e. `heads(all())`. The two use cases are quite different, and
I think it would be good to clarify that the no-arg form returns the
visible heads, so let's split that out to a new `visible_heads()`
function.
This basically removes hidden 'all() &' from union/negation of filters. To
achieve that, I have two options: 1. add separate evaluation path (like the
one this commit introduced), or 2. wrap "all()" revset to override predicate
as Box::new(|_| true) function. I took the former since it's less ad-hoc.
We can add an explicit RevsetExpression node to branch between evaluate()
and evaluate_predicate(), but I don't think it would simplify the
implementation at this point. We might need such node if we want to resolve
"all()" at resolve_symbols(). It might be even better to extract a subset of
RevsetExpression enum, which only contains evaluatable nodes.
The cost of 'all() &' isn't significant for most filters. '~merges()' is
the exception. For jj repo,
revsets/:v0.3.0 & (author(martinvonz) | committer(martinvonz))
--------------------------------------------------------------
base 1.06 11.2±0.04m
new 1.00 10.5±0.05m
revsets/~merges()
-----------------
base 1.69 750.0±8.47µ
new 1.00 444.1±3.50µ
I think this is more readable, and apparently it produces slightly better code
maybe because the compiler can determine that there are no unwanted markers.
Since filter is slow in general, its input set should be minimized. This has
measurable impact on artificial query like '~(v0.4.0..) & author(_)'. If it
were evaluated as a difference of sets, all commits would have to be loaded.
Since resolve_symbols() now removes Present(_) node, it make sense to
handle symbol resolution error there. That's why I added a "pre" callback
to try_transform_expression().
Perhaps, "operation" scope (#1283) can be implemented in a similar way,
(but somehow need to resolve operation id and call repo.reload_at(op).)
This makes it clear that RevsetExpression::Present node is noop at the
evaluation stage.
RevsetEvaluationError::StoreError is unused right now, but I'm not sure if
it should be removed. It makes some sense that evaluate() can propagate
StoreError as it has access to the store.
The `Repo` is a higher-level type that the index shouldn't have to
know about. With this change, a custom revset implementation should be
able evaluate the revset on a server without knowing which repo it
refers to.
We already pass a `CompositeIndex` to
`default_revset_engine::evaluate()` so let's use that wherever we
currently use `repo.index()`. That will help us remove the `repo`
argument, and it will also let us internal types (like `IndexEntry`)
in the index methods we call.
I'm about to replace the `&dyn Repo` argument by several smaller
types, and it's easier to collect those in a single context type than
to pass them separately as arguments.
I also moved `revset_for_commit_ids()` and `take_latest_revset()` onto
the new type because it was easy. `build_predicate_fn()` and
`has_diff_from_parent()` ran into some lifetime issue when I tried.
The `public_heads()` revset only contains the root commit in
practice. I'm not sure what we want to do about phases, but since we
don't have any real support for them yet, let's just remove this
revset. I didn't update the changelog because we don't seem to have
documented the revset function (and it seems unlikely that users who
found out about it found it useful enough to use it when they could
just use `root`).
The `ProtoOpStore` was separated out to simplify the migration from
Thrift. Now that the `ThriftOpStore` is gone, we can inline
`ProtoOpStore` as the TODO says.
Inline diffs on multi-byte UTF-8 characters would match individual
bytes, causing garbled diffs in some cases. For example, replacing
`⊢` with `⊣`, which differ in the final byte only, caused the
diff to display a diff of the bytes instead the character.
This commit uses a workaround present in Mercurial by treating all
bytes 0x80 and above as word characters, causing any multi-byte
character to be treated as a word and not segmented.
https://www.mercurial-scm.org/repo/hg/file/6.3.3/mercurial/patch.py#l51
This serves the role of limit() in Mercurial. Since revsets in JJ is
(conceptually) an unordered set, a "limit" predicate should define its
ordering criteria. That's why the added predicate is named as "latest".
Closes#1110