There were two issues on my end:
1. `known_hosts` doesn't seem to be recognized
2. SSH Agent is ignored despite running
A workaround for 1. is to set the HOME environment variable on Windows, so I added a hint to suggest this. Ideally we would add a `certificate_check` callback to the remote callbacks, but the git2 crate doesn't expose whether the certificate check already succeeded, which makes it useless for this purpose (as we'd be prompting users to accept a certificate even though that certificate is already known to be valid).
As for 2., I changed the behavior from "check SSH Agent if some env variables exist" to "check SSH Agent and only fail if some env variables exist". On Windows SSH Agent doesn't use these env variables (but trying to communicate with it will still work), so now Windows properly works with SSH Agent.
When using a sparse working copy (e.g. with no files at all) and
updating the working copy from the root commit to a commit with
millions of files, we shouldn't have to walk the parts of the diff
that doesn't match the sparse patterns. However, we still do the full
walk because our `Tree::diff()` currently doesn't care about what the
matcher tells us to visit, it only filters out unwanted files after
visiting them. This commit fixes that for the special (but common)
case of matching nothing in a directory.
I tried also adding special handling for when the matcher says that we
should only visit a few entries, but it wasn't clearly better in the
cases I tested it on. I'll keep that patch around and might send it if
I find some cases where it helps.
We could add `walk.descendants(root_positions)` method, and apply
`.filter_by_generation(range)`, but queue-based `.descendants()` would be
slower than the one using reachable set. So I didn't add such method.
I also considered reimplementing non-lazy version of this function without
using the current RevWalkGenerationRange, but it appears the current iterator
version performs well even if we have to do .collect_vec() and .reverse().
This helps to extract a trait that abstracts CompositeIndex and descendants
map. Since the entry type E is a newtype wrapper, there wouldn't be runtime
cost.
I'm going to add a RevWalk method to walk descendants with generation filter,
which will use this helper method. RevWalk::take_until_roots() uses .min()
instead of .last() since RevWalk shouldn't know the order of the input set.
Before, the number of the generations to track would increase at each merge
point. This was really bad for queries like ':@--' in merge-heavy history,
but I didn't notice the problem because ancestors query is lazy and
the default log template is slow. Since I'm going to reuse RevWalk for
'roots++:' queries, which can't be lazy, I need to fix this problem first.
As we don't have a revset expression to specify exact generation range,
gen.end is initialized to either 1 or close to u32::MAX. So, this change
means long-lived generation ranges will eventually be merged into one.
The substitution rule and tests are copied from ancestors/parents. The backend
logic will be reimplemented later. For now, it naively repeats children().
This is a minimal change to replace Children with Descendants. A generation
parameter could be added to RevsetExpression::DagRange, but it's not needed
as of now.
I'm going to add generation parameter to Children/DagRange nodes, and
'Children { .. }' will be substituted to 'DagRange { .., gen: 1 }'. This
commit helps future code move.
Lifetime bounds of the arguments are unnecessarily restricted. It appears
walk_ancestors_until_roots() captures arguments lifetime on rustc 1.64.0.
I think the problem will go away if walk_*() functions are extracted to
RevWalk methods where input arguments will become less generic.
It doesn't matter, but can simplify the function interface. I'll probably
extract this function to RevWalk so the descendants with/without generation
filter can be tested without using revset API.
It no longer needs to be on the `Index` trait, thereby removing the
last direct use of `IndexEntry` in the trait (it's still used
indirectly in `walk_revs()`).
Now we have 4 callers, I concluded this is common enough to add an
extension method. Still I think it's preferred to define config items in
src/config/*.toml if possible. It will catch typo of config keys.
A chain of 4 billion commits is a lot, but it's not out of the
question, so let's support it. The current default index will not be
able to handle that many commits, so I let that still use 32-bit
integers.
New ResolvedExpression enum ensures that the evaluation engine doesn't have
to know the symbol resolution details. In this commit, I've moved Filter
and NotIn resolution to resolve_visibility(). Implicit All/VisibleHeads
resolution will be migrated later.
It's tempting to combine resolve_symbols() and resolve_visibility() to get
rid of panic!()s, but the resolution might have to be two passes to first
resolve&collect explicit commit ids, and then substitute "all()" with
"(:visible_heads())|commit_id|..". It's also possible to apply some tree
transformation after symbol resolution.
This makes it clear what should be resolved at resolve_symbols(). Symbol
is a bit special while parsing function arguments, but it's no different
than the other unresolved references at expression level.
I'm thinking of transforming RevsetExpression to a enum dedicated for
the evaluation stage. To help the migration, I want to remove the use of
the RevsetExpression builder API from the evaluation engine.
Fewer virtual dispatch is also better.
The `heads()` revset function with one argument is the counterpart to
`roots()`. Without arguments, it returns the visible heads in the
repo, i.e. `heads(all())`. The two use cases are quite different, and
I think it would be good to clarify that the no-arg form returns the
visible heads, so let's split that out to a new `visible_heads()`
function.
This basically removes hidden 'all() &' from union/negation of filters. To
achieve that, I have two options: 1. add separate evaluation path (like the
one this commit introduced), or 2. wrap "all()" revset to override predicate
as Box::new(|_| true) function. I took the former since it's less ad-hoc.
We can add an explicit RevsetExpression node to branch between evaluate()
and evaluate_predicate(), but I don't think it would simplify the
implementation at this point. We might need such node if we want to resolve
"all()" at resolve_symbols(). It might be even better to extract a subset of
RevsetExpression enum, which only contains evaluatable nodes.
The cost of 'all() &' isn't significant for most filters. '~merges()' is
the exception. For jj repo,
revsets/:v0.3.0 & (author(martinvonz) | committer(martinvonz))
--------------------------------------------------------------
base 1.06 11.2±0.04m
new 1.00 10.5±0.05m
revsets/~merges()
-----------------
base 1.69 750.0±8.47µ
new 1.00 444.1±3.50µ
I think this is more readable, and apparently it produces slightly better code
maybe because the compiler can determine that there are no unwanted markers.
Since filter is slow in general, its input set should be minimized. This has
measurable impact on artificial query like '~(v0.4.0..) & author(_)'. If it
were evaluated as a difference of sets, all commits would have to be loaded.
Since resolve_symbols() now removes Present(_) node, it make sense to
handle symbol resolution error there. That's why I added a "pre" callback
to try_transform_expression().
Perhaps, "operation" scope (#1283) can be implemented in a similar way,
(but somehow need to resolve operation id and call repo.reload_at(op).)
This makes it clear that RevsetExpression::Present node is noop at the
evaluation stage.
RevsetEvaluationError::StoreError is unused right now, but I'm not sure if
it should be removed. It makes some sense that evaluate() can propagate
StoreError as it has access to the store.
The `Repo` is a higher-level type that the index shouldn't have to
know about. With this change, a custom revset implementation should be
able evaluate the revset on a server without knowing which repo it
refers to.
We already pass a `CompositeIndex` to
`default_revset_engine::evaluate()` so let's use that wherever we
currently use `repo.index()`. That will help us remove the `repo`
argument, and it will also let us internal types (like `IndexEntry`)
in the index methods we call.
I'm about to replace the `&dyn Repo` argument by several smaller
types, and it's easier to collect those in a single context type than
to pass them separately as arguments.
I also moved `revset_for_commit_ids()` and `take_latest_revset()` onto
the new type because it was easy. `build_predicate_fn()` and
`has_diff_from_parent()` ran into some lifetime issue when I tried.
The `public_heads()` revset only contains the root commit in
practice. I'm not sure what we want to do about phases, but since we
don't have any real support for them yet, let's just remove this
revset. I didn't update the changelog because we don't seem to have
documented the revset function (and it seems unlikely that users who
found out about it found it useful enough to use it when they could
just use `root`).
The `ProtoOpStore` was separated out to simplify the migration from
Thrift. Now that the `ThriftOpStore` is gone, we can inline
`ProtoOpStore` as the TODO says.
Inline diffs on multi-byte UTF-8 characters would match individual
bytes, causing garbled diffs in some cases. For example, replacing
`⊢` with `⊣`, which differ in the final byte only, caused the
diff to display a diff of the bytes instead the character.
This commit uses a workaround present in Mercurial by treating all
bytes 0x80 and above as word characters, causing any multi-byte
character to be treated as a word and not segmented.
https://www.mercurial-scm.org/repo/hg/file/6.3.3/mercurial/patch.py#l51
This serves the role of limit() in Mercurial. Since revsets in JJ is
(conceptually) an unordered set, a "limit" predicate should define its
ordering criteria. That's why the added predicate is named as "latest".
Closes#1110
There are no remaining places where we iterate over a revset and need
the `IndexEntry`s, so we can now make `Revset::iter()` yield
`CommitId`s instead.
I'm about to make `Revset::iter()` yield just `CommitId`s, but the
tests in `test_default_revset_graph_iterator.rs` need an `IndexEntry`
iterator so they can pass it into `RevsetGraphIterator::new()`. This
commits prepares for the change by adding a
`RevsetImpl::iter_graph_impl()` that returns `RevsetGraphIterator`,
keeping `InternalRevset` still hidden within the revset engine. We
could instead have made that (and `ToPredicateFn`) visible to tests. I
can't say which is better.