Both user and programmatic expressions use the same .evaluate() function now.
optimize() is applied globally after symbol resolution. The order shouldn't
matter, but it might be nicer because union of commit refs could be rewritten
to a single Commits(Vec<CommitId>) node.
This helps add library API that takes resolved revset expressions. For example,
"jj absorb" will first compute annotation within a user-specified ancestor range
such as "mutable()". Because the range expression may contain symbols, it should
be resolved by caller.
There are two ideas to check resolution state at compile time:
<https://github.com/martinvonz/jj/pull/4374>
a. add RevsetExpressionWrapper<PhantomState> and guarantee inner tree
consistency at public API boundary
b. parameterize RevsetExpression variant types in a way that invalid variants
can never be constructed
(a) is nice if we want to combine "resolved" and "unresolved" expressions. The
inner expression types are the same, so we can just calculate new state as
Resolved & Unresolved = Unresolved. (b) is stricter as the compiler can
guarantee invariants. This patch implements (b) because there are no existing
callers who need to construct "resolved" expression and convert it to "user"
expression.
.evaluate_programmatic() now requires that the expression is resolved.
This will become safe as I'm going to add static check that the expression does
never contain CommitRef(_)s. We could make Present(_) unconstructible, but the
existence of Present node is harmless.
I'm going to add RevsetExpression<State> type parameter, but the existing tree
transformer can't rewrite nodes to different state because the input and the
output must be of the same type. (If they were of different types, we couldn't
reuse the input subtree by Rc::clone().) The added visitor API will handle
state transitions by mapping RevsetExpression::<St1>::<Kind> to
RevsetExpression::<St2>::<Kind>.
CommitRef and AtOperation nodes are processed by specialized methods because
these nodes will depend on the State type. OTOH, Present node won't be
State-dependent, so it's inspected by the common fold_expression() method.
An input expression is not taken as an &Rc<RevsetExpression> but a &_ because
we can't reuse the allocation behind the Rc.
I originally considered adding deny-list-based implementation, but the Windows
compatibility rules are super confusing and I don't have a machine to find out
possible aliases. This patch instead adds directory equivalence tests.
In order to test file entity equivalence, we first need to create a file or
directory of the requested name. It's harmless to create an empty .jj or .git
directory, but materializing .git file or symlink can temporarily set up RCE
situation. That's why new empty file is created to test the path validity. We
might want to add some optimization for safe names (e.g. ASCII, not contain
"git" or "jj", not contain "~", etc.)
That being said, I'm not pretty sure if .git/.jj in sub directory must be
checked. It's not safe to cd into the directory and run "jj", but the same
thing can be said to other tools such as "cargo". Perhaps, our minimum
requirement is to protect our metadata (= the root .jj and .git) directories.
Despite the crate name (and internal use of std::fs::File),
same_file::is_same_file() can test equivalence of directories. This is
documented and tested, so I've removed my custom implementation, which was
slightly simpler but lacks Windows support.
If new file would overwrite an existing regular file, the file path is skipped.
It makes sense to apply the same rule to existing symlinks. Without this patch,
check out would fail if an existing path was a dead symlink or a symlink to
a directory.
I'm not sure if this was attackable before, but it should be better to not
try to remove file across symlinks.
The disk_path is now returned from create_parent_dirs() to clarify that the
path is identical.
This should be safer than relying on file open error. It's scary to continue
processing if the file was a symlink.
I'll add a few more sanity checks to remove_old_file(), so it's extracted as a
function.
I'm going to add "checked" version of to_fs_path(), but all callers can't be
migrated to it. For example, an error message should be produced even if the
path is malformed.
This patch also adds error variants to propagate InvalidRepoPathError. They
don't use ::Other { .. } so the errors can be distinguished in tests.
I'm going to replace the current .evaluate_programmatic() which does minimal
commit-ref resolution. The new .evaluate_programmatic() will be implemented on
a "resolved" expression.
For the same reason as the previous patch. It's nice if root() is considered
a "resolved" expression. With this change, most of the evaluate_programmatic()
callers won't have to do symbol resolution at all.
I'm going to add RevsetExpression<State = Resolved|User> type parameter to
detect API misuse at compile time. VisibleHeads is similar to All, and appears
in generic expression substitution function where a concrete State type
shouldn't be known.
This ensures that a symbol-resolved at_operation() expression won't be resolved
again when it's intersected with another expression, for example.
# in CLI
let expr1 = parse("at_operation(..)").resolve_user_symbol();
# in library
let expr2 = RevsetExpression::ancestors().intersection(&expr1);
expr2.evaluate_programmatic()
This will help construct file content based on diff hunks. For example, "jj
absorb" will first calculate annotation of the source parent (within mutable
ancestors), calculate diff, then "squash" hunks into ancestor commits of the
surrounding ranges.
This unblocks the use of TestBackend in long-running processes such as fuzzer.
It should also be safer because TempDir doesn't guarantee that the path is never
reused.
TestBackendData instances persist in memory right now, but they should be
discarded when the corresponding temp_dir gets dropped. The added struct will
manage the TestBackendData mapping.
We had documented that we support `git.auto-local-bookmark` but we
don't. The documentation has been incorrect since d9c68e08b1. This
patch fixes it by adding support for `git.auto-local-bookmark` with
fallback to the old/current `git.auto-local-branch`.
.
We might want to calculate (commit_id, range) pairs of consecutive lines in
order to "absorb" changes, for example.
This should also be cheaper since Vec<u8> doesn't have to be allocated per line.
Perhaps, get_same_line_map() could return an iterator, but implementing an
iterator to be "pull"-ed is much harder than writing a function to "push",
especially when lifetime is involved.
This function was short, and this change makes it clear that !.is_empty() was
redundant. Duplicated doc comment is also removed. I feel the inline comment is
easier to follow here.
It no longer makes sense to initialize Source line_map and build
HashMap<Commit, Source> in one function. Let's extract the line_map
initialization to a function instead.
All intermediate nodes are changed to RevWalk of Result<IndexPosition, _> type
to pass BackendError around from filter predicates. Leaf ancestors/descendants
computation is unchanged, and mapped to Result at revset_engine layer. This is
simpler than converting all RevWalk impls to Result<_, _>.
We'll need to propagate error from predicate function, so .filter() will no
longer be usable. .map() will be used in order to wrap infallible ancestry
lookup with Ok(_).
Some RevsetImpl methods are migrated to .map() as example.
I don't see measurable performance difference, but VecDeque is theoretically
simpler than BTreeSet. The input is sorted, so we never do random insertion.
This also allows some minor optimizations to be performed, such as
avoiding recomputation of the connected target set when
`MoveCommitsTarget::Roots` is used since the connected target set is
identical to the target set (all descendants of the roots).
is_empty() could also return Result<bool, _>, but I think the current definition
is also good. If an error occurred, revset.iter() would return at least one
item, so it's not empty.
Let's say we're updating one parent of a merge:
```
E E'
/|\ /|\
B C D -> B C D'
\|/ \|/
A A
```
When rebasing `E` to create `E'` there, we do that by merging the
changes compared to the auto-merged parents. The auto-merged parents
before is `B+(C-A)+(D-A)`, and after it's `B+(C-A)+(D'-A)`. Then we
rebase the diff, which gives us `E' = B+(C-A)+(D'-A) + (E -
(B+(C-A)+(D-A))) = D' + (E - D')`.
However, we currently don't do quite that simplification because we
first resolve conflicts when possible in the two auto-merged parent
trees (before and after). That rarely makes a difference to the
result, but it's wasteful to do it. It does make a difference in some
cases where our merge algorithm is lossy, which currently is only the
"A+(A-B)=A" case. I added a test case showing where it does make a
difference. It's a non-obvious cases but I think the new behavior is
more correct (the old behavior was a conflict).
This was added at f5f61f6bfe "revset: resolve 'HEAD@git' just like other
pseudo @git branches." As I said in this patch, there was no practical use case
of the HEAD@git symbol.
Suppose we implement colocated workspaces/worktrees #4436, there may be multiple
Git HEAD revisions. This means HEAD can no longer be abstracted as a symbol of
the "git" remote.
Custom backends may rely on networking or other unreliable implementations to support revsets, this change allows them to return errors cleanly instead of panicking.
For simplicity, only the public-facing Revset and RevsetGraph types are changed in this commit; the internal revset engine remains mostly unchanged and error-free since it cannot generally produce errors.
Clippy 1.83 (currently in beta) detects more cases of unneeded lifetimes,
namely in trait implementation declarations. Since this lint is warn by
default, we need to fix those instances to get a clean CI.
This is required when performing `rebase -s a -s b` where "b" is a
descendant of "a". Both "a" and "b" should be regarded as the roots of
the target set and be rebased onto the new destination.
This allows for `RebaseOptions` to be respected. This will be used when
migrating `rebase --source`/`rebase --branch` to use `move_commits` to
respect the `--before`/`--after` options.
The `coalesce` function takes a list of revsets and returns the commits in the
first revset in the list which evalutes to a non-empty set of commits.
It can be used to display fallbacks if a certain commit cannot be found,
e.g. `coalesce(present(user_configured_trunk), builtin_trunk)`.
A new module is added to jj_lib which exposes a function
get_annotation_for_file. This annotates the given file line by line with
commit information according to the commit that made the most recent
change to the line.
Similarly, a new command is added to the CLI called `jj file annotate` which
accepts a file path. It then prints out line by line the commit
information for the line and the line itself. Specific commit
information can be configured via the templates.annotate_commit_summary
config variable
This appears to be a bit faster if there are tons of unchanged ranges.
```
group new old
----- --- ---
bench_diff_git_git_read_tree_c 1.00 58.5±0.12µs 1.07 62.7±0.60µs
bench_diff_lines/modified/10k 1.00 34.2±0.72ms 1.08 37.0±1.09ms
bench_diff_lines/modified/1k 1.00 3.1±0.08ms 1.12 3.5±0.01ms
bench_diff_lines/reversed/10k 1.00 28.0±0.15ms 1.01 28.4±0.51ms
bench_diff_lines/reversed/1k 1.00 616.0±16.20µs 1.00 617.0±9.29µs
bench_diff_lines/unchanged/10k 1.00 3.5±0.04ms 1.10 3.9±0.06ms
bench_diff_lines/unchanged/1k 1.00 328.4±4.44µs 1.07 352.0±1.41µs
```
One particular use case for these is escape sequences -- and to that
end, I'm also adding `\e` as a shorthand for `\x1b`.
Change-Id: Id000000040ea6fd8e2d720219931485960c570dd
The Ord implementation didn't conform to Eq (which compares "self.others"
literally), and we don't need Ord to insert non-overlapping ranges before the
current range.
This isn't fancy, but I couldn't find a better abstraction. Note that
MutableRepo::base_repo() isn't removed as it has to return &Arc<_>, whereas
<ReadonlyRepo as Repo>::base_repo() can't return &Arc<_>.
The id.shortest() template prints a warning and falls back to repo-global
resolution. This seems better than erroring out. There are a few edge cases
in which the short-prefixes resolution can fail unexpectedly. For example, the
trunk() revision might not exist in operations before "jj git clone".
The idea is that the disambiguation index can be loaded from a repo which is
different from the symbol resolution context.
Suppose we add at_operation(op, expr) revset, a symbol inside at_operation()
expression will have to be resolved within that operation, whereas the
disambiguation index is cached globally by WorkspaceCommandHelper. We could
build temporary disambiguation index for each at-op repo, but that would be
complicated implementation-wise, and wouldn't be useful. For example, a query
"x | at_operation(@-, x)" might be resolved to "xy | at_operation(@-, xz)"
if disambiguation index were reloaded for the @- operation. Instead, the
short change ID "x" can be disambiguated to "xy", then resolved to the
corresponding commit IDs at each operation.
This unblocks reuse of a symbol resolver instance for a different repo view
specified by at_operation() revset. See later commits for details. It's also
easier to handle error if there is a single function that can fail.
This removes an invalid View state from the root operation.
Note that the root index will have to be reindexed in order to resolve "root()"
in the root operation. I don't think this would practically matter, so this
patch doesn't bump the index version to invalidate the existing indexes.
See also 48a9f9ef56 "repo: use Transaction for creating repo-init operation."
See the next patch for why. It might look odd that OpStore depends on the root
CommitId, but that seems okay because OpStore manages Views, and a View is
basically a set of CommitIds.
For #3673, we will have aliases such as:
```toml
'upload(revision)' = [
["fix", "-r", "$revision"],
["lint", "-r", "$revision"],
["git", "push", "-r", "$revision"],
]
```
Template aliases:
1) Start as Config::Value
2) Are converted to String
3) Are placed in the alias map
4) Expand to a TemplateExpression type via expand_defn.
However, command aliases:
1) Start as Config::Value
2) Are converted to Vec<Vec<String>>
3) Are placed in an alias map
4) Do not expand
Thus, AliasesMap will need to support non-string values.
We're likely to use the right (or new) context lines in rendered diffs, but
it's odd that the hunks iterator choose which context hunk to return. We'll
also need both contents to calculate left/right line numbers.
Since the hunk content types are the same, I also split enum DiffHunk into
{ kind, contents } pair.
This change made some diff benches slow, maybe because the generated code
becomes slightly worse due to the added abstraction? I'll revisit the
performance problem later. There are a couple of ways to mitigate it.
```
group new old
----- --- ---
bench_diff_git_git_read_tree_c 1.02 61.0±0.23µs 1.00 59.7±0.38µs
bench_diff_lines/modified/10k 1.00 41.6±0.24ms 1.02 42.3±0.22ms
bench_diff_lines/modified/1k 1.00 3.8±0.07ms 1.00 3.8±0.03ms
bench_diff_lines/reversed/10k 1.29 23.4±0.20ms 1.00 18.2±0.26ms
bench_diff_lines/reversed/1k 1.05 517.2±5.55µs 1.00 493.7±59.72µs
bench_diff_lines/unchanged/10k 1.00 3.9±0.10ms 1.08 4.2±0.10ms
bench_diff_lines/unchanged/1k 1.01 356.8±2.33µs 1.00 353.7±1.99µs
```
(I don't get stable results on my noisy machine, so the results would vary.)