This serves the role of limit() in Mercurial. Since revsets in JJ is
(conceptually) an unordered set, a "limit" predicate should define its
ordering criteria. That's why the added predicate is named as "latest".
Closes#1110
There are no remaining places where we iterate over a revset and need
the `IndexEntry`s, so we can now make `Revset::iter()` yield
`CommitId`s instead.
One of the remaining places we depend on index positions is when
creating a `ChangeIdIndex`. This moves that into the revset engine
(which is coupled to the commit index implementation) by adding a
`Revset::change_id_index()` method. We will also use this function
later when add support for resolving change id prefixes within a small
revset.
The current implementation simply creates an in-memory index using the
existing `IdIndex` we have in `repo.rs`.
The custom implementation at Google might do the same for small
revsets that are available on the client, but for revsets involving
many commits on the server, it might use a suboptimmal implementation
that uses longer-than-necessary prefixes for performance reasons. That
can be done by querying a server-side index including changes not in
the revset, and then verifying that the resulting commits are actually
in the revset.
The function is only used in tests, so it doesn't belong in
`default_revset_engine`. Also, it's not specific to that
implementation, so I rewrote as a revset evaluation.
I'd like to be able to pass a `self` of `type `&ReadonlyRepo` to
functions that take a `&dyn Repo`. For that, we need `ReadonlyRepo`
itself to implement `Repo` instead of having `Arc<ReadonlyRepo>`
implement it. I could have solved it in a different way, but the `Arc`
requirement seems like an unnecessary constraint.
The index position is specific to the default index implementation and
we don't want to use it in outside of there. This commit removes the
use of it as a key for nodes in the graphlog.
I timed it on the git.git repo using `jj log -r 'all()' -T commit_id`
(the worst case I can think of) and it slowed down from ~2.02 s to
~2.20 s (~9%).
For large repos, it's useful to be able to use shorter change id and
commit id prefixes by resolving the prefix in a limited subset of the
repo (typically the same subset that you'd want to see in your default
log output). For very large repos, like Google's internal one, the
shortest unique prefix evaluated within the whole repo is practically
useless because it's long enough that the user would want to copy and
paste it anyway.
Mercurial supports this with its `revisions.disambiguatewithin` config
(added in https://www.mercurial-scm.org/repo/hg/rev/503f936489dd). I'd
like to add the same feature to jj. Mercurial's implementation works
by attempting to resolve the prefix in the whole repo and then, if the
prefix was ambiguous, it resolves it in the configured subset
instead. The advantage of doing it that way is that there's no extra
cost of resolving the revset defining the subset if the prefix was not
ambiguous within the whole repo. However, there are two important
reasons to do it differently in jj:
* We support very large repos using custom backends, and it's probably
cheaper to resolve a prefix within the subset because it can all be
cached on the client. Resolving the prefix within the whole repo
requires a roundtrip to the server.
* We want to be able to resolve change id prefixes, which is always
done in *some* revset. That revset is currently `all()`, i.e. all
visible commits. Even on local disk, it's probably cheaper to
resolve a small revset first and then resolve the prefix within that
than it is to build up the index of all visible change ids.
We could achieve the goal by letting each revset engine respect the
configured subset, but since the solution proposed above makes sense
also for local-disk repos, I think it's better to do it outside of the
revset engine, so all revset engines can share the code.
This commit prepares for the new functionality by moving the symbol
resolution out of `Index::evaluate_revset()`.
We want to allow customization of the revset engine, so it can query
server indexes, for example. The current revset implementation will be
our default implementation for now. What's left in the `revset` module
after this commit is mostly parsing code.
To be able to make e.g. `jj log some/path` perform well on cloud-based
repos, a custom revset engine needs to be able to see the paths to
filter by. That way it is able pass those to a server-side index. This
commit helps with that by effectively converting `jj log -r foo
some/path` into `jj log -r 'foo & file(some/path)'`.
The type doesn't seem to provide any benefit. I don't think I had a
good reason for creating it in the first place; it was probably just
unfamiliarity with Rust.
By separating the value spaces change ids and commit ids, we can
simplify lookup of a prefix. For example, if we know that a prefix is
for a change id, we don't have to try to find matching commit ids. I
think it might also help new users more quickly understand that change
ids are not commit ids.
This commit is a step towards that separation. It allows resolving
change ids by using hex digits from the back of the alphabet instead
of 0-f, so 'z'='0', 'y'='1', etc, and 'k'='f'. Thanks to @ilyagr for
the idea. The regular hex digits are still allowed.
Git's HEAD ref is similar to other refs and can logically have
conflicts just like the other refs in `git_refs`. As with the other
refs, it can happen if you run concurrent commands importing two
different updates from Git. So let's treat `git_head` the same as
`git_refs` by making it an `Option<RefTarget>`.
Add a new git.auto-local-branch config option. When set to false, a
remote-tracking branch imported from Git will not automatically create a
local branch target. This is implemented by a new GitSettings struct
that passes Git-related settings from UserSettings.
This behavior is particularly useful in a co-located jj and Git repo,
because a Git remote might have branches that are not of everyday
interest to the user, so it does not make sense to export them as local
branches in Git. E.g. https://github.com/gitster/git, the maintainer's
fork of Git, has 379 branches, most of which are topic branches kept
around for historical reasons, and Git developers wouldn't be expected
to have local branches for each remote-tracking branch.
I've preferred "working-copy commit" over "checkout" for a while
because I think it's clearer, but there were lots of places still
using "checkout". I've left "checkout" in places where it refers to
the action of updating the working copy or the working-copy commit.
- branches has the signature branches([needle]), meaning the needle is optional (branches() is equivalent to branches("")) and it matches all branches whose name contains needle as a substring
- remote_branches has the signature remote_branches([branch_needle[, remote_needle]]), meaning it can be called with no arguments, or one argument (in which case, it's similar to branches), or two arguments where the first argument matches branch names and the second argument matches remote names (similar to branches, remote_branches(), remote_branches("") and remote_branches("", "") are all equivalent)
We already have `create_random_commit()`, which returns a
`CommitBuilder`. Most callers directly write that to a
`MutableRepo`. That currently returns a `Commit`, but I'm about to
make it propagate errors from the backend. That would add an
`unwrap()` to this sequence, making it longer. Let's create a simple
helper for these callers to simplify this common pattern.
When you're done with the `CommitBuilder`, you're going to have to
call `write_to_repo()`, passing it a mutable `MutableRepo`
reference. It's a bit simpler to pass that reference when we create
the `CommitBuilder` instead, so that's what this patch does.
A drawback of passing in the mutable reference when we create the
builder is that we can't have multiple unfinished `CommitBuilder`
instance live at the same time. We don't have any such use cases yet,
and it's not hard to work around them, so I think this change is worth
it.
The next commit will introduce a newtype for -m/--message argument which
can be converted Into<String>.
Since CommitBuilder is a thin wrapper, code bloat caused by generic parameters
wouldn't matter. I have another set of commits that makes all builder methods
accept Into/IntoIterator, which will remove some of .clone() calls from tests.
I ran an upgraded Clippy on the codebase. All the changes seem to be
about using variables directly in format strings instead of passing
them as separate arguments.
This basically transforms 's1 & (f() | s2)' to
's1.iter().filter(all && f || s2)'. Still the predicate part includes "all",
the filter function doesn't need to load commit data for every entry since
's1.iter().filter(all)' is tested first. To optimize "all" predicate out,
maybe we can add a wrapper that returns '|_: &IndexEntry| true'.
Instead of inserting AsFilter(_) node, I could add a recursive is_filter()
function. That would also work so long as the height of RevsetExpression tree
is limited. I chose node insertion just for ease of snapshot testing.
Because a unary negation node '~y' is more primitive than the corresponding
difference node 'x~y', '~y' is easier to deal with while rewriting the tree.
That's the main reason to add RevsetExpression::NotIn node.
As we have a NotIn node, it makes sense to add an operator for that. This
patch reuses '~' token, which I feel intuitive since the other set operators
looks like bitwise ops. Another option is '!'.
The unary '~' operator has the highest precedence among the set operators,
but they are lower than the ranges. This might be counter intuitive, but
useful because a prefix range ':x' can be negated without parens.
Maybe we can remove the redundant infix operator 'x ~ y', but it isn't
decided yet.
Let's acknowledge everyone's contributions by replacing "Google LLC"
in the copyright header by "The Jujutsu Authors". If I understand
correctly, it won't have any legal effect, but maybe it still helps
reduce concerns from contributors (though I haven't heard any
concerns).
Google employees can read about Google's policy at
go/releasing/contributions#copyright.
Follows up c5ed3e1477. Now change/commit ids are resolved at the same
precedence, which means there are at least three types of ambiguity.
I don't think we would need to discriminate these.
Because the use of the change id is recommended, any operation should abort
if a valid change id happens to match a commit id. We still try the commit
id lookup first as the change id lookup is more costly.
Ambiguous change/commit id is reported as AmbiguousCommitIdPrefix for now.
Maybe we can merge AmbiguousCommit/ChangeIdPrefix errors into one?
Closes#799
The CLI will load aliases from config, insert them one by one, and warn if
declaration part is invalid. That's why RevsetAliasesMap is a public struct
and needs to be instantiated by the caller.
The expression 'x ~ empty()' is identical to 'x & file(".")', but more
intuitive.
Note that 'x ~ empty()' is slower than 'x & file(".")' since the negative
intersection isn't optimized right now. I think that can be handled as
follows: 'x ~ filter(f)' -> 'x & filter(!f)' -> 'filter(!f, x)'
We currently get the hostname and username from the `whoami` crate. We
do that in lib crate, without giving the caller a way to override
them. That seems wrong since it might be used in a server and
performing operations on behalf of some other user. This commit makes
the hostname and username configurable, so the calling crate can pass
them in. If they have not been passed in, we still default to the
values from the `whoami` crate.
These calls often appear in expressions long enough that not having to
qualify it means that we can sometimes avoid wrapping a line. I
noticed because IntelliJ told me that `test_git.rs` had some
unnecessary qualificiations (the function was already imported there).
The `testutils` module should ideally not be part of the library
dependencies. Since they're used by the integration tests (and the CLI
tests), we need to move them to a separate crate to achieve that.
Since 'merges()' just filters the candidates set per item, it doesn't need
a candidates argument. Perhaps, 'merges(x)' could be a predicate to select
merge commits within a subgraph 'x', but I don't know if that would be
useful.
More workspace-derived parameters will be added, and I don't think wrapping
with Option for each makes sense because all parameters should be available
if workspace exists.
This moves the logic for handling the root commit when writing commits
from `CommitBuilder` into the individual backends. It always bothered
me a bit that the `commit::Commit` wrapper had a different idea of the
number of parents than the wrapped `backend::Commit` had.
With this change, the `LocalBackend` will now write the root commit in
the list of parents if it's there in the argument to
`write_commit()`. Note that root commit itself won't be written. The
main argument for not writing it is that we can then keep the fake
all-zeros hash for it. One argument for writing it, if we were to do
so, is that it would make the set of written objects consistent, so
any future processing of them (such as GC) doesn't have to know to
ignore the root commit in the list of parents.
We still treat the two backends the same, so the user won't be allowed
to create merges including the root commit even when using the
`LocalBackend`.
`wc_commit` seems clearer than `checkout` and not too much longer. I
considered `working_copy` but it was less clear (could be the path to
the working copy, or an instance of `WorkingCopy`). I also considered
`working_copy_commit`, but that seems a bit too long.
This will be a basic building block of 'jj log PATH'. The implementation
is naive, but works fine for small repos like jj. For mid-size repos,
there would be various areas which need to be optimized.
The `CommitBuilder::store` field is used only in
`CommitBuilder::write_to_repo()`, but we can easily get access to the
`Store` from the `repo` argument there, so let's remove the field.
We don't even have any settings that affect the repo, so there's no
point in passing the settings. I think this was a leftover from before
we separated out the "workspace" concept; now we no longer create a
working-copy commit when we initialize a repo (we do that when we
attach the workspace).
This introduces a `connected(x)` function, which is simply the same as
`x:x`. It's occasionally useful if `x` is a long expression. It's also
useful as a building block for `root(x)` (coming soon).
Most tests need a repo but don't need a working copy. Let's have a
function for setting up a test repo. But first, let's free up the name
`init_repo()` by renaming it to `init_workspace()` (which is also more
accurate).
Because we record each workspace's checkout in the repo view, we can
-- unlike other VCSs -- let the user refer to any workspace's checkout
in revsets. This patch adds syntax for that, so you can show the
contents of the checkout in workspace "foo" with `jj show foo@`. That
won't automatically commit that workspace's working copy, however.
The recent e5dd93cbf7, whose description says "cleanup: make Vec
inside CommitId etc. non-public", made all ID types in the `backend`
module *except* for `CommitId` non-public :P This patch makes
If you rewrite a commit that's also available on some remote, you'll
currently see both the old version and the new version in the view,
which means they're divergent. They're not logically divergent (the
rewritten version should replace the old version), so this is a UX
bug. I think it indicates that the set of current heads should be
redefined to be the *desired* heads. That's also what I had suspected
in the TODO removed by this change. I think another indication that
we should hide the old heads even if they have e.g. a remote branch
pointing to them is that we don't want them to be rebased if we
rewrite an ancestor.
So that's what I decided to do: let the view's heads be the desired
heads. The user can still define revsets for showing non-current
commits pointed to by e.g. remote branches.
With this change, you can do e.g. `heads(remote_branches())`. That
should currently be the same as `public_heads()`, except that we don't
yet remove public heads when remote branches have been updated. Having
this support should be generally useful, but I may use it in the short
term specifically for depending less on the public heads, until I get
around to keeping them up to date.
Now that we remove hidden heads whenever a transaction commits,
`non_obsolete_heads()` should always be the same as `all_heads()`,
except during a transaction. I don't think we depend on the difference
even during a transaction. Let's simplify a bit by removing the revset
function `all_heads()` and renaming `non_obsolete_heads()` to
`heads()`. This is part of issue #32.
There were some tests that discarded a transaction only because it
used to be easier to do that than to commit and reload the repo. We
get the new repo back when we commit the transaction these days, so
now it's often easier to commit the transaction instead.
This adds support for having conflicting git refs in the view, but we
never create conflicts yet. The `git_refs()` revset includes all "add"
sides of any conflicts. Similarly `origin/main` (for example) resolves
to all "adds" if it's conflicted (meaning that `jj co origin/main` and
many other commands will error out if `origin/main` is
conflicted). The `git_refs` template renders the reference for all
"adds" and adds a "?" as suffix for conflicted refs.
The reason I'm adding this now is not because it's high priority on
its own (it's likely extremely uncommon to run two concurrent `jj git
refresh` and *also* update refs in the underlying git repo at the same
time) but because it's a building block for the branch support I've
planned (issue #21).
This patch makes it so we attempt to resolve a symbol as the
non-obsolete commits in a change id if all other resolutions
fail.
This addresses issue #15. I decided to not require any operator for
looking up by change id. I want to make it as easy as possible to use
change ids instead of commit ids to see how well it works to interact
mostly with change ids instead of commit ids (I'll try to test that by
using it myself).
I'd like to experiment with mostly using change ids instead of commit
ids on the CLI. Then it needs to be easy to refer to the non-obsolete
commits in a change, which means we probably don't want to require any
operators (i.e. a plain change id should resolve to the non-obsolete
commits in the change). This patch prepares for letting a change id
resolve to (possibly) many commits.
This adds a `git_refs()` revset that includes all commits pointed to
by a git ref. It's not very useful yet because the graph log doesn't
use the right type of edges for non-contiguous commits.
This lets you use the same operator as we currently have for ancestors
and descendants (`,,`) to also specify a DAG range. That's what
Mercurial uses the `::` operator for and what Git has `git log
--ancestry-path` for.
I really liked the idea of having the operators for parents and
ancestors (etc.) look similar, but that turned out to be problematic
when we want to add an infix operator for a DAG range (hg's `::`
revset operator and git's `--ancestry-path` flag). Let's say we chose
`:*:` as the operator. Part of the problem is how to parse `foo:*:bar`
without eagerly parsing the `foo:`. It would also be nicer to use
exactly the same operator as prefix, postfix, and infix. Since the
"parents" operator can be repeated, we can't have it be just `:` and
the "ancestors" operator be `::`. We could make the "ancestors"
operator be something like `*:*` (or anything symmetric with the `:`
symbol on the inside). However, at that point, the operator is getting
ugly and hard to type. Another option would be to use `:` for
ancestors and `::` for parents, but that is counterintuitive and get
annoying if you want to repeat it. So it seems that the best option is
to simply pick different symbols for parents/children and
ancestors/descendants/range.
This patch changes the ancestors/descendants operators to both be
`,,`. I'm not at all attached to that particular symbol. I suspect
we'll change it later.
This adds `children(<set>)` and `<set>:` for the children of the given
set, and `descendants(<set>)` and `<set>:*` for the descendants of the
given set. The children and descendants are filtered to be among
ancestors of non-obsolete commits. I haven't added a way of overriding
that yet.
The tests don't need any complex set up (no repo necessary), so they
can be in the `revset` module itself. I'm sure we'll need to split up
that module later (at least separate out the parsing), but that's a
separate problem.
This change adds a `non_obsolete_heads(<set>)` revset, which walks up
ancestors of the input set until it gets to a non-obsolete and
non-pruned commit. That's what we do by default in `jj log`
(i.e. without `--all`). Now we can make `jj log` use revsets and teach
it a `-r` option!
This adds `parents(foo)` and `ancestors(foo)` as alternative ways of
writing `:foo` and `*:foo`.
I haven't added support for for whitespace yet; the parsing is very
strict. The error messages will also need to be improved later.
This patch adds initial support for a DSL for specifying revisions
inspired by Mercurial's "revset" language. The initial support
includes prefix operators ":" (parents) and "*:" (ancestors) with
naive parsing of the revsets. Mercurial uses postfix operator "^" for
parent 1 just like Git does. It uses prefix operator "::" for
ancestors and the same operator as postfix operator for descendants. I
did it differently because I like the idea of using the same operator
as prefix/postfix depending on desired direction, so I wanted to apply
that to parents/children as well (and for
predecessors/successors). The "*" in the "*:" operator is copied from
regular expression syntax. Let's see how it works out. This is an
experimental VCS, after all.
I've updated the CLI to use the new revset support.
The implementation feels a little messy, but you have to start
somewhere...