This command is similar to Mercurial's revset benchmarking command. It
lets you pass in a file containing revsets. I also included a file
with some revsets to test on the git.git repo. I put it in `testing/`,
which doesn't seem perfect. I'm happy to hear suggestions for better
places, or we can move it later if we find a better place.
Note that these tests don't clear caches between each run (or even
between tests), so revsets that rely on filtering commit data that's
not indexed appear faster than they typically are in reality.
I suspect the `jj bench walkrevs` command was from before we had
support for revsets. Now there doesn't seem to be any reason to have a
specific command for only range revsets (`foo..bar`), so let's replace
it by a command for benchmarking an arbitrary revset.
The `jj bench` commands are mostly meant for developers, so lets hide
the command from help and behind a `bench` feature flag. The feature
flags avoids bloating the binary with the `criterion` dependencies,
which was the reason I removed the command in 18c0b97d9d.
This just backs out commit 18c0b97d9d without making any changes,
except for resolving conflicts.
I want a way to benchmark different revsets on e.g. the Git Core repo
or the Linux repo.
There are no remaining places where we iterate over a revset and need
the `IndexEntry`s, so we can now make `Revset::iter()` yield
`CommitId`s instead.
There should be no problem to evaluate revset against base_repo and collect
commit objects from (mut_)repo, but it seemed a bit odd.
In rebase examples other than the "new --insert-after", we could switch to
tx.repo(). However, I think the use of tx.base_repo() makes it clear that
there's no data dependency on the previous mutation.
I'd like to be able to pass a `self` of `type `&ReadonlyRepo` to
functions that take a `&dyn Repo`. For that, we need `ReadonlyRepo`
itself to implement `Repo` instead of having `Arc<ReadonlyRepo>`
implement it. I could have solved it in a different way, but the `Arc`
requirement seems like an unnecessary constraint.
In most cases, we just need to access the commit backend and then the
shorter `base()` works. I noticed because I wanted to implement `Repo`
on `ReadonlyRepo` instead of on `Arc<ReadonlyRepo>` and then these
uses failed.
The index position is specific to the default index implementation and
we don't want to use it in outside of there. This commit removes the
use of it as a key for nodes in the graphlog.
I timed it on the git.git repo using `jj log -r 'all()' -T commit_id`
(the worst case I can think of) and it slowed down from ~2.02 s to
~2.20 s (~9%).
I think requests to reset the author came up twice in the last week,
so let's just add support for it. I copied git's behavior of resetting
the name, email, and timestamp. The flag name is also from git.
We need 1.64 to bump `clap` to `4.1`. We don't really need to upgrade
to that, but being on an older version causes minor confusions like
#1393. Rust 1.64 is very close to 6 months old at this point.
For large repos, it's useful to be able to use shorter change id and
commit id prefixes by resolving the prefix in a limited subset of the
repo (typically the same subset that you'd want to see in your default
log output). For very large repos, like Google's internal one, the
shortest unique prefix evaluated within the whole repo is practically
useless because it's long enough that the user would want to copy and
paste it anyway.
Mercurial supports this with its `revisions.disambiguatewithin` config
(added in https://www.mercurial-scm.org/repo/hg/rev/503f936489dd). I'd
like to add the same feature to jj. Mercurial's implementation works
by attempting to resolve the prefix in the whole repo and then, if the
prefix was ambiguous, it resolves it in the configured subset
instead. The advantage of doing it that way is that there's no extra
cost of resolving the revset defining the subset if the prefix was not
ambiguous within the whole repo. However, there are two important
reasons to do it differently in jj:
* We support very large repos using custom backends, and it's probably
cheaper to resolve a prefix within the subset because it can all be
cached on the client. Resolving the prefix within the whole repo
requires a roundtrip to the server.
* We want to be able to resolve change id prefixes, which is always
done in *some* revset. That revset is currently `all()`, i.e. all
visible commits. Even on local disk, it's probably cheaper to
resolve a small revset first and then resolve the prefix within that
than it is to build up the index of all visible change ids.
We could achieve the goal by letting each revset engine respect the
configured subset, but since the solution proposed above makes sense
also for local-disk repos, I think it's better to do it outside of the
revset engine, so all revset engines can share the code.
This commit prepares for the new functionality by moving the symbol
resolution out of `Index::evaluate_revset()`.
The callers don't need to hold on to the revset expression once it's
been evaluated, and having an owned expression (well, an expression
with shared ownership) will avoid a clone in the next commit.
A mapped template is basically a combined function that takes context: &C,
extracts Vec<O>, and formats each item with Template<C>. It cannot be cleanly
turned into a function of (&C) -> Vec<Template<()>> type. So list-like methods
are implemented on Box<dyn ListTemplate<C>> instead.
I'm going to add a trait that provides .join() -> Box<dyn Template>.
wrap_template() should handle it transparently, but the current interface
would require excessive boxing.
This involves a little hack to insert a lambda parameter 'x' to be used at
keyword position. If the template language were dynamically typed (and were
interpreted), .map() implementation would be simpler. I considered that, but
interpreter version has its own warts (late error reporting, uneasy to cache
static object, etc.), and I don't think the current template engine is
complex enough to rewrite from scratch.
.map() returns template, which can't be join()-ed. This will be fixed later.
A lambda expression will be allowed only in .map() operation. The syntax is
borrowed from Rust closure.
In Mercurial, a map operation is implemented by context substitution. For
example, 'parents % "{node}"' prints parents[i].node for each. There are two
major problems: 1. the top-level context cannot be referred from the inner map
expression. 2. context of different types inserts arbitrarily-named keywords
(e.g. a dict type inserts "{key}" and "{value}", but how we could know.)
These issues should be avoided by using explicitly named parameters.
parents.map(|parent| parent.commit_id ++ " " ++ commit_id)
^^^^^^^^^ global keyword
A downside is that we can't reuse template fragment in map expression. Suppose
we have -T commit_summary, -T 'parents.map(commit_summary)' doesn't work.
# only usable as a top-level template
'commit_summary' = 'commit_id.short() ++ " " ++ description.first_line()'
Another problem is that a lambda expression might be confused with an alias
function.
# .map(f) doesn't work, but .map(g) does
'f(x)' = 'x'
'g' = '|x| x'
The `jj debug` commands are hidden from help and are described as
"Low-level commands not intended for users", but e.g. `jj debug
completion` is intended for users, and should be visible in the help
output.
By using one letter for the path type before and one letter for path
type after, we can encode much more information than just the current
'M'/'A'/'R'. In particular, we can indicate new and resolved
conflicts. The color still encodes the same information as before. The
output looks a bit weird after many years of using `hg status`. It's a
bit more similar to the `git status -s` format with one letter for the
index and one with the working copy. Will we get used to it and find
it useful?
I'm going to add a lambda expression, and the current type-error message
wouldn't work for the lambda type. I also renamed "argument" to "expression"
as the expect_<type>() helper may be called against any expression node.
This is similar to the structure of RevsetParseError. It's unlikely we would
need to discriminate parsing errors, so let's avoid wasting time on naming
things.
In templater, it's easier to handle invalid format string at parsing stage, so
I want to build formatting items upfront. Since the formatting items borrow
the input string by reference, we need to manually convert them to the owned
variants.
While measuring overhead of interpreter version of the template engine, I
noticed the templater spend some time in chrono. I don't think this would
matter in practice, but it's easy to cache the formatting items.
% jj log -r'all()' -T'".\n"' --no-graph | wc -l
2996
% hyperfine --warmup 3 --runs 20 "jj log --ignore-working-copy -r 'all()' -Tshow --no-graph"
(original)
Time (mean ± σ): 120.0 ms ± 18.7 ms [User: 97.5 ms, System: 22.5 ms]
Range (min … max): 96.7 ms … 144.1 ms 20 runs
(new)
Time (mean ± σ): 106.2 ms ± 12.3 ms [User: 86.1 ms, System: 20.1 ms]
Range (min … max): 96.3 ms … 130.4 ms 20 runs
Regarding the template engine rewrites, I'm yet sure that the interpreter
version is strictly better. It's simpler, but could make some caching story
difficult. So I'm not gonna replace the engine anytime soon.
We want to allow custom revset engines define their own graph
iterator. This commit helps with that by adding a
`Revset::iter_graph()` function that returns an abstract iterator.
The current `RevsetGraphIterator` can be configured to skip or include
transitive edges. It skips them by default and we don't expose option
in the CLI. I didn't bother including that functionality in the new
`iter_graph()` either. At least for now, it will be up to the
implementation whether it includes such edges (it would of course be
free to ignore the caller's request even if we added an option for it
in the API).
@joyously found `o` confusing because it's a valid change id prefix. I
don't have much preference, but `●` seems fine. The "ascii",
"ascii-large", and "legacy" graph styles still use "o".
I didn't change `@` since it seems useful to have that match the
symbol used on the CLI. I don't think we want to have users do
something like `jj co ◎-`.
I'm about to make the default (non-working-copy) node symbol be a
unicode symbol, but we only want that when using a unicode graph, so
users with a terminal that doesn't support unicode can get plain ASCII
output by setting e.g. `ui.graph.style = "ascii"`.
We don't want custom index implementations to have to conform to the
same kind of stats as the default implementation. This commit also
makes the command error out on non-default index types.
I broke the commands in a27da7d8d5 and thought I just fixed it in
c7cf914694a8. However, as I added a test, I realized that I made it
only reindex the commits since the previous operation. I meant for the
command to do a full reindexing of th repo. This fixes that.
I'm thinking of rewriting the evaluation part as a simple interpreter. It
will increase the runtime cost (about a few microseconds per entry I suppose),
but will greatly reduce the complexity of generic property function chaining.
The extracted template_builder module is the part I'm going to reimplement.
I broke `jj debug reindex` in a27da7d8d5. From that commit, we no
longer delete the pointer to the old index, so nothing happens when we
reload the index. This commit fixes that, and also makes the command
error out if run on a repo with a non-default index type.
Not all index implementations may want to store the readonly index
implementation in an Arc. Exposing the Arc in the interface is also
problematic because `Arc<IndexImpl>` cannot be cast to `Arc<dyn
Index>`.
Unlike Mercurial, this isn't a template keyword/function, but a config knob.
Exposing graph_width to templater wouldn't be easy, and I don't think it's
better to handle terminal wrapping in template.
I'm not sure if patch content should be wrapped, so this option only applies
to the template output for now.
Closes#1043
I'm going to add $COLUMNS override, and it should work even if ioctl() on tty
failed. This means that the return type has to be (Option<u16>, Option<u16>).
Since we don't use the row count, I decided to drop it.
The parameter order follows indent()/label() functions, but this might be
a bad idea because fill() is more likely to have optional parameters. We can
instead add template.fill(width) method as well as .indent(prefix). If we take
this approach, we'll probably need to add string.fill()/indent() methods,
and/or implicit cast at method resolution. The good thing about the method
syntax is that we can add string.refill(), etc. for free, without inventing
generic labeled template functions.
For #1043, I think it's better to add a config like ui.log-word-wrap = true.
We could add term_width/graph_width keywords to the templater, but the
implementation would be more complicated, and is difficult to use for the
basic use case. Unlike Mercurial, our templater doesn't have a context map
to override the graph_width stub.
wrap_bytes() is similar to textwrap::wrap(), but can process arbitrary bytes.
More importantly, it guarantees that byte offsets can be reconstructed from
the split slices. This allows us to interleave push/pop_label()s with split
text fragments.
We could calculate byte offsets upfront, but using slice API is more
convenient. That's why I didn't add inner function returning Vec<Range>.
New word-wrap function will be implemented in two passes. The first pass
splits byte slice to lines, and the second pass inserts "\n" based on that
while interleaving push/pop_label() calls and text fragments. Since the second
pass combines multiple data sources, byte indices are more convenient than
slices there.
It's getting confusing since we now have a list property type.
expand/normalize_list() functions aren't renamed since they are also applied
to a list of function arguments.
A list type isn't so useful without a map operation, but List<CommitId>
is at least printable. Maybe we can experiment with it to craft a map
operation.
If a map operation is introduced, this keyword might be replaced with
"parents.map(|commit| commit.commit_id)", where parents is of List<Commit>
type, and the .map() method will probably return List<Template>.
The argument order is different from Mercurial's indent() function. I think
indent(prefix, content) is more readable for lengthy content. However,
indent(content, prefix, ...) might be better if we want to add an optional
firstline_prefix argument.
Template functions like indent() or fill() need to manipulate labeled
output. Since indent() is line oriented, it could be implemented as a
post-processing filter. OTOH, fill()/wrap() inserts additional "\n"s. If we
do that as a post process, colorized text could be split into multiple lines,
and would mess up graph log output. By using FormatRecorder, we can apply
text formatting in between labels.
I thought we could disallow text wrapping of labeled template fragments, but
the example in #1043 suggests that we do want to wrap(whole_template_output)
rather than simple description.wrap().
In `git_fetch()`, any glob present in `globs` is an "allow" mark. Using
`&[]` to represent an "allow-all" may be misleading, as it could
indicate that no branch (only the git HEAD) should be fetched.
By using an `Option<&[&str]>`, it is clearer that `None` means that
all branches are fetched.
Using &[String] forces the caller to materalize owned strings if they
have only references, which is costly. Using &[&str] makes it cheap
if the caller owns strings as well.
This eliminates ambiguous parsing between "func()" and "expr ()".
I chose "++" as template concatenation operator in case we want to add
bit-wise negate operator. It's also easier to find/replace than "~".
Here we know each field will never be empty, but separate(" ", foo, bar)
looks slightly better than 'foo ++ " " ++ bar'. Implicit template concatenation
will be disabled soon.
To be able to make e.g. `jj log some/path` perform well on cloud-based
repos, a custom revset engine needs to be able to see the paths to
filter by. That way it is able pass those to a server-side index. This
commit helps with that by effectively converting `jj log -r foo
some/path` into `jj log -r 'foo & file(some/path)'`.
Since there's no easy API to snapshot the stale working copy without releasing
the lock, we have to compare the tree ids after reacquiring the lock. We could
instead manually snapshot and rebase the working-copy commit, but that would
require more copy-paste codes.
Closes#1310
I plan to make `RepoLoader::init()` return a `Result`, which means
that `WorkspaceLoader::load()` will need to return more kinds of
errors. Making it return `WorkspaceLoadError` is a good start. By also
extracting a function for converting `WorkspaceLoadError` to
`CommandError`, we can reuse a the handling of `PathError` in
`cli_util`.
So the caller can print a commit summary.
It's getting less clear why cli_util::update_working_copy() takes a repo
argument. It might be better to extract a helper struct that operates on
repo + workspace (minus CLI stuff), and move it to the lib crate.
The outermost "op-log" label isn't moved to the default template. I think
it belongs to the command's formatter rather than the template.
Old bikeshedding items:
- "current_head", "is_head", or "is_head_op"
=> renamed to "current_operation"
- "templates.op-log" vs "templates.op_log" (the whole template is labeled
as "op-log")
=> renamed to "op_log"
- "template-aliases.'format_operation_duration(time_range)'"
=> renamed to 'format_time_range(time_range)'
The type doesn't seem to provide any benefit. I don't think I had a
good reason for creating it in the first place; it was probably just
unfamiliarity with Rust.
I was thinking of replacing `RevsetIterator` by a regular
`Iterator<Item=IndexEntry>`. However, that would make it easier to
pass in an iterator that produces revisions in a non-topological order
into `RevsetGraphIterator`, which would produce unexpected results (it
would result in nodes that are not connected to their parents, if
their parents had already been emitted). I think it makes sense to
instead pass in a revset into `RevsetGraphIterator`.
Incidentally, it will also be useful to have the full revset available
in `RevsetGraphIterator` if we rewrite the algorithm to be more
similar to Mercurial's and Sapling's algorithm, which involves asking
the revset if it contains parent revisions.
This basically undoes d6c6cdb45c "templater: store type-erased version of
commit/change id." Since they are looked up differently, they should preserve
the original types.
FWIW, I'm thinking of making the repo parameter generic over Arc<ReadonlyRepo>
and &MutableRepo. It will allow us cache a parsed commit_summary template.
Now it's ready to split template_parser/templater into base template functions
and "commit" templater. I think Signature and Timestamp are basic types, so
they aren't moved to CommitTemplatePropertyKind. Perhaps, a duration type from
OpTemplate will also be added to CoreTemplatePropertyKind.
The idea is that a derived language will do wrap_<core_type>() as
DerivedProperty::Core(CoreProperty::<Type>(property)). This could be dealt
with some From<CoreProperty> trait impls, but the resulting code looked
a mess, and compile errors would be cryptic. I think this is somewhat similar
to serde::Serializer API.
I also rejected the idea of abstracting property types over Box<dyn>. Maybe
it's okay for method dispatching and extraction of some basic types, but it
wouldn't work if we want to implement comparison operators for any compatible
types.
wrap_commit_or_change_id() and wrap_shortest_id_prefix() will be moved to
the CommitTemplateLanguage. I'll add impl_wrap_fns() macro after splitting
the modules.
The "core" template parser wouldn't know how to dispatch property of types
added by a derived language. For example, CommitOrChangeId/ShortestIdPrefix
will be moved to the "commit" templater.
This trait will provide ways to dispatch keyword/method nodes, and wrap
TemplateProperty object with a dedicated "Property" enum.
build_keyword() and context parameter "I"/"C" have been migrated to it.