This PR ships a series of optimizations for the semantic search engine.
Mostly focused on removing invalid states, optimizing requests to
OpenAI, and reducing token usage.
Release Notes (Preview-Only):
- Added eager incremental indexing in the background on a debounce.
- Added a local embeddings cache for reducing redundant calls to OpenAI.
- Moved to an Embeddings Queue model which ensures optimal batch sizes
at the token level, and atomic file & document writes.
- Adjusted OpenAI Embedding API requests to use provided backoff delays
during Rate Limiting.
- Removed flush races between parsing files step and embedding queue
steps.
- Moved truncation to parsing step reducing the probability that OpenAI
encounters bad data.
This should have no user-visible impact.
For vim `.` to repeat it's important that actions are replayable.
Currently editor::MoveDown *sometimes* moves the cursor down, and
*sometimes* selects the next completion.
For replay we need to be able to separate the two.
Because of the way we set up tools that add rows inside the toolbar it
is complicated to tighten up the spacing inside the toolbar.
This PR just reverts the changes I made previously. We'll need to
properly add rows below the toolbar instead of rendering search inside
of it to have non-equal height tools be able to descend from it.
Release Notes:
- Preview – Fixed an issue where search filters were partially cut off
in the UI.
This should have no user-visible impact.
For vim `.` to repeat it's important that actions are replayable.
Currently editor::MoveDown *sometimes* moves the cursor down, and
*sometimes* selects the next completion.
For replay we need to be able to separate the two.
Fixes movement::find_boundary to work on the buffer, not on display
points.
The user-visible impact is that the "until end of word" commands now
correctly go to the end of a soft-wrapped word (instead of to the first
character of the wrapped line).
It also fixes a bug where the callback passed to these methods was
called with the content of inlay hints.
[[PR Description]]
Release Notes:
- fix finding end of word on soft-wrapped lines
### Background
Currently, our CRDT uses three different types of timestamps:
| clock type | representation | purpose |
|-----|----------------|----------|
| `Local` | replica id + u32 | uniquely identifies operations |
| `Lamport` | replica id + u32 | provides a consistent total ordering
for all operations |
| `Global` | N local clocks | fully defines the partial ordering between
all concurrent operations |
All text operations include *each* type of timestamp. And every
`Fragment` in a buffer's fragment tree contains both a local and a
lamport timestamp.
### Change
An operation can be uniquely identified by its lamport timestamp, so we
don't really need a concept of a local timestamp. In this PR, I've
removed the concept of a local timestamp. Version vectors
(`clock::Global`) now store vectors of *lamport* timestamps.
Eliminating local timestamps reduces the memory footprint of a buffer by
four bytes per fragment, reduces the size of our `UpdateBuffer` RPC
messages, and reduces the amount of data we need to store in our
database for channel buffers. It also makes our CRDT a bit easier to
understand, IMO, because there is now only one scalar value that we
increment per replica.
It's possible I'm missing something here though. @as-cii, @nathansobo
it'd be good to get your 👀