2022-10-11 22:25:54 +00:00
|
|
|
[package]
|
|
|
|
name = "rope"
|
|
|
|
version = "0.1.0"
|
|
|
|
edition = "2021"
|
2023-01-18 20:28:02 +00:00
|
|
|
publish = false
|
2024-01-27 12:51:16 +00:00
|
|
|
license = "GPL-3.0-or-later"
|
2024-01-23 16:40:30 +00:00
|
|
|
|
2024-03-05 17:01:17 +00:00
|
|
|
[lints]
|
|
|
|
workspace = true
|
|
|
|
|
2022-10-11 22:25:54 +00:00
|
|
|
[lib]
|
|
|
|
path = "src/rope.rs"
|
|
|
|
|
|
|
|
[dependencies]
|
2024-01-31 02:41:29 +00:00
|
|
|
arrayvec = "0.7.1"
|
|
|
|
log.workspace = true
|
2024-10-30 09:59:03 +00:00
|
|
|
rayon.workspace = true
|
2023-04-25 00:41:55 +00:00
|
|
|
smallvec.workspace = true
|
2024-02-06 19:41:36 +00:00
|
|
|
sum_tree.workspace = true
|
Fix caret movement issue for some special characters (#10198)
Currently in Zed, certain characters require pressing the key twice to
move the caret through that character. For example: "❤️" and "y̆".
The reason for this is as follows:
Currently, Zed uses `chars` to distinguish different characters, and
calling `chars` on `y̆` will yield two `char` values: `y` and `\u{306}`,
and calling `chars` on `❤️` will yield two `char` values: `❤` and
`\u{fe0f}`.
Therefore, consider the following scenario (where ^ represents the
caret):
- what we see: ❤️ ^
- the actual buffer: ❤ \u{fe0f} ^
After pressing the left arrow key once:
- what we see: ❤️ ^
- the actual buffer: ❤ ^ \u{fe0f}
After pressing the left arrow key again:
- what we see: ^ ❤️
- the actual buffer: ^ ❤ \u{fe0f}
Thus, two left arrow key presses are needed to move the caret, and this
PR fixes this bug (or this is actually a feature?).
I have tried to keep the scope of code modifications as minimal as
possible. In this PR, Zed handles such characters as follows:
- what we see: ❤️ ^
- the actual buffer: ❤ \u{fe0f} ^
After pressing the left arrow key once:
- what we see: ^ ❤️
- the actual buffer: ^ ❤ \u{fe0f}
Or after pressing the delete key:
- what we see: ^
- the actual buffer: ^
Please note that currently, different platforms and software handle
these special characters differently, and even the same software may
handle these characters differently in different situations. For
example, in my testing on Chrome on macOS, GitHub treats `y̆` as a
single character, just like in this PR; however, in Rust Playground,
`y̆` is treated as two characters, and pressing the delete key does not
delete the entire `y̆` character, but instead deletes `\u{306}` to yield
the character `y`. And they both treat `❤️` as a single character,
pressing the delete key will delete the entire `❤️` character.
This PR is based on the principle of making changes with the smallest
impact on the code, and I think that deleting the entire character with
the delete key is more intuitive.
Release Notes:
- Fix caret movement issue for some special characters
---------
Co-authored-by: Conrad Irwin <conrad.irwin@gmail.com>
Co-authored-by: Thorsten <thorsten@zed.dev>
Co-authored-by: Bennet <bennetbo@gmx.de>
2024-04-10 19:01:25 +00:00
|
|
|
unicode-segmentation.workspace = true
|
2024-02-06 19:41:36 +00:00
|
|
|
util.workspace = true
|
2022-10-11 22:25:54 +00:00
|
|
|
|
|
|
|
[dev-dependencies]
|
2024-07-28 08:52:39 +00:00
|
|
|
ctor.workspace = true
|
|
|
|
env_logger.workspace = true
|
2024-02-06 19:41:36 +00:00
|
|
|
gpui = { workspace = true, features = ["test-support"] }
|
2023-04-25 00:41:55 +00:00
|
|
|
rand.workspace = true
|
2024-02-06 19:41:36 +00:00
|
|
|
util = { workspace = true, features = ["test-support"] }
|
2024-07-23 19:38:47 +00:00
|
|
|
criterion = { version = "0.5", features = ["html_reports"] }
|
Reduce memory usage to represent buffers by up to 50% (#10321)
This should help with some of the memory problems reported in
https://github.com/zed-industries/zed/issues/8436, especially the ones
related to large files (see:
https://github.com/zed-industries/zed/issues/8436#issuecomment2037442695),
by **reducing the memory required to represent a buffer in Zed by
~50%.**
### How?
Zed's memory consumption is dominated by the in-memory representation of
buffer contents.
On the lowest level, the buffer is represented as a
[Rope](https://en.wikipedia.org/wiki/Rope_(data_structure)) and that's
where the most memory is used. The layers above — buffer, syntax map,
fold map, display map, ... — basically use "no memory" compared to the
Rope.
Zed's `Rope` data structure is itself implemented as [a `SumTree` of
`Chunks`](https://github.com/zed-industries/zed/blob/8205c52d2bc204b8234f9306562d9000b1691857/crates/rope/src/rope.rs#L35-L38).
An important constant at play here is `CHUNK_BASE`:
`CHUNK_BASE` is the maximum length of a single text `Chunk` in the
`SumTree` underlying the `Rope`. In other words: It determines into how
many pieces a given buffer is split up.
By changing `CHUNK_BASE` we can adjust the level of granularity
withwhich we index a given piece of text. Theoretical maximum is the
length of the text, theoretical minimum is 1. Sweet spot is somewhere
inbetween, where memory use and performance of write & read access are
optimal.
We started with `16` as the `CHUNK_BASE`, but that wasn't the result of
extensive benchmarks, more the first reasonable number that came to
mind.
### What
This changes `CHUNK_BASE` from `16` to `64`. That reduces the memory
usage, trading it in for slight reduction in performance in certain
benchmarks.
### Benchmarks
I added a benchmark suite for `Rope` to determine whether we'd regress
in performance as `CHUNK_BASE` goes up. I went from `16` to `32` and
then to `64`. While `32` increased performance and reduced memory usage,
`64` had one slight drop in performance, increases in other benchmarks
and substantial memory savings.
| `CHUNK_BASE` from `16` to `32` | `CHUNK_BASE` from `16` to `64` |
|-------------------|--------------------|
|
![chunk_base_16_to_32](https://github.com/zed-industries/zed/assets/1185253/fcf1f9c6-4f43-4e44-8ef5-29c1e5d8e2b9)
|
![chunk_base_16_to_64](https://github.com/zed-industries/zed/assets/1185253/d82a0478-eeef-43d0-9240-e0aa9df8d946)
|
### Real World Results
We tested this by loading a 138 MB `*.tex` file (parsed as plain text)
into Zed and measuring in `Instruments.app` the allocation.
#### standard allocator
Before, with `CHUNK_BASE: 16`, the memory usage was ~827MB after loading
the buffer.
| `CHUNK_BASE: 16` |
|---------------------|
|
![memory_consumption_chunk_base_16_std_alloc](https://github.com/zed-industries/zed/assets/1185253/c1e04c34-7d1a-49fa-bb3c-6ad10aec6e26)
|
After, with `CHUNK_BASE: 64`, the memory usage was ~396MB after loading
the buffer.
| `CHUNK_BASE: 64` |
|---------------------|
|
![memory_consumption_chunk_base_64_std_alloc](https://github.com/zed-industries/zed/assets/1185253/c728e134-1846-467f-b20f-114a582c7b5a)
|
#### `mimalloc`
`MiMalloc` by default and that seems to be pretty aggressive when it
comes to growing memory. Whereas the std allocator would go up to
~800mb, MiMalloc would jump straight to 1024MB.
I also can't get `MiMalloc` to work properly with `Instruments.app` (it
always shows 15MB of memory usage) so I had to use these `Activity
Monitor` screenshots:
| `CHUNK_BASE: 16` |
|---------------------|
|
![memory_consumption_chunk_base_16_mimalloc](https://github.com/zed-industries/zed/assets/1185253/1e6e05e9-80c2-4ec7-9b0e-8a6fa78836eb)
|
| `CHUNK_BASE: 64` |
|---------------------|
|
![memory_consumption_chunk_base_64_mimalloc](https://github.com/zed-industries/zed/assets/1185253/8a47e982-a675-4db0-b690-d60f1ff9acc8)
|
### Release Notes
Release Notes:
- Reduced memory usage for files by up to 50%.
---------
Co-authored-by: Antonio <antonio@zed.dev>
2024-04-09 16:07:53 +00:00
|
|
|
|
|
|
|
[[bench]]
|
|
|
|
name = "rope_benchmark"
|
|
|
|
harness = false
|