mirror of
https://github.com/zed-industries/zed.git
synced 2025-01-09 02:44:49 +00:00
f128cf4a33
Closes https://linear.app/zed-industries/issue/Z-352/high-memory-usage-from-fs-scanning-if-project-contains-symlinks-that ### Background Currently, when you open a project, Zed eagerly scans the directory, building an in-memory representation of all of the files and directories within. This scanning includes all git-ignored files and follows any symlinks. When any directory changes on disk, Zed recursively rescans it in order to keep its in-memory representation up-to-date. When collaborating, all of these files are replicated to all guests. Right now, there are some performance problems associated with the maintenance of this filesystem state: * For various reasons, some projects contain symlinks that point out to large folders like `$HOME`, which itself contains many symlinks that point to the same large directory. When these projects are opened, the worktree scans endlessly, using more and more memory. * Some git-ignored directories (like `target` in a rust project) contain *many* more files than are actually tracked in the git repository. These files often change as a result of saving, (e.g. because the compiler runs). Maintaining in memory all of these paths isn't useful to the user, and causes significant CPU usage on every save. Most importantly, when collaborating sending all of these changes to guests can be slow, and can delay all other RPC messages. ### Change This PR changes the worktree's filesystem-scanning logic to be *lazy* about scanning two types of directories: * git ignored directories * "external" directories (those that are canonically located outside of the worktree root, but accessed via symlinks) The laziness works as follows. When, during a recursive scan, a directory is found that falls into one of the above 2 categories, that directory is marked as "unloaded". The directory might later be scanned, if some explicit operation is performed within it (like opening a buffer, or creating a file), if any collaborator expands that directory in their project panel, or if an LSP requests that it be watched. ### Results When collaborating on the `zed` folder: | metric | before | after | |-------|--------|------| | # `worktree_entries` in collab db initially | 154,763 | 77,679 | | # `worktree_entries` in collab db after 5 saves | 181,952 | 77,679 (nothing new to scan) | | app memory footprint (host) | 260MB | 228.5 MB | The db thing is a win, because reading and writing to the `worktree_entries` table is one of the most expensive thing that the `collab` server does. There's also generally lower background CPU usage after every save, because we don't need to recursively rescan directories inside of `target`. ### Limitations We still end up scanning some unnecessary directories (like `target/debug/build/zed-b612db829aeac16e/out`) because the LSP instructs us to watch those. ### To do: * [x] Expand parent directories of any path opened via LSP * [x] Avoid creating orphaned entries when FS events happen inside of unscanned directories * [x] Scan any newly-non-ignored directories after gitignore changes * [x] Emit correct events for newly-discovered paths when expanding dirs * [x] GC the set of expanded directory ids when dirs are removed * [x] Don't include "external" entries in file-finder * [x] Expand any directories watched by LSP * [ ] manual testing and profiling ### Release Notes: - Fixed a bug where Zed would use excessive memory when a project folder contained symlinks pointing to directories outside of the project. - Reduced Zed's memory and CPU usage when working in folders containing many git-ignored files. |
||
---|---|---|
.. | ||
src | ||
Cargo.toml |