mirror of
https://github.com/zed-industries/zed.git
synced 2025-01-24 19:10:24 +00:00
49371b44cb
This introduces semantic indexing in Zed based on chunking text from files in the developer's workspace and creating vector embeddings using an embedding model. As part of this, we've created an embeddings provider trait that allows us to work with OpenAI, a local Ollama model, or a Zed hosted embedding. The semantic index is built by breaking down text for known (programming) languages into manageable chunks that are smaller than the max token size. Each chunk is then fed to a language model to create a high dimensional vector which is then normalized to a unit vector to allow fast comparison with other vectors with a simple dot product. Alongside the vector, we store the path of the file and the range within the document where the vector was sourced from. Zed will soon grok contextual similarity across different text snippets, allowing for natural language search beyond keyword matching. This is being put together both for human-based search as well as providing results to Large Language Models to allow them to refine how they help developers. Remaining todo: * [x] Change `provider` to `model` within the zed hosted embeddings database (as its currently a combo of the provider and the model in one name) Release Notes: - N/A --------- Co-authored-by: Nathan Sobo <nathan@zed.dev> Co-authored-by: Antonio Scandurra <me@as-cii.com> Co-authored-by: Conrad Irwin <conrad@zed.dev> Co-authored-by: Marshall Bowers <elliott.codes@gmail.com> Co-authored-by: Antonio <antonio@zed.dev>
43 lines
1.7 KiB
Markdown
43 lines
1.7 KiB
Markdown
# Searching for a needle in a haystack
|
|
|
|
When you have a large amount of text, it can be useful to search for a specific word or phrase. This is often referred to as "finding a needle in a haystack." In this markdown document, we're "hiding" a key phrase for our text search to find. Can you find it?
|
|
|
|
## Instructions
|
|
|
|
1. Use the search functionality in your text editor or markdown viewer to find the hidden phrase in this document.
|
|
|
|
2. Once you've found the **phrase**, write it down and proceed to the next step.
|
|
|
|
Honestly, I just want to fill up plenty of characters so that we chunk this markdown into several chunks.
|
|
|
|
## Tips
|
|
|
|
- Relax
|
|
- Take a deep breath
|
|
- Focus on the task at hand
|
|
- Don't get distracted by other text
|
|
- Use the search functionality to your advantage
|
|
|
|
## Example code
|
|
|
|
```python
|
|
def search_for_needle(haystack, needle):
|
|
if needle in haystack:
|
|
return True
|
|
else:
|
|
return False
|
|
```
|
|
|
|
```javascript
|
|
function searchForNeedle(haystack, needle) {
|
|
return haystack.includes(needle);
|
|
}
|
|
```
|
|
|
|
## Background
|
|
|
|
When creating an index for a book or searching for a specific term in a large document, the ability to quickly find a specific word or phrase is essential. This is where search functionality comes in handy. However, one should _remember_ that the search is only as good as the index that was built. As they say, garbage in, garbage out!
|
|
|
|
## Conclusion
|
|
|
|
Searching for a needle in a haystack can be a challenging task, but with the right tools and techniques, it becomes much easier. Whether you're looking for a specific word in a document or trying to find a key piece of information in a large dataset, the ability to search efficiently is a valuable skill to have.
|