Qdrant DevContainer for File Embeddings
by questmapping
This project provides a development container setup for running Qdrant with file embeddings. It includes everything needed to index and search text documents using vector similarity search.
View on GitHub
Last updated: N/A
Qdrant DevContainer for File Embeddings
This project provides a development container setup for running Qdrant with file embeddings. It includes everything needed to index and search text documents using vector similarity search.
Prerequisites
- Docker Desktop must be running before starting the devcontainer
- VS Code with the Remote - Containers extension
- Internet connection (for downloading dependencies)
Getting Started
- Ensure Docker Desktop is running on your system
- Open this folder in VS Code
- Click the green "Reopen in Container" button in the bottom right corner
- Or press
F1
and type "Dev Containers: Reopen in Container"
- Or press
Project Structure
qdrant_server_devcontainer/ ├── .devcontainer/ │ ├── devcontainer.json │ └── Dockerfile ├── requirements.txt ├── ingest.py └── data/ # Place your text files here
Usage
- Place your text files in the
data/
directory - The container will automatically start Qdrant
- After the container is built You should be able to access Qdrant at
http://localhost:6333
- Run the ingestion script manually from within the container:
python ingest.py
Features
- Qdrant vector database running in the background
- Automatic file indexing using sentence-transformers
- Python environment with all necessary dependencies
- VS Code Python extension pre-installed
Technical Details
- Qdrant runs on a dynamically assigned port (check the output panel after container build)
- Uses
all-MiniLM-L6-v2
for text embeddings - Creates a collection named "local-docs" with cosine similarity
- Supports text files (.txt), markdown files (.md), and PDF files (.pdf) in the data directory
Troubleshooting
-
If the container fails to start:
- Ensure Docker Desktop is running
- Check that no other process is using the dynamically assigned port
- Verify all dependencies are properly installed
-
If files aren't being indexed:
- Check that files are in the
data/
directory - Verify file extensions are supported (currently .txt, .md, .pdf)
- Ensure files are readable by the container
- Check that files are in the
License
MIT License
TODO
- handle giant PDFs efficiently,
- extract text per page using parallel processing,
- embed and push each chunk as it’s ready,
- support GPU embedding if torch.cuda.is_available()?
- add support for epub files