Quick Presets
Workload Preset
Custom
Support Bot
Internal Knowledge Base
Coding Assistant
Research Agent
Vector DB Vendor Preset
Custom / generic
Qdrant
Weaviate
Pinecone
Corpus Size (GB of source docs)
1 GB
10 GB
100 GB
500 GB
1 TB
5 TB
Chunk Size (tokens)
256
384
512
768
1,024
Chunk Overlap (tokens)
0
64
128
256
Bytes per Token
3
4
5
6
Embedding Model
OpenAI text-embedding-3-large · 3072d · $0.13 / 1M tok
OpenAI text-embedding-3-small · 1536d · $0.02 / 1M tok
Self-hosted E5 / BGE class · 1024d · infra-only
Self-hosted MiniLM class · 768d · infra-only
Vector Precision
FP32
FP16
INT8
Binary
Metadata per Chunk (KB)
0.25 KB
0.5 KB
1 KB
2 KB
4 KB
Index Type
HNSW · fast recall · higher RAM
IVF · lower RAM · lower recall
DiskANN-like · disk-heavy · lower RAM
Flat scan · exact · expensive at scale
Vector DB Operating Model
Managed
Self-hosted
Hybrid
Replication Factor
1×
2×
3×
Target Query Load (QPS)
1
10
50
100
500
1,000
Top-K Retrieved
5
10
20
50
Reranking
API
Self-hosted
None
Generator Model Size
3B
7B
13B
30B
70B
Generator Precision
FP16
BF16
INT8
INT4
Avg Prompt Tokens
800
1,500
3,000
6,000
Avg Output Tokens
150
300
600
1,200
Deployment Preference
Cloud
Hybrid
On-Prem
Copy Plan