sentence-transformers
$
npx mdskill add Orchestra-Research/AI-Research-SKILLs/sentence-transformersGenerates high-quality embeddings for semantic similarity and retrieval tasks
- Solves semantic search, clustering, and RAG embedding generation
- Uses PyTorch, Transformers, and 5000+ pre-trained models
- Leverages multilingual and domain-specific embedding models
- Delivers fast, local embedding generation for production use
SKILL.md
.github/skills/sentence-transformersView on GitHub ↗
---
name: sentence-transformers
description: Framework for state-of-the-art sentence, text, and image embeddings. Provides 5000+ pre-trained models for semantic similarity, clustering, and retrieval. Supports multilingual, domain-specific, and multimodal models. Use for generating embeddings for RAG, semantic search, or similarity tasks. Best for production embedding generation.
version: 1.0.0
author: Orchestra Research
license: MIT
tags: [Sentence Transformers, Embeddings, Semantic Similarity, RAG, Multilingual, Multimodal, Pre-Trained Models, Clustering, Semantic Search, Production]
dependencies: [sentence-transformers, transformers, torch]
---
# Sentence Transformers - State-of-the-Art Embeddings
Python framework for sentence and text embeddings using transformers.
## When to use Sentence Transformers
**Use when:**
- Need high-quality embeddings for RAG
- Semantic similarity and search
- Text clustering and classification
- Multilingual embeddings (100+ languages)
- Running embeddings locally (no API)
- Cost-effective alternative to OpenAI embeddings
**Metrics**:
- **15,700+ GitHub stars**
- **5000+ pre-trained models**
- **100+ languages** supported
- Based on PyTorch/Transformers
**Use alternatives instead**:
- **OpenAI Embeddings**: Need API-based, highest quality
- **Instructor**: Task-specific instructions
- **Cohere Embed**: Managed service
## Quick start
### Installation
```bash
pip install sentence-transformers
```
### Basic usage
```python
from sentence_transformers import SentenceTransformer
# Load model
model = SentenceTransformer('all-MiniLM-L6-v2')
# Generate embeddings
sentences = [
"This is an example sentence",
"Each sentence is converted to a vector"
]
embeddings = model.encode(sentences)
print(embeddings.shape) # (2, 384)
# Cosine similarity
from sentence_transformers.util import cos_sim
similarity = cos_sim(embeddings[0], embeddings[1])
print(f"Similarity: {similarity.item():.4f}")
```
## Popular models
### General purpose
```python
# Fast, good quality (384 dim)
model = SentenceTransformer('all-MiniLM-L6-v2')
# Better quality (768 dim)
model = SentenceTransformer('all-mpnet-base-v2')
# Best quality (1024 dim, slower)
model = SentenceTransformer('all-roberta-large-v1')
```
### Multilingual
```python
# 50+ languages
model = SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')
# 100+ languages
model = SentenceTransformer('paraphrase-multilingual-mpnet-base-v2')
```
### Domain-specific
```python
# Legal domain
model = SentenceTransformer('nlpaueb/legal-bert-base-uncased')
# Scientific papers
model = SentenceTransformer('allenai/specter')
# Code
model = SentenceTransformer('microsoft/codebert-base')
```
## Semantic search
```python
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('all-MiniLM-L6-v2')
# Corpus
corpus = [
"Python is a programming language",
"Machine learning uses algorithms",
"Neural networks are powerful"
]
# Encode corpus
corpus_embeddings = model.encode(corpus, convert_to_tensor=True)
# Query
query = "What is Python?"
query_embedding = model.encode(query, convert_to_tensor=True)
# Find most similar
hits = util.semantic_search(query_embedding, corpus_embeddings, top_k=3)
print(hits)
```
## Similarity computation
```python
# Cosine similarity
similarity = util.cos_sim(embedding1, embedding2)
# Dot product
similarity = util.dot_score(embedding1, embedding2)
# Pairwise cosine similarity
similarities = util.cos_sim(embeddings, embeddings)
```
## Batch encoding
```python
# Efficient batch processing
sentences = ["sentence 1", "sentence 2", ...] * 1000
embeddings = model.encode(
sentences,
batch_size=32,
show_progress_bar=True,
convert_to_tensor=False # or True for PyTorch tensors
)
```
## Fine-tuning
```python
from sentence_transformers import InputExample, losses
from torch.utils.data import DataLoader
# Training data
train_examples = [
InputExample(texts=['sentence 1', 'sentence 2'], label=0.8),
InputExample(texts=['sentence 3', 'sentence 4'], label=0.3),
]
train_dataloader = DataLoader(train_examples, batch_size=16)
# Loss function
train_loss = losses.CosineSimilarityLoss(model)
# Train
model.fit(
train_objectives=[(train_dataloader, train_loss)],
epochs=10,
warmup_steps=100
)
# Save
model.save('my-finetuned-model')
```
## LangChain integration
```python
from langchain_community.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-mpnet-base-v2"
)
# Use with vector stores
from langchain_chroma import Chroma
vectorstore = Chroma.from_documents(
documents=docs,
embedding=embeddings
)
```
## LlamaIndex integration
```python
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
embed_model = HuggingFaceEmbedding(
model_name="sentence-transformers/all-mpnet-base-v2"
)
from llama_index.core import Settings
Settings.embed_model = embed_model
# Use in index
index = VectorStoreIndex.from_documents(documents)
```
## Model selection guide
| Model | Dimensions | Speed | Quality | Use Case |
|-------|------------|-------|---------|----------|
| all-MiniLM-L6-v2 | 384 | Fast | Good | General, prototyping |
| all-mpnet-base-v2 | 768 | Medium | Better | Production RAG |
| all-roberta-large-v1 | 1024 | Slow | Best | High accuracy needed |
| paraphrase-multilingual | 768 | Medium | Good | Multilingual |
## Best practices
1. **Start with all-MiniLM-L6-v2** - Good baseline
2. **Normalize embeddings** - Better for cosine similarity
3. **Use GPU if available** - 10× faster encoding
4. **Batch encoding** - More efficient
5. **Cache embeddings** - Expensive to recompute
6. **Fine-tune for domain** - Improves quality
7. **Test different models** - Quality varies by task
8. **Monitor memory** - Large models need more RAM
## Performance
| Model | Speed (sentences/sec) | Memory | Dimension |
|-------|----------------------|---------|-----------|
| MiniLM | ~2000 | 120MB | 384 |
| MPNet | ~600 | 420MB | 768 |
| RoBERTa | ~300 | 1.3GB | 1024 |
## Resources
- **GitHub**: https://github.com/UKPLab/sentence-transformers ⭐ 15,700+
- **Models**: https://huggingface.co/sentence-transformers
- **Docs**: https://www.sbert.net
- **License**: Apache 2.0
More from Orchestra-Research/AI-Research-SKILLs
- academic-plottingGenerates publication-quality figures for ML papers from research context. Given a paper section or description, extracts system components and relationships to generate architecture diagrams via Gemini. Given experiment results or data, auto-selects chart type and generates data-driven figures via matplotlib/seaborn. Use when creating any figure for a conference paper.
- ara-compilerCompiles any research input — PDF papers, GitHub repositories, experiment logs, code directories, or raw notes — into a complete Agent-Native Research Artifact (ARA) with cognitive layer (claims, concepts, heuristics), physical layer (configs, code stubs), exploration graph, and grounded evidence. Use when ingesting a paper or codebase into a structured, machine-executable knowledge package, building an ARA from scratch, or converting research outputs into a falsifiable, agent-traversable form.
- ara-research-managerRecords research provenance as a post-task epilogue, scanning conversation history at the end of a coding or research session to extract decisions, experiments, dead ends, claims, heuristics, and pivots, and writing them into the ara/ directory with user-vs-AI provenance tags. Use as a session epilogue — never during execution — to maintain a faithful, auditable trace of how a research project actually evolved.
- ara-rigor-reviewerPerforms ARA Seal Level 2 semantic epistemic review on Agent-Native Research Artifacts, scoring six dimensions (evidence relevance, falsifiability, scope calibration, argument coherence, exploration integrity, methodological rigor) and producing a constructive, severity-ranked report with a Strong Accept-to-Reject recommendation. Use after Level 1 structural validation passes, when an ARA needs an objective epistemic critique before publication or release.
- autogpt-agentsAutonomous AI agent platform for building and deploying continuous agents. Use when creating visual workflow agents, deploying persistent autonomous agents, or building complex multi-step AI automation systems.
- autoresearchOrchestrates end-to-end autonomous AI research projects using a two-loop architecture. The inner loop runs rapid experiment iterations with clear optimization targets. The outer loop synthesizes results, identifies patterns, and steers research direction. Routes to domain-specific skills for execution, supports continuous agent operation via Claude Code /loop and OpenClaw heartbeat, and produces research presentations and papers. Use when starting a research project, running autonomous experiments, or managing a multi-hypothesis research effort.
- awq-quantizationActivation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when deploying large models (7B-70B) on limited GPU memory, when you need faster inference than GPTQ with better accuracy preservation, or for instruction-tuned and multimodal models. MLSys 2024 Best Paper Award winner.
- blip-2-vision-languageVision-language pre-training framework bridging frozen image encoders and LLMs. Use when you need image captioning, visual question answering, image-text retrieval, or multimodal chat with state-of-the-art zero-shot performance.
- brainstorming-research-ideasGuides researchers through structured ideation frameworks to discover high-impact research directions. Use when exploring new problem spaces, pivoting between projects, or seeking novel angles on existing work.
- constitutional-aiAnthropic's method for training harmless AI through self-improvement. Two-phase approach - supervised learning with self-critique/revision, then RLAIF (RL from AI Feedback). Use for safety alignment, reducing harmful outputs without human labels. Powers Claude's safety system.