mixedbread ai mxbai-rerank-large-v1 logo

mixedbread ai mxbai-rerank-large-v1

Visit

Open-source high-performance reranking model supporting 90+ languages, outperforms Cohere rerank-v3 on BEIR benchmarks, with ONNX optimization.

Share:

mixedbread ai mxbai-rerank-large-v1

mixedbread ai's mxbai-rerank-large-v1 is an open-source high-performance reranking model released in December 2024. The model demonstrates excellent performance on BEIR benchmarks, even surpassing the well-known Cohere rerank-v3, while maintaining fully open-source and commercially free advantages.

Core Features

Open Source & Commercial-Friendly

  • Apache 2.0 License: Fully open-source with no commercial restrictions
  • Self-hosting: Complete control over data and deployment environment
  • No API Costs: No API fees after self-deployment
  • Community-driven: Active open-source community support

Excellent Performance

  • BEIR Average NDCG@10: 0.536 - outperforms Cohere rerank-v3
  • 90+ Language Support: Extensive multilingual coverage
  • Context Length: 8192 tokens
  • Efficient Inference: ONNX optimized version for faster inference

Technical Optimization

  • ONNX Runtime Support: Efficient cross-platform deployment
  • Quantized Versions: INT8 quantized models reduce memory footprint
  • Batch Processing: Efficient bulk request handling
  • GPU Acceleration: CUDA-accelerated inference support

Performance Benchmarks

BEIR Benchmark Results

Performance on BEIR (Benchmarking Information Retrieval):

  • Average NDCG@10: 0.536 (beats Cohere rerank-v3's 0.528)
  • NFCorpus: 0.372
  • MS MARCO: 0.395
  • TREC-COVID: 0.801
  • ArguAna: 0.618
  • SciFact: 0.742

Multilingual Performance

Strong performance on MIRACL multilingual retrieval benchmarks:

  • Supports 90+ languages including Chinese, Japanese, Korean
  • Maintains robust performance on non-English languages
  • Excellent cross-lingual retrieval capabilities

Technical Architecture

Model Design

  • Base Architecture: Cross-Encoder based on XLM-RoBERTa
  • Parameters: Large version ~560M parameters
  • Context Window: 8192 tokens
  • Training Data: Trained on large-scale multilingual datasets

Optimized Versions

mixedbread ai provides multiple optimized versions:

  • Standard PyTorch: Highest accuracy
  • ONNX: Cross-platform deployment, 30% inference speedup
  • Quantized: INT8 quantization, 50% memory reduction, 50% speedup
  • TensorRT: Ultimate performance on NVIDIA GPUs

Use Cases

Target Users

  • Cost-conscious Startups: No API costs, self-deployment
  • Privacy-focused Enterprises: Fully private deployment
  • Open-source Projects: Require open commercial licensing
  • Research Institutions: Academic research and experimentation
  • RAG Developers: Building retrieval-augmented generation systems

Typical Scenarios

  1. Private RAG Systems: Enterprise internal knowledge base retrieval
  2. Multilingual Search: Search optimization for global products
  3. Academic Literature Retrieval: Research paper and document search
  4. E-commerce Search: Product search and recommendation systems
  5. Customer Service: Knowledge retrieval for intelligent support

Deployment Options

Hugging Face Integration

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model = AutoModelForSequenceClassification.from_pretrained(
    'mixedbread-ai/mxbai-rerank-large-v1'
)
tokenizer = AutoTokenizer.from_pretrained(
    'mixedbread-ai/mxbai-rerank-large-v1'
)

# Reranking
pairs = [[query, doc] for doc in documents]
inputs = tokenizer(pairs, padding=True, truncation=True,
                   return_tensors='pt', max_length=512)
with torch.no_grad():
    scores = model(**inputs).logits.squeeze()

ONNX Deployment

import onnxruntime as ort

session = ort.InferenceSession("mxbai-rerank-large-v1.onnx")
# Inference using ONNX Runtime
scores = session.run(None, inputs)

Docker Deployment

docker pull mixedbreadai/mxbai-rerank-large-v1
docker run -p 8080:8080 mixedbreadai/mxbai-rerank-large-v1

Framework Integration

RAG Framework Integration

  • LangChain: As custom Reranker
  • LlamaIndex: NodePostprocessor integration
  • Haystack: Use via CrossEncoderRanker

Vector Database Pairing

  • Pinecone: Second-stage reranking
  • Qdrant: Hybrid search optimization
  • Milvus: Vector retrieval post-processing
  • Weaviate: Semantic search enhancement

Comparison with Competitors

vs Cohere Rerank v3

  • ✅ Open-source and free, no API costs
  • ✅ Better BEIR benchmark performance
  • ✅ Complete self-deployment control
  • ⚖️ Need to manage infrastructure yourself

vs Jina Reranker v3

  • ✅ Fully open-source, community-driven
  • ⚖️ Similar language coverage (90+ vs 100+)
  • ✅ More optimization versions (ONNX, TensorRT)
  • ⚖️ Comparable performance, each with advantages

vs Voyage Rerank 2

  • ✅ Open-source and free
  • ➖ Shorter context length (8K vs 16K)
  • ✅ No vendor lock-in
  • ➖ Need to handle availability and scalability yourself

vs BGE Reranker

  • ⚖️ Both open-source models
  • ✅ May be better on English tasks
  • ✅ Provides multiple optimization versions (ONNX, etc.)
  • ⚖️ Chinese performance may be slightly weaker than BGE

Best Practices

1. Hardware Selection

  • CPU Inference: Use ONNX quantized version, 4-core CPU sufficient
  • GPU Inference: Recommend NVIDIA T4 or higher, use TensorRT version
  • Memory Requirements: Standard version needs 4GB, quantized 2GB

2. Performance Optimization

  • Use ONNX Runtime to accelerate inference
  • Enable batch processing for multiple queries
  • Use quantized version to balance speed and accuracy
  • Use TensorRT on GPU for ultimate performance

3. Candidate Set Size

  • Recommended: 100-300 candidates
  • Maximum: 1000 candidates
  • Real-time Apps: 50-100 candidates

4. Deployment Strategy

  • Small Scale: Single GPU instance sufficient
  • Medium Scale: Load balancing + multiple inference instances
  • Large Scale: Kubernetes + auto-scaling

Cost Advantages

Self-hosting Cost Estimate

Assuming 1M reranking requests per month:

Infrastructure Costs:

  • AWS t3.large: ~$60/month (CPU version)
  • AWS g4dn.xlarge: ~$300/month (GPU version)

Compare to API Services:

  • Cohere Rerank: ~$100-500/month (depending on usage)
  • Voyage Rerank: ~$80-400/month

Savings: 50-80% cost savings at medium to large scale

Community & Support

Open Source Community

  • GitHub: Active issue and PR discussions
  • Discord: mixedbread ai official Discord channel
  • Hugging Face: Model page discussion area
  • Documentation: Detailed usage docs and examples

Model Updates

  • Regular performance improvement releases
  • Quick response to community feedback
  • Continuous benchmarking and optimization

Considerations

Suitable For

✅ Budget-limited projects ✅ Enterprises with data privacy requirements ✅ Scenarios requiring customization ✅ Teams with DevOps capabilities

May Not Be Suitable For

❌ Small teams without ops capabilities ❌ Need out-of-the-box SLA guarantees ❌ Extreme low latency requirements (<10ms) ❌ Prefer zero-maintenance scenarios

Alternatives

If mxbai-rerank-large-v1 doesn't fit, consider:

Quick Start

1. Install Dependencies

pip install transformers torch

2. Download Model

from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained(
    'mixedbread-ai/mxbai-rerank-large-v1'
)
tokenizer = AutoTokenizer.from_pretrained(
    'mixedbread-ai/mxbai-rerank-large-v1'
)

3. Rerank Documents

query = "What is machine learning?"
documents = ["doc1", "doc2", "doc3"]

pairs = [[query, doc] for doc in documents]
inputs = tokenizer(pairs, padding=True, truncation=True,
                   return_tensors='pt', max_length=512)

scores = model(**inputs).logits.squeeze().tolist()

# Sort by scores
ranked_docs = sorted(zip(documents, scores),
                     key=lambda x: x[1], reverse=True)

Summary

mixedbread ai's mxbai-rerank-large-v1 is an excellent open-source reranking model that even outperforms the commercial Cohere rerank-v3 on BEIR benchmarks. Its Apache 2.0 open-source license, 90+ language support, and rich optimization versions (ONNX, quantization, TensorRT) make it an ideal choice for budget-limited projects or those with data privacy requirements. While requiring self-managed deployment and operations, for teams with technical capabilities, it provides the best balance of performance, cost, and flexibility. Whether for startup RAG applications or enterprise private search systems, mxbai-rerank-large-v1 is a choice worth serious consideration.

Comments

No comments yet. Be the first to comment!