New:
Unboxing Nomic Embed v1.5: Resizable Production Embeddings with Matryoshka Representation Learning

Unboxing Nomic Embed v1.5: Resizable Production Embeddings with Matryoshka Representation Learning

NEWSFebruary 14,2024

On February 1st, 2024, we released Nomic Embed - a truly open, auditable, and performant text embedding model. While the performance of an embedding model is often taken into consideration when evaluating it for production deployment, other factors including the memory, storage, and bandwidth requirements of the embeddings are also important to consider.

For example, storing the embeddings of a large dataset for a RAG app is often quite costly.

That's why we're excited to introduce Nomic Embed v1.5, which improves on Nomic Embed by giving developers the flexibility to explicitly trade off performance and embedding footprint.

Nomic Embed v1.5

We train nomic-embed-text-v1.5 with Matryoshka Representation Learning to enable variable embedding dimensions in a single model.

Nomic Embed v1.5 supports any embedding dimension between 64 and 768 as well as binary embeddings.

__wf_reserved_inherit
Training and Evaluation

We finetuned nomic-embed-text-unsupervised on our nomic-embed-text finetuning dataset. You can replicate the model and openly access the data in the nomic-ai/constrastors repository.

Nomic Embed v1.5 outperforms text-embedding-3-small at both 512 and 768 embedding dimensions.

At an embedding dimension of 512, we outperform text-embedding-ada-002 while achieving a 3x memory reduction. At a 12x memory reduction compared to nomic-embed-text-v1, our model performs similarly to all-MiniLM-L6-v2.

We found that our 768 dimensional performance is similar to nomic-embed-text-v1 while also enabling the variable degrees of freedom.

We did not evaluate text-embedding-3-small at lower dimensions because running the evaluation is prohibitively expensive and time-consuming as the model weights are not public.

You can explore the similarities and differences between different nomic-embed-text-v1.5 embedding dimensions using the custom mapping feature in the map below:

__wf_reserved_inherit
Matryoshka Representation Learning

Matryoshka Representation Learning is the technique that enables our model to have a variable embedding dimension.

__wf_reserved_inherit

Similar to the Matryoshka nesting dolls, we explicitly train our model to learn nested representations at different embedding dimensions. This allows us to truncate embeddings from the full size to reduce their memory footprint while retaining performance. For a more detailed explanation, please refer to the paper and this excellent blog post by Aniket Rege here.

Usage with Nomic Embedding API

Sign up to Nomic Atlas.

With the Nomic Embedding API, you can

curl https://api-atlas.nomic.ai/v1/embedding/text \
   -H "Authorization: Bearer $NOMIC_API_KEY" \
   -H "Content-Type: application/json" \
   -d '{ "model": "nomic-embed-text-v1.5",
         "texts": ["Nomic AI introduces Nomic Embed", "#keepAIOpen"],
         "task_type": "search_document",
         "dimensionality": 256}'

and in the official Nomic Python Client with

from nomic import embed
import numpy as np

output = embed.text(
   texts=[
       "Who is Laurens van der Maaten?",
       "What is dimensionality reduction?",
   ],
   model='nomic-embed-text-v1.5',
   task_type="search_document",
   dimensionality=256,
)print(output['usage'])

embeddings = np.array(output['embeddings'])

print(embeddings.shape)

# to get binary embeddings
embeddings = (embeddings > 0).astype(int)

Share this article:

Related Articles

Out-of-the-box workflows on the platform

From code compliance to drawing review and project research—automate critical AEC workflows using your drawings, specs, and project data.

Automated Code Compliance of Drawings

Automated Code Compliance of Drawings

Instantly check drawings against 380+ building codes and standards with Nomic Assistant. Get cited answers with specific code references—no more digging through PDFs and standards manuals.

Automated Drawing Review

Automated Drawing Review

Catch errors in construction drawings and plans before they become costly rework. Nomic's Drawing Review agent checks every page of a drawing set or plan package against your firm's QA/QC standards and delivers cited PDF markups—catching issues while engineers design and the moment construction teams ingest drawings.

Automated Submittal Review

Automated Submittal Review

Automate submittal review with AI that cross-references every package against your drawings and specs. Nomic handles the first-pass review so your team focuses on decisions, not document triage—turning review cycles that took days into hours.

Firm-Wide Detail Search

Firm-Wide Detail Search

Give designers instant access to every detail your firm has ever drawn. Powered by Nomic Parse, search across thousands of drawings to find the exact detail you need—without digging through folder after folder.

Project Research

Project Research

Instantly access all project-critical information—drawings, specs, RFIs, emails, and meeting notes—from a single search interface. Keep projects moving forward by finding answers fast, without switching between systems.

AI for BIM & IFC Models

AI for BIM & IFC Models

Solve real tasks in your IFC and BIM models — without Revit, Navisworks, or a BIM specialist. The Nomic agent reads IFC files natively, extracts the full model into queryable tables, and runs quantity takeoffs, coordination checks, data validation, and space schedules on demand.

Nomic agents work in your project delivery software and tools
SharePoint
Egnyte
Autodesk Construction Cloud
ProjectWise (Bentley)
Google Drive
Dropbox
Box
Microsoft Teams
Gmail
Outlook