News:

The Future of AI in the Built World with Nomic - AEC Tech Week NYC

Security

Pricing

News

Login

Our products

Nomic gives you the tools to transform unstructured content into governed, accurate, AI-ready intelligence. So your teams can finally put institutional knowledge to work with confidence.

Nomic Platform

A turnkey solution to operationalise AI across your firm’s knowledge.

Developer API

A customisable platform for developers to build AI over enterprise data.

NEWS

EVENT

WHITE PAPER

February 14,2024

Unboxing Nomic Embed v1.5: Resizable Production Embeddings with Matryoshka Representation Learning

DEMO REQUEST RECEIVED

Thank you for your interest!

Your message is on its way to our sales team. We’ll reach out shortly to learn more about your needs and show you how our platform can unlock new possibilities for your business.

Oops! Something went wrong while submitting the form.

Rich Text 1

On February 1st, 2024, we released Nomic Embed - a truly open, auditable, and performant text embedding model. While the performance of an embedding model is often taken into consideration when evaluating it for production deployment, other factors including the memory, storage, and bandwidth requirements of the embeddings are also important to consider.

For example, storing the embeddings of a large dataset for a RAG app is often quite costly.

That's why we're excited to introduce Nomic Embed v1.5, which improves on Nomic Embed by giving developers the flexibility to explicitly trade off performance and embedding footprint.

Nomic Embed v1.5

We train nomic-embed-text-v1.5 with Matryoshka Representation Learning to enable variable embedding dimensions in a single model.

Nomic Embed v1.5 supports any embedding dimension between 64 and 768 as well as binary embeddings.

Training and Evaluation

We finetuned nomic-embed-text-unsupervised on our nomic-embed-text finetuning dataset. You can replicate the model and openly access the data in the nomic-ai/constrastors repository.

Nomic Embed v1.5 outperforms text-embedding-3-small at both 512 and 768 embedding dimensions.

At an embedding dimension of 512, we outperform text-embedding-ada-002 while achieving a 3x memory reduction. At a 12x memory reduction compared to nomic-embed-text-v1, our model performs similarly to all-MiniLM-L6-v2.

We found that our 768 dimensional performance is similar to nomic-embed-text-v1 while also enabling the variable degrees of freedom.

We did not evaluate text-embedding-3-small at lower dimensions because running the evaluation is prohibitively expensive and time-consuming as the model weights are not public.

You can explore the similarities and differences between different nomic-embed-text-v1.5 embedding dimensions using the custom mapping feature in the map below:

Video 1

00:00

Rich Text 2

Matryoshka Representation Learning

Matryoshka Representation Learning is the technique that enables our model to have a variable embedding dimension.

Similar to the Matryoshka nesting dolls, we explicitly train our model to learn nested representations at different embedding dimensions. This allows us to truncate embeddings from the full size to reduce their memory footprint while retaining performance. For a more detailed explanation, please refer to the paper and this excellent blog post by Aniket Rege here.

‍

Rich Text 3

Usage with Nomic Embedding API

With the Nomic Embedding API, you can

curl https://api-atlas.nomic.ai/v1/embedding/text \
-H "Authorization: Bearer $NOMIC_API_KEY" \
-H "Content-Type: application/json" \
-d '{ "model": "nomic-embed-text-v1.5",
"texts": ["Nomic AI introduces Nomic Embed", "#keepAIOpen"],
"task_type": "search_document",
"dimensionality": 256}'

and in the official Nomic Python Client with

from nomic import embed
import numpy as np

output = embed.text(
texts=[
"Who is Laurens van der Maaten?",
"What is dimensionality reduction?",
],
model='nomic-embed-text-v1.5',
task_type="search_document",
dimensionality=256,
)print(output['usage'])

embeddings = np.array(output['embeddings'])

print(embeddings.shape)

# to get binary embeddings
embeddings = (embeddings > 0).astype(int)

‍

Rich Text 4

Continue Reading

NEWS

EVENT

WHITE PAPER

August 2025

NEWS

EVENT

WHITE PAPER

February 1, 2024

NEWS

EVENT

WHITE PAPER

February 14,2024

NEWS

EVENT

WHITE PAPER

June 5,2024

NEWS

EVENT

WHITE PAPER

September 24, 2024

NEWS

EVENT

WHITE PAPER

March 27, 2025

NEWS

EVENT

WHITE PAPER

April 2, 2025

NEWS

EVENT

WHITE PAPER

November 12th, 2025

NYC

NEWS

EVENT

WHITE PAPER

August 20, 2025

NEWS

EVENT

WHITE PAPER

November 3, 2025

Our products

Security

News

Security

News

Login

menu

Toolkit

Essence

Explanation

Expression

Appendix

Unboxing Nomic Embed v1.5: Resizable Production Embeddings with Matryoshka Representation Learning

Download the White Paper

Get your copy of the comprehensive AEC AI transformation guide

Thank you for your interest!

Nomic Embed v1.5

Training and Evaluation

Matryoshka Representation Learning

Usage with Nomic Embedding API

Continue Reading

How Aurecon's Team of 7,500 Employees Use Nomic to Understand and Build With Their Data in One-Tenth of the Time

Introducing Nomic Embed: A Truly Open Embedding Model

Unboxing Nomic Embed v1.5: Resizable Production Embeddings with Matryoshka Representation Learning

Nomic Embed Vision: Expanding The Nomic Latent Space

SOC 2 Type 2 & Security at Nomic

Nomic Embed Code: A State-of-the-Art Code Retriever

Nomic Embed Multimodal: Open Source Multimodal Embedding Models for Text, Images, PDFs, and Charts