Content Lake (Datastore)

Importing data

How to go about importing data in bulk, including file and image assets.

Media Library available

The recommended way of importing data is to use the Command Line Interface. You can run sanity datasets import --help for a quick summary of syntax and options. Your other option is to use one of our client libraries and handle it yourself.

Validation is client-side only

Avoid unexpected webhook and function invocations

Import using the CLI

The Sanity import tool operates on newline-delimited JSON (NDJSON) files. Basically, each line in a file is a valid JSON-object containing a document you want to import.

Documents should follow the structure of your data model – most importantly, the requirement of a _type attribute. The _id field is optional – but helpful – in case you want to make references or be able to re-import your data replacing data from an old import. _ids in Sanity are usually a GUID, but any string containing only letters, numbers, hyphens, and underscores are valid.

During import, all references are automatically set to weak, then flipped to strong after all documents are in place. This ensures that you can import documents that reference other documents in any order you like.

Assets (images and files) are stored using references in Sanity. To make it easy to import these and refer to them within your documents, you can use a special _sanityAsset property where you would normally put a _ref. For instance, let's say you want your document to end up like this:

{
  "_id": "movie_123",
  "_type": "movie",
  "title": "Rogue One",
  "poster": {
    "_type": "image",
    "asset": {
      "_ref": "image_234",
      "_type": "reference"
    }
  }
}

This is what your ready-to-import document should look like:

{
  "_id": "movie_123",
  "_type": "movie",
  "title": "Rogue One",
  "poster": {
    "_type": "image",
    "_sanityAsset": "image@file:///local/path/to/rogue-one-poster.jpg",
  }
}

However, ndjson uses the newline character as delimiter (NDJSON == Newline Delimited JSON), therefore your ndjson file must be structured with one document on each line, like this:

{"_id": "movie_123", "_type": "movie", "title": "Rogue One", "poster": {"_type": "image", "_sanityAsset": "image@file:///local/path/to/rogue-one-poster.jpg"}}
{"_id": "another_movie", "_type": "movie"}
{"_id": "yet_another_movie", "_type": "movie"}

Note that you need to prefix the asset URL with a type declaration – either image@ or file@.

If your asset is on the Internet use image@https://example.com/path/to/rogue-one-poster.jpg instead of image@file:///local/path/to/rogue-one-poster.jpg.

Once you have prepared your ndjson file, you can run the import using the Sanity CLI.

What should I import?

Why is _updatedAt updated after import?

Changes to the _updatedAt field

When you import documents that reference assets or other documents, Sanity initially preserves the value of the _updatedAt field of these documents.

However, references in documents are first imported as weak references, and strengthened later in the import process. To strengthen references, patch mutations are submitted for the containing documents.

These patches run in new transactions, which sets _updatedAt to the time the patch executes successfully. Documents without references will keep their original _updatedAt.

Handling existing documents

The import will fail if an incoming document already exists in the dataset. A couple of options allow you to amend this:

  • --replace Overwrite existing documents. If you specify _id in the imported data, this flag can be very useful. It will let you reimport stuff that you got wrong in an earlier pass.
  • --missing Only create documents which don't exist, leave the rest alone.

The import will also fail if an asset is unavailable. This typically happens if the file isn't at the given path on your local system or the asset URL returns 404. You can tell the import not to fail on a missing asset by passing the --allow-failing-assets option.

Protip

Import using a client library

If you prefer not to use our CLI import tool, you may of course do the import yourself with help from one of our client libraries.

There are some common pitfalls to keep in mind:

Concurrency

While you may have thousands of documents to import, you shouldn't trigger thousands of requests in parallel. This is going to exceed API rate limits and might fail. We advise you to use a queue with a reasonably low concurrency.
Use a library to keep your import below our API rate limit:

const {default: PQueue} = require('p-queue')
const queue = new PQueue({
  concurrency: 1,
  interval: 1000 / 25
})

queue.add(() => client.create(...))
queue.add(() => client.patch('id').inc('visits').commit())

API usage limits

Importing large data sets can quickly cause a lot of requests, especially if you import a single document per request. It is usually a good idea to send multiple mutations within a single transaction.

Mutation size limits

While it's a good idea to do multiple mutations per transaction, you need to make sure that the size of the request is within our limits, in terms of byte size.

Mutation visibility

A Sanity client will use the visibility mode of sync by default, which means that it will wait for the documents to be searchable before returning. This should not be necessary when importing large datasets, so we recommend you use deferred. If you have a lot of documents, it can take a little while for them to be searchable, but the import job will move along much faster.

References

If you are referring to one document from another, they either need to be imported in the right order, or the reference needs to be flagged as weak by setting the _weak property to true. After importing, you probably want to remove the weak property in order to prevent referenced documents from being deleted.

Gotcha

Assets

Since assets (e.g., files and images) in Sanity are stored using references, you'll need to upload the assets first and put the returned document ID in your reference.

With this in mind, do check out our client libraries documentation to see how to perform mutations.

Was this page helpful?