Find answers from the community

Updated last year

Terraform Registry

@Marcos Duarte @WhiteFang_Jr @praveenks30#120294 @ash7 @OverclockedClock @BP

Hey, just leaving this here for some of ya'll I saw were interested in MongoDB Atlas incase you're still interested, or for anyone else that comes across this issue. From what I can tell @Logan M , you are correct that Atlas doesn't currently support the ability to create a search index via client libraries/api/mongosh. At least not for the M0 free cluster, or the other M2/M5 shared clusters (reference). Although it looks like maybe its possible for the dedicated clusters, but I haven't tried it yet.

However, I've been able to find one weird solution that seems to work. You can use the mongodbatlas Terraform provider to provision a mongodbatlas_search_index resource. The only issue with this (other than it being a pain in the ass if you aren't already using Terraform lol) is that you can't create a collection with Terraform, and although provisioning a search index without an existing collection will return a 200, it will fail to actually create the index. Meaning you would need to run some kind of external script in order to create an empty collection before provisioning the index.

Here's what I came up with --------->
M
B
L
7 comments
  1. Initialize the mongodbatlas provider
Plain Text
[main.tf or providers.tf]

terraform {
    required_providers {
        mongodbatlas = {
            source = "mongodb/mongodbatlas",
            version = "1.11.1"
        }
    }
}

provider "mongodbatlas" {
    public_key = var.atlas_public_key
    private_key = var.atlas_private_key
    region = var.atlas_region
}


  1. Do everything else you need for your configuration (ie. create project, cluster, users, whitelist, db, etc).
  1. Add a null_resource to run your script to create the cluster
Plain Text
[main.tf]

resource "null_resource" "create_collection" {
    provisioner "local-exec" {
        command = "python create_collection.py ${local.admin_user_connection_string_var}"
    }

    depends_on = [mongodbatlas_advanced_cluster.default_provisioned_cluster, mongodbatlas_database_user.admin_user]
}


There are lots of ways to do this, I opted to just use PyMongo because I'm using python to run the terraform config in the first place. But you could just as easily write a bash script that uses mongosh or some other client library to create the collection.

Plain Text
[create_collection.py]

import sys
from pymongo.mongo_client import MongoClient
import certifi

def create_collection(full_connection_string: str):
    client = MongoClient(host=full_connection_string, tlsCAFile=certifi.where())
    db = client["default_db"]
    result = db.create_collection("index_collection")
    print(f"index_collection: {result}")

if __name__ == "__main__":
    full_connection_string = sys.argv[1]
    create_collection(full_connection_string)


  1. Once you've got that set up you can just create a mongodbatlas_search_index for the fields you want:
Plain Text
[main.tf]

resource "mongodbatlas_search_index" "advanced_search_index" {
    project_id = mongodbatlas_project.atlas_project.id
    cluster_name = mongodbatlas_advanced_cluster.default_provisioned_cluster.name
    collection_name = "index_collection"
    database = "default_db"
    mappings_dynamic = true
    mappings_fields = <<-EOF
    {
        "embedding": {
            "dimensions": 1536,
            "similarity": "euclidean",
            "type": "knnVector"
        }
    }
    EOF

    name = "advanced_search_index"
    depends_on = [null_resource.create_collection]
}
TLDR - Although not ideal, it is possible to programmatically provision a MongoDB Atlas search index for both shared and dedicated clusters using Terraform. Why Terraform is the only way to accomplish this (at least for shared clusters) is beyond me. But I honestly think that MongoDB and Atlas in particular has some real potential in this space. @Logan M I would be very curious to hear your thoughts on this. Or anyone that has had experience with Altas in this context. Thanks all!
thank you for reaching out ser! it's much appreciated. I'll be perfectly honestly I'm a Python and LLM/RAG noob so this work around may be a bit too intense for me, so I'm sticking with ChromaDB for now--but please let me know if a simpler solution is in the works πŸ™
Yea great detective work @Murk418 ! But overall, I think this is likely over the heads of most devs using llamaindex πŸ˜… I don't think this process could be easily integrated into the vector db itself in llama index sadly πŸ€”
If mongodb ever makes this easier, I would be happy to use it ❀️
Yeah, I can definitely see how this would be more effort than its worth for most. Mongo's free cluster just happened to fit perfectly with the rest of my infra, so I figured I would share incase anyone else came across this limitation. Hopefully they'll start rolling out some new updates soon that could make them more of a viable option. Supposedly their upcoming MongoDB.local London conference is going to be "announcement-packed" so keeping my fingers crossed for some added support πŸ˜…
Add a reply
Sign up and join the conversation on Discord