-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
245: add image comments index and image comments filter #267
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JesseLeung97, save that one comment I have in there about awaiting the Mongo aggregation, this code looks good and it's all behaving well in my tests. That said I started to dig into the docs a bit and I'm a bit concerned about adding a text search index in prod (it seems like it's really expensive and can impact write speed).
I should have done more homework before setting you off on this, but it got me curious about Atlas Search (we do use MongoDB Atlas, so that would be an option). It sounds far more robust and offers a lot more functionality than Mongo DB's $text search, but I'm also concerned and unsure of what the RAM implications are for the Lucene indexes it manages. I'm also concerned about locking us into Atlas Search as it's tied to Atlas and would prevent moving to a self-managed or local MongoDB instance down the road if that was ever desirable. It also just might be overkill for what we need.
At any rate, I think we should pause on this and meet to talk about this and some of the other tasks you're looking to take on. I'd be curious if you have an opinion on Atlas Search and whether you took a look at it while implementing the $text search. Here's one blog post that compares the difference but I'm sure there are more.
I'll reach out via email and try to get a meeting set up.
@@ -124,7 +124,7 @@ export class ImageModel { | |||
hasNext: false, | |||
} as AggregationOutput<ImageSchema>; | |||
} | |||
return await MongoPaging.aggregate(Image.collection, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JesseLeung97, can you elaborate on the warning you were seeing and where it was coming from? It looks like the MongoPaging.aggregate()
function is asynchronous, and this would make sense as it's a DB query, so I do think awaiting it is important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what my lsp is saying.
Here's the source code. It seems weird that it would be synchronous but the types.d.ts file isn't declaring it async.
// src/@types/mongo-cursor-pagination.d.ts
const defaultExport: {
mongoosePlugin: any;
aggregate: <T extends Model>(
model: T['schema'],
input: AggregationInput,
) => AggregationOutput<T>;
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah I think that defaultExport.aggregate
return type needs to be wrapped in a Promise
type. mongo-cursor-pagination doesn't include Typescript types so we wrote these ones ourselves and this must have been an oversight on our part. I think the following should resolve that warning:
const defaultExport: {
mongoosePlugin: any;
aggregate: <T extends Model>(
model: T['schema'],
input: AggregationInput,
) => Promise<AggregationOutput<T>>;
};
Perhaps not surprisingly, MongoDB pushes for using Atlas Search over $text and $regex in most cases unless you're self-hosting your DB. Their decision framework is helpful, although probably a bit biased. Having read that I'm now leaning towards Atlas Search slightly because it does more performant and scalable:
and...
The main downsides are (a) getting ourselves locked into Atlas and (b) not being able to create Atlas Search Indexes from code (e.g. w/ Mongoose) or infra as code. I'm not totally positive on this last point but I think you have to create indexes via the MongoDB Atlas user interface. That said, our MongoDB Atlas setup is already outside of our Serverless/Cloudformation config files so anyone trying to spin up Animl on their own would have to manually muck around with the AWS console and MongoDB Atlas app anyhow. There also seem to be some indexing pitfalls with Atlas Search we should be aware of and try to avoid. Setting alerts would be a good idea too. |
…45-comment_filter
I think I ended up reading a bunch of the same resources as you listed (docs, decision tree, some stack overflow threads). The text index stores an entry per word in the comment in RAM which may get expensive depending on the volume of comments you expect to see in the database. Looking at the metrics for the past two months, it seems like on average, the instances are using 25% - 40% of their configured 2gb of memory. I think there's space now for an index but I am also concerned because it seems like it will grow endlessly as more comments are added. The deciding factor is whether this growth would force us to scale the instances in 2 months or 2 years. Like you mentioned, Atlas indexes to my understanding are far more feature-rich and scalable because they're hosted separately to the DB instances (on lucene instances?). Some things that would be nice to implement if we go with an atlas index are fuzzy finding and auto completion for searches. It's probably overkill to be honest, but because we already pay for Atlas, I think that's less of a factor in the decision. I took a quick look around and I found some IAC examples of creating Atlas indexes using CDK or Terraform Overall, I am also leaning towards an Atlas index. For other teams who are considering using Animl but don't want to use Atlas, we could provide a small doc about how to update it to use the standard text index instead. |
@JesseLeung97 thanks for doing some reading on Atlas vs text search and weighing in; I think let's move forward with the Atlas indexes and see how it goes. Good find with the IAC templates. We want to move away from Serverless to something else in the near future so this is great to flag. For now though, I like the idea of just using the Atlas UI with a JSON config for the index and committing it somewhere. In general I need to write up some documentation for standing up new Animl instances so I can better track the pieces that live outside of Serverless and steps to set them up. |
Context
Implements tnc-ca-geo/animl-frontend#245
Add comments filter so users can filter images by the content of comments on the image. Implemented using the
$search
$text
operator in MongoDB. Note: the$search
filter needs to be part of the first$match
of the pipeline.https://www.mongodb.com/docs/manual/reference/operator/query/text/#mongodb-query-op.-text
Fixes
Noticed the LSP was showing a warning that
await
had no effect onMongoPaging.aggregate
Related PRs
animl-frontend