Skip to content

feat: Add proxy for meilisearch host when ran out of the cluster. #156

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 12, 2025

Conversation

angonz
Copy link
Contributor

@angonz angonz commented May 2, 2025

Hi all,
I would like to propose this addition.

Multi-site K8s deployments might benefit from having one single Meilisearch server, installed out of the K8s cluster (which I would recommend).
However, the ingress will direct the MAILISEARCH_HOST to Caddy.

In this case a proxy for each site to the server might be useful.

@openedx-webhooks
Copy link

Thanks for the pull request, @angonz!

This repository is currently maintained by @openedx/openedx-k8s-harmony-maintainers.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.
🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads
🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.


Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label May 2, 2025
@github-project-automation github-project-automation bot moved this to Needs Triage in Contributions May 2, 2025
@angonz angonz changed the title feat: Add proxy for meilisearch host when run out of the cluster. feat: Add proxy for meilisearch host when ran out of the cluster. May 2, 2025
@mphilbrick211 mphilbrick211 moved this from Needs Triage to Ready for Review in Contributions May 6, 2025
@mphilbrick211 mphilbrick211 requested a review from a team May 6, 2025 20:43
@bradenmacdonald
Copy link
Contributor

What's the advantage of using a proxy server over just setting the MEILISEARCH_PUBLIC_URL directly to the URL of the search server? That would provide faster search results for end users by eliminating the proxy in the middle.

Also, since Meilisearch is so light on resources, we've been recommending one Meilisearch instance per Open edX instance. I'm curious about your reasons for recommending one large "shared" Meilisearch instance? I know both options are supported, just looking for reasons one way or the other.

@angonz
Copy link
Contributor Author

angonz commented May 9, 2025

Sure! We have a couple of design principles:

  • Avoid using K8s for anything with persistent volumes, specially databases
  • Put databases in a private subnet

We have ES as a service from AWS, but they don't offer Meilisearch, so we installed a stand-alone instance. As it is not a critical data store (indexes can be recreated if lost), we don't care about HA.
So what we have is not a "large" shared instances, but a "small" shared instance :-) more than enough for many LMSs.
As the server ended up in a private subnet, the public URL is not reachable from outside. Usually it wouldn't be needed, but I saw that the MFEs access Meilisearch directly and not through the backend. I don't know why it is this way, because ES didn't need to be accessed from outside. But anyway. That's why we needed a proxy in the LMS to access Meilisearch.

@bradenmacdonald
Copy link
Contributor

bradenmacdonald commented May 9, 2025

As the server ended up in a private subnet, the public URL is not reachable from outside. Usually it wouldn't be needed, but I saw that the MFEs access Meilisearch directly and not through the backend. I don't know why it is this way, because ES didn't need to be accessed from outside.

More modern search engines like Meilisearch and TypeSense are a newer generation from the Elasticsearch generation, and the modern ones are highly optimized for speed and "search as you type" results. One of the ways they achieve that is by having the user's browser retrieve results directly from the search engine as they type each character of the search query, rather than routing the search request through a proxy.

With Elasticsearch we didn't have much choice, and it was necessary to proxy the user's search requests through the LMS server in order to enforce permissions (don't allow users to search courses that they don't have access to). This could actually place a significant load on the edxapp/LMS servers, tying up web app processes while they simply wait for Elasticsearch to return results and then passing those results back to the user.

With Meilisearch, we've been able to make the search engine itself able to enforce permissions, so that each user can connect directly to the search index and still only query course content that they have access to. (At least for the CMS; we have yet to develop a more sophisticated LMS search.) This means that we can follow the best practice of making the Meilisearch server public and avoid using the LMS/CMS as a proxy.

You do still need some kind of load balancer or proxy to handle HTTPS because Meilisearch doesn't provide HTTPS, only HTTP. But it can be a very simple proxy like Caddy that just handles HTTPS and it doesn't need to do any other request filtering.

That's why we needed a proxy in the LMS to access Meilisearch.

I would strongly recommend using a simple proxy like AWS ELB, Caddy, or nginx (see Meilisearch docs) rather than using the LMS as a proxy, because there is no need to tie up LMS resources / worker processes for simple proxying. I guess that's what this PR is doing though right?

@angonz
Copy link
Contributor Author

angonz commented May 9, 2025

Thanks for the explanation!! Looks like it was a good decision to move to Meilisearch.

there is no need to tie up LMS resources / worker processes for simple proxying

I see your point, but if you run Meilisearch in a pod you will also use the ingress and Caddy to proxy. It's just the same, but with external Meilisearch server.
Indeed Harmony will proxy MEILISEARCH_HOST to Caddy, even if RUN_MEILISEARCH is false, which will not work as there is no else in the Caddyfile condition.

Anyway, I can do this in a separate plugin.

@bradenmacdonald
Copy link
Contributor

bradenmacdonald commented May 9, 2025

Yes, that's reasonable. What I was advising against is using the edx-platform/edxapp Django app as a proxy like this. It's totally fine to use Caddy as a proxy.

And I'm actually fine with merging this PR as is. Could you maybe just add some more comments explaining how this setup differs from the default? And confirm you've tested this? I also wonder if we need to consider the case of people who use Meilisearch Cloud because they'll have RUN_MEILISEARCH False but they also don't want/need a proxy.

@angonz
Copy link
Contributor Author

angonz commented May 12, 2025

Sure.
By default, MEILISEARCH_HOST is set as "meilisearch.{{ LMS_HOST }}" by Tutor. When Harmony is enabled, it will create a proxy to Caddy as with any other subdomains. But if RUN_MEILISEARCH is false, there will be nothing in the Caddyfile to handle these requests.
This change creates a block in the Caddyfile that sends theses requests to the configurable MEILISEARCH_URL. The benefit is that you can keep your Meilisearch server in a private subnetwork, only accessible by Caddy via the internal URL.
I have tested this setup successfully in production and works very well.
Users of Meilisearch Cloud or any Meilisearch server with a public URL accessible from Internet can set MEILISEARCH_PUBLIC_URL with the server address. If this URL is not a subdomain of the LMS_HOST, the traffic will go directly from the frontend, without reaching neither the nginx-ingress nor the Caddy pod.

To summarize:

  • If your run Meilisearch in the K8s cluster, set RUN_MEILISEARCH=true.
  • If you run Meilisearch in a public server, including Meilisearch Cloud, set RUN_MEILISEARCH=false and MEILISEARCH_PUBLIC_URL with the public address (e.g. https://ms-yourinstanceid.meilisearch.io). The server must support HTTPS with its own certificates.
  • If you run Meilisearch in a private server, set RUN_MEILISEARCH=false and MEILISEARCH_URL with the internal name or IP address, including port number (e.g., http://mailisearch.internal:7700). The server doesn't need SSL certificates, they will be managed by cert-manager. Note: leave MEILISEARCH_PUBLIC_URL unset, which will be https://meilisearch.{{LMS_HOST}} be default.

@bradenmacdonald
Copy link
Contributor

That's perfect - thanks! But can you move that info into the Configuration Reference section of the README as part of this PR? I don't think too many people will see it if it's just a comment on this PR. Then I'll merge it.

@angonz angonz force-pushed the angonz/add-meilisearch-proxy branch from b1092b0 to 2e4ad9a Compare May 12, 2025 17:30
Copy link
Contributor

@bradenmacdonald bradenmacdonald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much, especially for the nice documentation write-up!

@bradenmacdonald bradenmacdonald merged commit bfdcca6 into openedx:main May 12, 2025
3 checks passed
@github-project-automation github-project-automation bot moved this from Ready for Review to Done in Contributions May 12, 2025
@angonz angonz deleted the angonz/add-meilisearch-proxy branch May 14, 2025 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
open-source-contribution PR author is not from Axim or 2U
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants