Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cachalot in case of importing data #140

Open
babylonlin opened this issue Nov 26, 2019 · 4 comments
Open

Cachalot in case of importing data #140

babylonlin opened this issue Nov 26, 2019 · 4 comments

Comments

@babylonlin
Copy link

I am using django import export for importing data. I am wondering how Cachalot works in case of importing data.

Since django import export fetch object from database for each importing row for comparing differences, will Cachalot slow down the import process? Anyone got experiences with it?
If so How can I disable Cachalot in coding for importing?

@Andrew-Chen-Wang
Copy link
Collaborator

Andrew-Chen-Wang commented Feb 10, 2020

@babylonlin Hi, cachalot caches the entire table rather than caching individual objects like cache-machine or cache-ops which caches per query. In the long run, cachalot is what's best, but I don't actually understand what import-export actually does or what you're doing with it.

Are you saying your importing data one record/row at a time? If so, you want to disable cachalot for that ONE table by going to your settings.py and setting CACHALOT_ONLY_CACHABLE_TABLES

https://django-cachalot.readthedocs.io/en/latest/quickstart.html#cachalot-only-cachable-tables

@babylonlin
Copy link
Author

The process of importing data is to compare the existing data with the importing data and display user the differences and let user to confirm the import. That's the reason of the one record importing.

Thanks for pointing out the option CACHALOT_ONLY_CACHABLE_TABLES, I can change the settings before the importing process and reenable it afterwards. (I know the danger of inconsistency, but maybe it's the only way)

@Andrew-Chen-Wang
Copy link
Collaborator

Hmm, instead of doing one record at a time, may I recommend something different? I highly advise NOT to go along with your current plan of unsetting and setting since CACAHALOT_ONLY_CACHABLE_TABLES is a frozenset! As in once your application starts running, you can't reset django.conf.settingsCACAHALOT_ONLY_CACHABLE_TABLES on the fly.

My recommendation: you're going to need to use a separate table to store all your imported data. In that case, since cachalot caches the entire table, there won't be a performance degradation like single-record updating. If you use that secondary table to store the imported data, then use django pagination, you'd be able to utilize bulk_insert or bulk_update to your advantage, making this a much simpler task.

That secondary table could have attributes/columns:

  • id, BigAutoField: pk=True
  • fields... (these fields should reflect the reflected table. If you are importing data for several different tables, use PostgreSQL and their JSONField withOUT a Gin Index. Do not index the JSONField)
  • Optional: user, FK(User, on_delete=models.CASCADE)

The issue would be if you want to use a ForeignKey vs a GenericForeignKey. Ignore some hateful comments online about why it's bad to use it since, when you implement the bulk_update, it won't really matter since you aren't worried about GFK's performance degradation if you simply use django signals and their sender==from .models import Blah: (sender is a parameter).

The downfall of using a proposed method like mine is huge data transfer cost if you use a cloud provider; however, if the data is crucial for persistence and/or is huge in size, user's end won't be able to take it.

If you need me to clarify, ask away!

@Andrew-Chen-Wang
Copy link
Collaborator

In general, any INSERTs will be cached since Django essentially "hacks" into the Django ORM in order to cache all the results. It's a nice cheap move that allows for all queries to be cached. It's great :D

However, I'm closing this now since there seems to be no big issue surrounding this. Open it up and ping me if you need some more assistance. I recommend you join our Slack chat for faster comms: https://join.slack.com/t/cachalotdjango/shared_invite/enQtOTMyNzI0NTQzOTA3LWViYmYwMWY3MmU0OTZkYmNiMjBhN2NjNjc4OWVlZDNiMjMxN2Y3YzljYmNiYTY4ZTRjOGQxZDRiMTM0NWE3NGI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants