Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the frontend of the website and add new features #33

Open
Smritigit1202 opened this issue Jan 2, 2025 · 5 comments
Open

Improve the frontend of the website and add new features #33

Smritigit1202 opened this issue Jan 2, 2025 · 5 comments

Comments

@Smritigit1202
Copy link

New features to add-

  1. Home Page-
  • banners
  • categorization of events (music, performing and visual arts, coding, tech talks, etc..)
  • upcoming events
  • Online events
  • featured hubs
  1. Menu bar
  • Find events
  • search bar
  • Login / sign up
  • create events
  • All Hubs
  1. Under All hubs section (for each hub in the list) -
  • Description
  • Social media posts
  • Coordinators and their contact
  • Blogs Page
@codeblech
Copy link
Owner

  • search would be cool. it'll be interesting to see how you implement it.

  • hub descriptions would be a nice addition too.

  • coordinators and their contacts
    how would we source these? and how would it stay updated?

  • blogs page
    this was one of the initial goals to have a blog/writeup section under the PageTurnerSociety Hub's page

  • home page
    can you write more about what you've planned for the home page

@Smritigit1202
Copy link
Author

Thanks!
For the home page, what I am mainly trying to say is that,

  • we could show the upcoming events of different societies, so that viewers are updated with the latest events.
  • categorization of events (music, performing and visual arts, coding, tech talks, etc..)

@codeblech
Copy link
Owner

codeblech commented Jan 4, 2025

this is a great idea fr

  • how are we determining what posts contain upcoming events
  • how are we categorising the events

i think we can use an llm call for it. what ideas did you have

@Smritigit1202
Copy link
Author

here’s the plan to make this happen:
Scrape Instagram Content- We can use tools like Instagram Graph API or something like Selenium/Playwright to grab posts from public profiles.
Then, we’ll extract the images.
Then, OCR to Detect Text
We’ll use OCR tools like Tesseract to pull out any text from the images.
We can specifically look for any date-like patterns (e.g., “Jan 1, 2025”).
Date Detection
We’ll use an NLP model (like spaCy or Duckling) to figure out if the text is actually a date.
Then we can set up a database to store data.

Is it feasible? Yep! With OCR, NLP, and a backend, this is totally doable.

Some issues that we may face....
Instagram’s API has limits and needs authentication.
Date ambiguity (e.g., “01/04” can be different based on location).
Some non-event posts also contain dates...

There will be some challenges, but we can start with this approach and keep improving it. Even if Kwoc ends, we can keep working on it.

What do you think? Should we go for it?

@codeblech
Copy link
Owner

good research!
i've used instagram graph api and it is kinda shitty but it does work. sometimes, it is blocked by their own bot detection lol.

browser automation should be the last resort. if you do use that, do not use selenium. it is outdated. use playwright or nodriver or hrequests.

instead of ocr, using an llm would be much simpler. though, we have a lot of images to process already, but they can be processed in smaller batches to avoid rate limits. after all of the previous posts are processed, there won't be much traffic cause there aren't many events in this college anyways

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants