Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "Advanced Python DS Ecosystem" course materials #3

Draft
wants to merge 47 commits into
base: main
Choose a base branch
from

Conversation

ccauet
Copy link
Member

@ccauet ccauet commented Oct 9, 2023

No description provided.

@ccauet ccauet self-assigned this Oct 9, 2023
@ccauet
Copy link
Member Author

ccauet commented Oct 30, 2023

@clstaudt we created some new content for a customer covering topics from oop, poetry, databases, polars, and dashboards.

Would you be interested to have a look over the material and give some feedback before we merge?

@clstaudt
Copy link
Collaborator

@ccauet Certainly. There might even be some thematic overlap with new material I am building.

@clstaudt clstaudt marked this pull request as ready for review October 30, 2023 18:58
@clstaudt
Copy link
Collaborator

clstaudt commented Oct 30, 2023

Polars material

Not quite in the familiar form of notebooks from Data Science Learning Paths yet:

  • This would include a notebook title followed by a bit of introductory text (e.g. when and why should I use polars instead of pandas?)
  • Also probably more instruction / explanation for the code blocks (e.g. splitting up bigger code blocks and explaining step by step.)

Technical:

  • format all code with black
  • stick to the notebook file name scheme
  • write temporary data not to the notebook folder but a separate, gitignored data folder
  • index notebook ape-advanced-python-ds-ecosystem-2day.ipynb is duplicated

Nice to have: Comparison of PySpark and Polars API - since they look very similar.

@clstaudt
Copy link
Collaborator

clstaudt commented Oct 30, 2023

Object Oriented Programming material

allows us to organize code around real-world entities.

Really? A bit vague and misleading. Start with the idea of grouping data and logic together.

Python uses access modifiers to define the visibility of attributes and methods, helping to encapsulate data and ensure that unwanted changes cannot be made from outside the class.

C++ and Java have access modifiers that enforce visibility rules. The conventions explained here are usually not called like that. Suggestion:

In Python, naming conventions are used to indicate the intended visibility and accessibility of attributes and methods, rather than strict access modifiers.

Also, this is not strictly correct:

Can only be accessed within the defining class, denoted by a prefix of double underscore

Instead of:

Polymorphism is the ability of interacting with different objects, from different classes, through a common interface (methods).

... consider:

Polymorphism is the ability of objects from different classes to be treated as instances of the same class through a common interface (methods).

  • Mention the term "duck typing" here.
  • Abstract base classes: The main use case in my view goes unmentioned, namely ensuring that any subclass of the abstract class implements certain methods.

Nice to have: A more elaborate example where an OOP design really makes code elegant and easy to manage. For example the state machine design pattern -> check out https://github.com/clstaudt/cpp-patterns/blob/main/State/music.py

Nice to have: A practical example for how OOP is used in a data science library. For example, scikit-learn Estimators and Transformers. Exercise: Build your own Estimator...

@clstaudt
Copy link
Collaborator

1. Development of Python Packages with Poetry

material missing or not linked in the TOC?

@clstaudt
Copy link
Collaborator

clstaudt commented Oct 30, 2023

Working with Databases

ORM: SQLAlchemy

  • missing a notebook title and introductory text (e.g. What is an ORM?...)

- "pyramid scheme" but "database schema"

  • nice to have: an Entity Relationship Diagram for the database schema (one should probably start a DB design with sketching one)
  • query output: are the log outputs meant to be displayed?

NoSQL databases with PyMongo

  • start with notebook title and introduction
  • What does NoSQL mean and why do I need that?

Pandas + SQL(Alchemy)

This explains pandas + SQL. If we are already using SQLAlchemy to interact with the DB, should I write raw SQL queries to read data into pandas or rather something like this?

# Using the session in a with statement
with Session() as session:
    # Inserting data
    sample_users = [User(name="Alice", age=30), User(name="Bob", age=25), User(name="Charlie", age=35)]
    session.add_all(sample_users)
    session.commit()

    # Querying data
    users_query = session.query(User).all()

# Convert the query result to a pandas DataFrame
df = pd.DataFrame([(user.id, user.name, user.age) for user in users_query], 
                  columns=["ID", "Name", "Age"])

@clstaudt
Copy link
Collaborator

clstaudt commented Oct 30, 2023

streamlit

Would be great to have a streamlit example here, but this particular demo may be too German for this repo...

Nice to have: Demo that shows off a lot of the interactive stuff you can do with streamlit.

@clstaudt clstaudt added the enhancement New feature or request label Jan 31, 2024
@clstaudt clstaudt marked this pull request as draft August 30, 2024 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants