DT-32 Histogram of clicks and impressions #46

vanesssalai · 2024-12-24T08:53:44Z

Added query for clicks and impressions for mentors (grouped by industry). currently displayed as a table with the top and bottom 5 in the page 'Clicks & Impressions'

linear · 2024-12-24T08:53:47Z

DT-32 Histogram of clicks and impressions

wei2912 · 2024-12-25T12:11:19Z

dashboards/clicks_and_impressions.py

+# helper to extract industry
+def extract_industry(params):
+    industries = params.get("industries", [])
+    if isinstance(industries, list) and industries:
+        return industries[0]
+    return None
+
+df_processed["industries"] = df_processed["parsed_params"].apply(extract_industry)


From what I understand of this code, a click/impression on a mentor is considered to be of Industry X if the search query parameter includes Industry X. However, this isn't necessarily the case - in fact, the URL visit isn't sufficient and one needs to use the Elasticsearch data on individual mentors in order to obtain their industry (see #42).

wei2912

Grouping by industries is useful, but would require some additional code to take in Elasticsearch data and then perform a JOIN query to match the recorded document IDs with the mentor profiles.

For now, I would like to just have the following implemented:

Filter only production events

Each tracked event has an unique ID website_event_id with two fields, data_key and string_value (for now the other ..._value fields aren't used). Each pair corresponds to a single row in the table:

Filtering by env = 'production' is required to remove development events from the database.

Calculate number of clicks and impressions on a per-record basis
Histogram for these clicks/impressions (to see if there are particular mentors that receive a very high no. of clicks/impressions)

wei2912 · 2024-12-25T12:27:16Z

dashboards/clicks_and_impressions.py

+df = conn.query("select * from website_event;")
+
+df_processed = df.copy(deep=True)
+#
+df_processed["url"] = "/?" + df["url_query"].astype(str)
+df_processed["query_params"] = df_processed["url"].apply(extract_query_params)
+df_processed["parsed_params"] = df_processed["query_params"].apply(process_query_params)


Since 9ef75bd has been merged, please use DuckDB SQL queries instead to retrieve the initial dataframe. Some sample code is available at #48.

vanesssalai · 2024-12-29T14:12:48Z

As mentioned above, I updated the query to use DuckDB SQL it now will group by mentor_name as the id field will create 2 different rows for some mentors

wei2912 · 2025-01-04T13:41:51Z

Assigning to @JaCh23 for review.

JaCh23 · 2025-01-07T05:29:23Z

As mentioned above, I updated the query to use DuckDB SQL it now will group by mentor_name as the id field will create 2 different rows for some mentors

@vanesssalai Much rather we look into this deeper, we should try to work and join by ID for data integrity purposes, else we may run into issues now or later on (eg. 2 diff mentors with same name then accidentally grouped together); can revise query to use id?

Also can attach some screenshots of visual outputs in this thread too thx!

vanesssalai · 2025-01-13T12:15:57Z

Updated the sql query to group by mentor data. Here are the screenshots

JaCh23 · 2025-01-27T04:12:55Z

LGTM

JaCh23

LGTM

Query for clicks and impressions based on industry

e756c8e

wei2912 linked an issue Dec 24, 2024 that may be closed by this pull request

Histogram of clicks and impressions #18

Open

wei2912 self-requested a review December 24, 2024 09:09

wei2912 assigned wei2912 and vanesssalai and unassigned wei2912 Dec 24, 2024

wei2912 reviewed Dec 25, 2024

View reviewed changes

wei2912 requested changes Dec 25, 2024

View reviewed changes

vanessa lai added 2 commits December 29, 2024 20:28

Updated query to use DuckDB SQL

5bfb800

Changed sql query to fit the metrics checked

af98fe3

Added historgram and bar chart for impressions and clicks

57e8060

vanesssalai force-pushed the vanessa-branch branch from 4f47d81 to 57e8060 Compare January 1, 2025 08:56

Update imports, removed bare except

fae4e5e

wei2912 requested a review from JaCh23 January 4, 2025 13:40

Update SQL query to join mentor by id

8ca3a6f

JaCh23 approved these changes Jan 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DT-32 Histogram of clicks and impressions #46

DT-32 Histogram of clicks and impressions #46

vanesssalai commented Dec 24, 2024

linear bot commented Dec 24, 2024

wei2912 Dec 25, 2024

wei2912 left a comment •

edited

Loading

wei2912 Dec 25, 2024

vanesssalai commented Dec 29, 2024

wei2912 commented Jan 4, 2025

JaCh23 commented Jan 7, 2025 •

edited

Loading

vanesssalai commented Jan 13, 2025

JaCh23 commented Jan 27, 2025

JaCh23 left a comment

DT-32 Histogram of clicks and impressions #46

Are you sure you want to change the base?

DT-32 Histogram of clicks and impressions #46

Conversation

vanesssalai commented Dec 24, 2024

linear bot commented Dec 24, 2024

wei2912 Dec 25, 2024

Choose a reason for hiding this comment

wei2912 left a comment • edited Loading

Choose a reason for hiding this comment

wei2912 Dec 25, 2024

Choose a reason for hiding this comment

vanesssalai commented Dec 29, 2024

wei2912 commented Jan 4, 2025

JaCh23 commented Jan 7, 2025 • edited Loading

vanesssalai commented Jan 13, 2025

JaCh23 commented Jan 27, 2025

JaCh23 left a comment

Choose a reason for hiding this comment

wei2912 left a comment •

edited

Loading

JaCh23 commented Jan 7, 2025 •

edited

Loading