Week 3. Jan. 24: Sampling, Bias, and Causal Inference with Deep Learning - Orienting #7

ShiyangLai · 2025-01-19T19:29:19Z

Post your questions here about: “The Datome - Finding, Wrangling and Encoding Everything as Data”, “When Big Data is Too Small - Sampling, Crowd-Sourcing and Bias” & Thinking with Deep Learning, chapters 5,7; and “Deep learning for causal inference”, Bernard Koch, Tim Sainburg, Pablo Geraldo, Jiang Song, Yizhou Sun, and Jacob G. Foster.

youjiazhou · 2025-01-21T03:24:18Z

In addition to solving modeling issues (such as capturing complex nonlinear relationships and double robustness), does deep learning have advantages over traditional econometric models in the data input process? For example, many causal mechanisms are actually highly contextual but the contexts are simplified during the modeling process. So will deep learning modeling import all potential variables into the model from the beginning and let the model select and reduce the dimension, or will it only consider the main variables like economics?

yangyuwang · 2025-01-24T02:16:26Z

For the chapter of The Datome - Finding, Wrangling and Encoding Everything as Data, it goes through various data types and talks about how to encode them into input layer units. In many of these ways, what we mainly considered is to turn them into matrix or vectors. However, in many cultural products (such as poetry, novel, artwork, music, and movie), they would have meanings on their whole or their inner relations. For example, the Impression, Sunrise by Monet could be seen through pixels, but it did have other information when seeing large areas of pure colors (in its whole or the relations between pixels). So would it be able to combine the information for the relationship into the deep learning models, such as using both pixels and pixel graph (relation between pixels) in images as input layers? Would it improve the performance of the models?

ulisolovieva · 2025-01-24T03:20:32Z

How do we balance the benefits of introducing noise / data augmentation during training without compromising data quality - “garbage in, garbage out” problem? And how can we differentiate outliers from rare events while maintaining robustness to noise without overfitting to outliers?

kiddosso · 2025-01-24T03:49:30Z

Why negative sampling is a resampling method rather than a data augmentation method? Negative sampling seems to use the existing data to create new data that are rare in the real world. These newly created data seem to augment the existing sampled data.

christy133 · 2025-01-24T04:24:12Z

Chapter 5 gives us an overview of how to represent, process, and encode various forms of data, including text, images, audio etc. While everything can be data and therefore be represented by DL, how should we interpret the results? For example, SVD allows us to select k topics and project words into a lower-dimensional semantic space. However, how do we ensure that the reduced dimensions capture "meaningful" semantic relationships? People may have different perceptions of what's meaningful or not given the same text, image, or audio recording. Would the resulting topics be more objective or "accurate" than the perspectives of a majority of people?

xiaotiantangishere · 2025-01-24T04:33:53Z

In human learning, even incorrect examples usually follow some logical pattern, making them useful for understanding decision boundaries—just like well-designed distractors in multiple-choice questions. However, in random negative sampling, if we randomly generate completely irrelevant or nonsensical negatives, does such 'wildly wrong' negative samples diminish the effectiveness of helping deep models improve learning prediction?

psymichaelzhu · 2025-01-24T04:48:30Z

When constructing a cross-modality joint model, will the integration of heterogeneous input lead to an imbalance of information between modalities (for example, the representation of visual modality being too strong and overshadowing the contribution of the text modality)? How to detect and correct this problem?

Sam-SangJoonPark · 2025-01-24T05:03:16Z

In Chapter 5, I found the concept of representing text as vectors, like Word2Vec, interesting. How are embeddings created for different languages, such as English and Korean, with distinct grammar and context? What if there are frequent ineterplay within the same corpus? Additionally, how can embeddings be effectively generated for low-resource languages? Solving this could contribute to research involving underrepresented languages - what research is currently being done in this area?

zhian21 · 2025-01-24T05:04:40Z

Chapter 5 discusses different ways to encode data, from sparse (low-level, raw data) to dense (high-level, processed representations). How do encoding choices, such as one-hot encoding, TF-IDF, and neural embeddings (e.g., Word2Vec, BERT), impact the effectiveness of deep learning models when handling high-dimensional text data? In what scenarios might sparse representations be preferable over dense representations, and vice versa?

xpan4869 · 2025-01-24T05:11:14Z

In machine learning, prediction sampling aims to balance class distributions, whereas inferential sampling minimizes sampling bias. Why do machine learning models prioritize equally learning from all categories (as in prediction sampling) rather than focusing on representative samples (as in inferential sampling)? What are the implications of this difference in the context of generalization, performance, and fairness? What are the potential trade-offs of using prediction sampling to balance classes? Could this lead to overfitting or reduced performance in real-world distributions?

yilmazcemal · 2025-01-24T05:35:01Z

For text classification tasks with custom categories, how does BERT compare to newer, larger models like closed-source options or open-source ones that require more powerful setups? Fine-tuning these large models for specific tasks often needs a lot of data and high-end resources (or a lot of API credits), which can make the process expensive and less accessible. Are there strategies to make this process more efficient, such as saving the fine-tuned knowledge while resetting the model’s memory or reducing the need for extensive computing power? Is it possible to deploy LoRA or QLoRA for this kind of tasks? How can we decide between using a smaller, more accessible model like BERT and a larger, more advanced model, considering both quality of results and cost?

baihuiw · 2025-01-24T05:35:54Z

How does the choice between sparse and dense data representations influence the design and performance of deep learning models, particularly when applied to domains like text, images, or graphs?

DotIN13 · 2025-01-24T05:44:28Z

Considering that resampling techniques like undersampling, oversampling, and data augmentation are often used with large samples, how effective are these methods when applied to address class imbalance in small to medium datasets? Additionally, can concepts like bagging and boosting be adapted for neural networks trained on multiple sub-datasets undersampled, oversampled, or augmented?

Daniela-miaut · 2025-01-24T05:51:10Z

Are there possibilities to study human imagination by leveraging the models pre-trained on images? The intuition is that though people reason in language, at least a lot of people think and imagine in pictures. Although not all people are visual thinkers, I am wondering if the transfer learning from image processing models to data of human thoughts and imaginations would be able to simulate more activities in the human mind.

chychoy · 2025-01-24T05:56:50Z

This relates to the possibility readings, but what are the trade-offs inherent in data abstraction and representation might impact ethical decision-making in domains like healthcare or criminal justice? Could prioritizing certain features over others introduce or reinforce biases? Furthermore, as data are more abstract less interpretable following feature selection and processing, how do we return to the decision making process and justify the decisions?

haewonh99 · 2025-01-24T06:01:13Z

When we are modifying our samples with over and under-sampling, are there procedures to check that we are not creating bias in the samples or damaging their randomness?

CallinDai · 2025-01-24T09:14:57Z

I’m curious about the structure of embedding spaces when using multimodal data. For instance, image embeddings often capture pixel-level spatial features, while text embeddings encode grammatical and semantic relationships. How would these embedding spaces be structured, and what methods are used to align them effectively across modalities? Additionally, it would be fascinating to explore how multimodal embedding spaces align with human mental spaces, potentially offering insights into how humans integrate information from different sensory and linguistic modalities and then form concepts/ideas

tonyl-code · 2025-01-24T09:52:05Z

I found the part about text data and Word2Vec very interesting. I was wondering if there ways to detect more complex uses of language (I'm not sure if this is the right way to phrase this) than just distributional semantics? For example, writers tend to use figurative language. In these cases, can word vectors even pick up on this? Would you need some other technique for extracting meaning? Perhaps BERT or other deep learning models can do this?

JairusJia · 2025-01-24T16:48:53Z

When modeling multimodal data, how can high-level features of different modalities be effectively integrated to avoid information loss?

tyeddie · 2025-01-24T18:00:03Z

When training deep learning models on images, is it always recommended to encode images into RGB color values or we should reduce the dimensions of the data by dropping the color dimension?

shiyunc · 2025-01-24T19:13:12Z

Chapter 5 mentioned that dense representations are higher-level abstractions, often pre-processed using algorithms (e.g., PCA, embeddings) or transfer learning. Given that abstraction is inherently lossy, how can researchers decide which features to prioritize when encoding data for deep learning tasks?

siyangwu1 · 2025-01-31T04:15:35Z

How can we effectively merge the unique features of different modalities—like text’s grammatical structure, images’ spatial and color relationships, and audio’s temporal patterns—into a common deep learning framework without losing critical modality-specific context or introducing bias? In other words, what strategies or trade-offs should researchers consider when aligning these distinct representations into a single embedding space for downstream tasks?

Week 3. Jan. 24: Sampling, Bias, and Causal Inference with Deep Learning - Orienting #7

Week 3. Jan. 24: Sampling, Bias, and Causal Inference with Deep Learning - Orienting #7

Comments

ShiyangLai commented Jan 19, 2025

youjiazhou commented Jan 21, 2025

Uh oh!

yangyuwang commented Jan 24, 2025

Uh oh!

ulisolovieva commented Jan 24, 2025

Uh oh!

kiddosso commented Jan 24, 2025

Uh oh!

christy133 commented Jan 24, 2025

Uh oh!

xiaotiantangishere commented Jan 24, 2025

Uh oh!

psymichaelzhu commented Jan 24, 2025

Uh oh!

Sam-SangJoonPark commented Jan 24, 2025

Uh oh!

zhian21 commented Jan 24, 2025

Uh oh!

xpan4869 commented Jan 24, 2025

Uh oh!

yilmazcemal commented Jan 24, 2025

Uh oh!

baihuiw commented Jan 24, 2025

Uh oh!

DotIN13 commented Jan 24, 2025

Uh oh!

Daniela-miaut commented Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chychoy commented Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

haewonh99 commented Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CallinDai commented Jan 24, 2025

Uh oh!

tonyl-code commented Jan 24, 2025

Uh oh!

JairusJia commented Jan 24, 2025

Uh oh!

tyeddie commented Jan 24, 2025

Uh oh!

shiyunc commented Jan 24, 2025

Uh oh!

siyangwu1 commented Jan 31, 2025

Uh oh!

Daniela-miaut commented Jan 24, 2025 •

edited

Loading

chychoy commented Jan 24, 2025 •

edited

Loading

haewonh99 commented Jan 24, 2025 •

edited

Loading