-
Notifications
You must be signed in to change notification settings - Fork 4
Week 4: Jan. 31: Text Learning, Transformers, and Interpretability - Orienting #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
According to Chapter 11, a possible use for LLM is to simulate political actors and other social situations in social science research. This will be built of "properly [conditioned] language models to simulate a particular demographical group." This is an interesting use case as in the real world, there are often restraints and constrictions either on the budgetary or ethical levels that prevent us from being able to run certain experiments. Will LLM-based experiments expand the horizon for what could (or could not be done)? How do we interpret or reproduce results that we learn from an LLM-based experiments? |
In this chapter, we discussed lots of the application of how LLMs would be utilized for social science research. I am quite curious of how LLMs would backward influence the "social science world". For example, as LLMs are increasingly used to simulate human agents and interactions, would it decrease the number of surveys and experiments? Or in other words, would it be possible to change or add something to the paradigm of social sciences research? Moreover, as the LLMs largely influenced human life, could they still be powerful to simulate human after the huge amount of use on LLMs? For an extreme instance, would it be biased by lots of texts that are generated by themselves in social medias? |
The book chapter talks about how LLMs can exhibit "uniformity biases" where they systematically underrepresent variance in responses compared to human populations. To address this, the authors suggest modeling agents as "quantum entanglements" of multiple vectors to better represent unique combinations of experiences (p41). But how could we redesign LLMs to better simulate the kind of cognitive diversity that drives innovations and even scientific breakthroughs? If unusual life histories and cross-disciplinary experiences are key to innovation, what does it mean in terms of rethinking the architecture of LLMs beyond just training on more diverse data? |
I was struck by the fact that semantic meaning can be mathematically represented as relations/proximity in vector space, rather than as a dictionary style term-definition -- and then can be reproduced according to contextual specificities through self-attention (i.e., going from a vector to a matrix). |
LLm, powered by cutting edge technology like self-attention mechanisms and Transformer architecture, deeply understand context and have established themselves as innovative tools across various research fields. Especially, with the expansion into multimodal models, they are unlocking new possibilities for exploring complex social and cultural analysis. These advancements have undeniably opened new pathways that were previously inaccessible to humans, however, caution is needed when accepting the results generated by LLMs. We must carefully consider what specific factors to be mindful of and how much we can trust these analyses. My question is, is there a clear standard or framework for accepting LLM-based findings? like we have F-1 score or accuracy, precision for ML interpreting and accepting Machine Learning analysis results. |
How do we formalize the uniformity bias in LLMs, where responses implicitly reflect a "crowd of language speakers" and tend to veer toward the centroid of the system? Does this imply that token representations in a given context converge toward a single point in the latent space, leading to low variance? If so, how does this phenomenon interact with the structure of the model's embedding space, and what role does the attention mechanism play in preserving or mitigating such convergence? |
How to build efficient LLMs on low-resource languages (such as some African languages)? What strategies (perhaps transfer learning) can be used to alleviate the problem of data scarcity? |
Large Language Models (LLMs) can simulate complex social behaviors, but evaluating their validity is challenging when real-world data is limited or nonexistent. Existing methods rely on dataset comparisons and task vector analysis, yet these approaches fail in cases of emergent phenomena, counterfactual simulations, and underrepresented populations. Standard statistical metrics also overlook nuances in social dynamics, requiring alternative measures of plausibility and coherence. Given these limitations, how can we assess the validity of LLM-generated social simulations without real-world comparative data, particularly for emergent or counterfactual behaviors? |
The article discusses the application of large language models (LLMs) in social simulation and policy analysis, but LLMs are mainly based on existing text data for reasoning, while social science research often focuses on dynamically changing social structures and individual decision-making processes. In this case, can LLMs effectively simulate social institutional changes, group behavior evolution, or the long-term impact of policy interventions? |
I am wondering what is the criteria to choose which model to use in agent-based modeling with LLM, and how are researchers expected to justify their choice of models. |
The chapter demonstrates several successful applications of LLMs in social science research, particularly in modeling aggregate behaviors such as political voting patterns and group interactions. While these applications show impressive accuracy in predicting collective outcomes, how can social scientists effectively combine LLM-based analysis with traditional research methods to develop more comprehensive understandings of social phenomena? What are the complementary strengths of each approach? |
When the new word is predicted based on prior/later words, what happens when two words have the same probability of appearance? Is there a safety net that kicks in when this happens? Or is this case not our concern due to the low probability of this happening due to the huge number of dimensions? |
How do large language models handle multilingual inputs, and what are the challenges associated with training and optimizing them for diverse linguistic structures? |
How can we begin to evaluate the social simulcra? It reminds of the life2vec paper from week 1 - perhaps we could use an actual sequence of life events for the agents and compare some outcome? |
How would you describe the family of transformers? I know more or less there is a whole family of transformers now. What are their relationships? What are the innovations between each of them? And what commenalities they share except having multiple transformers? |
LLMs exhibit uniformity bias, If LLMs implicitly reflect the "majority" of language speakers, do they inherently resist cultural and linguistic innovation? Specifically, how might it affect the long-term dynamics of linguistic change? If a society is largely having long-term interaction with this kind of model, how would it shift people's ideology? Would people be more uniformed in conceptual spaces? |
|
How does self-attention in LLM models works in “sharing information from their local context?” And How it relates to deep learning architectures that excels at working with sequential data like RNN? |
The application of LLM seems to rely heavily on researchers’ existing knowledge, context, assumptions, and training data from mainstream society/culture. So how can it be used to discover novel variables or social changes? |
LLMs are very promising for research. They can dive deep to the data we have, which may be impenetrable for a human researcher, and give insights that we can interpret like for example through topic modeling. But, how do we make sure the results we get are reliable and replicable? What kind of measures of uncertainty there is that we can use to compare "good" insights from "bad" ones? |
Powerful LLMs like chatgpt was once criticized for bad performance in simple maths (such as word-counting a paragraph, or calculating how many "r"s are there in the word "strawberry"). Is this relevant to the features of the tranformer architecture? GPT4.0 is better than GPT3.5 on doing simple math. How is that improvement realized if the problem is with the architecture? |
The article mentions that cognitive and social simulations can help explore social phenomena and produce the 'wisdom of crowds.' I was wondering whether the idea generated by LLMs primarily caters to mainstream ideas, potentially 'crowding out' alternative perspectives and reinforcing an Echo Chamber effect? |
How can social scientists effectively combine LLM-based analysis with traditional research methods to develop more comprehensive understandings of social phenomena? What are the complementary strengths of each approach? |
I am a bit confused with the different methods for chain-of-thoughts. How do the multiple CoTs evaluate the score of outputs? Is it based on user’s feedback or other metrics? Do Tree of Thoughts and Graph of Thoughts come with the sacrifice of longer computational time and resources? What is the point of aggregating chains instead of selecting the one with best score? |
Post your questions here about: “Language Learning with Large Language Models”, chapter 11 in Thinking with Deep Learning.
The text was updated successfully, but these errors were encountered: