Week 5. Feb. 7: Transformers for Multi-Agent Simulation - Possibilities #12

avioberoi · 2025-02-05T20:57:26Z

Pose a question about one of the following articles:

“In Silico Sociology: Forecasting COVID-19 Polarization with Large Language Models” by Austin C. Kozlowski, Hyunku Kwon, and James A. Evans. This paper demonstrates how LLMs can serve as a tool for sociological inquiry by enabling accurate simulation of respondents from specific social and cultural contexts. Applying LLMs in this capacity, we reconstruct the public opinion landscape of 2019 to examine the extent to which the future polarization over COVID-19 was prefigured in existing political discourse. Using an LLM trained on texts published through 2019, we simulate the responses of American liberals and conservatives to a battery of pandemic-related questions.

“Generative agents: Interactive simulacra of human behavior.” Park, Joon Sung, Joseph O'Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. UIST. This paper introduces “generative agents,” advanced computational software agents capable of simulating realistic human behaviors such as daily routines, artistic creation, and social interactions, leveraging large language models for enhanced believability. These agents, which can remember, plan, and reflect using natural language, are showcased in an interactive environment inspired by “The Sims,” demonstrating their ability to autonomously perform complex social behaviors, like organizing and attending a Valentine's Day party, thereby offering new possibilities for interactive applications and realistic behavior simulation.

“Out of one, many: Using language models to simulate human samples.” Argyle, Lisa P., Ethan C. Busby, Nancy Fulda, Joshua R. Gubler, Christopher Rytting, and David Wingate. 2023. Political Analysis. This paper explores the potential of language models to serve as accurate proxies for diverse human subpopulations in social science research. By creating "silicon samples" conditioned on sociodemographic backstories, they demonstrate GPT-3's remarkable "algorithmic fidelity", showing its ability to emulate complex human response patterns across various groups, thereby offering a novel and valuable tool for gaining deeper insights into human attitudes and societal dynamics.

“Simulating social media using large language models to evaluate alternative news feed algorithms.” Törnberg, Petter, Diliara Valeeva, Justus Uitermark, and Christopher Bail. 2023. arXiv. This paper investigates the use of Large Language Models (LLMs) and Agent-Based Modeling as tools to simulate social media environments, aiming to understand how different news feed algorithms influence the quality of online conversations.

“Jury learning: Integrating dissenting voices into machine learning models” Gordon, Mitchell L., Michelle S. Lam, Joon Sung Park, Kayur Patel, Jeff Hancock, Tatsunori Hashimoto, and Michael S. Bernstein. 2022. CHI. This paper introduces “jury learning,” a novel machine learning approach that addresses the challenge of reconciling diverse societal perspectives in tasks such as toxicity detection. Unlike traditional supervised ML that relies on majority voting, jury learning incorporates the metaphor of a jury to define which groups and in what proportions influence the classifier’s predictions.

“The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery” by C. Lu, C. Lu, R. Tjarko Lange, J. Foerster, J. Clune, and D. Ha. 2024. This paper explores how to generate a scientific environment by role-playing all of the institutional perspectives associated with collective scientific advance.

youjiazhou · 2025-02-06T21:58:09Z

For the simulating social media paper: LLM helps simulate how social interaction unfolds. But when it comes to social media data, there's a problem of selection bias and visibility. For example, people who hold certain party opinions/personas are more willing to speak on social media, and the number of times people speak on social media is not normally distributed. The simulated sample only has 500 agents, so I am curious how to reduce this bias.

Sam-SangJoonPark · 2025-02-06T21:59:31Z

“Jury learning: Integrating dissenting voices into machine learning models”

This paper introduces jury learning, a machine learning approach designed to address disagreements in ground truth labels by integrating diverse perspectives explicitly through a jury-based system. It poses question whose voices–whose labels–does a machine learning algorithm learn to imitate? Jury learning allows practitioners to define and model specific juror compositions, ensuring that underrepresented groups are accounted for in predictions, unlike traditional supervised learning which relies on majority voting.

I wonder, what types of machine learning tasks would benefit most from jury learning, and in which tasks might it have limited effectiveness?

xpan4869 · 2025-02-06T23:39:59Z

“Generative agents: Interactive simulacra of human behavior.”

This article demonstrated a framework for creating computational agents that simulate believable human behavior. These agents remember past experiences, reflect on them, and use that information to plan and adapt their actions in a simulated environment. Through experiments, the paper shows that these agents produce more realistic behaviors than baseline models.

The study highlights key innovations in memory retrieval, reflection, and autonomous interaction, enabling AI-driven characters to behave more naturally in virtual worlds. I am curious that how do these agents resolve conflicting memories or information? And what are the ethical implications of making them increasingly lifelike?

kiddosso · 2025-02-07T03:14:36Z

For the social media simulation paper, I'm curious about to what extent the effect of the bridging algorithm can be replicated in the real world. The study use simulation, and therefore by nature it excludes some sensory elements in the interactions between the agents, which might be very important in everyday human interactions (if you think in term of symbolic interactionism). How can future studies address this issue? Is multimodal model help to address it?

lucydasilva · 2025-02-07T03:26:29Z

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
I understand why something like the "AI scientist" would be alluring to poets and scientists alike -- the former might be fascinated with knowledge's production of itself and the latter might be interested in it assistance in generating novelties and innovations in scientific research. But I really worry about the criteria by which these "scientists" are judged -- specifically "interestingness." This is a subjective category at worst and a relative one at best, and what is concerning is the ability of an (elite) few to determine what is interesting or not. Furthermore, "interestingness," stemming from inter (between) + esse (being) fundamentally means to bring things together in a way that neither historically nor ordinarily has been done. It should be something of an ends in itself; in an AI-integrated world of scientific research or inquiry, "interestingness" could be a primary reason for why a human should pursue research/investigation -- an "AI Scientist" could be a useful assistant for doing what is uninteresting. To go back to the theme of the week -- the substitutability/simulate-ability of human existence for scientific research -- what could be substitutable in a human-machine interface is the dull, rote work, leaving room for uninterrupted pursuit of what is interesting. However, if generating interesting ideas is delegated to something like the AI scientist, this leaves little room for the human pursuit of curiosity.

tyeddie · 2025-02-07T03:48:22Z

While the paper doesn’t attempt to establish causality in any sense as the paper acknowledges the limited empirical results on the causal effect of social media algorithm on polarization, how would agent-based modelling, especially its ability to create counterfactual with the aid of LLMs, make the causal effect identifiable?

yangyuwang · 2025-02-07T03:57:07Z

For the paper "Out of one, many: Using language models to simulate human samples," the authors suggested a way of using GPT-3 to generate samples for study. The high accuracy of GPT-3 samples to real human responses make me think of the validity of silicon samples. GPT-3 was originally trained on historical texts, so it sounds like tautology to use it to simulate the voting behavior (which might be already in its database). However, whether we can use concurrent events text to make it simulate future voting behavior would be questionable. (Furthermore, if it is able to predict, whether the prediction would influence the real voting behavior should be considered more.)

zhian21 · 2025-02-07T04:19:44Z

Park et al. (2023) introduce generative agents, AI-driven software entities designed to simulate human-like behavior through memory, reflection, and planning, enabling them to autonomously interact, form relationships, and coordinate activities in virtual environments. Their architecture, tested in the Smallville simulation, demonstrates emergent social behaviors such as information diffusion and collaboration, while ethical concerns highlight risks like parasocial relationships and potential biases in agent interactions. Given that generative agents can autonomously exhibit complex social behaviors, how can their architecture be refined to ensure alignment with ethical guidelines and societal values, preventing unintended consequences such as misinformation spread or bias reinforcement?

psymichaelzhu · 2025-02-07T04:47:08Z

(Park et al., 2023)
The current agents have simulated many aspects of human cognition, such as memory and planning. Can we equip these agents with more realistic emotional experiences? For example, can they simulate processes such as anxiety, stress, or a decrease in motivation?

JairusJia · 2025-02-07T05:47:36Z

“In Silico Sociology: Forecasting COVID-19 Polarization with Large Language Models”
When using large language models (LLMs) to predict social opinion polarization, how does this method distinguish between true social cognitive patterns and biases in the model training data? If the training data of LLMs itself contains bias or a certain narrative framework, will this affect its ability to predict social trends?

haewonh99 · 2025-02-07T06:00:31Z

“Simulating social media using large language models to evaluate alternative news feed algorithms.” I was thinking about how I should interpret the result and was wondering whether the trait of the content itself, rather than its introduction to the algorithm, could have affected the toxicity of online communication. The posts that are liked or shared by both parties may contradict one that arouses conflict, as it means that the issue was significant to either party. However, if a post is shared by only one side of people, it may be a 'milder' news that has less of a controversy or is not 'hot'. If so, wouldn't this kind of milder news that only triggers the interest of a smaller population naturally have less to be mad or fight about?

Daniela-miaut · 2025-02-07T06:01:20Z

"Jury Learning: Integrating Dissenting Voices into Machine Learning Models"
The “ground truth” in the simulated interaction seems not testable if not including real human interactions. Are there ways to examine the validity of the simulation results (and the knowledge we acquire from them)?

christy133 · 2025-02-07T06:02:03Z

Argyle et al., 2023
If biases in LLMs are not singular but structured, how do we differentiate between legitimate demographic reflection (e.g., real differences in political attitudes across age/racial groups) and bias amplification (e.g., exaggerating/ignoring stereotypes due to ill representation in training data)? In other words, how do we ensure that an LLM’s fidelity to a group’s perspective does not inadvertently magnify socially constructed distortions? Could certain underrepresented perspectives be erased or neglected, even if the model achieves algorithmic fidelity for larger groups?

ulisolovieva · 2025-02-07T15:28:18Z

Borah, A., & Mihalcea, R. (2024). Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions

The paper explores how LLMs may perpetuate implicit gender biases when engaging in multi-agent interactions. Through the task assignment framework, LLMs simulate social decision-making (i.e., assign tasks to 4 individuals: 2F/2M characters, 2 stereotypically male tasks and 2 stereotypically female tasks in different domains) and show that biases increase after interactions (i.e., agents more likely to assign tasks by gender stereotypicality). Authors propose two bias mitigation strategies: supervised fine-tuning (FT) and self-reflection (SR) with in-context examples. Results show that combining FT and SR reduces bias most effectively, but larger models may overcorrect and introduce new biases.

This approach can be used to model the emergence and persistence of any social biases in group interactions and simulate decision-making processes in consequential contexts. Researchers could ‘pilot’ interventions in these controlled environments.

The study shows that biases increase after multi-agent interactions, but is it really due to conformity effects like groupthink & stereotype threat in humans? Given that LLMs have alien intelligence - why would they fall into that trap? Or could it be just an artifact of aligning responses to dominant patterns in the training data?

ana-yurt · 2025-02-07T19:39:39Z

For “In Silico Sociology,” we rely on the release of LLM model as a natural limit of its knowledge. I wonder if we can artificially "delete" knowledge within it to simulate temporally segmented / historical agents.

xiaotiantangishere · 2025-02-08T03:08:48Z

Despite Park et al. (2023) addressing the ethical and societal implications of generative agents, I remain deeply concerned: At what point should AI agents be considered a form of 'life'? If AI agents exhibit long-term memory, self-reflection, and social interaction, how should we determine their legal and moral status?

chychoy · 2025-02-13T18:42:30Z

In the paper Jury Learning: Integrating Dissenting Voices in Machine Learning, the paper suggests that they utilize a jury-like system to simulate the decision making process to allow for more dissent and diverse systems of classification. However, I am interested in seeing how do we minimize bias in the original annotators that the jurors are based on, and how do we reduce biases within the jury selection process (considering how complicated jury selection could be in the real world).

CongZhengZheng · 2025-02-14T03:33:39Z

Based on the "Machine Theory of Mind" paper, How does the Machine Theory of Mind (ToMnet) adapt to rapidly changing behaviors in agents, especially in dynamic environments where agent strategies evolve? Can ToMnet predict such evolutionary changes, or does it require retraining to understand new behavioral patterns? In scenarios where ToMnet encounters completely novel agent behaviors not present in the training data, what strategies could be employed to enhance its predictive accuracy without extensive retraining?

DotIN13 · 2025-02-14T18:41:32Z

How can we account for unexpected exogenous shocks—such as pandemics, wars, or economic crises—that reshape political discourse in ways that past data alone cannot predict? Would a hybrid approach, integrating LLM-based reconstructions with dynamic real-time feedback mechanisms, improve predictive validity in forecasting polarization or even conflicts?

baihuiw · 2025-03-08T20:31:23Z

In Silico Sociology: Forecasting COVID-19 Polarization with Large Language Models, the study finds that LLMs trained on pre-2019 text were able to predict the politicization of COVID-19 responses with significant accuracy, suggesting that existing ideological structures prefigured public attitudes. What are the potential implications of this finding for the study of political polarization and the role of elite framing? Could this method be applied to other emerging political issues to forecast future polarization?

shiyunc · 2025-03-09T05:51:37Z

In “Simulating social media using large language models to evaluate alternative news feed algorithms”, the authors tested how different news feed algorithms shape the quality of online conversations with a combination of LLM and ABM. They found that exposure to opposing political views through a "bridging algorithm" helps to reduce toxicity. This supports some previous findings that homophily drives online discussion toxicity. However, recent research has found that acrophily (preference for extreme contents) drives toxicity too. Can we further divide the ingroup views and opposing views to two levels (extreme vs. non-extreme) and test their effects under the current ABM framework? What are the possible techinical barriers?

CallinDai · 2025-03-14T22:22:16Z

[“Jury learning: Integrating dissenting voices into machine learning models”]

Jury Learning highlights how AI models often reinforce dominant group perspectives by relying on majority-vote labels, which can obscure the voices of marginalized communities. This raises the question: How can Jury Learning be leveraged to make AI decision-making more equitable? Particularly in ensuring that marginalized communities’ voices are not overruled by dominant group perspectives? Specifically, could counterfactual jury analysis help quantify whose opinions are systematically excluded in AI moderation systems and provide a mechanism to amplify underrepresented voices in content moderation and policy decisions?

CongZhengZheng · 2025-03-18T14:16:20Z

The article The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery presents a groundbreaking framework for fully automated scientific research using large language models (LLMs). The AI Scientist, as described, is capable of generating novel research ideas, designing and executing experiments, writing scientific papers, and even conducting peer reviews. The system leverages advanced LLMs like GPT-4 and Claude Sonnet to perform these tasks iteratively, building on previous discoveries to refine future research. The authors demonstrate the system's effectiveness in machine learning subfields such as diffusion modeling, transformer-based language modeling, and learning dynamics, producing papers at a remarkably low cost of around $15 per paper. The framework also includes an automated reviewer that achieves near-human performance in evaluating paper quality, further streamlining the research process.

This method could significantly extend social science analysis by automating the generation and testing of hypotheses, particularly in areas where large datasets and complex models are involved. For example, the AI Scientist could be used to explore patterns in social behavior, economic trends, or political dynamics by iteratively generating and testing theories against real-world data. The system's ability to rapidly produce and evaluate research ideas could accelerate the pace of discovery in fields like sociology, economics, and political science, where traditional research methods can be time-consuming and resource-intensive.

To pilot this approach in social science, one could use datasets such as the General Social Survey (GSS) or World Values Survey (WVS), which contain rich, longitudinal data on social attitudes, behaviors, and demographic information. These datasets are publicly available and cover a wide range of topics, making them ideal for testing the AI Scientist's ability to generate and evaluate social science hypotheses. For example, the system could be tasked with exploring trends in political polarization, the impact of economic inequality on social trust, or the relationship between education and social mobility. The AI Scientist could iteratively generate hypotheses, test them against the data, and produce papers summarizing its findings, all while refining its approach based on feedback from the automated reviewer.

The implementation would involve feeding the AI Scientist with the GSS or WVS datasets, along with a set of initial research questions or themes. The system would then autonomously generate hypotheses, design statistical models, and analyze the data to produce insights. The results could be validated against existing social science literature, and the system could be further refined to improve its accuracy and relevance. This approach could democratize social science research, making it more accessible and efficient, while also uncovering new insights that might be overlooked by traditional methods.

Week 5. Feb. 7: Transformers for Multi-Agent Simulation - Possibilities #12

Week 5. Feb. 7: Transformers for Multi-Agent Simulation - Possibilities #12

Comments

avioberoi commented Feb 5, 2025

youjiazhou commented Feb 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Sam-SangJoonPark commented Feb 6, 2025

Uh oh!

xpan4869 commented Feb 6, 2025

Uh oh!

kiddosso commented Feb 7, 2025

Uh oh!

lucydasilva commented Feb 7, 2025

Uh oh!

tyeddie commented Feb 7, 2025

Uh oh!

yangyuwang commented Feb 7, 2025

Uh oh!

zhian21 commented Feb 7, 2025

Uh oh!

psymichaelzhu commented Feb 7, 2025

Uh oh!

JairusJia commented Feb 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

haewonh99 commented Feb 7, 2025

Uh oh!

Daniela-miaut commented Feb 7, 2025

Uh oh!

christy133 commented Feb 7, 2025

Uh oh!

ulisolovieva commented Feb 7, 2025

Uh oh!

ana-yurt commented Feb 7, 2025

Uh oh!

xiaotiantangishere commented Feb 8, 2025

Uh oh!

chychoy commented Feb 13, 2025

Uh oh!

CongZhengZheng commented Feb 14, 2025

Uh oh!

DotIN13 commented Feb 14, 2025

Uh oh!

baihuiw commented Mar 8, 2025

Uh oh!

shiyunc commented Mar 9, 2025

Uh oh!

CallinDai commented Mar 14, 2025

Uh oh!

CongZhengZheng commented Mar 18, 2025

Uh oh!

youjiazhou commented Feb 6, 2025 •

edited

Loading

JairusJia commented Feb 7, 2025 •

edited

Loading