Skip to content

Commit f69e442

Browse files
alyssachvastacopybara-github
authored andcommitted
April update of README
GitOrigin-RevId: 256e46603b522f59f0f315e53780180d55236e19
1 parent b85f3f7 commit f69e442

File tree

1 file changed

+64
-44
lines changed

1 file changed

+64
-44
lines changed

README.md

Lines changed: 64 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1,58 +1,79 @@
1-
# **Sensemaking tools**
1+
# Sensemaker tools
22

3-
# **Overview**
3+
# Overview
44

5-
Jigsaw’s [Sensemaking tools](https://medium.com/jigsaw/making-sense-of-large-scale-online-conversations-b153340bda55) help make sense of large-scale online conversations, leveraging LLMs to categorize statements, and summarize statements and voting patterns to surface actionable insights. There are currently three main functions:
5+
Jigsaw’s [Sensemaker tools](https://medium.com/jigsaw/making-sense-of-large-scale-online-conversations-b153340bda55) help make sense of large-scale online conversations, leveraging LLMs to categorize statements, and summarize statements and voting patterns to surface actionable insights. There are currently three main functions:
66

7-
* Topic Identification \- identifies topics and optionally subtopics from a set of statements.
7+
* Topic Identification \- identifies the main points of discussion. The level of detail is configurable, allowing the tool to discover: just the top level topics; topics and subtopics; or the deepest level — topics, subtopics, and themes (sub-subtopics).
88
* Statement Categorization \- sorts statements into topics defined by a user or from the Topic Identification function. Statements can belong to more than one topic.
9-
* Summarization \- analyzes statements and vote data to output a summary of the conversation, including areas of agreement and areas of disagreement.
10-
* Voting patterns are passed in, aggregated by group.
11-
* Summaries are run through grounding routines, ensuring that claims are backed up by what was actually said in the conversation, and adding citations or references to surface representative statements.
9+
* Summarization \- analyzes statements and vote data to output a summary of the conversation, including an overview, themes discussed, and areas of agreement and disagreement.
1210

1311
Please see these [docs](https://jigsaw-code.github.io/sensemaking-tools) for a full breakdown of available methods and types. These tools are still in their beta stage.
1412

1513
# How It Works
1614

1715
## Topic Identification
1816

19-
Jigsaw’s Sensemaking tools provide an option to identify the topics present in the comments. The tool offers flexibility to learn:
17+
Jigsaw’s Sensemaker tools provide an option to identify the topics present in the comments. The tool offers flexibility to learn:
2018

2119
* Top-level topics
2220
* Both top-level and subtopics
2321
* Sub-topics only, given a set of pre-specified top-level topics
2422

2523
## Statement Categorization
2624

27-
Categorization assigns statements to one or more of the topics and subtopics. These topics can either be provided by the user, or can be the result of the "topic identification" method described above. Topics are assigned to statements in batches, asking the model to return the appropriate categories for each statement, and leveraging the Vertex API constrained decoding feature to structure this output according to a pre-specified JSON schema, to avoid issues with output formatting. Additionally, error handling has been added to retry in case an assignment fails.
25+
Categorization assigns statements to one or more of the topics and subtopics. These topics can either be provided by the user, or can be the result of the "topic identification" method described above.
26+
27+
Topics are assigned to statements in batches, asking the model to return the appropriate categories for each statement, and leveraging the Vertex API constrained decoding feature to structure this output according to a pre-specified JSON schema, to avoid issues with output formatting. Additionally, error handling has been added to retry in case an assignment fails.
2828

2929
## Summarization
3030

31-
The summary structure is centered on topics and subtopics. For each subtopic the tool summarizes the primary areas of agreement and disagreement between groups for that subtopic.
31+
The summarization is output as a narrative report, but users are encouraged to pick and choose which elements are right for their data (see example from the runner [here](https://github.com/Jigsaw-Code/sensemaking-tools/blob/521dd0c4c2039f0ceb7c728653a9ea495eb2c8e9/runner-cli/runner.ts#L54)) and consider showing the summarizations alongside visualizations (more tools for this coming soon).
32+
33+
### Introduction Section
34+
35+
Includes a short bullet list of the number of statements, votes, topics and subtopics within the summary.
36+
37+
### Overview Section
38+
39+
The overview section summarizes the "Themes" sections for all subtopics, along with summaries generated for each top-level topic (these summaries are generated as an intermediate step, but not shown to users, and can be thought of as intermediate “chain of thought” steps in the overall recursive summarization approach).
40+
41+
Currently the Overview does not reference the "Common Ground" and "Differences of Opinion" sections.
42+
43+
Percentages in the overview (e.g. “Arts and Culture (17%)”) are the percentage of statements that are about this topic. Since statements can be categorized into multiple topics these percentages add up to a number greater than 100%.
44+
45+
### Top 5 Subtopics
46+
47+
Sensemaker selects the top 5 subtopics by statement count, and concisely summarizes key themes found in statements within these subtopics. These themes are more concise than what appears later in the summary, to act as a quick overview.
48+
49+
### Topic and Subtopic Sections
3250

33-
### Intro Section
51+
Using the topics and subtopics from our "Topic Identification" and "Statement Categorization" features, short summaries are produced for each subtopic (or topic, if no subtopics are present).
3452

35-
Includes a background on the report, and the number of statements and votes within it. Next there is a list of all the topics and subtopics discussed in the deliberation and how many statements fit under each category.
53+
For each subtopic, Sensemaker surfaces:
3654

37-
### Identifying “common ground” and “differences of opinion”
55+
* The number of statements assigned to this subtopic.
56+
* Prominent themes.
57+
* A summary of the top statements where we find "common ground" and "differences of opinion", based on agree and disagree rates.
58+
* The relative level of agreement within the subtopic, as compared to the average subtopic, based on how many comments end up in “common ground” vs “differences of opinion” buckets.
3859

39-
[Computational metrics](https://github.com/Jigsaw-Code/sensemaking-tools/blob/main/src/stats_util.ts) are used to select statements corresponding to points of “common ground” and “differences of opinion”. The metrics used rely on the participant body being partitioned into *opinion groups* (for example, the outputs of a clustering algorithm in the Polis software). These clusters represent groups of participants who tend to vote more similarly to each other than to those from other groups.
60+
#### Themes
4061

41-
Based on these opinion groups, “common ground” statements are defined as those having broad support across groups. To qualify as a point of common ground, each group has to be in agreement with a statement by at least 60%. Statements are then ranked by *group informed consensus*, defined as the product of each group’s agreement rate. This is highest when all groups agree strongly on a statement, thereby respecting minority dissent.
62+
For each subtopic, Sensemaker identifies up to 5 themes found across statements assigned to that subtopic, and writes a short description of each theme. This section considers all statements assigned to that subtopic.
4263

43-
“Differences of opinion” are identified based on the difference between the agreement rate for each opinion group, as compared with the rest of the participant body. Those statements with the highest difference for a particular group help us understand what distinguishes that group. To qualify as a difference of opinion, the agree rate difference for a group must be at least 30%. To avoid an edge-case where a statement could appear in both the "common ground" and "differences of opinion" sections, statements in the "differences of opinion" section must also have a minimum agree rate below 60%.
64+
When identifying themes, Sensemaker leverages statement text and not vote information. Sensemaker attempts to account for differing viewpoints in how it presents themes.
4465

45-
Because small sample sizes (low vote counts) can create misleading impressions, statements with fewer than 20 votes total are not included. This avoids, for example, a total of 2 votes in favor of a particular statement being taken as evidence of broad support, and included as a point of common ground, when more voting might reveal relatively low support (or significant differences of opinion between groups).
66+
#### Common Ground and Differences of Opinion
4667

47-
### Opinion Groups Section
68+
When summarizing "Common Ground" and "Differences of Opinion" within a subtopic, Sensemaker summarizes a sample of statements selected based on statistics calculated using the agree, disagree, and pass vote counts for those statements. For each section, Sensemaker selects statements with the clearest signals for common ground and disagreement, respectively. It does not use any form of text analysis (beyond categorization) when selecting the statements, and only considers vote information.
4869

49-
Each group is described based on what makes them unique, using the differences of opinion criteria described above, while also ensuring each group in question mostly agrees with the statements selected.
70+
Because small sample sizes (low vote counts) can create misleading impressions, statements with fewer than 20 votes total are not included. This avoids, for example, a total of 2 votes in favor of a particular statement being taken as evidence of broad support, and included as a point of common ground, when more voting might reveal relatively low support (or significant differences of opinion).
5071

51-
This section also describes what makes groups similar and different, and uses the common ground logic above to identify similarities. Differences of opinion are selected according where the agree rate differences are highest between any given group and the rest of the participant body, regardless of whether in the direction of agreement or disagreement.
72+
For this section, Sensemaker provides grounding citations to show which statements the LLM referenced, and to allow readers to check the underlying text and vote counts.
5273

53-
### Per Topic and Subtopic sections
74+
#### Relative Agreement
5475

55-
Using the topics and subtopics from our "Topic Identification" and "Statement Categorization" features, short summaries are produced for each subtopic (or topic, if no subtopics are present). Following similar criteria as above, only filtered to the given (sub)topic, points of “Common ground between groups” and “Differences of opinion” are identified and then the top set of statements for each are summarized using an LLM. The statements included in the summary are shown as citations within the summary text.
76+
Each subtopic is labeled as “high”, “moderately high”, “moderately low” or “low” agreement. This is determined by, for each subtopic, getting *all* the comments that qualify as common ground comments and normalizing it based on how many comments were in that subtopic. Then these numbers are compared subtopic to subtopic.
5677

5778
### **LLMs Used and Custom Models**
5879

@@ -62,39 +83,40 @@ In addition to models available through VertexAI’s Model Garden, users can int
6283

6384
### **Costs of Running**
6485

65-
LLM pricing is based on token count and constantly changing. Here we list the token counts for a conversation with \~1000 statements. Please see [Vertex AI pricing](https://cloud.google.com/vertex-ai/generative-ai/pricing) for an up-to-date cost per input token. As of January 23, 2025 the cost for running topic identification, statement categorization, and summarization was in total under $1 on Gemini 1.5 Pro.
86+
LLM pricing is based on token count and constantly changing. Here we list the token counts for a conversation with \~1000 statements. Please see [Vertex AI pricing](https://cloud.google.com/vertex-ai/generative-ai/pricing) for an up-to-date cost per input token. As of April 10, 2025 the cost for running topic identification, statement categorization, and summarization was in total under $1 on Gemini 1.5 Pro.
6687
Token Counts for a 1000 statement conversation
6788

6889
| | Topic Identification | Statement Categorization | Summarization |
6990
| ----- | ----- | ----- | ----- |
70-
| Input Tokens | 41,000 | 41,000 | 19,000 |
71-
| Output Tokens | 1,000 | 26,000 | 5,000 |
91+
| Input Tokens | 130,000 | 130,000 | 80,000 |
92+
| Output Tokens | 50,000 | 50,000 | 7,500 |
7293

73-
## **Running the tools \- Setup**
94+
### **Evaluations**
95+
96+
Our text summary consists of outputs from multiple LLM calls, each focused on summarizing a subset of comments. We have evaluated these LLM outputs for hallucinations both manually and using autoraters. Autorating code can be found in [evals/autorating](https://github.com/Jigsaw-Code/sensemaking-tools/tree/main/evals/autorating).
7497

75-
First make sure you have `npm` installed (`apt-get install npm` on Ubuntu-esque systems).
98+
We have evaluated topic identification and categorization using methods based on the silhouette coefficient. This evaluation code will be published in the near future. We have also considered how stable the outputs are run to run and comments are categorized into the same topic(s) \~90% of the time, and the identified topics also show high stability.
7699

100+
## **Running the tools \- Setup**
101+
102+
First make sure you have `npm` installed (`apt-get install npm` on Ubuntu-esque systems).
77103
Next install the project modules by running:
78104
`npm install`
79105

80106
### **Using the Default Models \- GCloud Authentication**
81107

82-
A Google Cloud project is required to control quota and access when using the default models that connect to Model Garden. Installation instructions for all machines are [here](https://cloud.google.com/sdk/docs/install-sdk#deb).
83-
108+
A Google Cloud project is required to control quota and access when using the default models that connect to Model Garden. Installation instructions for all machines are [here](https://cloud.google.com/sdk/docs/install-sdk#deb).
84109
For Linux the GCloud CLI can be installed like:
85-
`sudo apt install -y google-cloud-cli`
86-
87-
Then to log in locally run:
88-
89-
`gcloud config set project <your project name here>`
90-
110+
`sudo apt install -y google-cloud-cli`
111+
Then to log in locally run:
112+
`gcloud config set project <your project name here>`
91113
`gcloud auth application-default login`
92114

93-
## Example Usage - Javascript
115+
## **Example Usage \- Javascript**
94116

95-
Summarize Seattle’s $15 Minimum Wage Conversation.
117+
Summarize Seattle’s $15 Minimum Wage Conversation.
96118

97-
```js
119+
```javascript
98120
// Set up the tools to use the default Vertex model (Gemini Pro 1.5) and related authentication info.
99121
const mySensemaker = new Sensemaker({
100122
defaultModel: new VertexModel(
@@ -124,22 +146,20 @@ console.log(topics);
124146
// Summarize the conversation and print the result as Markdown.
125147
const summary = mySensemaker.summarize(
126148
comments,
127-
SummarizationType.GROUP_INFORMED_CONSENSUS,
149+
SummarizationType.AGGREGATE_VOTE,
128150
topics,
129151
// Additional context:
130152
"This is from a conversation on a $15 minimum wage in Seattle"
131153
);
132154
console.log(summary.getText("MARKDOWN"));
133155
```
134156
135-
## **CLI Usage**
136-
157+
**CLI Usage**
137158
There is also a simple CLI set up for testing. There are two tools:
138159
139160
* ./runner-cli/runner.ts: takes in a CSV representing a conversation and outputs an HTML file containing the summary. The summary is best viewed as an HTML file so that the included citations can be hovered over to see the original comment and votes.
140161
* ./runner-cli/rerunner.ts: takes in a CSV representing a conversation and reruns summarization a number of times and outputs each of the summaries in one CSV. This is useful for testing consistency.
141162
142-
143163
## **Making Changes to the tools \- Development**
144164
145165
### **Testing**
@@ -154,8 +174,8 @@ The documentation [here](https://jigsaw-code.github.io/sensemaking-tools) is the
154174
155175
## **Feedback**
156176
157-
If you have questions or issues with this library, please leave feedback [here](https://docs.google.com/forms/d/e/1FAIpQLSd6kScXaf0d8XR7X9mgHBgG11DJYXV1hEzYLmqpxMcDFJxOhQ/viewform?resourcekey=0-GTVtn872epNsEHtI2ClBEA) and we will reach out to you. Our team is actively evaluating Sensemaking performance and is aiming to share our results on this page in the future. Please note that performance results may vary depending on the model selected.
177+
If you have questions or issues with this library, please leave feedback [here](https://docs.google.com/forms/d/e/1FAIpQLSd6kScXaf0d8XR7X9mgHBgG11DJYXV1hEzYLmqpxMcDFJxOhQ/viewform?resourcekey=0-GTVtn872epNsEHtI2ClBEA) and we will reach out to you. Our team is actively evaluating Sensemaker performance and is aiming to share our results on this page in the future. Please note that performance results may vary depending on the model selected.
158178
159179
## **Cloud Vertex Terms of Use**
160180
161-
This library is designed to leverage Cloud Vertex, and usage is subject to the [Cloud Vertex Terms of Service](https://cloud.google.com/terms/service-terms) and the [Generative AI Prohibited Use Policy](https://policies.google.com/terms/generative-ai/use-policy).
181+
This library is designed to leverage Cloud Vertex, and usage is subject to the [Cloud Vertex Terms of Service](https://cloud.google.com/terms/service-terms) and the [Generative AI Prohibited Use Policy](https://policies.google.com/terms/generative-ai/use-policy).

0 commit comments

Comments
 (0)