-
Notifications
You must be signed in to change notification settings - Fork 1.1k
/
TomsHardware-Relative-Sigma-500.names.txt
439 lines (383 loc) · 19.4 KB
/
TomsHardware-Relative-Sigma-500.names.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
1. Title of Database: Buzz prediction on Tom's Hardware - Relative Labeling - Threshold Sigma equals 500
2. Sources:
-- Creators :
François Kawala (1,2) and
Ahlame Douzal (1) and
Eric Gaussier (1) and
Eustache Diemert (2)
-- Institutions :
(1) Université Joseph Fourier (Grenoble I)
Laboratoire d'informatique de Grenoble (LIG)
(2) BestofMedia Group
-- Donor: BestofMedia ([email protected])
-- Date: May, 2013
3. Past Usage:
-- References :
Predicting Buzz Magnitude in Social Media (in submission (ECML-PKDD 13))
-- Predicted attribute :
Buzz. This attribute is boolean: 1 meaning `buzz observed', 0 meaning
`no buzz observed'. It is stored is the rightmost column.
-- Study results :
The results achieved are acceptable, nevertheless the unbalanced nature
of this dataset leaves some room for improvement. Using random forest
yields a F-1 score of around 0.80 for the Buzz class, when the data is
scaled and normalized. First order discrete difference over features may also
be considered as additional features.
4. Relevant Information Paragraph:
-- Observations :
Each instance covers height weeks of observation for a specific topic (eg.
overclocking...). Considering the couple weeks following this initial
observation; If there is at least 500 additional active discussions by
day (on average, with respect to the initial observation) then, the
predicted attribute Buzz is True.
Observations are Independent and identically distributed.
5. Number of Instances
-- Total number of instances : 7905
6. Number of Attributes
-- Total number of attributes : 96.
-- Time representation :
Each instance is described by 96 features, those describe the evolution
of 12 `primary features' through time. Hence each feature name is
postfixed with the relative time of observation. For instance, the value
of the feature `Nb_Active_Discussion' at time t is given in
'Nb_Active_Discussion_t'.
7. Attributes
-- Number of Created Discussions (NCD) (columns [0,7])
-- Type : Numeric, integers only
-- Description : This feature measures the number of discussions created
at time step t and involving the instance's topic.
-- Columns : From column 0 (NCD at relative time 0) to column 7 (NCD at
relative time 7)
-- Abbreviations : NCD_0, NCD_1, NCD_2, NCD_3, NCD_4, NCD_5, NCD_6, NCD_7
-- Statistics :
+---------+-----+-----+-------+-------+
| feature | min | max | mean | std |
+---------+-----+-----+-------+-------+
| NCD_0 | 0 | 182 | 1.399 | 6.213 |
+---------+-----+-----+-------+-------+
| NCD_1 | 0 | 109 | 1.301 | 5.596 |
+---------+-----+-----+-------+-------+
| NCD_2 | 0 | 116 | 1.290 | 5.440 |
+---------+-----+-----+-------+-------+
| NCD_3 | 0 | 98 | 1.332 | 5.508 |
+---------+-----+-----+-------+-------+
| NCD_4 | 0 | 108 | 1.341 | 5.364 |
+---------+-----+-----+-------+-------+
| NCD_5 | 0 | 118 | 1.362 | 5.347 |
+---------+-----+-----+-------+-------+
| NCD_6 | 0 | 154 | 1.454 | 5.642 |
+---------+-----+-----+-------+-------+
| NCD_7 | 0 | 88 | 1.327 | 4.974 |
+---------+-----+-----+-------+-------+
-- Burstiness Level (BL) (columns [8,15])
-- Type : Numeric, defined on [0,1]
-- Description : The burstiness level for a topic z at a time t is
defined as the ratio of ncd and nad
-- Columns : From column 8 (BL at relative time 0) to column 15 (BL at
relative time 7)
-- Abbreviations : BL_0, BL_1, BL_2, BL_3, BL_4, BL_5, BL_6, BL_7
-- Statistics :
+---------+-----+-----+-------+-------+
| feature | min | max | mean | std |
+---------+-----+-----+-------+-------+
| BL_0 | 0 | 1 | 0.145 | 0.288 |
+---------+-----+-----+-------+-------+
| BL_1 | 0 | 1 | 0.143 | 0.286 |
+---------+-----+-----+-------+-------+
| BL_2 | 0 | 1 | 0.143 | 0.285 |
+---------+-----+-----+-------+-------+
| BL_3 | 0 | 1 | 0.146 | 0.288 |
+---------+-----+-----+-------+-------+
| BL_4 | 0 | 1 | 0.149 | 0.291 |
+---------+-----+-----+-------+-------+
| BL_5 | 0 | 1 | 0.156 | 0.298 |
+---------+-----+-----+-------+-------+
| BL_6 | 0 | 1 | 0.184 | 0.323 |
+---------+-----+-----+-------+-------+
| BL_7 | 0 | 1 | 0.180 | 0.320 |
+---------+-----+-----+-------+-------+
-- Average Discussions Length (NAD) (columns [16,23])
-- Type : Numeric, integer.
-- Description : This features measures the number of discussions
involving the instance's topic until time t.
-- Columns : From column 16 (NAD at relative time 0) to column 23 (NAD at
relative time 7)
-- Abbreviations : NAD_0, NAD_1, NAD_2, NAD_3, NAD_4, NAD_5, NAD_6, NAD_7
-- Statistics :
+---------+-----+-----+-------+--------+
| feature | min | max | mean | std |
+---------+-----+-----+-------+--------+
| NAD_0 | 0 | 217 | 3.185 | 11.985 |
+---------+-----+-----+-------+--------+
| NAD_1 | 0 | 202 | 3.096 | 11.688 |
+---------+-----+-----+-------+--------+
| NAD_2 | 0 | 201 | 3.077 | 11.438 |
+---------+-----+-----+-------+--------+
| NAD_3 | 0 | 189 | 3.215 | 11.783 |
+---------+-----+-----+-------+--------+
| NAD_4 | 0 | 210 | 3.198 | 11.603 |
+---------+-----+-----+-------+--------+
| NAD_5 | 0 | 194 | 3.173 | 11.240 |
+---------+-----+-----+-------+--------+
| NAD_6 | 0 | 170 | 3.198 | 10.698 |
+---------+-----+-----+-------+--------+
| NAD_7 | 0 | 185 | 3.006 | 9.809 |
+---------+-----+-----+-------+--------+
-- Author Increase (AI) (columns [24,31])
-- Type : Numeric, integers only
-- Description : This featurethe number of new authors interacting on
the instance's topic at time t (i.e. its popularity)
-- Columns : From column 24 (AI at relative time 0) to column 31 (AI at
relative time 7)
-- Abbreviations : AI_0, AI_1, AI_2, AI_3, AI_4, AI_5, AI_6, AI_7, AI_8
-- Statistics :
+---------+-----+-----+-------+-------+
| feature | min | max | mean | std |
+---------+-----+-----+-------+-------+
| AI_0 | 0 | 158 | 2.313 | 7.244 |
+---------+-----+-----+-------+-------+
| AI_1 | 0 | 146 | 2.253 | 7.274 |
+---------+-----+-----+-------+-------+
| AI_2 | 0 | 149 | 2.360 | 7.404 |
+---------+-----+-----+-------+-------+
| AI_3 | 0 | 161 | 2.567 | 8.677 |
+---------+-----+-----+-------+-------+
| AI_4 | 0 | 169 | 2.578 | 8.309 |
+---------+-----+-----+-------+-------+
| AI_5 | 0 | 149 | 2.532 | 7.850 |
+---------+-----+-----+-------+-------+
| AI_6 | 0 | 156 | 2.465 | 6.944 |
+---------+-----+-----+-------+-------+
| AI_7 | 0 | 160 | 2.538 | 7.038 |
+---------+-----+-----+-------+-------+
-- Number of Atomic Containers (NAC) (columns [32,39])
-- Type : Numeric, integer
-- Description : This feature measures the total number of atomic
containers generated through the whole social media on the instance's topic until time t.
-- Columns : From column 32 (NAC at relative time 0) to column 39 (NAC at
relative time 7)
-- Abbreviations : NAC_0, NAC_1, NAC_2, NAC_3, NAC_4, NAC_5, NAC_6, NAC_7
-- Statistics :
+---------+-----+------+--------+---------+
| feature | min | max | mean | std |
+---------+-----+------+--------+---------+
| NAC_0 | 0 | 1639 | 26.268 | 101.293 |
+---------+-----+------+--------+---------+
| NAC_1 | 0 | 1966 | 25.925 | 102.825 |
+---------+-----+------+--------+---------+
| NAC_2 | 0 | 1651 | 26.318 | 98.330 |
+---------+-----+------+--------+---------+
| NAC_3 | 0 | 1491 | 26.822 | 97.665 |
+---------+-----+------+--------+---------+
| NAC_4 | 0 | 1734 | 26.373 | 96.521 |
+---------+-----+------+--------+---------+
| NAC_5 | 0 | 1657 | 26.103 | 93.847 |
+---------+-----+------+--------+---------+
| NAC_6 | 0 | 1403 | 27.139 | 85.966 |
+---------+-----+------+--------+---------+
| NAC_7 | 0 | 1707 | 26.463 | 79.881 |
+---------+-----+------+--------+---------+
-- Number of displays (ND) (columns [40,47])
-- Type : Numeric, integer
-- Description : This feature gives the number of time discussions
relying on the instance's topic has been displayed by users.
-- Columns : From column 40 (ND at relative time 0) to column 47
(ND at relative time 7)
-- Abbreviations : ND_0, ND_1, ND_2, ND_3, ND_4, ND_5, ND_6, ND_7
-- Statistics :
+---------+-----+--------+----------+-----------+
| feature | min | max | mean | std |
+---------+-----+--------+----------+-----------+
| ND_0 | 0 | 235271 | 3571.694 | 12547.297 |
+---------+-----+--------+----------+-----------+
| ND_1 | 0 | 164319 | 3321.016 | 10976.325 |
+---------+-----+--------+----------+-----------+
| ND_2 | 0 | 225099 | 3359.040 | 11783.297 |
+---------+-----+--------+----------+-----------+
| ND_3 | 0 | 203236 | 3673.377 | 12273.335 |
+---------+-----+--------+----------+-----------+
| ND_4 | 0 | 229433 | 3887.347 | 13357.621 |
+---------+-----+--------+----------+-----------+
| ND_5 | 0 | 226412 | 3926.186 | 13570.591 |
+---------+-----+--------+----------+-----------+
| ND_6 | 0 | 330561 | 4449.874 | 16843.320 |
+---------+-----+--------+----------+-----------+
| ND_7 | 0 | 245047 | 4659.202 | 15680.960 |
+---------+-----+--------+----------+-----------+
-- Contribution Sparseness (CS) (columns [48,55])
-- Type : Numeric, defined on [0,1]
-- Description : This feature is a measure of spreading of contributions
over discussion for the instance's topic at time t.
-- Columns : From column 48 (CS at relative time 0) to column 55
(CS at relative time 7)
-- Abbreviations : CS_0, CS_1, CS_2, CS_3, CS_4, CS_5, CS_6, CS_7
-- Statistics :
+---------+-----+-----+-------+-------+
| feature | min | max | mean | std |
+---------+-----+-----+-------+-------+
| CS_0 | 0 | 1 | 0.011 | 0.051 |
+---------+-----+-----+-------+-------+
| CS_1 | 0 | 1 | 0.011 | 0.057 |
+---------+-----+-----+-------+-------+
| CS_2 | 0 | 1 | 0.012 | 0.050 |
+---------+-----+-----+-------+-------+
| CS_3 | 0 | 1 | 0.012 | 0.047 |
+---------+-----+-----+-------+-------+
| CS_4 | 0 | 1 | 0.013 | 0.057 |
+---------+-----+-----+-------+-------+
| CS_5 | 0 | 1 | 0.013 | 0.056 |
+---------+-----+-----+-------+-------+
| CS_6 | 0 | 1 | 0.016 | 0.066 |
+---------+-----+-----+-------+-------+
| CS_7 | 0 | 1 | 0.017 | 0.063 |
+---------+-----+-----+-------+-------+
-- Author Interaction (AT) (columns [56,63])
-- Type : Numeric, integer.
-- Description : This feature measures the average number of authors
interacting on the instance's topic within a discussion.
-- Columns : From column 56 (AT at relative time 0) to column 63
(AT at relative time 7)
-- Abbreviations : AT_0, AT_1, AT_2, AT_3, AT_4, AT_5, AT_6, AT_7
-- Statistics :
+---------+-----+-----+-------+-------+
| feature | min | max | mean | std |
+---------+-----+-----+-------+-------+
| AT_0 | 0 | 105 | 3.771 | 8.753 |
+---------+-----+-----+-------+-------+
| AT_1 | 0 | 97 | 3.616 | 8.095 |
+---------+-----+-----+-------+-------+
| AT_2 | 0 | 104 | 3.783 | 8.651 |
+---------+-----+-----+-------+-------+
| AT_3 | 0 | 106 | 4.056 | 9.173 |
+---------+-----+-----+-------+-------+
| AT_4 | 0 | 107 | 3.773 | 8.286 |
+---------+-----+-----+-------+-------+
| AT_5 | 0 | 102 | 3.940 | 8.860 |
+---------+-----+-----+-------+-------+
| AT_6 | 0 | 104 | 3.970 | 8.543 |
+---------+-----+-----+-------+-------+
| AT_7 | 0 | 101 | 4.101 | 8.347 |
+---------+-----+-----+-------+-------+
-- Number of Authors (NA) (columns [64,71])
-- Type : Numeric, integer.
-- Description : This feature measures the number of authors interacting
on the instance's topic at time t.
-- Columns : From column 64 (NA at relative time 0) to column 71 (NA at
relative time 7)
-- Abbreviations : NA_0, NA_1, NA_2, NA_3, NA_4, NA_5, NA_6, NA_7
-- Statistics :
+---------+-----+-----+-------+--------+
| feature | min | max | mean | std |
+---------+-----+-----+-------+--------+
| NA_0 | 0 | 310 | 6.160 | 20.083 |
+---------+-----+-----+-------+--------+
| NA_1 | 0 | 312 | 5.998 | 20.124 |
+---------+-----+-----+-------+--------+
| NA_2 | 0 | 306 | 6.068 | 19.829 |
+---------+-----+-----+-------+--------+
| NA_3 | 0 | 310 | 6.357 | 20.786 |
+---------+-----+-----+-------+--------+
| NA_4 | 0 | 313 | 6.258 | 20.401 |
+---------+-----+-----+-------+--------+
| NA_5 | 0 | 322 | 6.240 | 19.615 |
+---------+-----+-----+-------+--------+
| NA_6 | 0 | 264 | 6.116 | 17.485 |
+---------+-----+-----+-------+--------+
| NA_7 | 0 | 309 | 6.100 | 16.541 |
+---------+-----+-----+-------+--------+
-- Average Discussions Length (ADL) (columns [72,79])
-- Type : Numeric, real.
-- Description : This feature directly measures the average length of a
discussion belonging to the instance's topic.
-- Columns : From column 72 (ADL at relative time 0) to column 19 (ADL at
relative time 7)
-- Abbreviations : ADL_0, ADL_1, ADL_2, ADL_3, ADL_4, ADL_5, ADL_6, ADL_7
-- Statistics :
+---------+-----+-----+--------+--------+
| feature | min | max | mean | std |
+---------+-----+-----+--------+--------+
| ADL_0 | 0 | 150 | 12.377 | 23.652 |
+---------+-----+-----+--------+--------+
| ADL_1 | 0 | 150 | 11.905 | 22.637 |
+---------+-----+-----+--------+--------+
| ADL_2 | 0 | 150 | 11.994 | 22.478 |
+---------+-----+-----+--------+--------+
| ADL_3 | 0 | 150 | 13.227 | 24.110 |
+---------+-----+-----+--------+--------+
| ADL_4 | 0 | 150 | 12.754 | 23.191 |
+---------+-----+-----+--------+--------+
| ADL_5 | 0 | 150 | 12.955 | 23.729 |
+---------+-----+-----+--------+--------+
| ADL_6 | 0 | 150 | 13.620 | 23.763 |
+---------+-----+-----+--------+--------+
| ADL_7 | 0 | 150 | 14.916 | 25.321 |
+---------+-----+-----+--------+--------+
-- Attention Level (measured with number of authors) (AS(NA))
(columns [80,87])
-- Type : Numeric, integer
-- Description : This feature is a measure of the attention payed to a
the instance's topic on a social media.
-- Columns : From column 80 (AS(NAC) at relative time 0) to column 87
(AS(NAC) at relative time 7)
-- Abbreviations : AS(NA)_0, AS(NA)_1, AS(NA)_2, AS(NA)_3, AS(NA)_4, AS(NA)_5,
AS(NA)_6, AS(NA)_7
-- Statistics :
+----------+-----+-------+-------+-------+
| feature | min | max | mean | std |
+----------+-----+-------+-------+-------+
| AS(NA)_0 | 0 | 0.107 | 0.003 | 0.009 |
+----------+-----+-------+-------+-------+
| AS(NA)_1 | 0 | 0.126 | 0.003 | 0.009 |
+----------+-----+-------+-------+-------+
| AS(NA)_2 | 0 | 0.129 | 0.003 | 0.009 |
+----------+-----+-------+-------+-------+
| AS(NA)_3 | 0 | 0.159 | 0.003 | 0.009 |
+----------+-----+-------+-------+-------+
| AS(NA)_4 | 0 | 0.144 | 0.003 | 0.009 |
+----------+-----+-------+-------+-------+
| AS(NA)_5 | 0 | 0.166 | 0.003 | 0.010 |
+----------+-----+-------+-------+-------+
| AS(NA)_6 | 0 | 0.155 | 0.003 | 0.009 |
+----------+-----+-------+-------+-------+
| AS(NA)_7 | 0 | 0.147 | 0.003 | 0.009 |
+----------+-----+-------+-------+-------+
-- Attention Level (measured with number of contributions) (AS(NAC))
(columns [88,95])
-- Type : Numeric, integer
-- Description : This feature is a measure of the attention payed to a
the instance's topic on a social media.
-- Columns : From column 28 (AS(NAC) at relative time 0) to column 34
(AS(NAC) at relative time 6)
-- Abbreviations : AS(NAC)_0, AS(NAC)_1, AS(NAC)_2, AS(NAC)_3, AS(NAC)_4,
AS(NAC)_5, AS(NAC)_6, AS(NAC)_7
-- Statistics :
+-----------+-----+-------+-------+-------+
| feature | min | max | mean | std |
+-----------+-----+-------+-------+-------+
| AS(NAC)_0 | 0 | 0.107 | 0.002 | 0.007 |
+-----------+-----+-------+-------+-------+
| AS(NAC)_1 | 0 | 0.106 | 0.002 | 0.007 |
+-----------+-----+-------+-------+-------+
| AS(NAC)_2 | 0 | 0.130 | 0.002 | 0.007 |
+-----------+-----+-------+-------+-------+
| AS(NAC)_3 | 0 | 0.136 | 0.002 | 0.007 |
+-----------+-----+-------+-------+-------+
| AS(NAC)_4 | 0 | 0.153 | 0.002 | 0.007 |
+-----------+-----+-------+-------+-------+
| AS(NAC)_5 | 0 | 0.137 | 0.002 | 0.007 |
+-----------+-----+-------+-------+-------+
| AS(NAC)_6 | 0 | 0.147 | 0.002 | 0.007 |
+-----------+-----+-------+-------+-------+
| AS(NAC)_7 | 0 | 0.179 | 0.002 | 0.008 |
+-----------+-----+-------+-------+-------+
-- Annotation (column 96)
-- Type : Numeric, integer: 0 or 1
-- Description : See 3. and 4.
-- Columns : 96
-- Buzz = 1
Non Buzz = 0
8. Missing Attribute Values:
-- There is not any missing values.
9. Class Distribution:
-- Positives instances (ie. Buzz) : 1100 (22 %)
-- Negative instances (ie. Non Buzz) : 6795 (78 %)