-
Notifications
You must be signed in to change notification settings - Fork 16
/
Copy pathbig_data.theory.txt
27 lines (23 loc) · 1.12 KB
/
big_data.theory.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
┏━━━━━━━━━━━━━━┓
┃ BIG_DATA ┃
┗━━━━━━━━━━━━━━┛
Diff from BI (phases):
- Data storage + processing:
- unstructured data ("Varied"): structuring on the fly and dynamic
- External data (vs internal)
- Usually big amount of data ("Volume", "Velocity"). Internet of things will create even more.
- Data analysis:
- predictive:
- not past: reporting (what), analysis (why)
- not present: monitoring
- importance of data mining and machine learning: subset of AI, when algorithm improves with more data
(not FUNC args, FUNC itself)
- Big amount of data -> importance of algorithms and optimization
- Data visualization: focus on discovery not dashboard
Technologies (usually and as a consequence):
- Reject SQL (noSQL) and RDBMS
- MapReduce / Hadoop
- In memory databases
- SSD, DRAM
- Cloud computing
Big data is not real time, but by bringing speed of noSQL databases to Hadoop, tries to become.