Skip to content

Latest commit

 

History

History
40 lines (23 loc) · 3.56 KB

README.md

File metadata and controls

40 lines (23 loc) · 3.56 KB

Repository for my talk at UCLA's DataFest 2021

Jeremy Guinta - Data Scientist; Litigation Consultant; Statistical Expert; Machine Learning; Lecturer

Jeremy has nearly 20 years’ experience in litigation consulting and complex data analysis. He specializes in performing complex data analysis, including complex statistical procedures to develop trends and predictive analysis. He also retains an expertise in machine learning and has developed machine learning processes to improve name matching and categorical descriptions of textual data. Over the course of his nearly 20-year career, Jeremy has developed and delivered training courses on Statistics, Data Management and Analysis, R programming, SQL programming, and Data Visualization. Jeremy has authored articles on data privacy, data management, statistical analysis, and general areas of law when law, statistics, and data analysis intersect. Jeremy also teaches a college level course in beginning and intermediate statistics at California State University Los Angeles.

www.linkedin.com/in/jeremyguinta

Program Description

This program is a two-part program that will discuss my theory of data graphics and discuss an introduction to machine learning. Both parts of my talk will have hands on learning and programming using R and RStudio.

Data Visuals

Everyone wants to be able to tell a story, and persuasive storytelling is an important part of an attorney’s role. This program is designed to teach proper data visualization techniques to tell a more powerful story. The program will cover the 4 Os (Observable, Original, Objective, and Open) of great graphics and how you can portray a better story using data visualizations. Specifically, the program will cover 1) How to use facts and information in pictorial form; 2) Useful graphic techniques; 3) Good and bad data visual methods, pitfalls, and other considerations when creating or analyzing a graph; and 4) basics of ggplot2() using R to make a great visual.

Machine Learning

Machine Learning is a series of highly sophisicated statisitcal techniques that is hard to discern and even harder to implement correctly with meaning. This program is is designed to break open the black box of Machine Learning, so participants can understand the basics of telling their story based on the data and the model through visuals, setting up their data, developing a training and testing dataset, validating their model, and, most importantly, explaining the meaning of their model to a wider audience. This program will work through a very basic Machine Learning example using R.

Data

mtcars - Our favorite built-in dataset from R

indexes - Stock price indices taken from publicly available sources for

  1. NYSE (https://finance.yahoo.com/quote/%5ENYA?p=^NYA&.tsrc=fin-srch)
  2. Nasdaq (https://finance.yahoo.com/quote/%5EIXIC?p=^IXIC&.tsrc=fin-srch)
  3. Standard & Poors Preferred Stock (https://finance.yahoo.com/quote/%5ESPPREF/)

miller_d - Stock price index for D preferred shares of Miller Energy (https://finance.yahoo.com/news/miller-energy-responds-notice-delisting-213036706.html)

UCI_Credit_Card.csv - Please see UCI_README

Software Requirements

  1. R (https://mran.microsoft.com/download) or (https://cran.r-project.org/bin/windows/base/) I prefer the mran version as it is optimized for multi-threading. It is not required for this program.
  2. RStudio (https://rstudio.com/products/rstudio/download/)

Scripting

see 20210408_Script.r for R code for all of the PPTX graphics and machine learning walkthrough. This code will auto-load all of the required packages.