Skip to content

analyze usage of english in spanish language women's magazines (and usage of spanish in english language women's magazines) in the US.

License

Notifications You must be signed in to change notification settings

lusy/hora-de-decir-bye-bye

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Hora de decir bye-bye

Aim of this project

Analyze usage of English in Spanish language women's magazines (and usage of Spanish in English language women's magazines) in the US.

Outline

  1. Get some data: scrape magazine articles from
  • siempremujer.com
  • latina.com
  1. Text pre-processing
  • convert to plain text
  1. Dictionary pre-processing
  • we want to annotate tokens in articles according to their language
  • use open offices dictionaries
  • we need simple word lists, so clean up annotation here and convert to utf-8!
  • make a common Spanish dictionary (intersection of all Spanish dictionaries)
  • and regional specific Spanish dictionaries for the different countries
  1. Annotate text
  • look up all the words in Spanish/English dictionaries
  • annotate language (multiple labels possible)
  1. Get n-grammes with alternating language use

  2. Try to generalize some contexts where such n-grammes appear

Tools

  • webscraping: request and pattern
  • natural language processing: nltk

About

analyze usage of english in spanish language women's magazines (and usage of spanish in english language women's magazines) in the US.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages