This example shows you how to classify URLs as phishy or normal using Phishing Website Dataset. Since we are classifying the elements of a given set into two groups ie. phishy or normal, this is a binary classification problem.
We have following features available in the dataset:
HAVING_IP_ADDRESS: Whether an IP adress is used as an alternate to a domain name. {-1, 1}URL_LENGTH: Whether URL length is legitimate, suspicious or phishing. {1, 0, -1}SHORTINING_SERVICE: Whether it is using URL shortening service or not. {1, -1}HAVING_AT_SYMBOL: Whether URL contains "@" symbol. {-1, 1}DOUBLE_SLASH_REDIRECTING: Whether URL contains double slash redirecting or not. {-1, 1}PREFIX_SUFFIX: Whether URL contains prefix or suffix separated by "-". {-1, 1}HAVING_SUB_DOMAIN: Whether count of sub domains in URL is legitimate, suspicious or phishing. {-1, 0, 1}SSLFINAL_STATE: Whether URL use https and issuer is trusted, use https but issuer is not trusted or no https. {-1, 0, 1}DOMAIN_REGISTERATION_LENGTH: Whether domain expires in less than a year or not. {-1, 1}FAVICON: Whether favicon is loaded from external domain or not. {-1, 1}PORT: Whether port is of preferred status or not. {-1, 1}HTTPS_TOKEN: Whetherhttpstoken is part of domain or not. {-1, 1}REQUEST_URL: Whether percentage of requests made to external domain falls in legitimate or suspicious category. {-1, 1}URL_OF_ANCHOR: Whether percentage of url in anchor tags reference external domain or self falls in legitimate, suspicious or phishy category. {-1, 0, 1}LINKS_IN_TAGS: Whether percentage of links in meta, script, link tags referencing external domain falls in legitimate, suspicious or phishy category. {-1, 0, 1}SFH: Whether server form handler is empty or contains "about: blank", refers to a different domain or is normal. {1, 0, -1}SUBMITTING_TO_EMAIL: Whether the form submits information to email. {-1, 1}ABNORMAL_URL: Whether URL contains host name or not. {-1, 1}REDIRECT: Whether URL redirects less than equal to 1, between 2 and 4 or greater than 4. {0, 1}ON_MOUSEOVER: WhetheronMouseOverchanges status bar or not. {-1, 1}RIGHTCLICK: Whether right click is disabled or not. {-1, 1}POPUPWIDNOW: Whether pop up window contain text field or not. {-1, 1}IFRAME: Whether page contains iframe tag or not. {-1, 1}AGE_OF_DOMAIN: Whether age of domain is less than 6 months or not. {-1, 1}DNSRECORD: Whether there is DNS record for the domain or not. {-1, 1}WEB_TRAFFIC: Whether website ranking is less than 100,000, greater than 100,000 or is not recognized by Alexa and/or has no web traffic. {-1, 0, 1}PAGE_RANK: Whether page rank is less than 0.2 or not. {-1, 1}GOOGLE_INDEX: Whether web page is indexed by Google or not. {-1, 1}LINKS_POINTING_TO_PAGE: Whether links pointing to the page is equal to 0, between 0 and 2 or greater than 2. {-1, 0, 1}STATISTICAL_REPORT: Host belongs to top phishing IPs or domains or not. {-1, 1}
Prepare the environment:
$ npm install
# Or
$ yarnTo build and watch the example, run:
$ yarn watch