Skip to content

Conversation

@AndreMikulec
Copy link

TheSystematicInvestor@gmail

[Pull request] SIT File: fundamental.data.r R function: fund.data

Michael Kapler,

The following are the

  1. reasons
  2. code

supporting

  1. fixes
  2. enhancement

File: fundamental.data.r
R function: fund.data

PROBLEM #1

The situation exists that

    AAOI only has one quarter of information on the web page

When I run

outp <- fund.data31("AAOI", 20)

The following code is (eventually) executed,

# remove empty columns
all.data = all.data[, colSums(nchar(trim(all.data))) > 0]

if( ncol(all.data) > n ) {

Code
all.data[, colSums(nchar(trim(all.data))) > 0]

accidentally converts the single column matrix
( because only one column of information on the web page ) into a vector.

So next

    if( ncol(all.data) > n ) {

fails with error

     Downloading http://uk.advfn.com/p.php?pid=financials&symbol=AAOI&btn=quarterly_reports
     Error in if (ncol(all.data) > n) { : argument is of length zero

This is because, "vectors" do not have a "ncol" attribute

This is the fix

# remove empty columns
all.data = all.data[, colSums(nchar(trim(all.data))) > 0]
# if converted to a vector, then make it a matrix again
if (is.vector(all.data)) {
    all.data.temp <- matrix(all.data,nrow=length(all.data))
    rownames(all.data.temp) <- names(all.data)
    colnames(all.data.temp) <- all.data.temp[1, ]
    all.data <- all.data.temp
    all.data.temp <- matrix(nrow=0, ncol=0)
}

if( ncol(all.data) > n ) {  
    all.data <- all.data[, (ncol(all.data) - n + 1):ncol(all.data)]
    # if converted to a vector, then make it a matrix again
    if (is.vector(all.data)) {
        all.data.temp <- matrix(all.data,nrow=length(all.data))
        rownames(all.data.temp) <- names(all.data)
        colnames(all.data.temp) <- all.data.temp[1, ]
        all.data <- all.data.temp
        all.data.temp <- matrix(nrow=0, ncol=0)
    }

PROBLEM #2 ( RELATED TO PROBLEM #1 )

The situation exists that

            I may only want one quarter of information

When I run

            outp <- fund.data31("THM", 1) 

When requesting only one quarter of information ( using n=1 )
again,

This Code

           all.data = all.data[, colSums(nchar(trim(all.data))) > 0]

accidentally converts the single column matrix into a vector

It is tested by

           if( ncol(all.data) > n ) {

Instead it is directly returned from the function by

    return(all.data)

as a vector ( but this should be a matrix )

This is the fix

# remove empty columns
all.data = all.data[, colSums(nchar(trim(all.data))) > 0]
# if converted to a vector, then make it a matrix again
if (is.vector(all.data)) {
    all.data.temp <- matrix(all.data,nrow=length(all.data))
    rownames(all.data.temp) <- names(all.data)
    colnames(all.data.temp) <- all.data.temp[1, ]
    all.data <- all.data.temp
    all.data.temp <- matrix(nrow=0, ncol=0)
}

if( ncol(all.data) > n ) {  
    all.data <- all.data[, (ncol(all.data) - n + 1):ncol(all.data)]
    # if converted to a vector, then make it a matrix again
    if (is.vector(all.data)) {
        all.data.temp <- matrix(all.data,nrow=length(all.data))
        rownames(all.data.temp) <- names(all.data)
        colnames(all.data.temp) <- all.data.temp[1, ]
        all.data <- all.data.temp
        all.data.temp <- matrix(nrow=0, ncol=0)
    }

ENHANCEMENT

One may want to capture the page title for reasons of

  1. make visually matching symbols to companies easier
  2. compare and contrast errors between the page titles and the in-page company names

This is the enhancement

    if ( keepHTMLTITLEtext == TRUE ) {
        # extract title from this page
        HTMLOPENTITLETAGposStart    <- regexpr(pattern="<title>", txt,ignore.case=TRUE)[1]
        HTMLCLOSETITLETAGposStart   <- regexpr(pattern="</title>",txt,ignore.case=TRUE)[1]
        HTMLOPENTITLETAGlength      <- nchar("<title>")
        HTMLTITLEtext <- substr(txt, HTMLOPENTITLETAGposStart  + HTMLOPENTITLETAGlength , HTMLCLOSETITLETAGposStart - 1 )
            }
    ...

            if ( keepHTMLTITLEtext == TRUE ) {
        # add a row of the HTMLTITLEtext values
        all.data <- rbind(all.data, rep( HTMLTITLEtext, ncol(all.data) ) )
        # to the new 'added row' name it "HTMLTITLEtext"
        rownames(all.data)[nrow(all.data)] <- "HTMLTITLEtext"
    }
    return(all.data)
} else {
            if ( keepHTMLTITLEtext == TRUE ) {
        # add a row of the HTMLTITLEtext values
        all.data <- rbind(all.data, rep( HTMLTITLEtext, ncol(all.data) ) )
        # to the new 'added row' name it "HTMLTITLEtext"
        rownames(all.data)[nrow(all.data)] <- "HTMLTITLEtext"
    }
    return(all.data)
}

}

SIMPLE METHOD TO TEST

BELOW fund.data41 is actually fund.data function of
https://github.com/AndreMikulec/SIT/blob/hotfix/hotfix-vector_null_HTMLTitle/R/fundamental.data.r
AndreMikulec@0a14fea

library(SIT)

fund.data41 <- function
(
Symbol, # ticker
n=10, # number of periods
mode=c('quarterly','annual'), # periodicity
max.attempts=5, # maximum number of attempts to download before exiting
keepHTMLTITLEtext = FALSE # last row includes HTML TITLE text
)
{
all.data = c()
option.value = -1

start_date = spl('istart_date,start_date')
    names(start_date) = spl('quarterly,annual')

repeat {
    # download Quarterly Financial Report data
    if(option.value >= 0) {
        url = paste('http://uk.advfn.com/p.php?pid=financials&symbol=', Symbol, '&btn=', mode[1], '_reports&', start_date[mode[1]], '=', option.value, sep = '')    
    } else {
        url = paste('http://uk.advfn.com/p.php?pid=financials&symbol=', Symbol, '&btn=', mode[1], '_reports', sep = '')
    }

    cat('Downloading', url, '\n')

    #txt = join(readLines(url))     
    for(iattempt in 1:max.attempts) { 
        flag = T
        tryCatch({
            txt = join(readLines(url))
        }, interrupt = function(ex) {
            flag <<-  F
            Sys.sleep(0.1)
        }, error = function(ex) {
            flag <<-  F
            Sys.sleep(0.1)
        }, finally = {
            if(flag) break
        })
    }

if( len(grep('INDICATORS', txt, ignore.case = T)) == 0 ) {
cat('No Data Found for', Symbol, '\n')
return(all.data)
}

    if ( keepHTMLTITLEtext == TRUE ) {
        # extract title from this page
        HTMLOPENTITLETAGposStart    <- regexpr(pattern="<title>", txt,ignore.case=TRUE)[1]
        HTMLCLOSETITLETAGposStart   <- regexpr(pattern="</title>",txt,ignore.case=TRUE)[1]
        HTMLOPENTITLETAGlength      <- nchar("<title>")
        HTMLTITLEtext <- substr(txt, HTMLOPENTITLETAGposStart  + HTMLOPENTITLETAGlength , HTMLCLOSETITLETAGposStart - 1 )
            }

    # extract table from this page
    data = extract.table.from.webpage(txt, 'INDICATORS', hasHeader = T)
        colnames(data) = data[1,]
        rownames(data) = data[,1]
        data = data[,-1,drop=F]

    # only add not already present data
    add.index = which( is.na(match( colnames(data), colnames(all.data) )) )         
    all.data = cbind(data[,add.index,drop=F], all.data)

    # check if it is time to stop
    if(ncol(all.data) >= n) break
    if(option.value == 0)  break

    # extract option value to go to the next page
    temp = gsub(pattern = '<option', replacement = '<tr>', txt, perl = TRUE)
    temp = gsub(pattern = '</option>', replacement = '</tr>', temp, perl = TRUE)    
    temp = extract.table.from.webpage(temp, 'All amounts', hasHeader = T)

    temp = apply(temp,1,join)
    index.selected = grep('selected', temp)
    option.value = 0
    if( len(index.selected) )
        option.value = as.double( gsub('.*value=\'([0-9]*).*', '\\1', temp[index.selected]) ) 

    if(option.value > 0) {
        # can only get 5 time periods at a time
        option.value = option.value - 5
        option.value = max(0, option.value)     
    } else {
        break
    }
}

# remove empty columns
all.data = all.data[, colSums(nchar(trim(all.data))) > 0]
# if converted to a vector, then make it a matrix again
if (is.vector(all.data)) {
    all.data.temp <- matrix(all.data,nrow=length(all.data))
    rownames(all.data.temp) <- names(all.data)
    colnames(all.data.temp) <- all.data.temp[1, ]
    all.data <- all.data.temp
    all.data.temp <- matrix(nrow=0, ncol=0)
}

if( ncol(all.data) > n ) {  
    all.data <- all.data[, (ncol(all.data) - n + 1):ncol(all.data)]
    # if converted to a vector, then make it a matrix again
    if (is.vector(all.data)) {
        all.data.temp <- matrix(all.data,nrow=length(all.data))
        rownames(all.data.temp) <- names(all.data)
        colnames(all.data.temp) <- all.data.temp[1, ]
        all.data <- all.data.temp
        all.data.temp <- matrix(nrow=0, ncol=0)
    }
            if ( keepHTMLTITLEtext == TRUE ) {
        # add a row of the HTMLTITLEtext values
        all.data <- rbind(all.data, rep( HTMLTITLEtext, ncol(all.data) ) )
        # to the new 'added row' name it "HTMLTITLEtext"
        rownames(all.data)[nrow(all.data)] <- "HTMLTITLEtext"
    }
    return(all.data)
} else {
            if ( keepHTMLTITLEtext == TRUE ) {
        # add a row of the HTMLTITLEtext values
        all.data <- rbind(all.data, rep( HTMLTITLEtext, ncol(all.data) ) )
        # to the new 'added row' name it "HTMLTITLEtext"
        rownames(all.data)[nrow(all.data)] <- "HTMLTITLEtext"
    }
    return(all.data)
}

}


library(SIT)

BLUE: CONTROL ( NEVER DID 'NOT WORK' )

outp <- fund.data41("THM", 10 )

is.matrix(outp)
[1] TRUE
rownames(outp)[nrow(outp)];outp[nrow(outp),]
[1] "auditors report"
2002/09 2002/12 2003/03 2003/06 2003/09 2003/12 2004/03 2004/06 2004/09
"" "" "" "" "" "" "" "" ""

RED: A PROBLEM ( FIXED )

outp <- fund.data41("AAOI", 10)

is.matrix(outp)
[1] TRUE
rownames(outp)[nrow(outp)];outp[nrow(outp),]
[1] "% of leverage-to-industry"
[1] "0.0"

NEVER A PROBLEM

outp <- fund.data41("THM", 2)

is.matrix(outp)
[1] TRUE
rownames(outp)[nrow(outp)];outp[nrow(outp),]
[1] "auditors report"
2004/06 2004/09
"" ""

A PROBLEM ( FIXED )

outp <- fund.data41("THM", 1)

is.matrix(outp)
[1] TRUE
rownames(outp)[nrow(outp)];outp[nrow(outp),]
[1] "auditors report"
[1] ""


library(SIT)

BLUE: CONTROL ( NEVER DID 'NOT WORK' )

outp <- fund.data41("THM", 10, keepHTMLTITLEtext = TRUE)

is.matrix(outp)
[1] TRUE
rownames(outp)[nrow(outp)];outp[nrow(outp),]
[1] "HTMLTITLEtext"
2002/09
"Thomas Equipment Company Financial Information"
2002/12
"Thomas Equipment Company Financial Information"
2003/03
"Thomas Equipment Company Financial Information"
2003/06
"Thomas Equipment Company Financial Information"
2003/09
"Thomas Equipment Company Financial Information"
2003/12
"Thomas Equipment Company Financial Information"
2004/03
"Thomas Equipment Company Financial Information"
2004/06
"Thomas Equipment Company Financial Information"
2004/09
"Thomas Equipment Company Financial Information"

RED: A PROBLEM ( FIXED )

outp <- fund.data41("AAOI", 10, keepHTMLTITLEtext = TRUE)

is.matrix(outp)
[1] TRUE
rownames(outp)[nrow(outp)];outp[nrow(outp),]
[1] "HTMLTITLEtext"
[1] "APPLIED OPTOELECTRONICS, INC. Company Financial Information"

NEVER A PROBLEM

outp <- fund.data41("THM", 2, keepHTMLTITLEtext = TRUE)

is.matrix(outp)
[1] TRUE
rownames(outp)[nrow(outp)];outp[nrow(outp),]
[1] "HTMLTITLEtext"
2004/06
"Thomas Equipment Company Financial Information"
2004/09
"Thomas Equipment Company Financial Information"

A PROBLEM ( FIXED )

outp <- fund.data41("THM", 1, keepHTMLTITLEtext = TRUE)

is.matrix(outp)
[1] TRUE
rownames(outp)[nrow(outp)];outp[nrow(outp),]
[1] "HTMLTITLEtext"
[1] "Thomas Equipment Company Financial Information"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant