Skip to content

Renaming a computed factor variable in Spark CC fails #57

@hongooi73

Description

@hongooi73
## set compute context to RxSpark
test <- data.frame(a=1:10, b=factor(letters[1:10]))
test <- dplyrXdf::copy_to_hdfs(test)

test2 <- RxXdfData("test2", fileSystem=RxHdfsFileSystem(), createCompositeSet=TRUE)

rxDataStep(test, test2, transformFunc=function(varlst) {
    varlst$bNew <- varlst$b
    varlst
})

names(test2) <- c("a", "b", "z")
names(test2)

results in

Job failed, the last 20 lines of the log are shown below:
    Error in try({ :  
      The factor variable 'bNew' exists in the local header but not the master header. 
    Traceback (most recent call last): 
      File "/usr/lib64/microsoft-r/3.3/lib64/R/library/RevoScaleR/pythonScripts/common//logScaleR.py", line 1, in <module> 
        from hdinsight_common import hdinsightlogging 
      File "/usr/local/lib/python2.7/dist-packages/hdinsight_common/hdinsightlogging.py", line 8, in <module> 
        import utilities 
      File "/usr/local/lib/python2.7/dist-packages/hdinsight_common/utilities.py", line 10, in <module> 
        import Constants 
      File "/usr/local/lib/python2.7/dist-packages/hdinsight_common/Constants.py", line 3, in <module> 
        from hdinsight_role_env import RoleEnv 
      File "/usr/lib/python2.7/dist-packages/hdinsight_role_env/RoleEnv.py", line 3, in <module> 
        from watchdog.observers import Observer   
    ImportError: No module named watchdog.observers 
    Error:  Error in try({ :  
      The factor variable 'bNew' exists in the local header but not the master header. 
      
    ======  ed1-hdi21 (Master HPA Process) has completed run at Tue Aug  8 20:37:55 2017  ====== 
    >  
    >  
For the complete log, please refer to the log file C:\Users\hongooi\AppData\Local\Temp\MRSArchive\MRSLog-c03c45e33775.log 

Error in rxuHandleClusterJobTryFailure(retObject, hpcServerJob, autoCleanup) : 
  Error completing job on cluster:
Error in try({ : 
  The factor variable 'bNew' exists in the local header but not the master header.

Interestingly, creating a variable aNew from a, and then renaming aNew, works. Renaming b in the original Xdf file, rather than the derived one, also works.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions