Skip to content

model$fit trouble, data formatting mismatch or other possible issues #1503

@balapoura1

Description

@balapoura1

Hi,
I am an educator and in my class we are trying to teach tensorflow and keras in R. I have a large dataset, for the sake of simplicity I will attach a small version.

I have been struggling with getting things to work and was able to diagnose and resolve the issue step by step but I cannot figure out the last part where fit() is used to produce final results, I continuously receive format mismatch issues and I have tried every possible diagnosis and solutions suggested by GPT to no avail.

Any helps and suggestions are appreciated.

Taxi_Trip_Data_preprocessed - test.csv

I will copy my full code here and the error I receive:

install.packages("remotes") 
remotes::install_github("rstudio/reticulate", force = TRUE) 
remotes::install_github(sprintf("rstudio/%s", c("tensorflow", "keras"))) 
reticulate::miniconda_uninstall() # start with a blank slate
reticulate::install_miniconda() 
#keras::install_keras() 
keras::install_keras(method = "conda", conda = "auto") 
library(keras) 
tensorflow::install_tensorflow(conda = "auto", envname = "r-reticulate", version = "release") 
reticulate::use_condaenv(condaenv = "r-reticulate", conda = "auto", required = TRUE)

#model <- keras_model_sequential()




BigData <- read.csv('for privacy hidden the link on my personal computer!', header=TRUE)

BigData <- BigData [1:100000,]
# Cast dataframe as a matrix

# BigData$fare_category <- cut(
#  BigData$fare_amount,
#  breaks = c(0, 10, 30, Inf),
#  labels = c(1, 2, 3),
#  include.lowest = TRUE)

BigData$fare_category <- ifelse(BigData$fare_amount >= 1 & BigData$fare_amount <= 10, 1,
                           ifelse(BigData$fare_amount > 10 & BigData$fare_amount <= 30, 2, 3))

colnames(BigData) <- c("passenger_count", "trip_distance",  
                       "duration", "fare_amount", "fare_cat")
str(BigData)

BigData$passenger_count <- as.numeric(BigData$passenger_count)
#BigData$fare_category <- as.numeric(as.character(BigData$fare_category))


#BigData[, 1:4] <- lapply(BigData[, 1:4], as.numeric)
#BigData[, 5] <- lapply(BigData[, 5], as.factor)


#BigData <- as.matrix(BigData)

# Remove column names
#dimnames(BigData) = NULL


# Split for train and test data
set.seed(456)
indx <- sample(2,
               size=100000,
               replace = TRUE,
               prob = c(0.9, 0.1)) # Makes index with values 1 and 2


# Select only the feature variables
# Take rows with index = 1
x_train <- BigData[indx == 1, 1:3]
x_test <- BigData[indx == 2, 1:3]

y_test_actual <- BigData[indx == 2, 5]

y_train <- BigData[indx == 1, 5]
y_test <- BigData[indx == 2, 5]

library(keras) 
model <- keras_model_sequential() %>%
  layer_dense(name = "Layer1",  # Unique name for the first hidden layer
              units = 3,
              activation = "relu",
              input_shape = c(3)) %>%  # Input shape with 3 features
  layer_dense(name = "OutputL",  # Unique name for the output layer
              units = 3,
              activation = "softmax")  # For classification with 1 output

summary(model)

model$compile(
  loss = "categorical_crossentropy",
  optimizer = "adam",
  metrics = list("accuracy")
)

x_train <- as.matrix(x_train)
y_train <- as.factor(y_train)


str(x_train)
str(y_train)

#x_train[, 1:3] <- lapply(x_train[, 1:3], as.numeric)
#y_test <- keras::to_categorical(y_test, 3)
#y_train <- to_categorical(as.numeric(y_train) - 1)
str(BigData)



#x_train <- as.matrix(x_train)
#x_train <- apply(x_train, 2, as.numeric)
#y_train <- keras::to_categorical(as.numeric(y_train) - 1, num_classes = 3)

# Convert features to numeric
#x_train <- apply(x_train, 2, as.numeric)

# Convert labels to integers
y_train <- as.numeric(y_train)
y_test <- as.numeric(y_test)


#y_train <- keras::to_categorical(y_train - 1, num_classes = 3)
#y_test <- keras::to_categorical(y_test - 1, num_classes = 3)


library(tensorflow)
x_train <- tf$convert_to_tensor(x_train, dtype = tf$float32)
y_train <- tf$convert_to_tensor(y_train, dtype = tf$int64)  # Or tf$int64 if needed

history <- model$fit(
  x = x_train,
  y = y_train,
  epochs = 10,
  batch_size = 32,
  validation_split = 0.1,
  verbose = 1)

summary(x_train)
`




The problem is at model$fit and the error is:
Error in py_call_impl(callable, call_args$unnamed, call_args$named) : 
<truncated> 2.0 ..... .1.0 ] to EagerTensor of dtype int64

── R Traceback ────────────────────────────────────────────────────
    ▆
 1. └─tf$convert_to_tensor(y_train, dtype = tf$int64)
 2.   └─reticulate:::py_call_impl(callable, call_args$unnamed, call_args$named)
See `reticulate::py_last_error()$r_trace$full_call` for more details.

I have tried converting formats to 64 and still in the model$fit I receive error.
I have tried creating a categorical variable on fare_amount to 3 categories of low, med, high and still no progress.

I set x to matrixx and y to vector and still no progress.

I have tried forcing them to convert to numbers for X and Category for Y and still no progress.

I am lost, I have tried using both GPT and Pilot to understand my data structure, review my code and provide suggestions and I tried everything they suggested and still no success.

Unfortunately, I cannot switch to python because my students learned R for the whole semester and we cannot switch, ANN is part of the coursework and last topic in the class, so I have to make it work and be able to help them learn and run their codes smoothly as well.

Any help will be appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions