model$fit trouble, data formatting mismatch or other possible issues

Hi,
I am an educator and in my class we are trying to teach tensorflow and keras in R. I have a large dataset, for the sake of simplicity I will attach a small version.

I have been struggling with getting things to work and was able to diagnose and resolve the issue step by step but I cannot figure out the last part where fit() is used to produce final results, I continuously receive format mismatch issues and I have tried every possible diagnosis and solutions suggested by GPT to no avail. 

Any helps and suggestions are appreciated. 

[Taxi_Trip_Data_preprocessed - test.csv](https://github.com/user-attachments/files/19621842/Taxi_Trip_Data_preprocessed.-.test.csv)

I will copy my full code here and the error I receive:
```r
install.packages("remotes") 
remotes::install_github("rstudio/reticulate", force = TRUE) 
remotes::install_github(sprintf("rstudio/%s", c("tensorflow", "keras"))) 
reticulate::miniconda_uninstall() # start with a blank slate
reticulate::install_miniconda() 
#keras::install_keras() 
keras::install_keras(method = "conda", conda = "auto") 
library(keras) 
tensorflow::install_tensorflow(conda = "auto", envname = "r-reticulate", version = "release") 
reticulate::use_condaenv(condaenv = "r-reticulate", conda = "auto", required = TRUE)

#model <- keras_model_sequential()




BigData <- read.csv('for privacy hidden the link on my personal computer!', header=TRUE)

BigData <- BigData [1:100000,]
# Cast dataframe as a matrix

# BigData$fare_category <- cut(
#  BigData$fare_amount,
#  breaks = c(0, 10, 30, Inf),
#  labels = c(1, 2, 3),
#  include.lowest = TRUE)

BigData$fare_category <- ifelse(BigData$fare_amount >= 1 & BigData$fare_amount <= 10, 1,
                           ifelse(BigData$fare_amount > 10 & BigData$fare_amount <= 30, 2, 3))

colnames(BigData) <- c("passenger_count", "trip_distance",  
                       "duration", "fare_amount", "fare_cat")
str(BigData)

BigData$passenger_count <- as.numeric(BigData$passenger_count)
#BigData$fare_category <- as.numeric(as.character(BigData$fare_category))


#BigData[, 1:4] <- lapply(BigData[, 1:4], as.numeric)
#BigData[, 5] <- lapply(BigData[, 5], as.factor)


#BigData <- as.matrix(BigData)

# Remove column names
#dimnames(BigData) = NULL


# Split for train and test data
set.seed(456)
indx <- sample(2,
               size=100000,
               replace = TRUE,
               prob = c(0.9, 0.1)) # Makes index with values 1 and 2


# Select only the feature variables
# Take rows with index = 1
x_train <- BigData[indx == 1, 1:3]
x_test <- BigData[indx == 2, 1:3]

y_test_actual <- BigData[indx == 2, 5]

y_train <- BigData[indx == 1, 5]
y_test <- BigData[indx == 2, 5]

library(keras) 
model <- keras_model_sequential() %>%
  layer_dense(name = "Layer1",  # Unique name for the first hidden layer
              units = 3,
              activation = "relu",
              input_shape = c(3)) %>%  # Input shape with 3 features
  layer_dense(name = "OutputL",  # Unique name for the output layer
              units = 3,
              activation = "softmax")  # For classification with 1 output

summary(model)

model$compile(
  loss = "categorical_crossentropy",
  optimizer = "adam",
  metrics = list("accuracy")
)

x_train <- as.matrix(x_train)
y_train <- as.factor(y_train)


str(x_train)
str(y_train)

#x_train[, 1:3] <- lapply(x_train[, 1:3], as.numeric)
#y_test <- keras::to_categorical(y_test, 3)
#y_train <- to_categorical(as.numeric(y_train) - 1)
str(BigData)



#x_train <- as.matrix(x_train)
#x_train <- apply(x_train, 2, as.numeric)
#y_train <- keras::to_categorical(as.numeric(y_train) - 1, num_classes = 3)

# Convert features to numeric
#x_train <- apply(x_train, 2, as.numeric)

# Convert labels to integers
y_train <- as.numeric(y_train)
y_test <- as.numeric(y_test)


#y_train <- keras::to_categorical(y_train - 1, num_classes = 3)
#y_test <- keras::to_categorical(y_test - 1, num_classes = 3)


library(tensorflow)
x_train <- tf$convert_to_tensor(x_train, dtype = tf$float32)
y_train <- tf$convert_to_tensor(y_train, dtype = tf$int64)  # Or tf$int64 if needed

history <- model$fit(
  x = x_train,
  y = y_train,
  epochs = 10,
  batch_size = 32,
  validation_split = 0.1,
  verbose = 1)

summary(x_train)
`




The problem is at model$fit and the error is:
Error in py_call_impl(callable, call_args$unnamed, call_args$named) : 
<truncated> 2.0 ..... .1.0 ] to EagerTensor of dtype int64

── R Traceback ────────────────────────────────────────────────────
    ▆
 1. └─tf$convert_to_tensor(y_train, dtype = tf$int64)
 2.   └─reticulate:::py_call_impl(callable, call_args$unnamed, call_args$named)
See `reticulate::py_last_error()$r_trace$full_call` for more details.

```

I have tried converting formats to 64 and still in the model$fit I receive error.
I have tried creating a categorical variable on fare_amount to 3 categories of low, med, high and still no progress.

I set x to matrixx and y to vector and still no progress.

I have tried forcing them to convert to numbers for X and Category for Y and still no progress.

I am lost, I have tried using both GPT and Pilot to understand my data structure, review my code and provide suggestions and I tried everything they suggested and still no success.


Unfortunately, I cannot switch to python because my students learned R for the whole semester and we cannot switch, ANN is part of the coursework and last topic in the class, so I have to make it work and be able to help them learn and run their codes smoothly as well. 

Any help will be appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

model$fit trouble, data formatting mismatch or other possible issues #1503

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

model$fit trouble, data formatting mismatch or other possible issues #1503

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions