-
Notifications
You must be signed in to change notification settings - Fork 282
Description
Hi,
I am an educator and in my class we are trying to teach tensorflow and keras in R. I have a large dataset, for the sake of simplicity I will attach a small version.
I have been struggling with getting things to work and was able to diagnose and resolve the issue step by step but I cannot figure out the last part where fit() is used to produce final results, I continuously receive format mismatch issues and I have tried every possible diagnosis and solutions suggested by GPT to no avail.
Any helps and suggestions are appreciated.
Taxi_Trip_Data_preprocessed - test.csv
I will copy my full code here and the error I receive:
install.packages("remotes")
remotes::install_github("rstudio/reticulate", force = TRUE)
remotes::install_github(sprintf("rstudio/%s", c("tensorflow", "keras")))
reticulate::miniconda_uninstall() # start with a blank slate
reticulate::install_miniconda()
#keras::install_keras()
keras::install_keras(method = "conda", conda = "auto")
library(keras)
tensorflow::install_tensorflow(conda = "auto", envname = "r-reticulate", version = "release")
reticulate::use_condaenv(condaenv = "r-reticulate", conda = "auto", required = TRUE)
#model <- keras_model_sequential()
BigData <- read.csv('for privacy hidden the link on my personal computer!', header=TRUE)
BigData <- BigData [1:100000,]
# Cast dataframe as a matrix
# BigData$fare_category <- cut(
# BigData$fare_amount,
# breaks = c(0, 10, 30, Inf),
# labels = c(1, 2, 3),
# include.lowest = TRUE)
BigData$fare_category <- ifelse(BigData$fare_amount >= 1 & BigData$fare_amount <= 10, 1,
ifelse(BigData$fare_amount > 10 & BigData$fare_amount <= 30, 2, 3))
colnames(BigData) <- c("passenger_count", "trip_distance",
"duration", "fare_amount", "fare_cat")
str(BigData)
BigData$passenger_count <- as.numeric(BigData$passenger_count)
#BigData$fare_category <- as.numeric(as.character(BigData$fare_category))
#BigData[, 1:4] <- lapply(BigData[, 1:4], as.numeric)
#BigData[, 5] <- lapply(BigData[, 5], as.factor)
#BigData <- as.matrix(BigData)
# Remove column names
#dimnames(BigData) = NULL
# Split for train and test data
set.seed(456)
indx <- sample(2,
size=100000,
replace = TRUE,
prob = c(0.9, 0.1)) # Makes index with values 1 and 2
# Select only the feature variables
# Take rows with index = 1
x_train <- BigData[indx == 1, 1:3]
x_test <- BigData[indx == 2, 1:3]
y_test_actual <- BigData[indx == 2, 5]
y_train <- BigData[indx == 1, 5]
y_test <- BigData[indx == 2, 5]
library(keras)
model <- keras_model_sequential() %>%
layer_dense(name = "Layer1", # Unique name for the first hidden layer
units = 3,
activation = "relu",
input_shape = c(3)) %>% # Input shape with 3 features
layer_dense(name = "OutputL", # Unique name for the output layer
units = 3,
activation = "softmax") # For classification with 1 output
summary(model)
model$compile(
loss = "categorical_crossentropy",
optimizer = "adam",
metrics = list("accuracy")
)
x_train <- as.matrix(x_train)
y_train <- as.factor(y_train)
str(x_train)
str(y_train)
#x_train[, 1:3] <- lapply(x_train[, 1:3], as.numeric)
#y_test <- keras::to_categorical(y_test, 3)
#y_train <- to_categorical(as.numeric(y_train) - 1)
str(BigData)
#x_train <- as.matrix(x_train)
#x_train <- apply(x_train, 2, as.numeric)
#y_train <- keras::to_categorical(as.numeric(y_train) - 1, num_classes = 3)
# Convert features to numeric
#x_train <- apply(x_train, 2, as.numeric)
# Convert labels to integers
y_train <- as.numeric(y_train)
y_test <- as.numeric(y_test)
#y_train <- keras::to_categorical(y_train - 1, num_classes = 3)
#y_test <- keras::to_categorical(y_test - 1, num_classes = 3)
library(tensorflow)
x_train <- tf$convert_to_tensor(x_train, dtype = tf$float32)
y_train <- tf$convert_to_tensor(y_train, dtype = tf$int64) # Or tf$int64 if needed
history <- model$fit(
x = x_train,
y = y_train,
epochs = 10,
batch_size = 32,
validation_split = 0.1,
verbose = 1)
summary(x_train)
`
The problem is at model$fit and the error is:
Error in py_call_impl(callable, call_args$unnamed, call_args$named) :
<truncated> 2.0 ..... .1.0 ] to EagerTensor of dtype int64
── R Traceback ────────────────────────────────────────────────────
▆
1. └─tf$convert_to_tensor(y_train, dtype = tf$int64)
2. └─reticulate:::py_call_impl(callable, call_args$unnamed, call_args$named)
See `reticulate::py_last_error()$r_trace$full_call` for more details.
I have tried converting formats to 64 and still in the model$fit I receive error.
I have tried creating a categorical variable on fare_amount to 3 categories of low, med, high and still no progress.
I set x to matrixx and y to vector and still no progress.
I have tried forcing them to convert to numbers for X and Category for Y and still no progress.
I am lost, I have tried using both GPT and Pilot to understand my data structure, review my code and provide suggestions and I tried everything they suggested and still no success.
Unfortunately, I cannot switch to python because my students learned R for the whole semester and we cannot switch, ANN is part of the coursework and last topic in the class, so I have to make it work and be able to help them learn and run their codes smoothly as well.
Any help will be appreciated.