You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I facing some troubles to train a LightGBM Model. The model fits until a certain point and after that, somehow models stop to train. It is not indicating any kind of error, the model just stop to load and stays in the same place forever. As you can seen below: I've been using features such as: numTasks, numThreads, numBatches, useSingleDatasetMode and useBarrierExecutionMode in order to improve fit performance.
My dataset has about 418 millions lines to train and 18 millions for validation. I've been dealing of with about 21 features, 10 categorical and rest are continuous variables.
SynapseML version
1.0.4
System information
Describe the problem
Hello, folks!
I facing some troubles to train a LightGBM Model. The model fits until a certain point and after that, somehow models stop to train. It is not indicating any kind of error, the model just stop to load and stays in the same place forever. As you can seen below: I've been using features such as: numTasks, numThreads, numBatches, useSingleDatasetMode and useBarrierExecutionMode in order to improve fit performance.
My dataset has about 418 millions lines to train and 18 millions for validation. I've been dealing of with about 21 features, 10 categorical and rest are continuous variables.
DataBricks Cluster Configuration:
--- Single Node
--- 256 GB Ram Memory | 32 Cores
You guys have any idea why I'm having such issue?
Code to reproduce issue
dic_params_reg_model_0 = {'learningRate' : 0.10686341357711826 ,
'featureFraction': 0.9064118023259887,
'maxBin' : 5,
'minDataInLeaf' : 6,
'numIterations' : 53,
'numLeaves' : 147,
'lambdaL2' : 45.405492626469716,
'lambdaL1' : 0.0015480184927416942}
model_cluster_0 = LightGBMRegressor(metric = 'mae', earlyStoppingRound=1, labelCol='target',
dataTransferMode='streaming', numTasks=32, numThreads=32, validationIndicatorCol='validation_col', numBatches=500, useSingleDatasetMode=True, useBarrierExecutionMode=True
).setParams(**dic_params_reg_model_0).fit(train_0)
Other info / logs
Spark Configuration:
spark.master local[*, 8]
spark.databricks.cluster.profile singleNode
spark.driver.maxResultSize 150g
spark.jars.repositories https://mmlspark.azureedge.net/maven
What component(s) does this bug affect?
area/cognitive
: Cognitive projectarea/core
: Core projectarea/deep-learning
: DeepLearning projectarea/lightgbm
: Lightgbm projectarea/opencv
: Opencv projectarea/vw
: VW projectarea/website
: Websitearea/build
: Project build systemarea/notebooks
: Samples under notebooks folderarea/docker
: Docker usagearea/models
: models related issueWhat language(s) does this bug affect?
language/scala
: Scala source codelanguage/python
: Pyspark APIslanguage/r
: R APIslanguage/csharp
: .NET APIslanguage/new
: Proposals for new client languagesWhat integration(s) does this bug affect?
integrations/synapse
: Azure Synapse integrationsintegrations/azureml
: Azure ML integrationsintegrations/databricks
: Databricks integrationsThe text was updated successfully, but these errors were encountered: