-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
starting_point not used #1318
Comments
Hi, |
It is formatted as described in the python file I mentioned. It is the
version of FLAML on github.The problem is the program does not reset the
starting hyperparams to those in the starting_points file. It just uses the
default in DATA.
…On Sat, Jul 20, 2024 at 10:30 AM Ranuga ***@***.***> wrote:
Hi,
Check whether you are using the latest FLAML version and verify that
starting_points is correctly formatted and supported; if issues persist,
it may be something worth fixing.
—
Reply to this email directly, view it on GitHub
<#1318 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BDFVJGBG3XLGQYXC3QHFWW3ZNJYBVAVCNFSM6AAAAABK7TCEVSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBRGE3DSMZYHA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Hi @gps1938 , thank you very much for your feedback. Could you please provide a complete code snippet for reproducing the issue? |
from automi.py
starting_points: A dictionary or a str to specify the starting
hyperparameter
config for the estimators | default="static".
If str:
- if "data", use data-dependent defaults;
- if "data:path" use data-dependent defaults which are
stored at path;
- if "static", use data-independent defaults.
If dict, keys are the name of the estimators, and values
are the starting
hyperparameter configurations for the corresponding
estimators.
The value can be a single hyperparameter configuration dict
or a list
of hyperparameter configuration dicts.
In the following code example, we get starting_points from
the
`automl` object and use them in the `new_automl` object.
e.g.,
```python
from flaml import AutoML
automl = AutoML()
X_train, y_train = load_iris(return_X_y=True)
automl.fit(X_train, y_train)
starting_points = automl.best_config_per_estimator
new_automl = AutoML()
new_automl.fit(X_train, y_train,
starting_points=starting_points)This fails it does not use staring points
```
…On Tue, Aug 6, 2024 at 10:43 PM Li Jiang ***@***.***> wrote:
Hi @gps1938 <https://github.com/gps1938> , thank you very much for your
feedback. Could you please provide a complete code snippet for reproducing
the issue?
—
Reply to this email directly, view it on GitHub
<#1318 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BDFVJGHFA2WD5NHZJVTTLE3ZQGCW7AVCNFSM6AAAAABK7TCEVSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZSGUYTCMBUGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi @gps1938 , check this: from flaml import AutoML
from sklearn.datasets import load_iris
import numpy as np
def test_fit_w_starting_point(as_frame=True, n_concurrent_trials=1):
automl = AutoML()
settings = {
"max_iter": 3,
"metric": "accuracy",
"task": "classification",
"log_training_metric": True,
"n_jobs": 1,
"model_history": True,
}
X_train, y_train = load_iris(return_X_y=True, as_frame=as_frame)
if as_frame:
# test drop column
X_train.columns = range(X_train.shape[1])
X_train[X_train.shape[1]] = np.zeros(len(y_train))
automl.fit(X_train=X_train, y_train=y_train, n_concurrent_trials=n_concurrent_trials, **settings)
automl_val_accuracy = 1.0 - automl.best_loss
print("Best ML leaner:", automl.best_estimator)
print("Best hyperparmeter config:", automl.best_config)
print("Best accuracy on validation data: {0:.4g}".format(automl_val_accuracy))
print("Training duration of best run: {0:.4g} s".format(automl.best_config_train_time))
starting_points = automl.best_config_per_estimator
print("starting_points", starting_points)
print("loss of the starting_points", automl.best_loss_per_estimator)
settings_resume = {
"max_iter": 3,
"metric": "accuracy",
"task": "classification",
"log_training_metric": True,
"n_jobs": 1,
"model_history": True,
"log_type": "all",
"starting_points": starting_points,
"verbose": 5,
}
new_automl = AutoML()
new_automl.fit(X_train=X_train, y_train=y_train, **settings_resume)
new_automl_val_accuracy = 1.0 - new_automl.best_loss
print("Best ML leaner:", new_automl.best_estimator)
print("Best hyperparmeter config:", new_automl.best_config)
print("Best accuracy on validation data: {0:.4g}".format(new_automl_val_accuracy))
print("Training duration of best run: {0:.4g} s".format(new_automl.best_config_train_time))
test_fit_w_starting_point() And the outputs:
The trial 1 config |
A simpler code that recreate this issue -
Note that on smaller N (say 10) this is not reproduced. |
Hi @shlomosh , check this: import numpy as np
from flaml import AutoML
from sklearn.datasets import load_iris
N = 10
X_train, y_train = load_iris(return_X_y=True)
X_train = np.concatenate([X_train+0.1*i for i in range(N)], axis=0)
y_train = np.concatenate([y_train]*N, axis=0)
am1 = AutoML()
am1.fit(X_train, y_train, estimator_list=['lgbm'], time_budget=3, seed=11)
am2 = AutoML()
am2.fit(X_train, y_train, estimator_list=['lgbm'], time_budget=3, seed=11, starting_points=am1.best_config_per_estimator, verbose=5)
print(f"am1.best_loss: {am1.best_loss:.4f}")
print(f"am2.best_loss: {am2.best_loss:.4f}") The output:
The |
You decreased the time_budget. Here is my log (when running with 60/30 time_budget) -
|
And N=10000 (with N=10 the issue is not reproducible). To my opinion the issue happens in large data-sets since FLAML_sample_size is not included in the best_config_per_estimator dict. |
Hi @shlomosh, the starting_point is used. I don't see any issue in your output. Do you want to see |
I was expecting am1.best_loss >= am2.best_loss Given am2 warm-start starts from the best of am1 and improves (or not). Do I misunderstand this ? |
I have look at all lines containing starting_points in automl.py and Iam
not sure if this excerpt from automl.py
[starting_points: A dictionary or a str to specify the starting
hyperparameter
config for the estimators | default="data".
If str:
- if "data", use data-dependent defaults;
- if "data:path" use data-dependent defaults which are
stored at path;
- if "static", use data-independent defaults.
If dict, keys are the name of the estimators, and values
are the starting
hyperparameter configurations for the corresponding
estimators.
The value can be a single hyperparameter configuration dict
or a list
of hyperparameter configuration dicts.
In the following code example, we get starting_points from
the
`automl` object and use them in the `new_automl` object.]
is really coded in. To me, it looks like the new object uses the original
data' rather than the newly found optimized parameters.
…On Fri, Aug 23, 2024 at 8:08 AM shlomosh ***@***.***> wrote:
I was expecting am1.best_loss >= am2.best_loss
Given am2 warm-start starts from the best of am1 and improves (or not). Do
I misunderstand this ?
—
Reply to this email directly, view it on GitHub
<#1318 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BDFVJGABNJHHGDONSIYCPSDZS4Q2ZAVCNFSM6AAAAABK7TCEVSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBWHE2TOOJVGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
It's not guaranteed. |
The logs showed clearly that the starting point was used. Could you please provide detailed code and logs to explain your point? Thanks. |
I saw the logs. If this was c++ code I would debug and look at the object
to see if it contained the optimized params.
I am not a python expert but I think debugging would give the best answer.
When I use extensive data from a kaggle competition
it takes the optimized object the same time to solve as the original
object. If the object was using the optimized params there should be a
dramatic decrease in time to solve. That is why I think the optimized
object should be looked at in debug mode.
…On Sat, Aug 24, 2024 at 1:45 AM Li Jiang ***@***.***> wrote:
I have look at all lines containing starting_points in automl.py and Iam
not sure if this excerpt from automl.py
[starting_points: A dictionary or a str to specify the starting
hyperparameter
config for the estimators | default="data".
If str:
- if "data", use data-dependent defaults;
- if "data:path" use data-dependent defaults which are
stored at path;
- if "static", use data-independent defaults.
If dict, keys are the name of the estimators, and values
are the starting
hyperparameter configurations for the corresponding
estimators.
The value can be a single hyperparameter configuration dict
or a list
of hyperparameter configuration dicts.
In the following code example, we get starting_points from
the
`automl` object and use them in the `new_automl` object.]
is really coded in. To me, it looks like the new object uses the original
data' rather than the newly found optimized parameters.
On Fri, Aug 23, 2024 at 8:08 AM shlomosh *@*.***> wrote:
I was expecting am1.best_loss >= am2.best_loss
Given am2 warm-start starts from the best of am1 and improves (or not). Do
I misunderstand this ?
—
Reply to this email directly, view it on GitHub
#1318 (comment)
<#1318 (comment)>,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/BDFVJGABNJHHGDONSIYCPSDZS4Q2ZAVCNFSM6AAAAABK7TCEVSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBWHE2TOOJVGM
.
You are receiving this because you were mentioned.Message ID:
*@*.***>
The logs showed clearly that the starting point was used. Could you please
provide detailed code and logs to explain your point? Thanks.
—
Reply to this email directly, view it on GitHub
<#1318 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BDFVJGEBRRDYA3X5GP4IUXDZTAMZPAVCNFSM6AAAAABK7TCEVSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBYGE2DANBSHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
in automl.py
from flaml import AutoML
automl = AutoML()
X_train, y_train = Mydata
automl.fit(X_train, y_train)
starting_points = automl.best_config_per_estimator
Using this snippet, I get the same answer using my starting_points with my optimized params , that is it uses the internal default and starts retraining from scratch . My optimizized params are not being used. Looking at automl.py I cannot find any code that would incorporate starting_points params into the estimater
The text was updated successfully, but these errors were encountered: