Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug in trainer.eval and add test cases for test_console_logger #1937

Merged
merged 6 commits into from
Feb 1, 2023

Conversation

eracah
Copy link
Contributor

@eracah eracah commented Feb 1, 2023

What does this PR do?

hot fix for bug caused by ConsoleLogger due to trainer.eval (when called with an eval_dataloader specified) not adding evaluators to state.evaluators:

  1. Adds any passed in evaluators to state.evaluators and then once eval is done; removes them.
  2. Adds tests to test_console_logger for the following use cases:
    • trainer.eval(eval_dataloader=...)
    • trainer.fit(eval_dataloader=...)

Tests

  1. modified console_logger unit tests as described above
  2. local manual test:
transform = transforms.Compose([transforms.ToTensor()])
train_set = datasets.MNIST("data", train=True, download=True, transform=transform)
val_set = datasets.MNIST("data", train=False, download=True, transform=transform)
train_dataloader = DataLoader(train_set, batch_size=128)
eval_dataloader = DataLoader(val_set, batch_size=16)
eval_dataloader2 = Evaluator(label='eval2', dataloader=eval_dataloader)
eval_dataloader3 = Evaluator(label='eval3', dataloader=eval_dataloader)
model=mnist_model(num_classes=10)


trainer = Trainer(
    model=model,
    train_dataloader=train_dataloader,
    eval_dataloader=[eval_dataloader, eval_dataloader2],
    max_duration="4ba",
    optimizers=[Adam(model.parameters())],
    eval_interval='2ba',
    log_to_console=True,
    # auto_log_hparams=True,
    progress_bar=False,
    # callbacks=[SpeedMonitor(), MemoryMonitor()],
)

trainer.fit()
trainer.eval(eval_dataloader=eval_dataloader3)
trainer.fit(eval_dataloader=eval_dataloader3, reset_time=True, eval_interval='2ba')

Results:

[batch=1/4]:
         Train epoch: 0
         Train trainer/global_step: 0
         Train trainer/batch_idx: 0
         Train trainer/grad_accum: 1
         Train loss/train/total: 2.3045
         Train metrics/train/Accuracy: 0.0938
[batch=2/4]:
         Train trainer/global_step: 1
         Train trainer/batch_idx: 1
         Train trainer/grad_accum: 1
         Train loss/train/total: 2.2840
         Train metrics/train/Accuracy: 0.1406
[Eval batch=1/625] Eval on eval data
[Eval batch=63/625] Eval on eval data
[Eval batch=126/625] Eval on eval data
[Eval batch=188/625] Eval on eval data
[Eval batch=251/625] Eval on eval data
[Eval batch=313/625] Eval on eval data
[Eval batch=375/625] Eval on eval data
[Eval batch=438/625] Eval on eval data
[Eval batch=500/625] Eval on eval data
[Eval batch=563/625] Eval on eval data
[Eval batch=625/625] Eval on eval data:
         Eval CrossEntropy: 2.3021
         Eval Accuracy: 0.0958
[Eval batch=1/625] Eval on eval2 data
[Eval batch=63/625] Eval on eval2 data
[Eval batch=126/625] Eval on eval2 data
[Eval batch=188/625] Eval on eval2 data
[Eval batch=251/625] Eval on eval2 data
[Eval batch=313/625] Eval on eval2 data
[Eval batch=375/625] Eval on eval2 data
[Eval batch=438/625] Eval on eval2 data
[Eval batch=500/625] Eval on eval2 data
[Eval batch=563/625] Eval on eval2 data
[Eval batch=625/625] Eval on eval2 data:
         Eval CrossEntropy: 2.3021
         Eval Accuracy: 0.0958
[batch=3/4]:
         Train epoch: 0
         Train trainer/global_step: 2
         Train metrics/eval/CrossEntropy: 2.3021
         Train metrics/eval/Accuracy: 0.0958
         Train metrics/eval2/CrossEntropy: 2.3021
         Train metrics/eval2/Accuracy: 0.0958
         Train trainer/batch_idx: 2
         Train trainer/grad_accum: 1
         Train loss/train/total: 2.2447
         Train metrics/train/Accuracy: 0.2422
[batch=4/4]:
         Train trainer/global_step: 3
         Train trainer/batch_idx: 3
         Train trainer/grad_accum: 1
         Train loss/train/total: 2.2098
         Train metrics/train/Accuracy: 0.3125
[Eval batch=1/625] Eval on eval data
[Eval batch=63/625] Eval on eval data
[Eval batch=126/625] Eval on eval data
[Eval batch=188/625] Eval on eval data
[Eval batch=251/625] Eval on eval data
[Eval batch=313/625] Eval on eval data
[Eval batch=375/625] Eval on eval data
[Eval batch=438/625] Eval on eval data
[Eval batch=500/625] Eval on eval data
[Eval batch=563/625] Eval on eval data
[Eval batch=625/625] Eval on eval data:
         Eval CrossEntropy: 2.2971
         Eval Accuracy: 0.0958
[Eval batch=1/625] Eval on eval2 data
[Eval batch=63/625] Eval on eval2 data
[Eval batch=126/625] Eval on eval2 data
[Eval batch=188/625] Eval on eval2 data
[Eval batch=251/625] Eval on eval2 data
[Eval batch=313/625] Eval on eval2 data
[Eval batch=375/625] Eval on eval2 data
[Eval batch=438/625] Eval on eval2 data
[Eval batch=500/625] Eval on eval2 data
[Eval batch=563/625] Eval on eval2 data
[Eval batch=625/625] Eval on eval2 data:
         Eval CrossEntropy: 2.2971
         Eval Accuracy: 0.0958
[Eval batch=1/625] Eval on eval3 data
[Eval batch=63/625] Eval on eval3 data
[Eval batch=126/625] Eval on eval3 data
[Eval batch=188/625] Eval on eval3 data
[Eval batch=251/625] Eval on eval3 data
[Eval batch=313/625] Eval on eval3 data
[Eval batch=375/625] Eval on eval3 data
[Eval batch=438/625] Eval on eval3 data
[Eval batch=500/625] Eval on eval3 data
[Eval batch=563/625] Eval on eval3 data
[Eval batch=625/625] Eval on eval3 data:
         Eval CrossEntropy: 2.2971
         Eval Accuracy: 0.0958
[batch=1/4]:
         Train epoch: 0
         Train trainer/global_step: 0
         Train metrics/eval/CrossEntropy: 2.2971
         Train metrics/eval/Accuracy: 0.0958
         Train metrics/eval2/CrossEntropy: 2.2971
         Train metrics/eval2/Accuracy: 0.0958
         Train metrics/eval3/CrossEntropy: 2.2971
         Train metrics/eval3/Accuracy: 0.0958
         Train trainer/batch_idx: 0
         Train trainer/grad_accum: 1
         Train loss/train/total: 2.1587
         Train metrics/train/Accuracy: 0.2344
[batch=2/4]:
         Train trainer/global_step: 1
         Train trainer/batch_idx: 1
         Train trainer/grad_accum: 1
         Train loss/train/total: 2.1629
         Train metrics/train/Accuracy: 0.2812
[batch=3/4]:
         Train trainer/global_step: 2
         Train trainer/batch_idx: 2
         Train trainer/grad_accum: 1
         Train loss/train/total: 2.1253
         Train metrics/train/Accuracy: 0.2812
[batch=4/4]:
         Train trainer/global_step: 3
         Train trainer/batch_idx: 3
         Train trainer/grad_accum: 1
         Train loss/train/total: 2.0864
         Train metrics/train/Accuracy: 0.3516
[Eval batch=1/625] Eval on eval3 data
[Eval batch=63/625] Eval on eval3 data
[Eval batch=126/625] Eval on eval3 data
[Eval batch=188/625] Eval on eval3 data
[Eval batch=251/625] Eval on eval3 data
[Eval batch=313/625] Eval on eval3 data
[Eval batch=375/625] Eval on eval3 data
[Eval batch=438/625] Eval on eval3 data
[Eval batch=500/625] Eval on eval3 data
[Eval batch=563/625] Eval on eval3 data
[Eval batch=625/625] Eval on eval3 data:
         Eval CrossEntropy: 2.2776
         Eval Accuracy: 0.1502

What issue(s) does this change relate to?

fix CO-1732

@eracah eracah marked this pull request as ready for review February 1, 2023 20:44
@eracah eracah requested a review from dakinggg February 1, 2023 20:44
Copy link
Contributor

@dakinggg dakinggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending a final manual test

Copy link
Contributor

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test ensuring calling eval doesn't add a new evaluator to state and leave it there? I think it works, but I would prefer a unit test verifying it to test_trainer_eval.py

@dakinggg
Copy link
Contributor

dakinggg commented Feb 1, 2023

Also to verify Mihir's question, maybe we can add a test that 1) creates a trainer with an evaluator, 2) calls eval with a passed in evaluator 3) verifies that the original evaluator remains present on state.evaluators

@eracah
Copy link
Contributor Author

eracah commented Feb 1, 2023

Can we add a test ensuring calling eval doesn't add a new evaluator to state and leave it there? I think it works, but I would prefer a unit test verifying it to test_trainer_eval.py

Also to verify Mihir's question, maybe we can add a test that 1) creates a trainer with an evaluator, 2) calls eval with a passed in evaluator 3) verifies that the original evaluator remains present on state.evaluators

Sure

@eracah
Copy link
Contributor Author

eracah commented Feb 1, 2023

Ok added the unit test. We good to merge?

@mvpatel2000
Copy link
Contributor

LGTM!

@eracah
Copy link
Contributor Author

eracah commented Feb 1, 2023

LGTM!

🙌

@eracah eracah merged commit 78dab95 into mosaicml:dev Feb 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants