Skip to content

Best Practice

Tasneem edited this page Jun 7, 2022 · 9 revisions

To be able to use PS to it's best and obtain a good annotated output, we share some best practices annotating digital pathology data.

1. Train DL first

Once the user has uploaded images and completed making patches, user will navigate to View Embed Page. Until this point there is not trained model and hence user should train the deep learning model first, this will be a unsupervised training. Training module in PS tool can be configured, details on this is available in Train DL section on Hyper-Parameter page.

The initial training epocs will be defined by the user or defaults to 100, system will actively check the progress and at any point if loss drops below the defined minimum_loss defined for early stop, the training will converge.

2. Embed Patches

After the model is been trained , user would Embed Patches in 2D space and clusters would be formed based on the number of classes defined in the project. And a plot will be displayed to the user, at this points

3. Initial Labelling
  • User should click on Show Patches and view how the patches are distributed over the plot.
  • Lasso over smaller homogenous spots, and these patches will be mostly of the same type and hence labelling/annotating all of them at once would be faster, user could user shortcuts (A) to select all patches in grid and user numbers (0,1,2 etc) for selecting a label and hit enter and labelling would be done.
  • In the early labelling cycles, user need not annotate every object within a lasso, and can instead select the ones with high confidence that are fast to make assertions for.
  • Post every labelling/annotating event, plot is refreshed and user will be able to filter between labelled and unlabeled to proceed further labelling.
  • With the help of Show Patches, User should skip around the embedding space and annotate, so that a good representational diversity is provided for better training and help improve separability.
4. Re-Train DL and Embed Patches

Once user has some data annotated/labelled, a re-training of the deep learning model and Embed Patches would help obtain the separation between the clusters based on the annotated information provided by the user. During the retrain of the model, parameter like prev_model_init, numepochs, num_epochs_earlystop , num_min_epochs and unlabeled_percent will be considered and the model training will converge faster. More details in Train DL section on Hyper-Parameter page.

After this once the plot will be loaded and user does a Show Patches, the patches will be more homogenously clustered and displayed. Also now the labelled data clusters will be separated out clearly. And user could also filter by Discordant and fix any annotation issue.

Now lasso on a certain homogenous regions would further help a faster annotations.

And this way a user would repeat steps 3 and 4 and have a good faster result. Note : There should be a bit of a balance, if user has 1000 labelled patches and then annotate 10 more patches, retraining and embedding is not going to improve things. A good rule of thumb is probably atleast 50% (25%) more data is needed before a retraining to get a good result.

PathcSorter Wiki

PS's Wiki is complete documentation that explains to user how to use this tool and the reasons behind. Here is the catalogue for PS's wiki page:

Home:

  1. PatchSorter Pages
  1. User Guide
  1. Frequently Asked Questions
Clone this wiki locally