Skip to content

vinhle169/ProCodesAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ProCodesAnalysis

This is the repository for the development of ProCodes image analysis.
Specifically performing segmentations on neurons via colorization using a U-Net a type of machine learning model.
Ex:

Example Inputs Model Architecture
U-Net Architecture

Before using any python files make sure everything in requirements.txt is installed:
pip install -r requirements.txt or conda install --file requirements.txt

Also ensure that you have the following ready:

  • A directory with images as .pt files, with at least two different folders one for training and one for ground truth. Currently only 3 channel images are supported but as for image height/width sizes can be anything, but need to remain consistent within the same folder. Dimensions must be [3 x Width x Height] for training and ground truth.
  • A directory to store the model checkpoints
  • CUDA-ready GPU, although not 100% necessary training w/ GPU already takes hours upon hours. Without will be endless.

utils.py

Contains a multitude of helper functions for various tasks. Feel free to import and use for anything that's also not in the project. import utils

  • otsu_threshold_channelwise, Will perform otsu thresholding on an input image able to be scaled by a threshold factor
  • random_channel_pop, Picks a random channel to remove from image, and returns the new image and the channel separately
  • load_tif, Loads a tif file from a given path and 1-max normalizes.
  • load_codebook, With a path loads in a csv file as a pandas database
  • normalize_array and normalize_array_t, Normalizes a numpy array or a torch tensor respectively
  • display_codebook, Displays a codebook as a heatmap on matplotlib.pyplot given a path to a csv
  • fixed_blobs, With a set of parameters will create a sparse fixed 2-D array Fixed Grid
  • color_blobs, Given an image return the same image and a new image where it is just n blobs per channel colored in
  • plot_loss, With an array of losses, plot Loss vs Epochs and save it
  • matching_pursuit, Given an image, and codebook values, and iterations, compute matching pursuit to segment an image
  • three_channel_grayscale, given two imgs as numpy arrays(width,height,channel), an image and a mask, return a grayscale image with but with coloring given a mask. 3 channel grayscale
  • classification_accuracy, with image, label, and mask of all non-zero pixels of label, compute mean top-1 classification accuracy over all non-zero pixels
  • preprocess_and_create_3cg_data, given a path to original data and a path to where output data should be, will create a new dataset with 3 channel grayscale and blob properties.
  • preprocess_main, wrapper function for the function above, helps create directories if needed.
  • connected_components, grabs the connected components from an image.
  • random_pixel_data, arguments it takes: path to original data, new data path, output shape of images, and minimum size of components. It then creates new dataset where in each channel, the largest component has a single pixel colored in.
  • remove_outliers, given an image(numpy), removes the outliers
  • make_plotable, makes a torch tensor, normally in the shape of [channels, w, h] into a matplotlib plotable object
  • save_img, takes a torch tensor, and then plots it and saves the image
  • channel_transform, creates a new training dataset from an existing one where the channels of all the images are permutated.
  • kaggle_hpa_renamer, renames the hpa dataset to be more manageable and includes a metadata file
  • removes_points, helper function for create_synthetic_dataset_hpa, takes in points a list of all used points and sample space all possible new points and removes illegal points from the sample_space
  • generate_x_y, also helper function for create_synthetic_dataset_hpa, generates a random positive mask
  • create_synthetic_dataset_hpa, takes path to cell bodies, path to outlines, metadata path, max amt of images, cells per channel, and image shape, to create a custom dataset using random combinations of singular cell bodies and outlines.
  • np_to_torch_img: convert a numpy array(w, h, channels) to torch (channels, w, h)
  • hpa_kaggle_transform_data: takes in all the necessary paths to transform 3 types of HPA kaggle data to make a new dataset. Similar to the 3 Channel Grayscale type of data.

evaluation.py

Has functions for evaluation and plotting.

  • Elementwise Accuracy elementwise_accuracy(img_1, img_2, ignore_zeros=False, deviation=0.001)
  • Mean Squared Error mse(img_1, img_2)
  • Structural Similarity Index ssim_err(img_1, img_2, channel_axis=None, img=False)
    • img is a boolean to decide whether the SSIM image is displayed
  • Identifiable Submatrices, class
   Finds all possible special matrices given codebook
   Special matrix in this case is a 6 x 4 matrix where:
   Condition 1:
       First 3 rows represent a smaller 3 x 4 subset where each 3 x 1 column is a different permutation
       (Can be in any order. However, the last column must always be [1 1 1])
           [[1 0 1 1]
            [0 1 1 1]
            [1 1 0 1]]
   Condition 2:
       Last 3 rows represent 3/4th of an identity matrix
           [[1 0 0 0]
            [0 1 0 0]
            [0 0 1 0]]
   self.clean_matrices is how one would access a set of matrices that are possible given a row order
    x = identifiable_submatrices('codebook.csv')
    x.clean_matrices[(0, 1, 3, 6, 5, 2)]
    ->
    [[array([[0., 1., 1., 1.], # The resulting array
             [1., 0., 1., 1.],
             [1., 1., 0., 1.],
             [1., 0., 0., 0.],
             [0., 1., 0., 0.],
             [0., 0., 1., 0.]]),
       array([10,  7,  0, 17])]] # The order of the columns

    # To get the actual row names
    [x.idx_to_row[i] for i in (0, 1, 3, 6, 5, 2)]
    ->
    ['NWS', 'VSVG', 'HSV', 'Ollas', 'S', 'FLAG']

    # To get the actual column names
    [x.idx_to_col[i] for i in [10,  7,  0, 17]]
    ->
    ['D7', 'D11', 'A2', 'F8']
  • Generating Outputs from Model generate_model_outputs(model_directory, model_list, input_img_list, img_size=(1, 3, 256, 256), output_path='', parallel=True)
  • Plot Outputs From HPA Model plot_different_outputs_HPA(filenames: list, checkpoint_path: str, data_path: str, plot_name: str ='example', dimensions: tuple = (512, 512))
  • Get Classification Accuracy for HPA Model hpa_classification_accuracy(test_files: list, checkpoint_path: str, test_path: str, cell_seg_path: str, metadata_path: str, dimensions: tuple = (512, 512))

dataset.py

Python file necessary for building the dataset to run the model through. Makes use of PyTorch dataset module.
ProCodesDataModule needs data_path(a list in the form [path to train data, path to truth data]), batch_size which is size of batches to return, and test_size which is the proportion of the dataset which is dedicated to be the test set.

unet_lightning.py

Python callable, which takes in arguments 'epochs', 'train_path', 'label_path', 'model_path', and optional arguments 'batch_size', 'gpus', and 'checkpoint'. This callable will run a trainer on the U-Net for epochs amount of epochs given the train path and label path, and will output models at the model path.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages