-
Notifications
You must be signed in to change notification settings - Fork 0
Pixel Notebook
This wiki aims to briefly explain the functions and or requirements for each cell of the Pixel Notebook. Note that the cell numbers in this wiki must not be confused with the execution count (in brackets left of a cell) which indicates the cell's position in the execution order.
This cell imports all the necessary Python libraries required for the notebook to function. These libraries support various operations like data manipulation, data visualization, clustering, and interactive widgets.
-
Standard Libraries:
codecs
,os
for basic Python operations. -
Data Manipulation:
pandas
(aspd
) andnumpy
(asnp
) for data manipulation and mathematical operations. -
Data Visualization:
matplotlib
(asmpl
),matplotlib.pyplot
(asplt
), andseaborn
(assns
) for plotting and data visualization. -
Interactive Widgets:
ipywidgets
(aswidgets
),IPython.display
for creating interactive user interfaces. -
Clustering:
KMeans
fromsklearn.cluster
for machine learning clustering tasks. -
Data Preprocessing:
RobustScaler
,MinMaxScaler
fromsklearn.preprocessing
for data scaling.
-
%matplotlib widget
: This magic command enables the interactive Matplotlib backend for Jupyter notebooks.
- Warning messages are optionally suppressed using the
warnings
library to make the notebook output cleaner. - The
from io import StringIO
line imports theStringIO
class for reading and writing strings as file streams.
Initializes widgets for entering the working directory path and for selecting a CSV file from that directory. It also sets up a function to update the list of CSV files based on the given path.
-
image_path
: Text widget for entering the working directory path.- Type: Text
- Placeholder: "Enter your working directory path"
-
csv_select
: Radio buttons to select the CSV file containing single-cell data.- Type: Radio Buttons
- Options: Populated based on the files in the provided directory path.
Updates the list of available CSV files in the csv_select
widget based on the directory path provided in image_path
.
Displays the image_path
and csv_select
widgets side by side in an HBox layout.
The widgets are styled using a custom CSS style defined in the variable style
.
Reads the selected single-cell CSV data into a pandas DataFrame and drops any columns with missing values.
-
PATH
: Stores the value of the working directory path fromimage_path
. -
PX_DATA
: Stores the name of the selected CSV file fromcsv_select
. -
file_path
: CombinesPATH
andPX_DATA
to form the complete file path. -
image_df
: DataFrame holding the loaded and cleaned single-cell data.
The cell doesn't have a visual output but it updates the image_df
DataFrame with the loaded data.
Columns with missing or NaN
values are dropped from image_df
using dropna(axis=1)
.
Sets up widgets for entering the number of clusters, selecting channels for clustering, and choosing the scaling method for data standardization.
-
number_clusters
: Text box for entering the desired number of clusters.- Type: Text
- Placeholder: "Desired number of clusters"
-
channels_cluster
: Multiple selection box for choosing which channels to use for clustering.- Type: Multiple Select
-
Options: Columns of the
image_df
DataFrame
-
scaler_select
: Radio buttons to choose the scaling method.- Type: Radio Buttons
- Options: 'RobustScaler', 'MinMaxScaler'
Displays the number_clusters
, channels_cluster
, and scaler_select
widgets horizontally.
For more information on data scaling methods, consult the Scikit-learn documentation and here.
Performs KMeans clustering on the selected channels after scaling the data using the selected scaling method. The clustering labels are then added to the image_df
DataFrame.
-
cluster_df
: A subset ofimage_df
containing only the selected channels. -
cluster_std
: The scaled version ofcluster_df
. -
kmeans
: KMeans object from scikit-learn.
-
Scaling: Uses either
RobustScaler
orMinMaxScaler
based on user selection. -
KMeans Clustering: Uses the scikit-learn
KMeans
algorithm.
Prints a message indicating that clustering has been completed.
If only one channel is selected, cluster_df
is reshaped to be 2D, as required by scikit-learn's scaling and clustering functions.
Sets up widgets for entering the dimensions of the image, specifically its width and height in pixels.
-
img_width
: Text box for entering the image width in pixels.- Type: Text
- Placeholder: "Enter your image width in pixels"
-
img_height
: Text box for entering the image height in pixels.- Type: Text
- Placeholder: "Enter your image height in pixels"
Displays the img_width
and img_height
widgets horizontally.
Reshapes the clustering labels to match the original image dimensions and then displays this image.
-
labels
: Array containing the KMeans cluster labels extracted fromimage_df
. -
image
: 2D array formed by reshaping thelabels
array, representing the clustered image.
Displays the clustered image in a 10x10 figure.
- The
labels
array is reshaped based on the user-provided image dimensions (img_width
andimg_height
). - The Matplotlib parameter
savefig.pad_inches
is set to 0 to remove padding around the saved figure.
Contains code for saving the generated clustered image as a PNG file.
No output unless the code is uncommented and executed.
To save the clustered image, uncomment the line and execute the cell.
Calculates and prints the average, minimum, and maximum pixel values for each selected channel in each cluster. This information is useful for understanding the distribution of pixel values within each cluster.
- Calculates the mean, minimum, and maximum pixel values for a given channel in a specified cluster.
- Returns a string containing these statistics.
-
info_strings
: List to store all the generated information strings for each cluster and channel.
Prints the statistics for each channel in each cluster.
This cell uses the labels
array generated from previous cells and iterates over the range of unique cluster labels. For each cluster and channel, it calls the generate_info
function to compute and print the statistics.
Contains code for saving the calculated statistics to a text file.
No output unless the code is uncommented and executed.
To save the statistics, uncomment the lines and execute the cell. The statistics will be saved in a text file named kmeans_cluster_statistics.txt
in the specified path.
Initiates widgets for selecting the channels and the type of correlation method (Spearman or Pearson) for which the correlation matrix will be computed.
-
correlation_channels
: Allows multiple selection of channels for which to calculate the correlation coefficients.- Type: SelectMultiple (Options are DataFrame columns)
-
correlation_select
: Allows selection of the correlation method to be used (Spearman or Pearson).- Type: RadioButtons
Displays the widgets for channel and correlation method selection.
The selected channels and correlation method will be used in the next cell to plot the correlation heatmap.
Generates a heatmap to visualize the correlations between the selected channels. The heatmap provides insight into how the channels are related to each other.
-
plot_correlation_heatmap(df, channels, method)
:- Plots a correlation heatmap for the given DataFrame, channels, and correlation method.
-
transformed_data
: Transforms the original data by applying the square root. -
scaled_data
: Scales the transformed data using the previously selected scaler.
Displays the correlation heatmap.
The cell uses the square root transformation and scaling to prepare the data for correlation analysis.
Contains code for saving the generated correlation heatmap to a PNG file.
No output unless the code is uncommented and executed.
To save the heatmap, uncomment the lines and execute the cell. The heatmap will be saved as a PNG file in the specified path.