Automatic interpretation of salmon scales using deep learning

Abstract

Determining the age structure of a fish species is important for understanding population and ecosystem dynamics and for stock assessment and management. For Atlantic salmon, age and other important biological information is collected from scale samples through manual qualitative interpretation. Reliable automatic methods are so far not widely utilised.
We use a state of the art Convolutional Neural Network (CNN) architecture called EfficientNet and a novel data set consisting of 9056 images of salmon scales to train different CNNs for four different prediction tasks. We consider two binary classification tasks regarding the origin of the fish (wild/farmed) and the spawning state (spawner/non-spawner) as well as two regression tasks predicting the number of years spent in the river and the sea. We take advantage of transfer learning by starting our training process with a CNN pre-trained on existing open-access image database ImageNet. To further test the predictive performance of our two regression CNNs, a set of 150 additional salmon scale images were analyzed for river and sea age both by the CNNs and by six human expert readers.
We find that the CNNs perform well on the two binary classification tasks and on predicting sea age, while the prediction of river age is less accurate. Estimates of river age by experts exhibit higher variance and lower levels of agreement compared to sea age, and may indicate why this task is also more difficult for the CNN. We see substantial benefit in using transfer learning. Comparing the performance of the CNN to six expert readers using standard precision measures for age reading, we confirmed the high performance of the CNN predicting sea age, well within the top of human expertise.
Automatic interpretation of scales offers a cost-efficient and effective way of investigation of fish age and life-history traits, which may further support the management of these biological resources.

salmon-scale CNN results

Comparison of different metrics for prediction of salmon scales. I have also added metric from Greenland otolith prediction for comparison. The metrics is from the validation set. Except the first line which is from Greenland Halibut and is calculated from mean of pairs of right and left otolith.

In the wild/farmed dataset there is 5427 wild salmon, and 505 (8.5%) farmed salmon. Salmon classified as something else like unknown or trout are not included in training.
In the spawning/non-spawning dataset there is 8835 non-spanwning scales and 238 spawned scales (2.6%). Note: There is spawners 422 (4.6%) but missing images for 184 of these. Therefore they are not included in the training set.

(MAPE: Mean absolute percentage error)
(MCC: mathews correlation coefficient)

Species	Predict	testLOSS	MSE	MAPE	ACC	MCC	#trained	activ. f	classWeights
Greenland Halibut(1)	age	x	2.65	0.124	0.262	x	8875	linear	x
Greenland Halibut(2)	age	-"-	2.82	0.136	0.294	x	8875	linear	x
Salmon	sea age	-"-	0.239	0.141	0.822	x	9056	linear	x
Salmon B4(12)	sea age	1.476	1.476	60.25	0.471	x	9056	linear	x
Salmon B4(13)	sea age	0.17	0.173	8.97	0.846	x	8286	linear	x
Salmon B4 v1.1.0	sea age	0.1570	0.1570	8.6405	0.8699	x	8286	linear	x
Salmon B4(14)patience20	sea age	0.158	0.158	7.88	0.863	x	8286	linear	x
Salmon B4(14)rerun(lr=0.00007)	sea age	0.158	0.158	7.1598	0.864	x	8286	linear	x
Salmon B4(14)rerun(lr=0.00007) seed=9	sea age	0.199	0.199	7.1524	0.863	x	8286	linear	x
Salmon B4(14)rerun(lr=0.00007) no weights	sea age	1.08	1.08	53.9	0.496	x	8286	linear	x
Salmon B4(15)path20batch16	sea age	x	x	x	x	x	8299	linear	x
Salmon	river age	-"-	0.431	0.252	0.585	x	6300	linear	x
Salmon B4(9)	river age	2.35	2.35	x	0.37	x	9056	linear	x
Salmon B4(11)	river age	0.359	0.359	19.58	0.618	x	6238	linear	x
Salmon B4 v1.1.0	river age	0.336	0.336	17.34	0.632	x	6238	linear	x
Salmon B4(16)patience20	river age	0.359	0.359	17.315	0.6297	x	6238	linear	x
Salmon B4(16) rerun(lr=0.00008)	river age	0.3237	0.3237	17.47	0.6371	x	6238	linear	x
Salmon B4(16) rerun(lr=0.00008) seed=9	river age	0.3884	0.3884	17.11	0.6339	x	6238	linear	x
Salmon B4(16x) rerun(lr=0.00008) no weights	river age	0.4896	0.4896	26.70	0.5347	x	6238	linear	x
Salmon missing_loss1	river & sea	9.4372	2.955	0.97	0.707	x	9056	linear	x
Salmon missing_loss2	river & sea	0.5915	2.992	0.974	0.707	x	9056	linear	x
Salmon missing_loss3	river & sea	2.0107	2.011	0.744	0.607	x	9056	linear	x
Salmon (3)	Spawned	0.113	x	x	0.964	x	9056	softmax	{0: 0.5, 1: 19}
Salmon (5)	Spawned	0.132	x	x	0.958	x	476	softmax	{0: 1, 1: 1}
Salmon (8)	spawned	0.6417	x	x	0.944	x	476	sigmoid	{0: 1, 1: 1}
Salmon (18)	spawned	x	x	x	x	x	9056	softmax	{0: 0.5, 1: 19}
Salmon (6)	Wild/farmed	0.155	x	x	0.9697	x	5917	softmax	{0: 5.87, 1: 0.54}
Salmon batch=8	Wild/farmed	0.187	x	x	0.967	x	5919	softmax	{0: 5.87, 1: 0.54}
Salmon (10)lr=0.0005	Wild/farmed	1.21	x	x	0.924	x	5919	softmax	{0: 5.87, 1: 0.54}
Salmon (4)	Wild/farmed	0.213	x	x	0.94	x	1010	softmax	{0: 1, 1: 1}
Salmon (7)	Wild/farmed	0.693	x	x	0.075	x	5919	sigmoid	{0: 5.87, 1: 0.54}
Salmon (17)	Wild/farmed	0.2057	x	x	0.96292	x	5919	softmax	{0: 5.87, 1: 0.54}

(1) is test-set
(2) is validation-set
(3) train/val/test size: 70, 15, 15 * *
- Training-set (negative example, positive example): (4861, 129)
- Validation-set (negative example, positive example): (3541 89) - 89/(3541+89)= 0.025, 1-0.25 = 0.975
(4) train/val/test size: 70, 15, 15
- val_acc: 0.9276
- class frequency: {vill:505, oppdrett:505}
- CNN: efficientNet-B4, 380x380
- Training-set (negative example, positive example) (0,1), (1,0): (3772 2579) - 3772/(3772+2579) = 3772/6351=0.59
- Validation-set (negative example, positive example)(0,1), (1,0): (809 552) - 809/(552+809)= 0.59
- test-set (negative example, positive example)(0,1), (1,0): (809 552)
missing_loss1 - missing_mse(y_true, y_pred) in https://github.com/emoen/salmon-scale/blob/master/mse_missing_values.py
missing_loss2 - missing_mse2(y_true, y_pred) in https://github.com/emoen/salmon-scale/blob/master/mse_missing_values.py
missing_loss3 - classic mse with 2 outputs
(9) regression on river age contains missing values - encoded as -1
(10) identical to (6) but with lr=.0005 instead of the usual lr=.0001
(11) identical to (9) but without scales of unknown river age. Learning rage 0.0001
(12) without unknown using patience 5, on efficientNet B4. Resolution 380x380
(14) patience 20, batch size=12, lr=0.0001, efficientNet B4, dense(2) linear, tensorboard_path='./tensorboard_salmon_sea_uten_ukjent_patience_20'
(14) sea age: checkpoints_salmon_sea_uten_ukjent_patience_20
(16) river age: NB have forgotten to set new directory: checkpoints_salmon_sea_uten_ukjent. Patience 20
- rerun: batch size=12
(16x) river age: Same as (16) but with no weights. 150 epochs, 1600 steps and batch size of 12. 150 * 1600 * 12 = 2.880.000 images looked at in 150 epochs. 6246 images augmented by rotation of 360 degrees with mirroring which results in 360 * 2 * 6246 = 4.497.120 possible images. Best epoch was in epoch 122.
(17) farmed: tensorboard_farmed_uten_ukjent_patience_20
(18) Spawned: tensorboard_spawned_uten_ukjent_patience_20

Note val_acc is 0.7068 in almost every epoch (except 2. epoch of missing_loss2 training.)

Missing_loss1/2 is same the same network - but with Dense(2, 'linear') so it predicts both sea and river age.

Farmed salmon:(4) Precision, recall and f1-score from scikit-learn: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html

	precision	recall	f1-score	support
opprett	0.95	0.93	0.94	76
vill	0.93	0.95	0.94	75
micro avg	0.94	0.94	0.94	151
macro avg	0.94	0.94	0.94	151
weighted avg	0.94	0.94	0.94	151

Confusion matrix:

class	oppdrett	vill
oppdrett	71	5
vill	4	71

Farmed salmon:(7) Precision, recall and f1-score from scikit-learn:

	precision	recall	f1-score	support
opprett	0.08	1.00	0.14	66
vill	0.00	0.00	0.00	823
accuracy			0.08	890
macro avg	0.94	0.94	0.94	890
weighted avg	0.01	0.08	0.01	890

Confusion matrix:

class	oppdrett	vill
oppdrett	67	0
vill	823	0

Farmed salmon:(6) Precision, recall and f1-score from scikit-learn:

	precision	recall	f1-score	support
opprett	0.87	0.70	0.78	67
vill	0.98	0.99	0.98	823
accuracy			0.97	890
macro avg	0.92	0.85	0.88	890
weighted avg	0.97	0.97	0.97	890

confusion matrix:

class	oppdrett	vill
oppdrett	47	20
vill	7	816

Spawner:(3) Precision, recall and f1-score from scikit-learn

	precision	recall	f1-score	support
Not spawnd	0.99	0.98	0.98	1326
spawnd	0.35	0.46	0.40	35
accuracy			0.96	1361
macro avg	0.67	0.72	0.69	1361
weighted avg	0.97	0.96	0.97	1361

confusion matrix:

class	non spawnd	spawnd
non spawnd	1296	30
spawnd	19	16

Spawner:(5) Precision, recall and f1-score from scikit-learn

	precision	recall	f1-score	support
Not spawnd	0.93	1.00	0.96	38
spawnd	1.00	0.91	0.95	33
accuracy			0.96	71
macro avg	0.96	0.95	0.96	71
weighted avg	0.97	0.96	0.96	71

confusion matrix:

class	non spawnd	spawnd
non spawnd	38	0
spawnd	3	30

Spawner:(8) Precision, recall and f1-score from scikit-learn

	precision	recall	f1-score	support
Not spawnd	0.90	1.00	0.95	38
spawnd	1.00	0.88	0.94	33
accuracy			0.94	71
macro avg	0.95	0.94	0.94	71
weighted avg	0.95	0.94	0.94	71

confusion matrix:

class	non spawnd	spawnd
non spawnd	38	0
spawnd	4	29

(17) | |precision|recall|f1-score|support| |------------|---------|------|--------|-------| |opprett |0.76 |0.75 | 0.75 | 67 | |not opprett |0.98 |0.98 | 0.98 | 823 | |accuracy | | | 0.96 | 890 | |macro avg |0.87 | 0.86 | 0.87 | 890 | |weighted avg|0.96 | 0.96 | 0.96 | 890 |

class	oppdrett	vill
oppdrett	50	17
vill	16	807

>>> df = pd.DataFrame({}, d2015.columns.values)
>>> df = df.append(d2015)
>>> df = df.append(d2016)
>>> df = df.append(d2017)
>>> df = df.append(d2018)
>>> df = df.append(d2016rb)
>>> df = df.append(d2017rb)
>>> len(df)
16601
>>> df.sjø.value_counts()
 2.0     7737
 1.0     3809
 3.0     2832
-1.0     1513
 4.0      486
 5.0      123
 6.0       59
 7.0       22
 8.0        9
 9.0        3
 11.0       1
 12.0       1
Name: sjø, dtype: int64
>>> df.smolt.value_counts()
-1.0    7923
 3.0    4900
 2.0    2937
 4.0     549
 1.0     216
 5.0      62
 6.0       8
Name: smolt, dtype: int64

Spawners in the dataset:

>>> d2015.gytarar.value_counts()
2-2-g-1                   3
2-3-g-1                   3
?-2-g-1-g                 1
2-2-g-2                   1
3-3-g-1                   1
2-2-g+                    1
?-2-g-1+                  1
3-2-g-1                   1
2-2-g-2 eller2-2-g-1-1    1
3-2-g-2                   1
Name: gytarar, dtype: int64
>>> d2015.gytarar.value_counts().sum()
14
>>> d2016.gytarar.value_counts().sum()
17
>>> d2017.gytarar.value_counts().sum()
24
>>> d2018.gytarar.value_counts().sum()
29
>>> d2016rb.gytarar.value_counts().sum()
112
>>> d2017rb.gytarar.value_counts().sum()
226
>>> 14+17+24+29+112+226
422
>>>

River age distribution:

>>> unique, counts = np.unique(all_smolt_age, return_counts=True)
>>> dict(zip(unique, counts))
{-1.0: 2827, 1.0: 195, 2.0: 2097, 3.0: 3528, 4.0: 377, 5.0: 45, 6.0: 4}

Sea age:

>>> unique, counts = np.unique(all_sea_age2, return_counts=True)
>>> dict(zip(unique, counts))
{1.0: 2323, 2.0: 4192, 3.0: 1443, 4.0: 235, 5.0: 64, 6.0: 31, 7.0: 8, 8.0: 2, 9.0: 1}

Name		Name	Last commit message	Last commit date
Latest commit History 162 Commits
image_annotations		image_annotations
manuscript		manuscript
model_weights_farmed		model_weights_farmed
model_weights_river_age		model_weights_river_age
model_weights_sea_age		model_weights_sea_age
predictions_on_v004_in_2019		predictions_on_v004_in_2019
preprocessing_data		preprocessing_data
presentation		presentation
ringlesing		ringlesing
styx		styx
test_set_predictions		test_set_predictions
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
clean_y_true.py		clean_y_true.py
ringreading_sea.py		ringreading_sea.py
ringreading_smolt.py		ringreading_smolt.py
train_oppdrett.py		train_oppdrett.py
train_river.py		train_river.py
train_sea.py		train_sea.py
train_util.py		train_util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic interpretation of salmon scales using deep learning

Abstract

salmon-scale CNN results

About

Releases 1

Packages

Languages

License

emoen/Deep-learning-for-salmon-scales

Folders and files

Latest commit

History

Repository files navigation

Automatic interpretation of salmon scales using deep learning

Abstract

salmon-scale CNN results

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages