Aquaculture and Aquaculture Genetics
As the global population escalates with an expectation of 9 billion heads by the year 2050 with no increment or expansion in the world's natural resources, the adequate and sustainable supply of foods for all, has become a flaming issue (Garcia and Rosenberg, 2010). In the sense of Fish production, the capture fisheries which is having proof of commercial-scale practices even before the 1500s bared the sole responsibility (Lackey, 2005). However, due to the reasons of both increment of population and declining of natural fisheries stocks, the capture fisheries production became stagnated. Later, with the first attempt of commercial aquaculture practices emerged in Germany in the year 1733 (The Healthy fish, 2019) besides the small scale isolated and primitive fish farming practices remained, revolutionized the fish production gradually and felicitously.
In the present context, the aquaculture is affected by complex series of factors including, type of species, type of production system, the intensity of production system, water quality, temperature, feeds and feeding, health/disease management, stocking density, stress management, biosecurity measures, reproduction, harvesting, available human resources, economic concerns, legal framework, etc. throughout its production cycle (Losordo and Westerman, 1994, Pillay and Kutty, 2005, Moyo and Rapatsa, 2021).
The genetic background of aquatic organisms including both fish and shellfish (here onwards referred to as fish), interconnects with most of the above-mentioned biological factors. As it determines the desirable phenotypes of cultured fish by addressing the genetic factors along with environmental factors, its consideration is crucial for successful and profitable aquaculture practices (Lutz, 2008). Accordingly, growth rates, survival rates, muscle ratios, feed conversion ratios, breeding capacities, etc. can be listed as some direct spheres that are affected by genetic factors of an individual, population, or species of fish in culture practices (Wilkins, 1981, De Verdal et al., 2018).
Development of new aquaculture species including hybridization, production of transgenic fish, application of genomic technologies and genetic engineering with genome sequencing, genome editing (ex: CRISPR/Cas9) and gene knockouts on acquiring desired traits related to the disease resistance, stress resistance, high growth rates, etc. (Ex: AquAdvantage Salmon (AAS) (Sweet 2019), CRISPR Cas developed Oreochromis niloticus (Evans, 2018)), genetic diversity and allelic diversity by means of minisatellites, microsatellites, Single Nucleotide Polymorphisms (SNPs), Linkage mapping, Selective breeding, inbreeding and interspecific crossbreeding, Sex manipulation, gynogenesis, androgenesis and cloning, marker-assisted selection and genomic selection, polyploidy and even epigenetics and hologenomics applications have appertained with the field of aquaculture genetics (Dunham at al., 2000, Changadeya, 2003, Shen and Yue, 2019, Okoli et al., 2021).
However, limited or poorly annotated genomic data of aquatic organisms (Sundaram et al., 2017, Wargelius, 2019), retardation of identification, clarification, and confirmation of trait-related fish genes (Okoli et al., 2021), Genome duplications (Glasauer and Neuhauss, 2014), etc. remain as some constraints that are still possible to address or will be resolved in impending years.
Artificial Intelligence and Machine Learning
Artificial intelligence which is considered one of the major driving forces of the fourth industrial revolution has been defined as ‘the study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of success at some goal’ (Ongsulee, 2017) in which its implementation overlaps with various spheres of science including, computer science, mathematics, neuroscience, philosophy and psychology (Singh and Jain, 2018).
Machine learning is considered a subpart of the so-called paramount concept of Artificial intelligence, which involves the construction of algorithms that are capable of detecting/ identifying different meaningful patterns of furnished big data, without any explicit programming (Ongsulee, 2017, Aristodemou and Tietze, 2018). However, the machine learning models are usually being trained with training data for training, exercising, inculcating, and priming the models prior to use for the test data to ensure the sensitivity and accuracy of its detections and/or predictions (Shalev-Shwartz and Ben-David, 2014). Accordingly, its life cycle comprises seven steps including data gathering, preparation, wrangling, analysis, model training, testing, and deployment (Javapoint, 2021a).
Further, the classification of machine learning includes three major types namely, supervised, unsupervised, and reinforcement learning. Accordingly, supervised learning holds classification and regression algorithms including Decision Tree (DT), Random Forests (RF), Linear Regression, Logistic Regression, KNN, Support Vector Machine (SVM)) whereas unsupervised learning consists of clustering and association algorithms (Apriori algorithm, K-means). However, reinforcement learning learns based on the feedback it receives for each of its actions (Markov Decision Process) (Javapoint, 2021b, AnalyticsVidya, 2017). These different types and different algorithms are employed on various applications suitably.
In the field of genetics, these can be employed in tasks like identification of transcription initiation sites (Ohler et al., 2002), promoter sites (Bucher, 1990), splicing sites (Degroeve et al., 2002), enhancer sites (Heintzman et al., 2007) in genomic sequences, gene annotations (Picardi and Pesole, 2010), recognition of patterns in DNA sequences (i.e., RNA-seq, DNase-seq, MNase-seq, FAIRE-seq, ChIP-seq), identifying functional relationships (Libbrecht and Noble, 2015), breed identification and classification (Seo et al., 2021) and many other.
Use of Machine learning in Aquaculture genetics
Genetic algorithms, a type of stochastic algorithms that are used as adaptive search techniques (Vafaie and De Jong, 1992, Mitchell, 1995, Shapiro, 1999, Gupta and Ghafir, 2012) along with the above-mentioned algorithms has been deployed in the studies associated with the applications of machine learning in aquaculture genetics.
In the aspect of selective breeding of fish especially in line with the health management/disease-related identifications and predictions, machine learning approaches have successfully been employed. Some of the great examples include the 'Predicting for disease resistance in aquaculture species using machine learning models with the use of DT, SVM, RM, AdaBoost (adaptive boosting), and XGB (extreme gradient boosting) models to analyze the resistance of carps over Koi Herpesvirus by Palaiokostas (2021), machine learning on genomic prediction with a focal target of identifying the disease resistance of gilthead sea bream (Sparus aurata) over photobacteriosis (Bargelloni et al., 2021) and adopting machine learning algorithms for developing a cost-effective and precise method of using SNPs of Genome-Wide Association Studies (GWAS) for genomic selections with special focus on disease resistance traits using Litopenaeus vannamei, Salmon and Gilthead Sea Bream by Luo et al. (2021).
On another aspect, parasites also play a major role in determining the success and profitability of aquaculture practices. Correspondingly, the Random Forest machine learning approach on Lepeophtheirus salmonis, a parasite of salmonids has been conducted for the development of DNA markers to distinguish L. salmonis populations (Jacobs et al., 2018). Moreover, the previously mentioned Luo etc. (2021) also has addressed Vibrio parahaemolyticus infection in Litopenaeus vannamei while Lin etc. (2020) have studied the resistance on heterobrothriosis, a parasitic disease caused by Heterobothrium okamotoi, in pufferfish using machine learning procedures. Additionally, Gautam et al. (2016) have reported a prediction model based on machine learning to predict antimicrobial peptides in fish, using genomic and proteomic data.
Commercial phenotypes based on big data which are in the form of images are analyzed using machine learning models to achieve diverse demands through image analysis. Similarly, the machine learning ANNs (Artificial Neural Networks) have been deployed for obtaining scrupulous phenotypic data through this image analysis technology specifically for pearl oysters and Penaeus monodon to analyze the growth data in both and pearl quality data in pearl oysters, that alludes the individuals with good genetic composition on a concerning trait (Zenger, 2019).
Further, the gene analysis of copepod-associated Bacteriobiomes (CABs) has been conducted using machine learning models in a metanalysis-based approach to evaluate the biogeochemical properties like methanogenesis and nitrogen fixation that can also impact mostly on aquaculture practices that involved open marine waters (Sadaiappan et al., 2021).
Even though it is slightly different from the previous applications due to its consideration on residual aquatic organisms including macroinvertebrates, bacteria, etc. living in farming water than the main species which is being cultured, the approaches as follows are also could be considered under aquaculture genetics as the impact the aquaculture practices and involve with genetic materials. The application of Supervised Machine Learning (SML) using DNA metabarcoding to evaluates the impacts on aquaculture practices on communities of micro-faunal, micro-floral and the water itself has been tried, considering bacteria and ciliates as indicator organisms in a Salmon aquafarm (Fruehe et al., 2021).
Future prospective
Aquaculture has become a crucial industry and genetics play a major role in shaping the industry without a doubt. The application of machine learning approaches is recently being applied in aquaculture genetics with an increasing trend and sooner it will uplift the industry to its next level. To facilitate the route for the revolution the machine learning approaches can further be used for genomic markers modeling, fecundity predictions, and expression fingerprinting of economically important aquaculture species by addressing proteomics and transcriptomics data (Abdelrahman et al., 2017) and for genes identifications, evolutionary genetics and population genetics suitably.
Acknowledgement
This work was also partly supported and partly supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2020-0-01441, Artificial Intelligence Convergence Research Center (Chungnam National University)) and Korea Evaluation Institute of Industrial Technology (KEIT) grant funded by the Korea government (MOTIE).