Asee peer logo

Towards Streamlining the Process of Building Machine Learning Models for your Artificial Intelligence Applications

Download Paper |

Conference

2024 ASEE North Central Section Conference

Location

Kalamazoo, Michigan

Publication Date

March 22, 2024

Start Date

March 22, 2024

End Date

March 23, 2024

Page Count

11

DOI

10.18260/1-2--45644

Permanent URL

https://peer.asee.org/45644

Download Count

9

Request a correction

Paper Authors

author page

Joseph George Western Michigan University

author page

Ajay Gupta Western Michigan University

author page

Alvis Fong Western Michigan University

Download Paper |

Abstract

On Amino Acid Modeling with Efficient Neural Architecture Search - An AutoML approach I. INTRODUCTION The state-of-the-art algorithms in proteomics domain for de novo sequencing and peptide analysis report high accuracy – Tran et al. claim DeepNovo achieving 97.2 to 99.5% accuracy in reconstructing mouse antibody samples [1]. However, with the rapidly expanding protein sequencing domain, new automated tools are required to scale this success, as these alogrithms require hand-tuning of parameters making it labor intensive. Neural Architecture Search technique has shown great promise in Image Classification and Object Detection domains in AutoML projects of Google and others. This paper assesses its performance in the proteomics domain by applying it to amino acid prediction. II. PEPTIDE ANALYSIS AND AMINO ACID PREDICTION De novo peptide sequencing is the pattern recognition of charge b- and y- ions of a mass spectra specimen. Manually, one can sit down and calculate the atomic mass units derived by the MS/MS spectrum. The calculations interpret the energy released from b- and y- ions which are then matched to a residual amino acid [2]. Current approaches in machine learning use manually designed architectures to learn the features of mass spectra for predicting peptides. These architectures have been trained to provide fast, efficient, and accurate coverage (see Figure 1) [1]. III. APPLICATION TO PROTEOMICS: AMINO ACID MODELING Where previous approaches used neural architecture, hand tailored to the datasets [1], we use Efficient NAS (ENAS) for the automatic generation and training of custom models on a wide variety of datasets, about 7 low resolution and 9 high resolution datasets from the PRIDE Peptidome library [12] – a proteomics repository with large, annotated datasets. These have been used in the construction and training of complex neural networks [3]–[8]. We assess ENAS’ viability to identify a spectrum’s amino acids. These acids represent key features for de novo sequencing. The features can then be used as labels to a spectrum dataset. Consequently, it can be used as input to a de novo peptide prediction algorithm. IV. RESULTS ENAS computes order of magnitude better than NAS. For comparative results with respect to DeepNovo, we used Escherichia coli [9] dataset. Without data optimization, ENAS derived a model which yielded a maximum amino acid identification rate of 70.6%. Over the course of 2 hours, a total of 2000 models were generated and two of those models had an identification rate of at least 70%. In comparison, DeepNovo reported an amino acid identification precision of 52.3% for the same Escherichia coli dataset [1]. The full paper will report extensive results from low and high resolution datasets tested in DeepNovo.

George, J., & Gupta, A., & Fong, A. (2024, March), Towards Streamlining the Process of Building Machine Learning Models for your Artificial Intelligence Applications Paper presented at 2024 ASEE North Central Section Conference, Kalamazoo, Michigan. 10.18260/1-2--45644

ASEE holds the copyright on this document. It may be read by the public free of charge. Authors may archive their work on personal websites or in institutional repositories with the following citation: © 2024 American Society for Engineering Education. Other scholars may excerpt or quote from these materials with the same citation. When excerpting or quoting from Conference Proceedings, authors should, in addition to noting the ASEE copyright, list all the original authors and their institutions and name the host city of the conference. - Last updated April 1, 2015