Feature Engineering Over Unstructured Data
This study marks a significant advancement in applying traditional artificial intelligence (TAI) methodologies to unstructured data analysis, addressing a critical gap in current AI research. We introduce a novel graph-based framework that optimizes feature space for TAI algorithms, achieving unprecedented accuracy and precision across various TAI applications. Our approach employs an innovative use of an extended De Bruijn graph, significantly improving path discovery efficiency in complex data structures. This framework’s resilience, scalability, and modularity ensure its reliability and facilitate its application across diverse domains, offering considerable potential for future innovations. We empirically validated our feature extractor on TAI models, utilizing features extracted from TIMP (tissue inhibitors of matrix metalloproteinase), where it demonstrated significant accuracy improvements over randomly extracted features. Furthermore, our model successfully identified Glycine and Arginine-rich (GAR) motifs from protein sequences with high accuracy.