Synthetic Data Pipeline for Active Acoustic Machine Learning
Developed during my SOAR Internship at Penn State ARL.
Overview
This project focused on developing a synthetic data pipeline for active acoustic machine learning tasks, enabling machine learning on unclassified SONAR data.
- Interfaced with Hardware-in-the-Loop (HiL) software to generate acoustic data.
- Addressed limitations in real-world data availability, particularly the scarcity of unclassified SONAR data.
- Integrated a machine learning model for regression tasks.
Technical Details
- Developed a Python application to interface with Hardware-in-the-Loop (HiL) simulation software for unmanned undersea vehicles (UUVs) over TCP.
- Designed and implemented various scenarios for UUV and target pose, as well as navigation speed.
- Automated scenario execution in the HiL simulation mode to generate theoretical acoustic data.
- Processed the simulation output into individual acoustic pulses for model training.
- Validated ML models on synthetic data for acoustic range and angle of arrival, with the hypothesis that pretraining on synthetic data improves performance when refined with real-world data through transfer learning.
Key Deliverables
- Comprehensive literature review on active acoustic machine learning.
- Programmatic scenario development across a range of preset parameters.
- Automated interface with HiL simulation software to streamline loading and execution of scenarios.
- A dataset of SONAR pulses with ground truth for range and acoustic angle of arrival.
- Machine learning regression models predicting range and acoustic angle of arrival for individual pulses.
- A final report summarizing findings, detailing pipeline usage, and providing recommendations for future work.
Project Results
- Generated a large dataset of synthetic acoustic pulses covering a wide range of scenarios.
- Successfully interfaced with the Hardware-in-the-Loop simulation in real time using a Jupyter Notebook.
- The model demonstrated strong performance in predicting range but struggled to learn meaningful information about the acoustic angle of arrival. This limitation was due to low-fidelity simulation data lacking phase offsets between the SONAR array’s hydrophones, which are critical for beamforming.
- With access to phase offsets across all SONAR array channels, the model could estimate the acoustic angle of arrival using an approach similar to beamforming algorithms. Higher fidelity simulation data would resolve this limitation.
The synthetic data pipeline serves as a proof-of-concept for future applications in active acoustic machine learning.