Speaker
Description
The advent of next-generation radio telescopes, such as the Square Kilometre Array, promises to revolutionize radio astronomy by generating unprecedented data volumes that challenge traditional processing methods. Deep learning techniques have shown significant potential in tackling various radio analysis tasks, but their effectiveness is often hindered by the scarcity of large, balanced annotated datasets. Recent studies have addressed this limitation through self-supervised learning on unlabelled radio survey data, resulting in foundational radio vision models. These models typically require coding expertise for task adaptation, which limits their broader adoption among astronomers. A text-based interface could overcome this barrier by enabling task-specific queries through examples and customizable outputs.
In this context, Large Language Models (LLMs) have transformed scientific research and daily life with their natural language interfaces and zero-shot learning capabilities. Yet, deploying large-scale models remains resource-intensive and costly. This study investigates small-scale Vision-Language Models (VLMs) as AI assistants for radio astronomy, combining LLM capabilities with vision transformers for image processing. We fine-tuned the LLaVA VLM on a dataset of over 59,000 radio images and instruction queries, evaluating its performance on various radio benchmarks, including source morphology classification, extended source detection, and artifact identification. The resulting model demonstrates clear improvements on radio tasks compared to base models. However, the performance of pure vision models remains unmatched, underscoring the need to improve visual-textual alignment and training dataset quality. This work marks a first step in quantifying the current effectiveness of VLMs, laying a foundation for further developments in radio astronomy.