Recent studies show that some toddlers with Autism Spectrum Disorder (ASD) do not respond to high-pitched and exaggerated intonation speech, known as infant-directed speech (IDS). Understanding the caregiver-child speech interaction system is critical to detect early signs of ASD. Therefore, this study evaluates the potential for a signal processing pipeline with deep learning in classifying IDS and adult-directed speech (ADS) in multiple languages: English, Arabic, Spanish, and Chinese. Our pipeline classifies IDS and ADS in single-language and multi-language models using $3260$ ADS and IDS audio files. Results demonstrate that a classification model for single-language achieved accuracy between $85 \%$ and $94\%$ for all languages. The multi-language model achieved an accuracy of $93\%$ across all language datasets. Results from this technique indicate the possibility of accurately classifying ADS and IDS. This study offers opportunities to develop caregiver-child interaction systems and increase tools for early detection of ASD in toddlers.