AUTHOR=Maddieson Ian , Benedict Karl TITLE=Demonstrating environmental impacts on the sound structure of languages: challenges and solutions JOURNAL=Frontiers in Psychology VOLUME=Volume 14 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2023.1200463 DOI=10.3389/fpsyg.2023.1200463 ISSN=1664-1078 ABSTRACT=Recent research has suggested that there are significant associations between aspects of the phonological properties of languages and the locations in which they are spoken. This paper outlines a strategy and analytic platform for assembling climatic and environmental data in juxtaposition with curated linguistic location and structure data. Preliminary analyses suggest that previously proposed language-environment relationships are statistically valid, but may be better placed in a broader framework of language types. In this paper we discuss approaches to these challenges and describe the development of publicly shared data and tools to address them. The provision of these data and tools is a major contribution of the current project. The paper starts with a discussion of the sample of just over 1000 languages we have compiled, representing about 1/7th of "living languages" according to the categorization in the Ethnologue. The sample aims to meet multiple criteria. It includes representatives of all language families with 20 or more members in the Ethnologue listing, as well as many members of smaller families and isolates. It aims in part to reflect language density by selecting multiple languages from areas where many are spoken, mainly in tropical regions not far from the equator, but extends this sample by including languages spoken in the widest diversity of environments, including desert, high-altitude, and high latitude locations. These are regions with low language density and results in the inclusion of some closely related languages, such as varieties in the Inuit and Saami stocks in northern latitudes, or languages found in hot desert regions in north Africa or south-western South America. Inclusion of these languages is considered crucial since variables encoding altitude, temperature, vegetation type and seasonal variation have been put forward as influences on language structure, and some of these environmental variables tend to exhibit lower variance in high language density areas near the equator. This curated collection of languages is combined with a set of documented data processing and analysis scripts that automate the integration of linguistic and environmental data and provide preliminary analysis and visualization capabilities.