Webcrawling of building-relevant information

Urban and regional planning as well as the spatial sciences require detailed information about the functional, morphological and socio-economic structure of the built environment. Building-relevant information such as the building height, number of storeys, usage, age and condition are of great interest in urban modelling as they serve as a basis for training, calibrating and validating various models for e.g. mapping populations, estimating energy demands or assessing flood risks.
Acquiring data on the level of individual buildings is time consuming and costly as this is usually realized through field observations and local knowledge. Another possibility to collect this information is the automatic analysis of user-generated data from sources like OpenStreetMap, Mapillary, WikiMapia and others. In the context of buildings, WikiMapia is a promising resource that contains building usage information for a large number of buildings. Additional geocoded Street View Data are attached by users, which offers further opportunities for image interpretation (computer vision or human-based computation approaches).

With the use of the provided Wikimapia API, the building-relevant data can be acquired in a structured format. Eventually the goal was to develop a WebCrawler tool to export the building-relevant information and save it as an output file. Doing this the main attention is on building properties like name, type, age, etc. and the download of the geocoded street view imagery available for some of the buildings. At first a connection to the Wikimapia tool is initiated by the tool. The tool also features a spatial and semantic filtering functions (e.g. via coordinates of an extent or selection of a specific type of building. The tool has been developed in Java with libraries Java.io and Java.net worked in the integrated development environment (IDE) Eclipse-Luna. An executable.jar file is available to the user, which can be initiated with a command line called java-jar wikimapia.jar. Furthermore, there is an option to add parameters to complete queries spatially and semantically. The program runs on Linux as well as Windows (tested on Windows 7).