The World Wide Web Conference (WWW2009) will be held in Madrid, Spain, April 20-24, 2009. I will be co-presenting two papers at the Developer track. The first paper is co-authored with Jason Hines and entitled “Query GeoParser: A Spatial-Keyword Query Parser Using Regular Expressions”. The second paper is co-authored with Christopher Adams and entitled “Creating Your Own Web-Deployed Street Map Using Open Source Software and Free Data”. Both papers will be presented in the afternoon of Friday, April 20. The schedule, as well as a link to the proceedings, can be found on the Developer’s Track page. The paper abstracts follow.
Query GeoParser: A Spatial-Keyword Query Parser Using Regular Expressions
Abstract: There has been a growing commercial interest in local information within Geographic Information Retrieval, or GIR, systems. Local search engines enable the user to search for entities that contain both textual and spatial information, such as Web pages containing addresses or a business directory. Thus, queries to these systems may contain both spatial and textual components—spatial-keyword queries. Parsing the queries requires breaking the query into textual keywords, and identifying components of the geo-spatial description. For example, the query ‘Hotels near 1567 Argyle St, Halifax, NS’ could be parsed as having the keyword ‘Hotels’, the preposition ‘near’, the street number ‘1567’, the street name ‘Argyle’, the street suffix ‘St’, the city ‘Halifax’, and the province ‘NS’. Developing an accurate query parser is essential to providing relevant search results. Such a query parser can also be utilized in extracting geographic information from Web pages.
One approach to developing such a parser is to use regular expressions. Our Query GeoParser is a simple, but powerful, regular expression-based spatial-keyword query parser. Query GeoParser is implemented in Perl and utilizes many of Perl’s capabilities in optimizing regular expressions. By starting with regular expression building blocks for common entities such as number and streets, and combining them into larger regular expressions, we are able handle over 400 different cases while keeping the code manageable and easy to maintain. We employ the mark-and-match technique to improve the parsing efficiency. First we mark numbers, city names, and states. Following, we use matching to extract keywords and geographic entities. The advantages of our approach include manageability, performance, and easy exception handling. Drawbacks include a lack of geographic hierarchy and the inherent difficulty in dealing with misspellings. We comment on our overall experience using such a parser in a production environment, what we have learnt, and suggest possible ways to deal with the drawbacks.
Creating Your Own Web-Deployed Street Map Using Open Source Software and Free Data
Abstract: Street maps are a key element to Local Search; they make the connection between the search results, and the geography. Adding a map to your website can be easily done, using an API from a popular local search provider. However, the lists of restrictions are lengthy and customization can be costly, or impossible. It is possible to create a fully customizable web-deployed street map without sponsoring the corporate leviathans, at only the cost of your time and your server. Being able to freely style and customize your map is essential; it will distinguish your website from websites with shrink wrapped maps that everyone has seen. Using open source software adds to the level of customizability – you will not have to wait two years for the next release and then maybe get the anticipated new feature or the bug fix; you can make the change yourself. Using free data rids you of contracts, costly transactions, and hefty startup fees. As an example, we walk through creating a street map for the United States of America.
A Web-deployed street map consists of a server and a client. The server stores the map data including any custom refinements. The client requests a portion of the map and the server renders that portion and returns it to the client, which in turn displays it to the user. The map data used in this example is the Tiger/LINE data. Tiger/LINE data covers the whole of the USA. Another source of free road network data is OpenStreetMap, which is not as complete as Tiger/LINE but includes additional data such as points of interest and streets for other countries. Sometimes the original data is not formatted in a manner that attributes to a good looking, concise map. In such cases, data refinement is desired. For instance, performance and aesthetics of a map can be improved by transforming the street center lines to street polygons. For this task, we use the Python language, which has many extensions that make map data refinement easy. The rendering application employed is MapServer. MapServer allows you to specify a configuration file for your map, which consists of layers referencing geographical information, as well as the style attributes to specify how the layers are visualized. MapServer contains utilities to speed up the rendering process, and organize similar data. On the front end, we need a web-page embeddable client that can process requests for map movements, and scale changes in real time. In our experience, OpenLayers is this best tool for this task; it supports many existing protocols for requesting map tiles and is fast, customizable, and user friendly. Thus, deploying a street map service on the Web is feasible for individuals and not limited to big corporations.