Finding anagrams of place names (in GB)
A little while ago Alasdair Rae asked if any one had combined an anagram engine with a list of place names.
Well, no one stepped forward so I thought it could be a fun project. And, it turns out it is quite fun though I got to think about data structures rather more than geography, but that is probably good for me.
I made the assumption that Alasdair was probably not interested in just permutations of letters but wanted actual words (such as would be used in a crossword clue). I also limited my search to single word anagrams as I can’t see a simple solution to finding multi word solutions.
First I stuffed the Ordnance Survey’s OpenNames data set into PostGIS (as who wants to be scanning hundreds of little csv files).
I then set up a GeoTool’s PostGIS datastore and grabbed the populated places.
Map<String, Object> params = new HashMap<String, Object>();
params.put(PostgisNGDataStoreFactory.DBTYPE.key, PostgisNGDataStoreFactory.DBTYPE.sample);
params.put(PostgisNGDataStoreFactory.USER.key, "username");
params.put(PostgisNGDataStoreFactory.PASSWD.key, "password");
params.put(PostgisNGDataStoreFactory.SCHEMA.key, "opennames");
params.put(PostgisNGDataStoreFactory.DATABASE.key, "osdata");
params.put(PostgisNGDataStoreFactory.HOST.key, "127.0.0.1");
params.put(PostgisNGDataStoreFactory.PORT.key, "5432");
DataStore ds = DataStoreFinder.getDataStore(params);
if (ds == null) {
throw new RuntimeException("No datastore");
}
SimpleFeatureSource fs = ds.getFeatureSource("opennames");
SimpleFeatureCollection features = fs.getFeatures(CQL.toFilter("type = 'populatedPlace'"));
I tried a naive approach of recursively finding every anagram possible from the name and looking each one up in a HashMap
of English words. Oddly, this took a
long time so I thought (and Googled) some more and came up with the much more
efficient way of sorting the letters in a word and using that as a key to all
words that contained those letters. Then I could sort each place name’s letters
and do a single lookup to find all the possible words that could be made with
those letters. That speeded things up nicely.
To build the lookup table I made use of Google’s HashMultimap
(from
Guava) which allows you to create a Map
of
Collections
keyed on a String
.
private Map<String, Collection<String>> dict;
public AnagramLookup() throws FileNotFoundException, IOException {
//change this to point to your dictionary (one word per line)
File f = new File("/usr/share/dict/british-english");
HashMultimap<String, String> indexedDictionary = HashMultimap.create();
try (BufferedReader buf = new BufferedReader(new FileReader(f))) {
String line;
// read each word in the dictionary
while ((line = buf.readLine()) != null) {
//strip out non letters
String word = line.toLowerCase().replaceAll("\\W", "");
//store the word against the sorted key
indexedDictionary.put(sort(word), word);
}
}
dict = indexedDictionary.asMap();
}
Then all that is left to do is to iterate each populated place, grab it’s name
and then remove all the non-letters and sort it’s letters and look the anagrams
in the HashMap
. The final trick is to remove the name itself if it appears in
the list of anagrams (i,e. the name itself is an English word).
try (SimpleFeatureIterator itr = features.features()) {
while (itr.hasNext()) {
SimpleFeature f = itr.next();
String name = (String) f.getAttribute("name1");
current = name.toLowerCase().replaceAll("\\W", "");
Collection<String> anagrams = getAnagrams(current);
if(anagrams!=null&&!anagrams.isEmpty()) {
//remove the name itself if it happens to be a word
anagrams.remove(current);
if(!anagrams.isEmpty()) {
results.put(name, new TreeSet<String>(anagrams));
}
}
}
}
Results
It turns out that there are 6 11 letter anagrams for the list of GB place names.
- Balnadelson - belladonnas
- Fortis Green - reforesting
- Gilling East - legislating
- Green Plains - spenglerian
- Morningside - modernising
- Sharrington - harringtons
- Stone Corner - cornerstone
A Spenglerian is “of or relating to the theory of world history developed by Oswald Spengler which holds that all major cultures undergo similar cyclical developments from birth to maturity to decay”. While a Harrington is “a man’s short lightweight jacket with a collar and a zipped front.”
Other highlights for cross word setters include Aimes Green as menageries and Westlinton as tinseltown.
I have posted the full list of anagrams and the code to generate the list.
See this follow up for world names.