...
There might be some API's available to handle the task, but we are choosing hard way to do that to save time to explore them and then test them. The easiest part is to find them and replace them with the words that sounds similar to them. We need to perform the following steps in order to find and replace them with similar sounding words.
- Download the page which contains diacritics eg: http://geoiptool.com/en/?IP=192.42.43.22 contains Neuchatel with 'a' as a diacritic.
- From a UNIX machine get the dump of the page using command line tool 'xxd' and grep the word so that you get hex dump of the alphabet required eg:
Code Block xxd index.html?IP=192.42.43.22 | grep Neuch
- The output would be something like :
Code Block 0001d00: 6c64 223e 4e65 7563 68e2 7465 6c3c 2f74 ld">Neuch?tel</t
- Every two char at left represent one char at right and its starts after the colan":" eg 6c represents I and 64 represents d
- The output would be something like :
- Count the char to find the missing alphabet which "e2" in our case.
- Replace the alphabet using the pattern matching for hex by \x
Code Block if($city =~ m/\xe2/){ $city =\~ s/\xe2/a/g; }
...