The data is arranged in a multi-page table, so I got the idea of writing a python script to convert this to an HTML map for easy display.
Here is the final output from the table published on 1/7/22:
UCIPD publishes the tables as a pdf, which means it will need to be parsed into a more friendly format before the data contained can be processed.
I chose to use tabula because I just needed a quick way of parsing tables from a PDF file, and it allows both a local file and a remote one to be processed easily.
Initially, I converted the pdf data parsed by tabula directly into a Pandas dataframe. This presented problems since the pdf spanned multiple pages and so there were duplicate rows for the heading of the table present on all pages of the pdf.
I solved this by converting the data to a CSV file first, then iterating through each row and deleting duplicate rows except the first, preserving the header and deleting all subsequent duplicates.
With the duplicates removed, the CSV file containing the table can now be read into a Pandas dataframe for easy management and processing.
To minimalize API calls for cost and effeciency (since each API call takes a not insignificant amount of time and potentially money), only duplicate addresses are skipped over and the geo coordinates are cached locally in JSON for future use.
Last thing before a map can be generated, the dataframe has two new colums appended to it: latitude and longitude. We now have all the information for generating a map.
Using the folium library, an HTML file is generated and pins are plotted using the new columns of coordinates.