Deriving "Precinct" level results for Wisconsin

Deriving "Precinct" level results for Wisconsin

tl;dr: Over a few notebooks, we'll show how to join Wisconsin election results to ward(precinct) shapefiles

What we in Wisconsin call a ward is referred to as a precinct in some states or a voting district by the Census Bureau. Wards do not constitute election districts from which municipal officials are elected, and thus are not subject to the “one person, one vote” requirement which governs the formation of election districts. Instead, wards are intended to serve as administrative subunits that are aggregated into election districts of equal population. Cities, villages, and towns form municipal wards by combining whole census blocks...Once established, wards serve as the building blocks used by the legislature, counties, and cities in redistricting their respective election districts. -- Wisconsin Elections Commission

The OpenElections Project is compiling a set of standardized, precinct-level results for national and state level elections going back to 2000. It's a great project, and when finished the dataset will be very useful for journalists, academics, campaigns, and armchair political scientists.

The quality of the data OpenElections obtains varies greatly by state, and even within a state it can be an adventure - sometimes there's a nice spreadsheet in one county, but in some extreme cases volunteers have to go to County Clerk's offices and take photographs of election results and convert it by hand.

Wisconsin is on the easier end - The Wisconsin Elections Commission and its predecessor the Government Accountability Board have been good about providing statewide election results within a few weeks of an election, and in machine-readable formats, usually Excel. They're not consistent about the formatting of those Excel files but it's at least not too bad to reason about on a year-by-year basis.

Wards are the atomic base unit for most elections and all voters in a ward get the same ballot. (Well, mostly. See footnote at the bottom) Wards often have different polling places per ward, but in some cases multiple wards will vote at the same polling location. The districts built from the wards are not in a strict hierarchy: a State Assembly district made up of 50 wards might well be split by a Congressional district, with 25 wards in one district and 25 in another Congressional District. Wards are always contained in one County and in one municipality.

Because wards are built from whole Census blocks, demographic information is available for each ward.

Wards may get bigger or new wards may be created due to annexations and changes to municipal boundaries, but once created at the start of a new redistricting cycle, a ward is usually not deleted and they have stable ID numbers. The Legislative Technology Services Bureau publishes a map and shapefiles for the wards and gives each ward an equivient of a FIPS code that can be used for a database/GIS join. The maps are published twice a year to reflect municipal boundary changes. Election results need to be paired with the current shapefile for best accuracy - using a 2016 shapefile with the 2012 election results will be off on the boundaries of muncipalities, though most wards will still look the same. (One exception: when the entire municipality is deleted, the LTSB stops including the old wards for it)

Unfortunately for Wisconsin election data users, while Wisconsin votes by ward, it does not necessarily report by ward. Only cities of greater than 35,000 people are required to report each ward individually. Smaller municipalites are permitted to combine results into "reporting units" of multiple wards. All wards in a reporting unit must share the same districts, and because different elections cover districts, the reporting units may vary every election, though they're typically the same. There are usually around 3600 reporting units per election, and in the 2014 fall election there were 6,634 wards in Wisconsin.

The "reporting units" do not get any sort of geographical identifier like a FIPS code. In the reported data collected by the Elections Commission, the reporting unit is an unstructured string. However, there is a prescribed format for how clerks should name reporting units that in theory contains all of the information necessary to decide what wards are included in a reporting unit.

Sadly, the Reporting units are created by humans, and Wisconsin has 1,927 different clerks who are potentially creating reporting unit names, few of whom think about "how can a computer parse this", and not all of whom follow exactly the guidnance set out by the Elections Commission.

This notebook will first cover how to normalize reporting units. Then, we'll move on to joining GeoData. To normalize, we will use a mix of code and hand-editing to create a file that users of OpenElection data can reference, and lookup which wards (and which GeoIDs) are associated with a given row of the election results data. All of the notebooks can be found in a Github repo

We'll start with the preliminaries - bring in Pandas and set some blogging-friendly defaults

In [1]:
import pandas as pd
import re
import json
pd.options.display.max_rows = 999
pd.options.display.max_columns = 999

We'll start by bringing in the reporting units used for the Fall 2016 elections. Before we're done, we'll merge in reporting units from 2014 too - our end result will be a superset of multiple elections, but that's OK, if a reporting unit is named the same thing in different elections it will have the same wards.

In [2]:
reportingunits2016 = pd.read_excel("http://elections.wi.gov/sites/default/files/page/2016_general_election_reporting_units_xlsx_79857.xlsx")
In [3]:
reportingunits2016.head()
Out[3]:
County Muni ReportingUnit CongressionalDistrict AssemblyDistrict SenateDistrict
0 Adams County CITY OF ADAMS Ward 1-4 Congressional - District 3 Assembly - District 41 State Senate - District 14
1 Adams County CITY OF WISCONSIN DELLS Wards 5,9 Congressional - District 3 Assembly - District 41 State Senate - District 14
2 Adams County TOWN OF ADAMS Wards 1-3 Congressional - District 3 Assembly - District 41 State Senate - District 14
3 Adams County TOWN OF BIG FLATS Ward 1-2 Congressional - District 3 Assembly - District 72 State Senate - District 24
4 Adams County TOWN OF COLBURN Ward 1 Congressional - District 3 Assembly - District 72 State Senate - District 24
In [4]:
len(reportingunits2016)
Out[4]:
3638

We don't actually care about the different Districts in use here (we'll rediscover those in election results anyway.) The good clerks of Adams County above have done a nice job of following the instructions, and it looks like this might be relatively straightforward to parse with a regular expression. (You know what quote this calls for, of course)

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. --jwz

It will turn out that there are too many exceptions to reasonably parse with a regex, so we'll take the coward's way out: we'll parse a bunch with a regex, and just fix the rest by hand.

First, we'll try out our regex to try to cut the problem down to size. We'll blast over each row and then make a few new columns in our DataFrame

In [5]:
def process_reporting_unit(ward):
    x = re.search('((?:^\D+)(?P<hypen>(\d+)(?:\s*)-(?:\s*)(\d+$)))|((?:^\D+)(?P<single>(\d+$)))|((?:^\D+)(?P<amp>(\d+)(?:\s*)&(?:\s*)(\d+$)))|((?:^\D+)(?P<comma>(\d+)(?:\s*),(?:\s*)(\d+$)))', ward)
    if x is None:
        return {"type": "unmatched", "data": ward}
    elif x.group('single'):
        return {"type": "single", "data": x.group('single')}
    elif x.group('hypen'):
        return {"type": "hypen", "data": x.group('hypen')}
    elif x.group('amp'):
        return {"type": "amp", "data": x.group('amp')}
    elif x.group('comma'):
        return {"type": "comma", "data": x.group('comma')}
    else:
        raise Exception()
        pass

We'll use this regex to add a few columns to our data, using a handy trick from StackOverflow for returning multiple columns from a single pandas row

In [6]:
processed2016 = pd.concat([reportingunits2016,reportingunits2016.ReportingUnit.apply(lambda s: pd.Series(process_reporting_unit(s)))], axis=1)
In [7]:
processed2016.head()
Out[7]:
County Muni ReportingUnit CongressionalDistrict AssemblyDistrict SenateDistrict data type
0 Adams County CITY OF ADAMS Ward 1-4 Congressional - District 3 Assembly - District 41 State Senate - District 14 1-4 hypen
1 Adams County CITY OF WISCONSIN DELLS Wards 5,9 Congressional - District 3 Assembly - District 41 State Senate - District 14 5,9 comma
2 Adams County TOWN OF ADAMS Wards 1-3 Congressional - District 3 Assembly - District 41 State Senate - District 14 1-3 hypen
3 Adams County TOWN OF BIG FLATS Ward 1-2 Congressional - District 3 Assembly - District 72 State Senate - District 24 1-2 hypen
4 Adams County TOWN OF COLBURN Ward 1 Congressional - District 3 Assembly - District 72 State Senate - District 24 1 single

Let's look at a couple of examples

In [8]:
processed2016[processed2016['type']=='hypen'].head(5)
Out[8]:
County Muni ReportingUnit CongressionalDistrict AssemblyDistrict SenateDistrict data type
0 Adams County CITY OF ADAMS Ward 1-4 Congressional - District 3 Assembly - District 41 State Senate - District 14 1-4 hypen
2 Adams County TOWN OF ADAMS Wards 1-3 Congressional - District 3 Assembly - District 41 State Senate - District 14 1-3 hypen
3 Adams County TOWN OF BIG FLATS Ward 1-2 Congressional - District 3 Assembly - District 72 State Senate - District 24 1-2 hypen
5 Adams County TOWN OF DELL PRAIRIE Ward 1-3 Congressional - District 3 Assembly - District 41 State Senate - District 14 1-3 hypen
6 Adams County TOWN OF EASTON Wards 1-2 Congressional - District 3 Assembly - District 41 State Senate - District 14 1-2 hypen
In [9]:
processed2016[processed2016['type']=='comma'].head(5)
Out[9]:
County Muni ReportingUnit CongressionalDistrict AssemblyDistrict SenateDistrict data type
1 Adams County CITY OF WISCONSIN DELLS Wards 5,9 Congressional - District 3 Assembly - District 41 State Senate - District 14 5,9 comma
115 Brown County CITY OF DE PERE Wards 9,18 Congressional - District 8 Assembly - District 2 State Senate - District 1 9,18 comma
197 Brown County VILLAGE OF HOWARD Wards 1,12 Congressional - District 8 Assembly - District 89 State Senate - District 30 1,12 comma
426 Crawford County CITY OF PRAIRIE DU CHIEN Wards 2,7 Congressional - District 3 Assembly - District 96 State Senate - District 32 2,7 comma
616 Dane County CITY OF VERONA Wards 1,5 Congressional - District 2 Assembly - District 79 State Senate - District 27 1,5 comma

And overall, 157 that are something funky. That's not too bad. Let's take a look at what a few of those look like:

In [10]:
processed2016['type'].value_counts()
Out[10]:
single       2198
hypen        1236
unmatched     157
comma          47
Name: type, dtype: int64
In [11]:
processed2016[processed2016['type']=='unmatched'].head(5)
Out[11]:
County Muni ReportingUnit CongressionalDistrict AssemblyDistrict SenateDistrict data type
173 Brown County TOWN OF LEDGEVIEW Wards 1-3,8-10 Congressional - District 8 Assembly - District 88 State Senate - District 30 Wards 1-3,8-10 unmatched
198 Brown County VILLAGE OF HOWARD Wards 2,8,11 Congressional - District 8 Assembly - District 4 State Senate - District 2 Wards 2,8,11 unmatched
199 Brown County VILLAGE OF HOWARD Wards 3-4,6 Congressional - District 8 Assembly - District 89 State Senate - District 30 Wards 3-4,6 unmatched
202 Brown County VILLAGE OF HOWARD Wards 9-10,18 Congressional - District 8 Assembly - District 4 State Senate - District 2 Wards 9-10,18 unmatched
205 Brown County VILLAGE OF PULASKI Wards 1-3,6 Congressional - District 8 Assembly - District 6 State Senate - District 2 Wards 1-3,6 unmatched

Yeah, let's not try to figure all those variations out. Let's just make ourselves a nice dictionary to edit.

In [12]:
def convert(row):
    x = re.search('(^\D+)', row)
    print("\"%s\": \"%s \"," % (row, x.group(1).strip()) )
    
junk = processed2016[processed2016['type']=='unmatched'].ReportingUnit.map(lambda s: convert(s))
"Wards 1-3,8-10": "Wards ",
"Wards 2,8,11": "Wards ",
"Wards 3-4,6": "Wards ",
"Wards 9-10,18": "Wards ",
"Wards 1-3,6": "Wards ",
"Wards 7-8,10-12": "Wards ",
"Wards 5-6,10": "Wards ",
"Wards 7-9,14": "Wards ",
"Ward 3A": "Ward ",
"Ward 7A": "Ward ",
"Wards 1-4,6-7": "Wards ",
"Wards 1,9-10": "Wards ",
"Wards 2-3,5": "Wards ",
"Wards 4,6-8": "Wards ",
"Wards 1-3,6": "Wards ",
"Wards 15,18-19": "Wards ",
"Wards 1-4,9": "Wards ",
"Wards 3-4,12": "Wards ",
"Wards 1-2,4-5,7": "Wards ",
"Ward 1,3,5": "Ward ",
"Ward 2,4,6": "Ward ",
"Ward 1,3-6,15": "Ward ",
"Ward 7-10,12": "Ward ",
"Wards 14,16-17": "Wards ",
"Wards 1,5-6,11": "Wards ",
"Wards 2-4,12": "Wards ",
"Wards 1,3,5": "Wards ",
"Wards 7,12-13": "Wards ",
"Wards 1-4,7-11": "Wards ",
"Wards 1,3-7": "Wards ",
"Wards 1-6,22-24,29": "Wards ",
"Wards 7-10,18-21,25-27,30": "Wards ",
"Wards 11-17,28": "Wards ",
"Wards 1-3,7-8": "Wards ",
"Wards 4-6,9-12": "Wards ",
"Wards 13-16,20-22": "Wards ",
"Wards 17-19,30-32": "Wards ",
"Ward 9B": "Ward ",
"Ward 1A": "Ward ",
"Ward 2-3,5": "Ward ",
"Ward 1-5,10": "Ward ",
"Wards 1-3,8": "Wards ",
"Wards 5-6,9-10": "Wards ",
"Wards 1-6,8": "Wards ",
"Wards 3-4,22": "Wards ",
"Wards 17-18,21,23-26": "Wards ",
"Wards 19-20,27": "Wards ",
"Wards 12,20-21,24": "Wards ",
"Wards 1-2,6-7": "Wards ",
"Wards 1-2,4-5": "Wards ",
"Wards 1,3,5": "Wards ",
"Wards 2,4,6": "Wards ",
"Wards 1,3-5": "Wards ",
"Ward 15A": "Ward ",
"Ward 15B": "Ward ",
"Ward 22A": "Ward ",
"Ward 22B": "Ward ",
"Wards 2,8S": "Wards ",
"Ward 11S": "Ward ",
"Wards 1S,3S": "Wards ",
"Ward 1-2,5": "Ward ",
"Ward 3-4,6": "Ward ",
"Wards 13-18,20": "Wards ",
"Ward 5B": "Ward ",
"Wards 12-16,18": "Wards ",
"Wards 17,19-20": "Wards ",
"Wards 1A-3A": "Wards ",
"Wards 1B-3B": "Wards ",
"Wards 4-6,9-10": "Wards ",
"Wards 1-3,5,7-8": "Wards ",
"Wards 1,4-5,14": "Wards ",
"Wards 2,6,8,12-13": "Wards ",
"Wards 3,9-11": "Wards ",
"Wards 5,7B": "Wards ",
"Wards 6-7A": "Wards ",
"Ward 5-6,10": "Ward ",
"Wards 1-2,5": "Wards ",
"Wards 1,6-7": "Wards ",
"Ward 2A": "Ward ",
"Ward 2B": "Ward ",
"Wards 35,40,43": "Wards ",
"Wards 36,38,41": "Wards ",
"Wards 1-2,4": "Wards ",
"Ward 3S": "Ward ",
"Wards 9-10,12-13": "Wards ",
"Wards 11,14-15,17": "Wards ",
"Wards 16,18-19": "Wards ",
"Wards 1,17,20": "Wards ",
"Wards 2-4,11": "Wards ",
"Wards 9,13-14": "Wards ",
"Wards 10,12,15-16": "Wards ",
"Wards 19,21-22": "Wards ",
"Wards 1-4,6": "Wards ",
"Wards 1-3,13": "Wards ",
"Wards 4,6,14": "Wards ",
"Wards 5,7-9": "Wards ",
"Wards 1-2,9": "Wards ",
"Wards 1-4,15": "Wards ",
"Wards 1-2,7-9,11-14": "Wards ",
"Wards 3-6,10": "Wards ",
"Wards 1-4,6": "Wards ",
"Wards 6,9-10,15-17,20,23-25,28": "Wards ",
"Wards 11-14,21-22,26-27": "Wards ",
"Wards 9-10,32": "Wards ",
"Wards 11-14,28": "Wards ",
"Wards 23-24,26": "Wards ",
"Wards 1-2,8": "Wards ",
"Wards 1,8,10-11": "Wards ",
"Wards 2,5-7": "Wards ",
"Wards 3-4,9,16-17": "Wards ",
"Wards 1-5,7": "Wards ",
"Wards 1-3,13": "Wards ",
"Wards 1,3-4,10": "Wards ",
"Wards 2,5-8": "Wards ",
"Wards 1-2,5-6": "Wards ",
"Wards 1,3,5,9": "Wards ",
"Wards 2,4,10": "Wards ",
"Ward 1-3,7-9": "Ward ",
"Ward 4-6,10-11": "Ward ",
"Wards 1,7-8": "Wards ",
"Wards 2-3,9-11": "Wards ",
"Wards 1-2,4": "Wards ",
"Wards 3,6-7": "Wards ",
"Wards 5,8-9": "Wards ",
"Ward 1,8-11": "Ward ",
"Wards 7,9-11": "Wards ",
"Wards 3-4,8": "Wards ",
"Wards 1A-2,4,7": "Wards ",
"Ward 1B": "Ward ",
"Wards 3,14-15,30": "Wards ",
"Ward 5B": "Ward ",
"Wards 5A-6,8-9,23-29,31-35,38": "Wards ",
"Wards 10-13,21-22,36-37": "Wards ",
"Ward 22B": "Ward ",
"Ward 22A": "Ward ",
"Ward 23A": "Ward ",
"Ward 23B": "Ward ",
"Ward 25A": "Ward ",
"Ward 25B": "Ward ",
"Ward 28B": "Ward ",
"Ward 28A": "Ward ",
"Ward 29A": "Ward ",
"Ward 29B": "Ward ",
"Wards 1-2,7-10": "Wards ",
"Wards 1A,2-5": "Wards ",
"Ward 1B": "Ward ",
"Wards 1A-2": "Wards ",
"Ward 1B": "Ward ",
"Wards 1-2A": "Wards ",
"Wards 2B,2C": "Wards ",
"Wards 1-2,4,7": "Wards ",
"Wards 3,5-6": "Wards ",
"Wards 6,17,25-26": "Wards ",
"Wards 7,16,27": "Wards ",
"Wards 8,19,22-23": "Wards ",
"Wards 6-15,24,26-29": "Wards ",
"Wards 16-23,25": "Wards ",

Let's cut and paste this into a new cell and just fix it by hand. You won't see me do this, but this is just the output from above manually edited.

In [13]:
manual = {"Wards 1-3,8-10": "Wards 1,2,3,8,9,10",
"Wards 2,8,11": "Wards 2,8,11",
"Wards 3-4,6": "Wards 3,4,6",
"Wards 9-10,18": "Wards 9,10,18",
"Wards 1-3,6": "Wards 1,2,3,6",
"Wards 7-8,10-12": "Wards 7,8,10,11,12",
"Wards 5-6,10": "Wards 5,6,10",
"Wards 7-9,14": "Wards 7,8,9,14",
"Ward 3A": "Ward 3A",
"Ward 7A": "Ward 7A",
"Wards 1-4,6-7": "Wards 1,2,3,4,6,7",
"Wards 1,9-10": "Wards 1,9,10",
"Wards 2-3,5": "Wards 2,3,5",
"Wards 4,6-8": "Wards 4,6,7,8",
"Wards 1-3,6": "Wards 1,2,3,6",
"Wards 15,18-19": "Wards 15,18,19",
"Wards 1-4,9": "Wards 1,2,3,4,9",
"Wards 3-4,12": "Wards 3,4,12",
"Wards 1-2,4-5,7": "Wards 1,2,4,5,7",
"Ward 1,3,5": "Ward 1,3,5",
"Ward 2,4,6": "Ward 2,4,6",
"Ward 1,3-6,15": "Ward 1,3,4,5,6,15",
"Ward 7-10,12": "Ward 7,8,9,10,12",
"Wards 14,16-17": "Wards 14,16,17",
"Wards 1,5-6,11": "Wards 1,5,6,11",
"Wards 2-4,12": "Wards 2,3,4,12",
"Wards 1,3,5": "Wards 1,3,5",
"Wards 7,12-13": "Wards 7,12,13",
"Wards 1-4,7-11": "Wards 1,2,3,4,7,8,9,10,11",
"Wards 1,3-7": "Wards 1,3,4,5,6,7",
"Wards 1-6,22-24,29": "Wards 1,2,3,4,5,6,22,23,24,29",
"Wards 7-10,18-21,25-27,30": "Wards 7,8,9,10,18,19,20,21,25,26,27,30",
"Wards 11-17,28": "Wards 11,12,13,14,15,16,17,28",
"Wards 1-3,7-8": "Wards 1,2,3,7,8",
"Wards 4-6,9-12": "Wards 4,5,6,9,10,11,12",
"Wards 13-16,20-22": "Wards 13,14,15,16,20,21,22",
"Wards 17-19,30-32": "Wards 17,18,19,30,31,32",
"Ward 9B": "Ward 9B",
"Ward 1A": "Ward 1A",
"Ward 2-3,5": "Ward 2,3,5",
"Ward 1-5,10": "Ward 1,2,3,4,5,10",
"Wards 1-3,8": "Wards 1,2,3,8",
"Wards 5-6,9-10": "Wards 5,6,9,10",
"Wards 1-6,8": "Wards 1,2,3,4,5,6,8",
"Wards 3-4,22": "Wards 3,4,22",
"Wards 17-18,21,23-26": "Wards 17,18,21,23,24,25,26",
"Wards 19-20,27": "Wards 19,20,27",
"Wards 12,20-21,24": "Wards 12,20,21,24",
"Wards 1-2,6-7": "Wards 1,2,6,7",
"Wards 1-2,4-5": "Wards 1,2,4,5",
"Wards 1,3,5": "Wards 1,3,5",
"Wards 2,4,6": "Wards 2,4,6",
"Wards 1,3-5": "Wards 1,3,4,5",
"Ward 15A": "Ward 15A",
"Ward 15B": "Ward 15B",
"Ward 22A": "Ward 22A",
"Ward 22B": "Ward 22B",
"Wards 2,8S": "Wards 2,8S",
"Ward 11S": "Ward 11S",
"Wards 1S,3S": "Wards 1S,3S",
"Ward 1-2,5": "Ward 1,2,5",
"Ward 3-4,6": "Ward 2,4,6",
"Wards 13-18,20": "Wards 13,14,15,16,17,18,20",
"Ward 5B": "Ward 5B",
"Wards 12-16,18": "Wards 12,13,14,15,16,18",
"Wards 17,19-20": "Wards 17,19,20",
"Wards 1A-3A": "Wards 1A,2A,3A",
"Wards 1B-3B": "Wards 1B,2B,3B",
"Wards 4-6,9-10": "Wards 4,5,6,9,10",
"Wards 1-3,5,7-8": "Wards 1,2,3,5,7,8",
"Wards 1,4-5,14": "Wards 1,4,5,14",
"Wards 2,6,8,12-13": "Wards 2,6,8,12,13",
"Wards 3,9-11": "Wards 3,9,10,11",
"Wards 5,7B": "Wards 5,7B",
"Wards 6-7A": "Wards 6,7A",
"Ward 5-6,10": "Ward 5,6,10",
"Wards 1-2,5": "Wards 1,2,5",
"Wards 1,6-7": "Wards 1,6,7",
"Ward 2A": "Ward 2A",
"Ward 2B": "Ward 2B",
"Wards 35,40,43": "Wards 35,40,43",
"Wards 36,38,41": "Wards 36,38,41",
"Wards 1-2,4": "Wards 1,2,4",
"Ward 3S": "Ward 3S",
"Wards 9-10,12-13": "Wards 9,10,12,13",
"Wards 11,14-15,17": "Wards 11,14,15,17",
"Wards 16,18-19": "Wards 16,18,19",
"Wards 1,17,20": "Wards 1,17,20",
"Wards 2-4,11": "Wards 2,3,4,11",
"Wards 9,13-14": "Wards 9,13,14",
"Wards 10,12,15-16": "Wards 10,12,15,16",
"Wards 19,21-22": "Wards 19,21,22",
"Wards 1-4,6": "Wards 1,2,3,4,6",
"Wards 1-3,13": "Wards 1,2,3,13",
"Wards 4,6,14": "Wards 4,6,14",
"Wards 5,7-9": "Wards 5,7,8,9",
"Wards 1-2,9": "Wards 1,2,9",
"Wards 1-4,15": "Wards 1,2,3,4,15",
"Wards 1-2,7-9,11-14": "Wards 1,2,7,8,9,11,12,13,14",
"Wards 3-6,10": "Wards 3,4,5,6,10",
"Wards 1-4,6": "Wards 1,2,3,4,6",
"Wards 6,9-10,15-17,20,23-25,28": "Wards 6,9,10,15,16,17,20,23,24,25,28",
"Wards 11-14,21-22,26-27": "Wards 11,12,13,14,21,22,26,27",
"Wards 9-10,32": "Wards 9,10,32",
"Wards 11-14,28": "Wards 11,12,13,14,28",
"Wards 23-24,26": "Wards 23,24,26",
"Wards 1-2,8": "Wards 1,2,8",
"Wards 1,8,10-11": "Wards 1,8,10,11",
"Wards 2,5-7": "Wards 2,5,6,7",
"Wards 3-4,9,16-17": "Wards 3,4,9,16,17",
"Wards 1-5,7": "Wards 1,2,3,4,5,7",
"Wards 1-3,13": "Wards 1,2,3,13",
"Wards 1,3-4,10": "Wards 1,3,4,10",
"Wards 2,5-8": "Wards 2,5,6,7,8",
"Wards 1-2,5-6": "Wards 1,2,5,6",
"Wards 1,3,5,9": "Wards 1,3,5,9",
"Wards 2,4,10": "Wards 2,4,10",
"Ward 1-3,7-9": "Ward 1,2,3,7,8,9",
"Ward 4-6,10-11": "Ward 4,5,6,10,11",
"Wards 1,7-8": "Wards 1,7,8",
"Wards 2-3,9-11": "Wards 2,3,9,10,11",
"Wards 1-2,4": "Wards 1,2,4",
"Wards 3,6-7": "Wards 3,6,7",
"Wards 5,8-9": "Wards 5,8,9",
"Ward 1,8-11": "Ward 1,8,9,10,11",
"Wards 7,9-11": "Wards 7,9,10,11",
"Wards 3-4,8": "Wards 3,4,8",
"Wards 1A-2,4,7": "Wards 1A,2,4,7",
"Ward 1B": "Ward 1B",
"Wards 3,14-15,30": "Wards 3,14,15,30",
"Ward 5B": "Ward 5B",
"Wards 5A-6,8-9,23-29,31-35,38": "Wards 5A,6,8,9,22,23,24,25,26,27,28,29,31,32,33,34,35,38",
"Wards 10-13,21-22,36-37": "Wards 10,11,12,13,21,22,36,37",
"Ward 22B": "Ward 22B",
"Ward 22A": "Ward 22A",
"Ward 23A": "Ward 23A",
"Ward 23B": "Ward 23B",
"Ward 25A": "Ward 25A",
"Ward 25B": "Ward 25B",
"Ward 28B": "Ward 28B",
"Ward 28A": "Ward 28A",
"Ward 29A": "Ward 29A",
"Ward 29B": "Ward 29B",
"Wards 1-2,7-10": "Wards 1,2,7,8,9,10",
"Wards 1A,2-5": "Wards 1A,2,3,4,5",
"Ward 1B": "Ward 1B",
"Wards 1A-2": "Wards 1A,2",
"Ward 1B": "Ward 1B",
"Wards 1-2A": "Wards 1,2A",
"Wards 2B,2C": "Wards 2B,2C",
"Wards 1-2,4,7": "Wards 1,2,4,7",
"Wards 3,5-6": "Wards 3,5,6",
"Wards 6,17,25-26": "Wards 6,17,25,26",
"Wards 7,16,27": "Wards 7,16,27",
"Wards 8,19,22-23": "Wards 8,19,22,23",
"Wards 6-15,24,26-29": "Wards 6,7,8,9,10,11,12,13,14,15,24,26,27,28,29",
"Wards 16-23,25": "Wards 16,17,18,19,20,21,22,23,25"}

Now, we can just blast over the Data Frame and create a new column, merging or expanding where necessary. If we were unmatched, look in the 'wardmappings' dictionary and return what we found there. The end results will be a column named 'mapped', which has every reporting unit converted to a comma seperated list.

In [14]:
def lookup(row, ward_mappings):
    if row['type'] == 'hypen':
        search = re.search('(\d+)(?:\s*)-(?:\s*)(\d+$)', row['data'])
        if(search):
            return "Wards %s" % (",".join([str(x) for x in range(int(search.group(1)), int(search.group(2))+1)]))
    elif row['type'] == 'comma':
        search = re.search('(\d+)(?:\s*),(?:\s*)(\d+$)', row['data'])
        if(search):
            return "Wards %d,%d" % (int(search.group(1)), int(search.group(2)))
        return row['ReportingUnit']
    elif row['type'] == 'single':
        search = re.search('(\d+$)', row['data'])
        return "Ward %d" % (int(search.group(1), 10))
        return row['ReportingUnit']
    elif row['type'] == 'amp':
        search = re.search('(\d+)(?:\s*)&(?:\s*)(\d+$)', row['data'])
        if(search):
            return "Wards %d,%d" % (int(search.group(1)), int(search.group(2)))
        else:
            raise Exception()
    elif row['type'] == 'unmatched':
        return ward_mappings[row['ReportingUnit']]
              
In [15]:
processed2016['mapped'] = processed2016.apply(lookup, args=(manual,), axis=1)
processed2016.head()
Out[15]:
County Muni ReportingUnit CongressionalDistrict AssemblyDistrict SenateDistrict data type mapped
0 Adams County CITY OF ADAMS Ward 1-4 Congressional - District 3 Assembly - District 41 State Senate - District 14 1-4 hypen Wards 1,2,3,4
1 Adams County CITY OF WISCONSIN DELLS Wards 5,9 Congressional - District 3 Assembly - District 41 State Senate - District 14 5,9 comma Wards 5,9
2 Adams County TOWN OF ADAMS Wards 1-3 Congressional - District 3 Assembly - District 41 State Senate - District 14 1-3 hypen Wards 1,2,3
3 Adams County TOWN OF BIG FLATS Ward 1-2 Congressional - District 3 Assembly - District 72 State Senate - District 24 1-2 hypen Wards 1,2
4 Adams County TOWN OF COLBURN Ward 1 Congressional - District 3 Assembly - District 72 State Senate - District 24 1 single Ward 1
In [16]:
processed2016[processed2016['ReportingUnit'] == 'Wards 1A-3A']
Out[16]:
County Muni ReportingUnit CongressionalDistrict AssemblyDistrict SenateDistrict data type mapped
2208 Monroe County TOWN OF LA GRANGE Wards 1A-3A Congressional - District 7 Assembly - District 70 State Senate - District 24 Wards 1A-3A unmatched Wards 1A,2A,3A

Now, let's add in reporting units from the OpenElection Data. We'll start with 2014 data.

In [17]:
openelex2014 = pd.read_csv("https://raw.githubusercontent.com/openelections/openelections-data-wi/master/2014/20141104__wi__general_ward.csv")
In [18]:
openelex2014.head()
Out[18]:
county ward office district total votes party candidate votes
0 Adams Town Of Adams Wards 1-3 Governor/Lieutenant Governor NaN 500 DEM Mary Burke John Lehman 233
1 Adams Town Of Adams Wards 1-3 Governor/Lieutenant Governor NaN 500 REP Scott Walker Rebecca Kleefisch 258
2 Adams Town Of Adams Wards 1-3 Governor/Lieutenant Governor NaN 500 IND Dennis Fehr No Candidate 1
3 Adams Town Of Adams Wards 1-3 Governor/Lieutenant Governor NaN 500 IND Robert Burke Joseph M. Brost 8
4 Adams Town Of Adams Wards 1-3 Governor/Lieutenant Governor NaN 500 IND Mary Jo Walters (Write-In) 0

The OpenElection data is similar to Election Commission reporting units data, except they've combined the Ward and City into one column. Our end goal is a generic dictionary that can work on multiple years, so to keep the data consistent we're going to split that back into two columns.

Unfortunately, there are some odditities in the 2014 data, so we'll special-case those when we split data into municipalities and reporting units.

In [19]:
openelex2014specialcases= {"Village Of Gilman Ward": {"Muni": "Village of Gilman", "ReportingUnit": "Ward 1"}, "Village Of Lublin Ward": {"Muni": "Village of Lublin", "ReportingUnit": "Ward 1"}}

def split_ward_col(ward, special_cases):
    
    if ward in special_cases:
      mcd = special_cases[ward]['Muni']
      repounit = special_cases[ward]['ReportingUnit']
    else:
      x = re.match('(.+?)(?P<ward>(?:Ward).*$)', ward)
      if x is None:
        #print(ward)
        #some counties use Wd instead of Ward
        x = re.match('(.+?)(?P<ward>(?:Wd).*$)', ward)
      mcd = x.group(1).strip()
      repounit = x.group('ward')
    handled = process_reporting_unit(repounit)
    handled['Muni'] = mcd
    handled['ReportingUnit'] = repounit
    return handled
In [20]:
openelexprocessed = pd.concat([openelex2014,openelex2014.ward.apply(lambda s: pd.Series(split_ward_col(s, openelex2014specialcases)))], axis=1)
In [ ]:
 

The OpenElections data has a row for each candidate in each reporting unit, and for now we're just looking to compute ward infomration, so we'll drop some columns and make ourselves a copy that we can de-depulicate.

In [21]:
hack = openelexprocessed[['county', 'ward', 'Muni', 'ReportingUnit', 'data', 'type']].copy()
In [22]:
dedupedOpenElex2014 = hack.drop_duplicates().copy()

As mentioned above, each election the reporting units might be different: depending on the ballots used and what's on the election, in one year a city might have a single reporting unit of wards 1,2,3,and 4, and the next year it might have two reporting units, one with wards 1 and 2, and the other with wards 3 and 4. So, we'll see if we've already got a reporting unit from 2016, and for everything we don't already have we'll build another manual dictionary. Hopefully, between the 2016 and the 2014 data, we'll have covered most possible combinations.

In [23]:
len(dedupedOpenElex2014[dedupedOpenElex2014['type']=='unmatched'])
Out[23]:
200
In [24]:
def checkforexists(row):
    if not row in manual:
        convert(row)
    
junk = dedupedOpenElex2014[dedupedOpenElex2014['type']=='unmatched']['ReportingUnit'].apply(checkforexists)
"Wards 1-3, 8-10": "Wards ",
"Wards 2, 8 & 11": "Wards ",
"Ward 3, 4 & 6": "Ward ",
"Ward 9, 10 & 18": "Ward ",
"Wards 1-3 & 6": "Wards ",
"Wards 1, 2, 3": "Wards ",
"Wards 7-8, 10-12": "Wards ",
"Ward 1, 2, 3": "Ward ",
"Wards 1 - 5 & 5S": "Wards ",
"Wards 1 - 2 & 2 S": "Wards ",
"Wards 1 - 4, 6 - 7": "Wards ",
"Wards 1, 2 & 3": "Wards ",
"Ward 1, 2 & 3": "Ward ",
"Ward 1, 2, 3,& 4": "Ward ",
"Wards 1,9,10": "Wards ",
"Wards 2,3,5": "Wards ",
"Wards 4,6,7,8": "Wards ",
"Wards 1 - 3 & 6": "Wards ",
"Wards 1, 2, 4, 5, 7": "Wards ",
"Wards 1 And 2": "Wards ",
"Ward 1, 3-6, 15": "Ward ",
"Ward 7-10, 12": "Ward ",
"Ward 1, 5, 6 & 11": "Ward ",
"Ward 2, 3, 4 & 12": "Ward ",
"Wards 15, 18, 19": "Wards ",
"Wards 1-5, 8-9": "Wards ",
"Wards 6-7, 14-18": "Wards ",
"Wards 3 - 4, 12": "Wards ",
"Wards 2,3,4": "Wards ",
"Wards 1-4, 7-11": "Wards ",
"Wards 1, 3-7": "Wards ",
"Ward 1, 2, 3": "Ward ",
"Wards 1, 3 & 5": "Wards ",
"Wards 7, 12 & 13": "Wards ",
"Wards 1, 2 & 3": "Wards ",
"Wards 1, 2 & 3": "Wards ",
"Wards 1, 2 & 3": "Wards ",
"Wards 1, 2, 3, 4 & 5": "Wards ",
"Wards 1-6, 22-24, 29": "Wards ",
"Wards 7-10, 18-21, 25-27, 30": "Wards ",
"Wards 11-17, 28": "Wards ",
"Wards 1, 2 & 3": "Wards ",
"Wards 1 - 3 And 7 - 8": "Wards ",
"Wards 4 - 6 And 9 - 12": "Wards ",
"Wards 13 - 16 And 20 - 22": "Wards ",
"Wards 17 - 19 And 30 - 32": "Wards ",
"Wards 23 - 26 And 27 - 29": "Wards ",
"Wards 1 + 2": "Wards ",
"Wards 3 + 4": "Wards ",
"Wards 5 + 7": "Wards ",
"Wards 8 + 9": "Wards ",
"Wards 9B": "Wards ",
"Wards 9, 9A, 10, 11 ,12": "Wards ",
"Wards 1, 2, 3": "Wards ",
"Wards 1, 2, 3": "Wards ",
"Wards 1, 2, 3": "Wards ",
"Wards 1, 2, 3, 4, 5": "Wards ",
"Wards 1, 2 & 3": "Wards ",
"Wards 1, 2 & 3": "Wards ",
"Wards 1, 2 & 3": "Wards ",
"Wards 1-2-3": "Wards ",
"Wards 1-2-3": "Wards ",
"Wards 1-2-3-4": "Wards ",
"Wards 1-2-3-4": "Wards ",
"Wards 2,3 & 5": "Wards ",
"Wards 1 - 5 & 10": "Wards ",
"Wards 5-6, 9, 12": "Wards ",
"Wards 1,2,3 & 8": "Wards ",
"Wards 1 And 2": "Wards ",
"Wards 3 And 4": "Wards ",
"Wards 5 And 6": "Wards ",
"Ward 1 - 6 & 8": "Ward ",
"Wards 3, 4, 22": "Wards ",
"Wards 17 - 18, 21, 23 - 25": "Wards ",
"Ward 1, 2 & 3": "Ward ",
"Wards 1, 2, 4, 5": "Wards ",
"Wards 12, 20, 21, 24": "Wards ",
"Wards 1,2,6 & 7": "Wards ",
"Wards 1, 2, & 3": "Wards ",
"Wards 1, 2, & 3": "Wards ",
"Wards 1, 2, & 3": "Wards ",
"Wards 1, 2, & 3": "Wards ",
"Wards 1, 3, 5": "Wards ",
"Wards 2, 4, 6": "Wards ",
"Wards 1, 2, & 3": "Wards ",
"Ward 1, 3 - 5": "Ward ",
"Wards 1S & 3S": "Wards ",
"Wards - 1, 2 & 5": "Wards - ",
"Wards - 3, 4 & 6": "Wards - ",
"Wards 2, 8S": "Wards ",
"Ward 1 B, 2 B, 3 B": "Ward ",
"Ward 1 A, 2 A, 3 A": "Ward ",
"Wards 1, 2 & 3": "Wards ",
"Wards 1-16, 5A, 18": "Wards ",
"Wards 1-3 &  5-8": "Wards ",
"Wards 1, 4, 5, 14": "Wards ",
"Wards 2, 6, 8, 12, 13": "Wards ",
"Wards 3, 9, 10, 11": "Wards ",
"Wd 1,2,5": "Wd ",
"Ward 5, 7B": "Ward ",
"Ward 6, 7A": "Ward ",
"Ward 8, 9, 10": "Ward ",
"Ward 19, 20, 21": "Ward ",
"Wards 1, 2 & 4": "Wards ",
"Ward 1, 2, 3": "Ward ",
"Wards 3, 4, 5": "Wards ",
"Wards 6, 7, 8": "Wards ",
"Wards 9, 10, 12, 13": "Wards ",
"Wards 11, 14, 15, 17": "Wards ",
"Wards 16, 18, 19": "Wards ",
"Ward 1, 2, 4, 16": "Ward ",
"Ward 3, 10, 11, 12, 15": "Ward ",
"Ward 6, 7, 8, 9": "Ward ",
"Ward 19, 21, 22, 23": "Ward ",
"Wards 1-4, 6": "Wards ",
"Wards 1 - 4 & 15": "Wards ",
"Ward 1-3, 13": "Ward ",
"Ward 4, 6, 14": "Ward ",
"Ward 5, 7-9": "Ward ",
"Wards 1-2 & 9": "Wards ",
"Wards 1,2,3,": "Wards ",
"Wards 1- 4, 6": "Wards ",
"Wards 5, 6, 10": "Wards ",
"Ward 7 - 9, 11 - 14": "Ward ",
"Wds 1,2,8": "Wds ",
"Wds 3,4,5,6,7": "Wds ",
"Wards 1,8,10 & 11": "Wards ",
"Wards 2,5,6 & 7": "Wards ",
"Wards 3,16,4,17 & 9": "Wards ",
"Wards 1 2": "Wards ",
"Wards 5 6": "Wards ",
"Wards 7 8 9": "Wards ",
"Wards 6, 9, 10, 15 - 17, 20, 23": "Wards ",
"Wards 11 - 14, 21, 22": "Wards ",
"Wards 23-24, 26": "Wards ",
"Wards 1 - 3 - 4 - 10": "Wards ",
"Wards 2 - 5 - 6 - 7 - 8": "Wards ",
"Wards 1, 2, 5, 6": "Wards ",
"Wards 9, 10, 11": "Wards ",
"Ward 1,3,5 & 9": "Ward ",
"Ward 2, 4 & 10": "Ward ",
"Ward 6,7 & 8": "Ward ",
"Wards 1, 2, 3, 7, 8, 9": "Wards ",
"Wards 4, 5, 6, 10, 11": "Wards ",
"Wards 1 - 3, 7 - 11": "Wards ",
"Wards 1,2,& 4": "Wards ",
"Wards 3,6,& 7": "Wards ",
"Wards  5,8,& 9": "Wards ",
"Ward 1, 8-11": "Ward ",
"Wards 7, 9 - 11": "Wards ",
"Wards 1, 2, 3, 4": "Wards ",
"Wards 2,3,4,5": "Wards ",
"Wards 4 And 5": "Wards ",
"Wards 9 And 10": "Wards ",
"Wards 11 And 12": "Wards ",
"Wards 13 And 14": "Wards ",
"Wards 15 And 16": "Wards ",
"Wards 3, 4 & 8": "Wards ",
"Wards 1 - 2 & 7 - 10": "Wards ",
"Wards 1, 2, 4, 7": "Wards ",
"Wards 3, 5, 6": "Wards ",
"Wards 1A, 2-5": "Wards ",
"Ward 1A And 2": "Ward ",
"Wards 1 And 2A": "Wards ",
"Wards 2B And 2C": "Wards ",
"Wards 1A, 2, 4, 7": "Wards ",
"Wards 3, 14, 15, 30": "Wards ",
"Wards 5A, 6, 8, 9, 23, 24, 25, 26, 27, 28, 29, 31": "Wards ",
"Wards 10, 11, 12, 13, 21, 22": "Wards ",
"Wards 8, 19, 22, 23": "Wards ",
"Wards 6 - 15 & 24 & 26": "Wards ",
"Wards 16 - 23 & 25 & 27": "Wards ",
In [25]:
manual2014 = {
"Wards 1-3, 8-10": "Wards 1,2,3,8,9,10",
"Wards 2, 8 & 11": "Wards 2,8,11",
"Ward 3, 4 & 6": "Ward 3,4,6",
"Ward 9, 10 & 18": "Ward 9,10,18",
"Wards 1-3 & 6": "Wards 1,2,3,6",
"Wards 1, 2, 3": "Wards 1,2,3",
"Wards 7-8, 10-12": "Wards 7,8,10,11,12",
"Ward 1, 2, 3": "Ward 1,2,3",
"Wards 1 - 5 & 5S": "Wards 1,2,3,4,5,5S",
"Wards 1 - 2 & 2 S": "Wards 1,2,2S",
"Wards 1 - 4, 6 - 7": "Wards 1,2,3,4,6,7",
"Wards 1, 2 & 3": "Wards 1,2,3",
"Ward 1, 2 & 3": "Ward 1,2,3",
"Ward 1, 2, 3,& 4": "Ward 1,2,3,4",
"Wards 1,9,10": "Wards 1,9,10",
"Wards 2,3,5": "Wards 2,3,5",
"Wards 4,6,7,8": "Wards 4,6,7,8",
"Wards 1 - 3 & 6": "Wards 1,2,3,6",
"Wards 1, 2, 4, 5, 7": "Wards 1,2,4,5,7",
"Wards 1 And 2": "Wards 1,2",
"Ward 1, 3-6, 15": "Ward 1,3,4,5,6,15",
"Ward 7-10, 12": "Ward 7,8,9,10,12",
"Ward 1, 5, 6 & 11": "Ward 1,5,6,11",
"Ward 2, 3, 4 & 12": "Ward 2,3,4,12",
"Wards 15, 18, 19": "Wards 15,18,19",
"Wards 1-5, 8-9": "Wards 1,2,3,4,5,8,9",
"Wards 6-7, 14-18": "Wards 6,7,14,15,16,17,18",
"Wards 3 - 4, 12": "Wards 3,4,12",
"Wards 2,3,4": "Wards 2,3,4",
"Wards 1-4, 7-11": "Wards 1,2,3,4,7,8,9,10,11",
"Wards 1, 3-7": "Wards 1,3,4,5,6,7",
"Ward 1, 2, 3": "Ward 1,2,3",
"Wards 1, 3 & 5": "Wards 1,3,5",
"Wards 7, 12 & 13": "Wards 7,12,13",
"Wards 1, 2 & 3": "Wards 1,2,3",
"Wards 1, 2 & 3": "Wards 1,2,3",
"Wards 1, 2 & 3": "Wards 1,2,3",
"Wards 1, 2, 3, 4 & 5": "Wards 1,2,3,4,5",
"Wards 1-6, 22-24, 29": "Wards 1,2,3,4,5,6,22,23,24,29",
"Wards 7-10, 18-21, 25-27, 30": "Wards 7,8,9,10,18,19,20,21,25,26,27,30",
"Wards 11-17, 28": "Wards 11,12,13,14,15,16,17,28",
"Wards 1, 2 & 3": "Wards 1,2,3",
"Wards 1 - 3 And 7 - 8": "Wards 1,2,3,7,8",
"Wards 4 - 6 And 9 - 12": "Wards 4,5,6,9,10,11,12",
"Wards 13 - 16 And 20 - 22": "Wards 13,14,15,16,20,21,22",
"Wards 17 - 19 And 30 - 32": "Wards 17,18,19,30,31,32",
"Wards 23 - 26 And 27 - 29": "Wards 23,24,25,26,27,28,29",
"Wards 1 + 2": "Wards 1,2",
"Wards 3 + 4": "Wards 3,4",
"Wards 5 + 7": "Wards 5,7",
"Wards 8 + 9": "Wards 8,9",
"Wards 9B": "Wards 9B",
"Wards 9, 9A, 10, 11 ,12": "Wards 9,9A,10,11,12",
"Wards 1, 2, 3": "Wards 1,2,3",
"Wards 1, 2, 3": "Wards 1,2,3",
"Wards 1, 2, 3": "Wards 1,2,3",
"Wards 1, 2, 3, 4, 5": "Wards 1,2,3,4,5",
"Wards 1, 2 & 3": "Wards 1,2,3",
"Wards 1, 2 & 3": "Wards 1,2,3",
"Wards 1, 2 & 3": "Wards 1,2,3",
"Wards 1-2-3": "Wards 1,2,3",
"Wards 1-2-3": "Wards 1,2,3",
"Wards 1-2-3-4": "Wards 1,2,3,4",
"Wards 1-2-3-4": "Wards 1,2,3,4",
"Wards 2,3 & 5": "Wards 2,3,5",
"Wards 1 - 5 & 10": "Wards 1,2,3,4,5,10",
"Wards 5-6, 9, 12": "Wards 5,6,9,12",
"Wards 1,2,3 & 8": "Wards 1,2,3,8",
"Wards 1 And 2": "Wards 1,2",
"Wards 3 And 4": "Wards 3,4",
"Wards 5 And 6": "Wards 5,6",
"Ward 1 - 6 & 8": "Ward 1,2,3,4,5,6,8",
"Wards 3, 4, 22": "Wards 3,4,22",
"Wards 17 - 18, 21, 23 - 25": "Wards 17,18,21,23,24,25",
"Ward 1, 2 & 3": "Ward 1,2,3",
"Wards 1, 2, 4, 5": "Wards 1,2,4,5",
"Wards 12, 20, 21, 24": "Wards 12,20,21,24",
"Wards 1,2,6 & 7": "Wards 1,2,6,7",
"Wards 1, 2, & 3": "Wards 1,2,3",
"Wards 1, 2, & 3": "Wards 1,2,3",
"Wards 1, 2, & 3": "Wards 1,2,3",
"Wards 1, 2, & 3": "Wards 1,2,3",
"Wards 1, 3, 5": "Wards 1,3,5",
"Wards 2, 4, 6": "Wards 2,4,6",
"Wards 1, 2, & 3": "Wards 1,2,3",
"Ward 1, 3 - 5": "Ward 1,3,4,5",
"Wards 1S & 3S": "Wards 1S,3S",
"Wards - 1, 2 & 5": "Wards 1,2,5",
"Wards - 3, 4 & 6": "Wards 3,4,6",
"Wards 2, 8S": "Wards 2,8S",
"Ward 1 B, 2 B, 3 B": "Ward 1B,2B,3B",
"Ward 1 A, 2 A, 3 A": "Ward 1A,2A,3A",
"Wards 1, 2 & 3": "Wards 1,2,3",
"Wards 1-16, 5A, 18": "Wards 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,5A,18",
"Wards 1-3 &  5-8": "Wards 1,2,3,5,6,7,8",
"Wards 1, 4, 5, 14": "Wards 1,3,4,14",
"Wards 2, 6, 8, 12, 13": "Wards 2,6,8,12,13",
"Wards 3, 9, 10, 11": "Wards 3,9,10,11",
"Wd 1,2,5": "Ward 1,2,5",
"Ward 5, 7B": "Ward 5,7B",
"Ward 6, 7A": "Ward 6,7A",
"Ward 8, 9, 10": "Ward 8,9,10",
"Ward 19, 20, 21": "Ward 19,20,21",
"Wards 1, 2 & 4": "Wards 1,2,4",
"Ward 1, 2, 3": "Ward 1,2,3",
"Wards 3, 4, 5": "Wards 3,4,5",
"Wards 6, 7, 8": "Wards 6,7,8",
"Wards 9, 10, 12, 13": "Wards 9,10,12,13",
"Wards 11, 14, 15, 17": "Wards 11,14,15,17",
"Wards 16, 18, 19": "Wards 16,18,19",
"Ward 1, 2, 4, 16": "Ward 1,2,3,16",
"Ward 3, 10, 11, 12, 15": "Ward 3,10,11,12,15",
"Ward 6, 7, 8, 9": "Ward 6,7,8,9",
"Ward 19, 21, 22, 23": "Ward 19,21,22,23",
"Wards 1-4, 6": "Wards 1,2,3,4,6",
"Wards 1 - 4 & 15": "Wards 1,2,3,4,15",
"Ward 1-3, 13": "Ward 1,2,3,13",
"Ward 4, 6, 14": "Ward 4,6,14",
"Ward 5, 7-9": "Ward 5,7,8,9",
"Wards 1-2 & 9": "Wards 1,2,9",
"Wards 1,2,3,": "Wards 1,2,3",
"Wards 1- 4, 6": "Wards 1,2,3,4,6",
"Wards 5, 6, 10": "Wards 5,6,10",
"Ward 7 - 9, 11 - 14": "Ward 7,8,9,11,12,13,14",
"Wds 1,2,8": "Wards 1,2,8",
"Wds 3,4,5,6,7": "Wards 3,4,5,6,7",
"Wards 1,8,10 & 11": "Wards 1,8,10,11",
"Wards 2,5,6 & 7": "Wards 2,5,6,7",
"Wards 3,16,4,17 & 9": "Wards 3,16,4,17,9",
"Wards 1 2": "Wards 1,2",
"Wards 5 6": "Wards 5,6",
"Wards 7 8 9": "Wards 7,8,9",
"Wards 6, 9, 10, 15 - 17, 20, 23": "Wards 6,9,10,15,16,17,20,23",
"Wards 11 - 14, 21, 22": "Wards 11,12,13,14,21,22",
"Wards 23-24, 26": "Wards 23,24,26",
"Wards 1 - 3 - 4 - 10": "Wards 1,3,4,10",
"Wards 2 - 5 - 6 - 7 - 8": "Wards 2,5,6,7,8",
"Wards 1, 2, 5, 6": "Wards 1,2,5,6",
"Wards 9, 10, 11": "Wards 9,10,11",
"Ward 1,3,5 & 9": "Ward 1,3,5,9",
"Ward 2, 4 & 10": "Ward 2,4,10",
"Ward 6,7 & 8": "Ward 6,7,8",
"Wards 1, 2, 3, 7, 8, 9": "Wards 1,2,3,7,8,9",
"Wards 4, 5, 6, 10, 11": "Wards 4,5,6,10,11",
"Wards 1 - 3, 7 - 11": "Wards 1,2,3,7,8,9,10,11",
"Wards 1,2,& 4": "Wards 1,2,4",
"Wards 3,6,& 7": "Wards 3,6,7",
"Wards  5,8,& 9": "Wards 5,8,9",
"Ward 1, 8-11": "Ward 1,8,9,10,11",
"Wards 7, 9 - 11": "Wards 7,9,10,11",
"Wards 1, 2, 3, 4": "Wards 1,2,3,4",
"Wards 2,3,4,5": "Wards 2,3,4,5",
"Wards 4 And 5": "Wards 4,5",
"Wards 9 And 10": "Wards 9,10",
"Wards 11 And 12": "Wards 11,12",
"Wards 13 And 14": "Wards 13,14",
"Wards 15 And 16": "Wards 15,16",
"Wards 3, 4 & 8": "Wards 3,4,8",
"Wards 1 - 2 & 7 - 10": "Wards 1,2,7,8,9,10",
"Wards 1, 2, 4, 7": "Wards 1,2,4,7",
"Wards 3, 5, 6": "Wards 3,5,6",
"Wards 1A, 2-5": "Wards 1A,2,3,4,5",
"Ward 1A And 2": "Ward 1A,2",
"Wards 1 And 2A": "Wards 1,2A",
"Wards 2B And 2C": "Wards 2B,2C",
"Wards 1A, 2, 4, 7": "Wards 1A,2,4,7",
"Wards 3, 14, 15, 30": "Wards 3,14,15,30",
"Wards 5A, 6, 8, 9, 23, 24, 25, 26, 27, 28, 29, 31": "Wards 5A,6,8,9,23,24,25,26,27,28,29,31",
"Wards 10, 11, 12, 13, 21, 22": "Wards 10,11,12,13,21,22",
"Wards 8, 19, 22, 23": "Wards 8,19,22,23",
"Wards 6 - 15 & 24 & 26": "Wards 6,7,8,9,10,11,12,13,14,15,24,26",
"Wards 16 - 23 & 25 & 27": "Wards 16,17,18,19,20,21,22,23,25,27",
}

Merge the 2016 and 2014 data into one dictionary, and then let's normalize all of the 2014 reporting units.

In [26]:
combinedmappings = manual.copy()
combinedmappings.update(manual2014)
In [27]:
dedupedOpenElex2014['mapped'] = dedupedOpenElex2014.apply(lookup, args=(combinedmappings,), axis=1)
In [28]:
dedupedOpenElex2014.head(10)
Out[28]:
county ward Muni ReportingUnit data type mapped
0 Adams Town Of Adams Wards 1-3 Town Of Adams Wards 1-3 1-3 hypen Wards 1,2,3
11 Adams Town Of Big Flats Wards 1 & 2 Town Of Big Flats Wards 1 & 2 1 & 2 amp Wards 1,2
22 Adams Town Of Colburn Ward 1 Town Of Colburn Ward 1 1 single Ward 1
33 Adams Town Of Dell Prairie Wards 1-3 Town Of Dell Prairie Wards 1-3 1-3 hypen Wards 1,2,3
44 Adams Town Of Easton Wards 1 & 2 Town Of Easton Wards 1 & 2 1 & 2 amp Wards 1,2
55 Adams Town Of Jackson Wards 1 & 2 Town Of Jackson Wards 1 & 2 1 & 2 amp Wards 1,2
66 Adams Town Of Leola Ward 1 Town Of Leola Ward 1 1 single Ward 1
77 Adams Town Of Lincoln Ward 1 Town Of Lincoln Ward 1 1 single Ward 1
88 Adams Town Of Monroe Ward 1 Town Of Monroe Ward 1 1 single Ward 1
99 Adams Town Of New Chester Ward 1-3 Town Of New Chester Ward 1-3 1-3 hypen Wards 1,2,3

Let's write out our combined special-case mappings for later. This dictionary will have both 2016 and 2014 data, but when we want to add 2012 data hopefully we'll be able to use it as a starting point.

In [29]:
with open("ward_mappings.json", "w") as special_case_output:
    json.dump(combinedmappings, special_case_output)

Let's also write out a copy of a CSV with normalized wards. This is only the 2014 data, and can be used as a lookup table for the real Open Elections data, which we'll do in our next notebook.

In [30]:
dedupedOpenElex2014.to_csv("2014_wards_normalized.csv", index=False, columns=('county', 'ward', 'Muni', 'ReportingUnit', 'mapped'))

* One note about wards. Earlier, we said that all voters in a ward get the same ballot. This is not true in the case of school board elections. School district boundaries do not follow municipal boundaries, so some wards are split between different districts. The Elections Commission calls these 'District Combos', and the Voter Registration system records which District Combo a voter lives in and which ballot they should get. (In fact, not only do school district boundaries not follow municipal boundaries, they don't even follow parcel boundaries. There are a few condos in Madison where the School District boundary runs through the condo. In that case, for purposes of "residency", the district that the bedroom falls into decides which district the voter lives in.)




This post was written entirely in the IPython notebook. You can download this notebook, or see a static view on nbviewer. The Facebook graphic courtesty of Wikipedia under a CC-BY-SA license