Birthplaces of the Nobel Laureates

Summary

Just look at the interactive map below! But:

Introduction

Someone showed me a map of the birthplaces of Nobel laureates. Unfortunately I can't find the map I saw, but it was something like the lower half of this one, except not animated, and without the inexplicable US/Europe focus. It was similar though, in that it did not differentiate between the different prizes. I wondered whether there are any geographical trends in the birth location of Nobel laureates. The ‘US + Europe (+ Japan)’ trend is quite clear from the totals map, but perhaps there are more interesting trends. For example, perhaps we should expect the Chemistry Prize to be tilted towards those born in places with significant chemical industries, such as (south-)west Germany or Switzerland.

I made the map for each category, using data from Wikidata (see the Data section). I excluded the Peace Prize for two reasons:

  1. I don't respect the Peace Prize1;
  2. It has been awarded to organisations, and I didn't want to think about how to handle this.

I also made maps for the (technically not a ‘Nobel Prize’) economics Nobel, the Fields Medal, and the Turing Award.

Map

The below map is interactive. You can select the prize(s) you want to display by clicking on the legend on the right. You can pan/zoom the map. You can hover over dots to see the name of the laureate. There's no ‘fullscreen’ button, but if you want a fullscreen verison click here.

Note that the map is slightly misleading about the number of laureates from big cities like NYC: they have more laureates than it might seem from the map. See the Plotting section for more details.

Data

Wikidata

To get the data for this, I briefly considered doing it manually, then realised it would take a long time. I've been aware of Wikidata for a while now, but never really had a good excuse to use it until now. Turns out it's fairly straightforward to build a query that gets the list of prizewinners for a given prize (in the example below, Fields Medal Q28835), gets the award year, date of birth, and birth location for each winner, and gets the country and coordinates of each birth location.

You can run the below query in the Wikidata query service, which I found to be surprisingly quick. To generate this for other prizes, change the item identifier Q28835 to that of the prize you want to map.

SELECT DISTINCT ?itemLabel (YEAR(?when) as ?date) (YEAR(?dob) as ?doby) ?birthplaceLabel ?countryLabel ?lat ?lon
WHERE {
  ?item p:P166 ?awardStat .
  ?awardStat ps:P166 wd:Q28835 .  # Q28835 = Fields Medal
  ?awardStat pq:P585 ?when .

  OPTIONAL { 
    ?item wdt:P19 ?birthplace .
    OPTIONAL {?birthplace wdt:P17 ?country . }
    OPTIONAL {
      ?birthplace p:P625 ?coords .
      ?coords psv:P625 ?coord .
      ?coord wikibase:geoLatitude ?lat .
      ?coord wikibase:geoLongitude ?lon .
    }
  }

  OPTIONAL { ?item wdt:P569 ?dob . }

  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

This query might not be ‘optimal’: I basically stitched it together by looking at examples. There's a SPARQL tutorial which looks quite good. Next time I have a Wikidata-answerable question, I'll spend some time actually doing the tutorial and learning SPARQL properly.

I had to do a little bit of post-processing for two reasons. Firstly, some cities don't have coordinates. For example, Christiania (which is now Oslo). Secondly, this query sometimes generates duplicate entries. One reason this can happen is because a particular city is listed under multiple countries, such as Königsberg (now Kaliningrad), which is listed as part of Prussia, the German Empire, Nazi Germany, the Soviet Union, and the Russian Empire.

Another reason for duplicate entries is uncertainty about the laureate's date of birth (when was Isaac Bashevis Singer born?), or their place of birth. Economics laureate Wassily Leontief is listed on wikidata as being born in Munich and St Petersburg. From what I can tell he was actually born in Munich, but I ended up listing him as being born in St Petersburg, where he grew up and where his family are from. Similarly, I listed Paul Nurse, who was actually born in Norwich, as being born in London. Unfortunately this isn't necessarily consistent with the rest of the data, but I don't think it's too unreasonable.

Wikidata seems potentially quite powerful. The query service at least seems great. The ‘Nobel laureate birthplace’ question might just be very well suited to Wikidata though: it's an area where I expect the data to be quite complete. I'm not sure how complete the data is in other cases, which might limit Wikidata's usefulness. That said it's a cool project. If anything, I think it should be more ambitious. I'd love to be able to easily get things like timeseries data based on Wikidata queries.

Plotting

I used python's pyplot to plot the maps. Thanks to Bee Guan Teo's article on Towards Data Science for pointers on this.

I added a little bit of noise to locations such that stacked points can be distinguished if you zoom in, so if you see any people seemingly born in the sea, that's why!

The way I plotted this understates the number of Nobels that have come from a couple of major cities: particularly New York City, which is the birthplace of a huge number of laureates. NYC-proper is the birthplace of 13 Physics, 11 Medicine, 7 Economics, 5 Chemistry, and 2 Literature laureates; and 8 Turing Award winners, and 1 Fields Medallist. There are quite a few more who were born in what I assume is the NYC metro area. It would've made sense to make the marker area bigger rather than use opacity stacking, but I hope this warning still gets the picture across.

My code is below:

import pandas as pd
import numpy as np
import plotly.graph_objects as go
from collections import namedtuple

# I define some NamedTuples in a list called prizes here
# omitted for brevity.

for prize in prizes:
    filename = f"{prize.name}.csv"
    df = pd.read_csv(filename)
    data=go.Scattergeo(
        name=prize.graph_title,
        visible=True if prize.name == "physics" else "legendonly",
        lon=df["lon"] + 0.05*np.random.randn(len(df["lon"])),
        lat=df["lat"] + 0.05*np.random.randn(len(df["lat"])),
        hovertext=df["itemLabel"],
        mode="markers",
        marker=dict(
            symbol='circle',
            color=prize.color,
            size=7,
            opacity=0.3,
            )
        )
    all_data.append(data)

layout = dict(
    title="Nobel Laureate Birthplaces",
    geo=dict(
        scope="world",
        projection_type="robinson",
        showland=True,
        landcolor="#e6e6e6"
        ),
    margin=dict(b=20, l=20, r=20, t=40)
    )

fig = go.Figure(
        data=all_data,
        layout=layout
    )
fig.write_html("nobel_birthplace_map.html", auto_open=True)

  1. Mostly based on who they've given the award to. I'm only half-joking when I say I consider the Peace Prize on par with the Eurovision. That said, I don't want to diminish the work of the majority of the Peace laureates, who have done extremely worthwhile things. Similarly, my respect for the Literature Prize was dented with the award given to Bob Dylan, who is probably the most overrated songwriters/musicians of all time.