There’s all sorts of reasons that scientists might want to look at people’s name and infer their ethnicity, gender or age. Take public health researchers who want to figure out the health care disparities between ethnicities. If they can use surnames to sort people, they could avoid having to go out and obtain race and ethnicity data from every patient. Some researchers have been using facial recognition software to try to improve estimates of peoples ages and genders in photographs. And some have suggested that unpopular names might be correlated with juvenile delinquency.
But how much can you really tell from someone’s name? Pete Warden, an engineer and blogger, breaks down some of the techniques available to analyze names.
The U.S. Census, for instance, releases lists of how popular names are by gender and year of birth. Minnie was the fifth most popular name in 1880 and has nearly disappeared today. In 2012, the number five spot was held by Ava. Gender is probably the easiest distinction to make with names, Warren writes. While there are certainly exceptions, Mikes and Bobs tend to be men, while Sarahs and Sallies tend to be women. The second easiest thing to tell tends to be ethnicity. The U.S. Census also has a list of 150,000 family names by ethnicity. Warden writes:
Asian and Hispanic family names tend to be fairly unique to those communities, so an occurrence is a strong signal that the person is a member of that ethnicity. There are some confounding factors though, especially with Spanish-derived names in the Phillipines. There are certain names, especially those from Germany and Nordic countries, that strongly indicate that the owner is of European descent, but many surnames are multi-racial. There are some associations between African-Americans and certain names like Jackson or Smalls, but these are also shared by a lot of people from other ethnic groups. These ambiguities make non-Hispanic and non-Asian measures more indicators than strong metrics, and they won’t tell you much until you get into the high hundreds for your sample size.
Age is the hardest. While Minnie isn’t popular anymore, it’s still around. And many names, like Ava, tend to come back into fashion. Just like it’s rude to guess someone’s age to their face, it’s also probably a bad idea to guess it from their name.
Using names to infer things like gender and ethnicity can be useful for all sorts of scientists.
Facebook has even used this information to determine how diverse the sites users are. Facebook data scientist Cameron Marlow writes:
This is a tough question to answer because, unlike information such as gender or age, Facebook does not ask users to share their ethnicity or race on their profiles. In order to answer it, we focused on a single country with a large and diverse population—the United States. Comparing people’s surnames on Facebook with data collected by the U.S. Census Bureau, we are able to estimate the racial breakdown of Facebook users over the history of the site.
What Facebook found was that, since 2005, Asian/Pacific Islanders have been much more likely to be on Facebook than Whites. White users and black users are about even, with Hispanics lagging slightly behind.
Overall, though, guessing from names is tricky. There are always the lady Alex’s and the Chinese Smiths. Many immigrants change their names when they move, muddling the correlations. But if researchers can get good approximations, they could use them to figure out what’s going on with large sets of people without having to ask them.
More from Smithsonian.com: