Look No More, The Data driven Baby Name generator

September 7, 2019

Very soon, I'm about to blessed with a tiny being in the world calling me uncle (I'm also a bit terrified, but that's a story for another time).

With the birth just around the corner, my sister and brother-in-law recently endeavored on the all important decision that will chart the course of my niece's life... deciding on a first name.

I'm a proponent that the ole folklore of names holding some special "power" has an element of truth. From inspiring a child to be one of a kind to giving the child a sense of identity to smoothing over business deals, it can have an outsized impact on a child's life.

So, I of course wanted to throw my name in the mix for deciding my niece's fate. Unfortunately, names just so happen to be my kryptonite. Thus, I turned to one of machine learning's latest advances in Recurrent Neural Networks (RNNs) to give me a hand at computing creativity.

Without further ado, The NAmes

Guidebook
Each column contains 25 sampled names from the trained RNN segmented by varying degrees of creativity as proxied by the parameter Temperature in the model. In other words, the lower the creativity (further to the left) the more recognizable the name.

The Insights and conclusion

There's an outsized proportion of names that begin with the letter A, especially those with a lower temperature. My hypothesis (outside of additional investigation into algorithm implementation) is this is the result of names beginning with an A accounting for such a large proportion of the underlying training data. In other words, everyone wants their kids to be at the start of the alphabet.

Distinct Count of Names Registered with Social Security in 2018

While all of the names on the creative end (Temperature = 2.0) were my first encounter with the name, some actually have precedence in today's world. As examples, Xibel is a username in the hit game World of Warcraft while Google searches show quite a few Instagrammers share the name Shonte and there's even an entry in Urban Dictionary.

What will be my niece's name you ask? Well, amongst the names above, my sister paused on Merlin for a hot second (no doubt Harry Potter's influence)... before ultimately settling on the name Olivia.

Fret not though, as I'm sure I'll still get a kick out of this for years to come each and every time friends/family welcome a new life into the world.

Methodology and credits

All source code, training data, and project notes can be found on my Github.

The RNN was trained on data kindly made public by the Social Security Administration. In particular, every name that was registered for an application in 2018 was used to keep things "fresh and hip."

The underlying code for the package heavily leverages Max Woolf's Textgenrnn package located on Github.

Curious about learning more on RNNs? I highly recommend reading Andrej Karpathy's blog on CNNs.