It seems like neural networks are taking over. We see their code open-sourced, they power our self-driving vehicles, we use them as programming assistants, and we’ve even let them rate our selfies.

Now, deep neural networks have analyzed 50,000 fonts, just because.

The man behind this crazy font fun is Erik Bernhardsson, a CTO at a small financial technology startup in New York City, according to his Twitter bio. He also likes large-scale machine learning and “other fun stuff.” This fun stuff includes writing a blog post about fonts that generated so much traffic, it crashed his site while he was sleeping. It also magically appeared and disappeared on Hacker News.

I still managed to come across his side-project, so let’s get to it. To start, he took a bunch of Scrapy scripts, which are from an open-source and collaborative framework for extracting the data you need from websites. It started pulling down fonts, and a few days later, Bernhardsson had more than 50,000 fonts on his computer.

After some “number juggling,” he was able to scale all the characters down to 64×64, which would allow them to be directly compared to each other. This created a font vector, which is a vector in latent space that “defines” a certain font. That way, all fonts are embedded in a space where similar fonts have similar vectors, according to him.

His simple neural network was built using Lasagne/Theano, which he wrote took “an insane amount of time to converge, probably because there’s so much data and parameters.” He said after weeks of running, the model converges to something that “looks decent.”


In the gif above, you’ll see that the model has learned that many fonts use upper case characters for the lower case range. Bernhardsson said that new fonts can be generated from this model.

He concluded that there’s a lot of room for improvement, which is why he has his code available on GitHub to anyone who wants to get their hands on some neural nets.