Markov chain paper title generator

Gene Stanley is a prolific and influential physicist who has been one of the biggest pioneers of interdisciplinary science of the last several decades. His H-Index is an obscene 111.

I work on econophysics, which he basically invented, so I encounter his papers regularly. After seeing a few dozen of his econophysics papers, their titles start to blur together. This is no surprise—he has published so much in econophysics that many of the same terms appear in his paper titles.

So I wrote a Markov title generator and trained it on Gene Stanley’s econophysics papers to see if it could generate Stanleyesque paper titles. See for yourself:

The Distribution of Commodity Price Fluctuations.

Correlation in Complex Organizations.

Scaling Behavior in the Random Matrix Theory Approach to the Indian Stock Index.

Statistical Regularities in Stock Prices.

I would totally read these papers. Some of them even sound pretty interesting. Sadly, they don’t actually exist.

Try generating your own paper titles with this python library I wrote. It generates chains of arbitrary length and is optimized for clarity and not speed, and it’s based on shabda’s on agiliq.com.

When called from the command line, it’ll scrape titles from Google Scholar if you feed it an author name as a command line argument. So

python markov.py "HE Stanley"

will scrape 300 of Stanley’s paper titles as the source corpus. This works best when the author name is provided as first and middle initials and a last name, as shown. Caution: scraping Google scholar violates their TOS.

If you’re using a small corpus and long Markov chains, you’ll end up with lots of actual strings from the corpus, and no fake ones. If this happens, experiment with the second parameter to the constructor for the class “MarkovGenerator.”

See also SCIgen, which generates entire papers, but cannot be seeded with a particular author or category. Code on github.

See comments on HackerNews.