• Code

    Find me on:


    Generating text using Markov chains

    Markov chains can be used to generate real-looking text when trained on a large volume of redundant text. This a simple python Markov text generator. It also includes utility functions for scraping Google scholar pages for any author to generate paper titles that look like that author’s. Note that that last part violates Google’s Terms of Service, so you probably shouldn’t do it.

    Code on GitHub.

    Timeseries analysis with R, Python, and Rpy2

    There are a lot of great tools for working with time series, both in Python and R. But one of the uses I find myself encountering frequently for which there isn’t a great solution is getting time series data in a workable format to begin with. This is tiny library for pre-processing time series.

    Code on GitHub.

    Analyzing the SSA’s names database

    The Social Security Administration provides a public domain database of baby names. The number of people born with any given name is aggregated year-to-year for the entire country and for each state. This is a tiny set of utility functions for parsing the data and plotting interesting properties.

    Data from SSA.
    Code on GitHub.
    Related blog post.

    Working with World Bank indicators

    The World Bank has made their indicator dataset available for free online. These utility functions download the entire dataset and set up objects for working with data for specific countries or for specific indicators.

    Data on worldbank.org.
    Code on GitHub.