NLP on a Billion Documents: Scalable Machine Learning with Apache Spark