Type-safe DataFrames: Speaking of datasets, version 1.6 includes the new DataSet API, which allows for compile-time type safety within DataFrames. The existing DataFrames API, using the DataSet API, now supports static typing and user functions that run directly on Scala or Java types.
For data scientists, Spark 1.6 has improved its machine-learning pipeline. The Pipeline API offers functionality to save and reload pipelines in persistent storage. Spark 1.6 also increases algorithm coverage in machine learning; this adds support for univariate and bivariate statistics, bisecting k-means clustering, online hypothesis testing, survival analysis, and non-standard JSON data.