Introduction

Apache Spark is a unified analysis engine used for large-scale data processing. It features scalability and in-memory computing and has become a unified platform for quick processing of lightweight big data. Spark can be used to run applications, such as real-time stream processing, machine learning, and interactive query, on various storage and operating systems. For more information about Spark, see the official Spark documentation.

The Kunpeng BoostKit machine learning algorithm library is compatible with native Spark APIs (the KNN algorithm is developed by Huawei and does not have native Spark APIs; therefore, no incompatibility issue exists). It has optimized machine learning algorithms, greatly improving the computing performance in big data algorithm scenarios. This library supports the Kunpeng processors.

Parent topic: Machine Learning Algorithm Library Developer Guide