Introduction
Based on the Kunpeng 920 processors, this document describes how to deploy a benchmarking system on the openEuler operating system (OS) to measure the inference performance of search and recommendation models, covering server and client environment setup and performance evaluation during the inference phase.
Models
ModelZoo is a collection of search and recommendation models. Currently, it includes five models: Wide_and_Deep, Deep Learning Recommendation Model (DLRM), Factorization Machine with Deep Neural Network (DeepFM), Domain Facilitated Feature Modeling (DFFM), and Deep Structured Semantic Model (DSSM).
Wide_and_Deep
The Wide_and_Deep model is a machine learning architecture proposed by Google for recommendation systems. It combines the advantages of width (linear models) and depth (deep neural networks). Linear models capture explicit relationships in sparse data by memorizing known feature combinations, while deep neural networks learn new potential feature interactions through generalization. This architecture can process both high-dimensional sparse features and low-dimensional dense features to facilitate personalized recommendation. It is applicable to various scenarios such as ad click-through rate (CTR) estimation.

- Wide component: Process cross-product transformations of sparse features through a linear layer.
- Deep component: Transform categorical and ID-based sparse features, represented by one-hot encoding, into low-dimensional vectors through an embedding layer. Feed these vectors into a multilayer perceptron (MLP) together with normalized dense features such as age and income.
DLRM
DLRM is a deep learning recommendation model proposed by Facebook. This model is designed to process sparse features. It uses the embedding layer to convert high-dimensional sparse features into low-dimensional dense vectors, and captures complex relationships between features through the interaction layer. DLRM combines low-order and high-order feature interaction, uses the dot product to calculate feature combinations, and outputs prediction results through the multi-layer perceptron (MLP). DLRM is widely used in personalized services such as advertising and recommendation.
There are two categories of features. One is discrete features of the category and ID types. They are usually encoded using one-hot encoding to generate sparse features. The other is numeric continuous features. Discrete features become particularly sparse after one-hot encoding, which is not suitable for the deep learning model to learn from. Generally, the discrete features are mapped to dense continuous values through embeddings.
After the embeddings are applied, all features, including discrete features and continuous features, can be further converted through the MLP, as shown in the triangle part in Figure 2. The features processed by the MLP then enter the interaction layer for feature crossing. The interaction layer takes the dot product on every two of the embedding results to implement feature crossing. Then, the crossed features are combined with the previous embedding results and sent to the MLP for the final output.
DeepFM
DeepFM is a CTR model proposed in 2017. It is a recommendation system model that integrates the factorization machine (FM) and deep neural networks (DNNs). This model automates feature combination learning, removing the burden of feature engineering. Its FM effectively captures the second-order combination relationship between features, while the DNNs deeply explores the high-order feature crosses. DeepFM has excellent performance in processing sparse data and can memorize known combinations and generalize new combinations, which is applicable to scenarios such as CTR estimation and personalized recommendation.

Similar to other methods, one-hot encoding is performed on the sparse features, and then the sparse features are input into the embedding layer, while the dense features are normalized.
- FM:
- Linear part: Weighted summation of raw features.
- Second-order crossing: Second-order crosses between all features are captured through the inner product.
- DNN: MLP is used to extract high-order feature representations.
- Output prediction: Combine the outputs of FM and DNN, and generate the final recommendation probability or regression value.
DFFM
DFFM is an enhanced recommendation algorithm that integrates domain awareness and feature modeling. By introducing domain information, DFFM emphasizes the importance of different domain features besides considering the crossing between features. This model uses the deep learning architecture to accurately capture user preferences and behavior patterns during cross-domain data processing, improving the accuracy and personalization of the recommendation system. It is especially applicable to multi-domain or cross-platform recommendation scenarios.

Features are classified into domain-agnostic feature Ea, domain-specific feature Ed, target item feature Et, and historical behavior feature Eh.
Domain-enhanced inner product processing is performed on Ea and Ed, and then Ea and Ed are input into the fully connected layers (FC layers) to generate domain-enhanced features. After Ed and Et are concatenated, an attention weighting operation is performed on Eh and the concatenated Ed and Et to generate the domain facilitated user behavior (DFUB) features. Concatenate the domain-enhanced features and DFUB features, and input them to the FC layers for the final result.
DSSM
DSSM is a semantic model based on the deep network. It calculates a similarity by mapping user features and item features to the semantic space of the common dimension to predict the CTR.

After both the user features and item features pass through the embedding layers, the DNNs generate vector representations in the semantic space of the common dimension, and then calculate a similarity of the vectors.
Test Procedure

