Skip to main content
Jieping Ye

    Jieping Ye

    For many applications, predicting the users' intents can help the system provide the solutions or recommendations to the users. It improves the user experience, and brings economic benefits. The main challenge of user intent... more
    For many applications, predicting the users' intents can help the system provide the solutions or recommendations to the users. It improves the user experience, and brings economic benefits. The main challenge of user intent prediction is that we lack enough labeled data for training, and some intents (labels) are sparse in the training set. This is a general problem for many real-world prediction tasks. To overcome data sparsity, we propose a masked-field pre-training framework. In pre-training, we exploit massive unlabeled data to learn useful feature interaction patterns. We do this by masking partial field features, and learning to predict them from other unmasked features. We then finetune the pre-trained model for the target intent prediction task. This framework can be used to train various deep models. In the intent prediction task, each intent is only relevant to partial features. To tackle this problem, we propose a Field-Independent Transformer network. This network generates separate representation for each field, and aggregates the relevant field representations with attention mechanism for each intent. We test our method on intent prediction datasets in customer service scenarios as well as several public datasets. The results show that the masked-field pre-training framework significantly improves the prediction precision for deep models. And the Field-Independent Transformer network trained with the masked-field pre-training framework outperforms the state-of-the-art methods in the user intent prediction.
    Ride sharing apps like Uber and Didi Chuxing have played an important role in addressing the users' transportation needs, which come not only in huge volumes, but also in great variety. While some users prefer low-cost services such... more
    Ride sharing apps like Uber and Didi Chuxing have played an important role in addressing the users' transportation needs, which come not only in huge volumes, but also in great variety. While some users prefer low-cost services such as carpooling or hitchhiking, others prefer more pricey options like taxi or premier services. Further analyses suggest that such preference may also be associated with different time and location. In this paper, we empirically analyze the preferred services and propose a recommender system which provides service recommendation based on temporal, spatial, and behavioral features. Offline simulations show that our system achieves a high prediction accuracy and reduces the user's effort in finding the desired service. Such a recommender system allows a more precise scheduling for the platform, and enables personalized promotions.
    Technology-design co-optimization methodologies of the resistive cross-point array are proposed for implementing the machine learning algorithms on a chip. A novel read and write scheme is designed to accelerate the training process,... more
    Technology-design co-optimization methodologies of the resistive cross-point array are proposed for implementing the machine learning algorithms on a chip. A novel read and write scheme is designed to accelerate the training process, which realizes fully parallel operations of the weighted sum and the weight update. Furthermore, technology and design parameters of the resistive cross-point array are co-optimized to enhance the learning accuracy, latency and energy consumption, etc. In contrast to the conventional memory design, a set of reverse scaling rules is proposed on the resistive cross-point array to achieve high learning accuracy. These include 1) larger wire width to reduce the IR drop on interconnects thereby increasing the learning accuracy; 2) use of multiple cells for each weight element to alleviate the impact of the device variations, at an affordable expense of area, energy and latency. The optimized resistive cross-point array with peripheral circuitry is implemented at the 65 nm node. Its performance is benchmarked for handwritten digit recognition on the MNIST database using gradient-based sparse coding. Compared to state-of-the-art software approach running on CPU, it achieves >103 speed-up and >106 energy efficiency improvement, enabling real-time image feature extraction and learning.
    Estimating missing values in visual data is a challenging problem in computer vision, which can be considered as a low rank matrix approximation problem. Most of the recent studies use the nuclear norm as a convex relaxation of the rank... more
    Estimating missing values in visual data is a challenging problem in computer vision, which can be considered as a low rank matrix approximation problem. Most of the recent studies use the nuclear norm as a convex relaxation of the rank operator. However, by minimizing the nuclear norm, all the singular values are simultaneously minimized, and thus the rank can not be well approximated in practice. In this paper, we propose a novel matrix completion algorithm based on the Truncated Nuclear Norm Regularization (TNNR) by only minimizing the smallest N-r singular values, where N is the number of singular values and r is the rank of the matrix. In this way, the rank of the matrix can be better approximated than the nuclear norm. We further develop an efficient iterative procedure to solve the optimization problem by using the alternating direction method of multipliers and the accelerated proximal gradient line search method. Experimental results in a wide range of applications demonstrate the effectiveness of our proposed approach.
    Vehicle travel time estimation or estimated time of arrival (ETA) is one of the most important location-based services (LBS). It is becoming increasingly important and has been widely used as a basic service in navigation systems and... more
    Vehicle travel time estimation or estimated time of arrival (ETA) is one of the most important location-based services (LBS). It is becoming increasingly important and has been widely used as a basic service in navigation systems and intelligent transportation systems. This paper presents a novel machine learning solution to predict the vehicle travel time based on floating-car data. First, we formulate ETA as a pure spatial-temporal regression problem based on a large set of effective features. Second, we adapt different existing machine learning models to solve the regression problem. Furthermore, we propose a Wide-Deep-Recurrent (WDR) learning model to accurately predict the travel time along a given route at a given departure time. We then jointly train wide linear models, deep neural networks and recurrent neural networks together to take full advantages of all three models. We evaluate our solution offline with millions of historical vehicle travel data. We also deploy the proposed solution on Didi Chuxing's platform, which services billions of ETA requests and benefits millions of customers per day. Our extensive evaluations show that our proposed deep learning algorithm significantly outperforms the state-of-the-art learning algorithms, as well as the solutions provided by leading industry LBS providers.
    Sparse learning enables dimension reduction and efficient modeling of high dimensional signals and images, but it may need to be tailored to best suit specific applications and datasets. Here we used sparse learning to efficiently... more
    Sparse learning enables dimension reduction and efficient modeling of high dimensional signals and images, but it may need to be tailored to best suit specific applications and datasets. Here we used sparse learning to efficiently represent functional magnetic resonance imaging (fMRI) data from the human brain. We propose a novel embedded sparse representation (ESR), to identify the most consistent dictionary atoms across different brain datasets via an iterative group-wise dictionary optimization procedure. In this framework, we introduced additional criteria to make the learned dictionary atoms more consistent across different subjects. We successfully identified four common dictionary atoms that follow the external task stimuli with very high accuracy. After projecting the corresponding coefficient vectors back into the 3-D brain volume space, the spatial patterns are also consistent with traditional fMRI analysis results. Our framework reveals common features of brain activation in a population, as a new, efficient fMRI analysis method.
    In practical applications, the purpose of object detection is to determine the target space position based on the image. At the same time, better performance is obtained under the premise of reducing the computational overhead. The... more
    In practical applications, the purpose of object detection is to determine the target space position based on the image. At the same time, better performance is obtained under the premise of reducing the computational overhead. The dataset of the PAIR competition has the characteristics of imbalanced categories, low quality of images and inconsistent annotations. To address this issue, firstly we adopt an improved cross entropy loss function and data augmentations to rebalance the data distribution. Then the extra datasets are involved to neutralize the low images quality and annotation inconsistency issues. Secondly, this competition focuses on object detection on embedded device. So we apply knowledge distillation to fine-tune a lightweight detection model. Our detection model uses MobileNetV3 Small as backbone and SSDLite as detector head. In order to improve detection performance on small targets, FPNLite is included so that low-level features can be utilized. And we also apply TensorRT library to accelerate the inference procedure further. Eventually, our method achieves the 3rd place in the final score list of competition as the fastest, lightest and the most computation economically solution. Our code will soon be open source.
    This thesis concentrates on the theory, implementation, and application of dimension reduction in data mining. Many real-world applications such as text mining, image retrieval, face recognition, and microarray data analysis involve... more
    This thesis concentrates on the theory, implementation, and application of dimension reduction in data mining. Many real-world applications such as text mining, image retrieval, face recognition, and microarray data analysis involve high-dimensional data, where the dimension can often run into the thousands. Traditional machine learning and data mining techniques are not effective when dealing with such high-dimensional data because of the so-called curse of dimensionality. A natural approach to deal with this problem is to apply dimension reduction as a pre-processing step. The first part of the thesis presents a dimension reduction technique for data in matrix form. The essence of the proposed algorithm is that it applies a bilinear transformation on the data. Such a bilinear transformation is particularly appropriate for data in matrix representation and often leads to lower computational costs compared to traditional algorithms. A natural application of the algorithm is in image compression and retrieval, where each image is represented in its native matrix representation. Extensive experiments performed using image data show that the proposed algorithm outperforms the traditional ones, in terms of computational time and space requirement, while maintaining competitive performance in classification. The second part of the thesis focuses on generalizing classical Linear Discriminant Analysis (LDA) to overcome problems associated with undersampled data, where the data dimension is much greater than the number of data items. The optimization criterion in classical LDA fails when the scatter matrices are singular, which is the case for undersampled problems. A new optimization criterion has been developed which is applicable to undersampled problems. The algorithms based on the proposed criterion have been shown to be very competitive in classification. The final part of the thesis considers the problem of designing an efficient and incremental dimension reduction algorithm. An LDA-based incremental dimension reduction algorithm has been developed. Unlike other LDA-based algorithms, this algorithm does not require the whole data matrix in main memory. This is desirable for large datasets. More importantly, with the insertion of new data items, the proposed algorithm can constrain the computational cost by efficient incremental updating techniques. Experiments reveal that the proposed algorithm is competitive in classification, but has much lower computational cost, especially when new data items are inserted dynamically.

    And 609 more