Python
TideSwing Technology Co., Ltd. related projects 2020.07 - present
Recommender System Related
With the increase in the volume of users and the amount of dynamic submission data, a personalized recommendation system that can better meet the different preferences of different users is required.
Worked on feature extraction based on business scenarios, and created user and text features through the two-tower model. Independently complete feature filtering, data preprocessing, model training and iterative systems to improve user click-through rates and retention rates.
· vector search · ranking models (such as DIN, DIEN) · tree model · comparative learning
NLP Related
To complete the corresponding matching according to the user's posts, it is necessary to parse and process the text, process and analyze the user's intention and published data through multiple dimensions such as word meaning understanding and semantic analysis, and design a more accurate matching mechanism and plan.
Responsible for the entire project from data collection, data labeling, model training, testing, deployment, and iterative optimization. The project mainly involves: named entity recognition, multi-label classification model, topic model, word segmentation, new word discovery and other NLP tasks.
· Bert pretrained model · Fine-tuned · BiLSTM · LDA · CRF
User portrait
According to the user's basic attributes and text information, the user's interests and preferences are obtained, and the user portrait is constructed.
For items that match user text, it is necessary to recommend more according to the user's personal preference, and complete the user portrait label independently
System construction, label extraction of user data, storage and query of graph database, and regular update, and integrate them into the recommendation system to improve the accuracy.
· Label Extraction · Knowledge Graph · Graph Database
Sanger Institue University of Cambridge Prediction of Genetic Essentialism 2019.8 - 2019.12
It is necessary to predict the essentialism of genes in gene expression data by comparing the strategies of different models, which is an important preliminary analysis work for genetic editing related research.
Worked on the Predictiion of the gene expression data given by the laboratory, and used different machine learning models such as linear regression and lasso regression to predict the essentialism of the corresponding gene.
· Genetic Data Preprocessing · Linear Regression · Lasso regression
Combining Imaging Biomarkers and AI on medical records to shed light on Alzheimer's disease
(MRes project) 09.2018 - 08.2019
Data from Alzheimer's disease ADNI was used, including clinical data in the form of medical records, biological specimen data, genetic data, and imaging data. Predicting advanced Alzheimer's disease by analyzing medical records and other biomarker data of different patients.
Combine the machine learning and natural language processing techniques on imaging biomarkers and medical records, and reveal Alzheimer's disease through related research to achieve an earlier diagnosis.
· LDA Topic modelling · Data processing and analysis with Python
· Deep Learning using LSTM · Medical Knowledge Graph
· Entity Extraction · Intention Recognition