Nowadays, healthcare is one of the most interested areas in the 4th industrial revolution, and we - FPT Software - cannot stay out of the line. We have created many products as well as providing healthcare solutions to both domestic and foreign customers. Recently, we has just delivered a PoC solution and implemented a Healthcare chatbot project under strict time constraints.
Our customers want to offer out-of-the-box dashboards on platform for their pharma/Med-tech business users to gain insight from the data on their platform. They want to integrate a “Data Explore” module for conversational analytics as part of the dashboard to follow business users to simply ask questions about their data, and AI agent will use natural language processing (NPL) to translate the questions into queries and offer an instant answer back to the users. So that the users do not need to look through and drill down different charts or write their own query code to pull data from different data tables.
FPT Software has to complete a project that embed the analytics module directly on their current dashboard portal with following desired features:
- Allow users to ask natural language queries about their data.
- Build the capacity for NPL based on processing the questions.
- Be able to query multiple relevant tables and join the sub-result.
- Visualize the results into some specific charts based on questions/utterances.
Moreover, the customers require the project to be finished in only 5 weeks and be ready for the important meeting of products demonstration after that. This is truly a key point for FPT Software to prove the capacity in the field of AI.
As the shortage of time, we chose to use the machine learning to solve the problem of customers. Despite the popularity of deep learning recently, we have decided to use statistical learning techniques because they require less training data and are easier to tune manually.
The analysis process consists of three stages. Training, Prediction, and Answer Extraction.
The input of the Training Stage is Training Data. This stage aims to create a trained model, which is used to predict the slots (labels) of words in each new utterance, e.g intent slot, filter slot, table to query slot…
The training dataset is a collection of thousands of sentences will be associated slot labels. The number of slots must be pre-defined to train the model. If we want to have other slots, we need to train the model again.
The Training Stage needs to run in an offline mode including the following steps:
- Preprocessing: Transform the training data into a form that can be used in the next steps. This includes changing capital words, removing common words (a, an, the, etc.) and special characters.
- Feature Extraction: Based on the preprocessed data, we will build word’s feature via defining a template. As a result, each word will be represented by the amount of useful information and manually tagged with the name of labels.
- Model training: After representing the words in the training data, we use a training algorithm to analyze the training data and receive the trained model from the training data. This step is the most important and requires the expertise of data scientist in order to refine different templates and choose the best templates to build the trained model.
The input of the Prediction Stage is a new utterance, while its output is slots. The Prediction Stage works in online mode and needs to use the trained model received after the Training Stage. The Training Stage includes the following steps:
- Preprocessing: This step is the same with the Preprocessing step in the Training Stage for the data consistency.
- Feature Extraction: In this step, we extract the features of each new word after preprocessing and representing each word.
- Prediction: Based on word’s feature received after the Feature Extraction step and the trained model received after the Training Stage, we can predict which label the world is belonging to.
Extracting answer stage
The input of the Extracting Answer Stage is the slots extracted from Prediction Stage. After the slots are available, we use rule base on to map these slots n SQL queries. Then, we extract information from database to answer questions. Finally, results will be formed as some statistical charts and visualized within customer’s dashboard portal.
With the above system and architecture, the project has been presented in the customers’ Leadership Team meeting. They are very satisfied with the professional attitude and highly appreciate the capacity of FPT Software. The PoC has successfully completed and customers have confirmed that they will continue to cooperate with FPT Software in the next phase of the product. Specifically, the PoC will be built as a complete module in their entire platform with more stringer and specific requirements within the healthcare sector.
Benefit and value
- Help business users with a simple and intuitive way to understand the databases without the need of in-depth knowledge, and not having to write code to query the data.
- Complete use natural language to ask/query data so the framework can be used as a Q&A chatbot system.
- Demonstrate the capacity of FPT Software in the field of NPL by solving difficult problems of customers in a very short time.
Interested in our healthcare technology service? Check here to explore what we can offer!