1. Introduction
With the gradual improvement of people’s living standards, cars have gradually entered our door to door. There will also be a variety of confusion. When we have a problem, we usually open “Baidu”. However, we live in an era of rapid development of the Internet, and there is more and more complicated information, so it is difficult to find really effective and valuable information among them. Therefore, the traditional way of searching information can no longer meet people’s needs [1]. In the past two years, with the development of artificial intelligence, intelligent questions and answers have gradually received attention. We built an intelligent question and answer system, which can give effective answers to questions in mandarin. It greatly facilitates our life. The most difficult problem in intelligent question answering is to identify the meaning of sentences. For example, many different sentences express the same problem. How to identify them and then accurately correspond to the answers. It is one of the criteria for testing the quality of this question and answer system [2].
The automatic question answering system began to be studied by people in the 1960s. In the 1990s, with the development of natural language processing technology and the application of semantic information, the automatic question answering system has been greatly improved, from about 30% in the past to more than 50%. In 1993, MIT released START on the Internet, which greatly improved the number and types of questions answered, and made a staged breakthrough in the automatic question answering system technology [3]. Other countries have also begun to invest in research, including the CLEF cross-language question and answer system and Japan’s NTCIR [4]. There’s also a question and answer system for turby. Its name is FREYA. It is specifically designed to train semantic models. The development of our country’s intelligent question-answering system began in 1970. China’s first human-computer dialogue system was invented in 1980 by the Research Institute of the Chinese Academy of Sciences. Intelligent question-and-answer technology is becoming more and more mature. It gradually replaces human customer service, liberates a large amount of labor, and can solve people’s problems all day long.
2. Construct Corpus
We collect the problems that car owners often encounter during their use. And expand it to build a question-answer corpus. The same Id is the same semantic question, and the first one is the standard question. There are 123 common questions in the corpus. The structure is shown in Table 1.
Table 1. Question and answer table.
3. TFIDF-Based FAQ System
TFIDF Value
TF-IDF Full name is Term Frequency/Inverse Document Frequency. In 1973 Salton [5] came up with TFIDF. It is a word frequency-inverse document frequency algorithm and is a statistical method. The number of times a given word appears in a document, we define it as the word frequency. In how many documents a word appears in, we define it as a reverse word frequency. Then multiplying the two parts of the word frequency is the TFIDF value [6].
[7]
We can use “jieba” word segmentation to solve the problem of corpus, and then extract the keywords and part of speech tagging, and then statistical word frequency and reverse word frequency, calculate the TFIDF value. Its value can be approximated to a probability between 0 and 1. Arrange the value of TFIDF from small to large, the larger the value is, the more important it is. We can make the problem the row of the matrix and the keyword the column of the matrix. Form a TFIDF matrix and save it locally. When the user enters a question, the system participles the question and calculates the TFIDF. This value is mapped to the vector space, and then the inner product space cosine is applied to calculate their similarity with the previously saved TFIDF matrix locally. Eventually we return the answer to the most similar question to the user. The structural framework is shown below (Figure 1) [8].
4. Semantic Similarity Model Based on BILSTM-Siamese
4.1. Model Architecture
We convert the problem pair (Q1, Q2) into a list of characters of length 15. According to previous research scholars, character-based models perform better. So here, we use a pre-trained 100-dimensional character vector to convert the Q1, Q2 character list into a character vector matrix. Use the BILSTM-32 network to extract the timing characteristics of the statement, then connect the neural node to the fully connected layer of 64 and output the advanced features. Finally, we use cosine similarity to calculate the similarity of advanced features. [9] In the entire model, the left subnetwork shares the weight parameters with the right subnetwork. Therefore, when Q1 is exactly the same as Q2, the model output is 1. The result is shown in Figure 2.
4.2. Sample Construction
1) Under the same problem, different questions are combined between two pairs to form a synonymous statement pair (positive sample set);
2) Using 123 standard problem sets, use tfidf to retrieve the k problems that are closest to each problem (removing themselves), and compose non-synonymous statement pairs (negative sample).
4.3. Positive Sample Expansion
1) Commonly used consultation words to expand, how to swear, how; where ® where, where, where, what position; why ® how, oh, for example: How to adjust the seat? Expanded to:
(How to adjust the seat? How to adjust the seat?)
Where is the speedometer? Expanded to:
(Where is the speedometer? Where is the speedometer? Where is the speedometer? Where is the speedometer? Where is the speedometer?)
Why is the warning light on? Expanded to:
(Why is the warning light on?, how does the warning light light up? Is the warning light on?)
2) Randomly exchange the position of two words. E.g.:
How to adjust the seat? Expanded to:
(How to adjust the seat, how to adjust the seat, how to adjust the seat, …)
4.4. Negative Sample Expansion
The negative sample is expanded by the unexpanded positive sample, that is, if q1 and q2 are negative, the positive examples of q1 and q2 can form a negative example.
The sample set built is shown in Table 2.
The constructed sample set is divided into a training set and a test set according to a ratio of 7:3, as shown in Table 3.
5. Experiment
The model training batch size is 128, and the epoch (iteration period) is 30. Each layer of the model is processed by dropout, which speeds up the training iteration and prevents over-fitting.
The performance of the model on the test set is shown in Table 4.
Accuracy (precision): (19,894 + 8590)/28,622 = 99.52%.
Recall rate (recall): 8590/8590 = 100%.
F1:2 * precision * recall/(precision + recall) = 99.20%.
6. FAQ System Structure
Use the data in Figure 1 to build a QA corpus. As shown in Figure 3, for a user input, we first use the TFIDF to recall k questions as candidate sets, and then use the BILSTM-Siamese semantic similarity model constructed above to calculate the semantic similarity between the input and k candidate sets. Eventually return the answer to the semantically most similar candidate question.
We feed the problem into the system. Run according to the program in our system. Can quickly output the most similar answer. The corresponding accuracy rate is high, as shown in Figure 4.
7. Summarize
This paper introduces the automatic question answering system model in the
field of mathematical computing, and the specific operation process improves the accuracy of answering questions. In this paper, there are still some problems in expanding the vocabulary of automobile proper nouns and solving various names of the same parts. The characteristics of interaction between the two sentences of BILSTM are not perfect. These are the places that need to be improved next.