KAWAHARA Takafumi, HASHIGUCHI Tomoya, YUMOTO Takayuki, OHSHIMA Hiroaki
J105-D(5) 322-336, May 1, 2022 Peer-reviewed
In this research, we propose a method for estimating the degree of injury from text documents that describe accidents. It is assumed that a text document to be input consists of a few sentences. The proposed method is to estimate the degree of injury by solving a classification problem using machine learning techniques. The data used in this research is the accident data published in the Accident Information Data Bank System. The text in the “Summary of the accident” field is used as an input. In the proposed method, an input text is represented as a distributed representation using the generic language model called BERT. As a model for BERT, we use a pre-trained model trained using the Japanese Wikipedia. To improve the performance of the task of estimating the degree of injury, we introduce the following four ideas; (1) the class weights, (2) the ordinal classification, (3) the multitasking learning, and (4) the fine-tuning model with token label estimation. We examined the effects of using and not using these ideas on the accuracy, Macro F1, RMSE, and confusion matrices for the task of estimating the degree of injury. The results showed that Macro F1 and RMSE are improved when (1) the class weights and (2) the ordinal classification are introduced. In addition, the accuracy is improved when (3) the multitasking learning is introduced.