法政大学学術機関リポジトリ >
030 紀要 >
法政大学大学院紀要. 理工学・工学研究科編 >

このアイテムの引用には次の識別子を使用してください: http://hdl.handle.net/10114/13580

タイトル: 視覚文字表現と深層学習による文書分類法
著者: 島田, 大輔
Shimada, Daisuke
発行日: 2017-3-31
出版者: 法政大学大学院理工学・工学研究科
抄録: Languages such as Chinese and Japanese have a significantly large number of characters as compared to other languages, and each of their sentences consists of several concatenated words with wide varieties of inflected forms; thus appropriate word segmentation is quite difficult. In this study, we propose a new and efficient document classification technique for such languages. The proposed method is characterized into a new “image-based character embedding” method and character-level convolutional neural networks method with “wildcard training.” The first method encodes each character based on its visual structures and preserves them. Further, the second method treats some of the input characters as wildcards to prevent over-fitting of the classifier. We confirmed that our method showed superior performance to conventional ones for Japanese document classification tasks without data pre-processing. Key Words : document classification, deep learning, convolutional neural network, Japanese character
記述: ヘッダー部分に誤記 ; (誤) 法政大学大学院理工学・工学研究科紀要 (正) 法政大学大学院紀要. 理工学・工学研究科編
URI: http://hdl.handle.net/10114/13580
ISSN: 21879923
出現コレクション:法政大学大学院紀要. 理工学・工学研究科編


ファイル 記述 サイズフォーマット
15R4119.pdf932.23 kBAdobe PDF見る/開く



Valid XHTML 1.0! DSpace Software Copyright © 2002-2010  Duraspace - ご意見をお寄せください