基于CNN的中文文本分类算法应用与场景

利用word2vec获取中文文本向量,再输入卷积网络分类。

可应用场景

- 垃圾邮件过滤

- 情感分析

运行方法

1. 训练:run python train.py

2. 查看summaries:run tensorboard --logdir /{PATH_TO_CODE}/runs/{TIME_DIR}/summaries/

3. 分类:run python eval.py

提示

- 可自指定分类文件

- 若需测试准确率,需指定标签文件

推荐环境

- python 2.7.13

- tensorflow 1.0.0

- gensim 1.0.1

- Ubuntu16.04 64bit

zip
neal23333-zh_cnn_text_classify-master.zip 预估大小:30个文件
folder
zh_cnn_text_classify 文件夹
file
train.py 9KB
file
data_helpers.py 4KB
file
word2vec_helpers.py 3KB
file
eval.py 5KB
file
README.md 2KB
folder
runs 文件夹
folder
1492954581 文件夹
folder
checkpoints 文件夹
file
model-300.meta 100KB
file
model-600.meta 100KB
file
model-400.meta 100KB
file
model-600.index 1009B
file
model-600.data-00000-of-00001 2.26MB
file
model-500.data-00000-of-00001 2.26MB
file
model-500.index 1009B
file
model-500.meta 100KB
file
model-400.data-00000-of-00001 2.26MB
file
model-300.index 1009B
file
model-400.index 1009B
file
model-200.meta 100KB
file
model-300.data-00000-of-00001 2.26MB
file
model-200.data-00000-of-00001 2.26MB
file
model-200.index 1009B
file
checkpoint 697B
folder
summaries 文件夹
folder
train 文件夹
file
events.out.tfevents.1492954586.escenter11PC 14.84MB
folder
dev 文件夹
file
events.out.tfevents.1492954586.escenter11PC 156KB
file
training_params.pickle 59B
file
trained_word2vec.model 845KB
file
prediction.csv 45KB
folder
data 文件夹
file
ham_100.utf8 58KB
file
spam_100.utf8 44KB
file
.gitignore 14B
file
text_cnn.py 3KB
zip 文件大小:13.32MB