Python人人网信息采集与数据挖掘技巧

人人网的安全机制得到了加强,封锁了无权限访问的内容,但仍可通过浏览器抓取可访问的信息。在Ubuntu、Win7和XP环境下均可进行操作。Python版本为2.7,使用igraph和pycairo进行图形绘制,Ubuntu用户可通过apt-get install python3-igraph安装,Win32用户需下载安装igraph和pycairo。使用MySQL作为存储介质时,需要安装相应组件。

zip
renren-master.zip 预估大小:36个文件
folder
renren-master 文件夹
folder
.gitmodules 文件夹
file
net_graph.py 987B
folder
config 文件夹
file
spider.ini 132B
file
grabrr.py 5KB
file
mysql.ini 1KB
file
spider.py 5KB
file
test_net_graph.png 58KB
file
repo_file.py 2KB
file
test_parse.py 12KB
file
downloader.py 4KB
file
settings.py 256B
file
parse.py 8KB
file
.gitignore 427B
file
get_info.py 2KB
file
requirement.txt 22B
file
repo_mysql.py 5KB
file
README.md 8KB
folder
topic 文件夹
folder
jieba 文件夹
folder
posseg 文件夹
file
prob_emit.py 3.07MB
file
char_state_tab.py 1.55MB
file
viterbi.py 1KB
file
__init__.py 3KB
file
prob_trans.py 252KB
file
prob_start.py 6KB
file
__init__.py 4KB
file
dict.txt 5.05MB
folder
analyse 文件夹
file
idf.txt 5.91MB
file
__init__.py 778B
file
README.md 9KB
folder
finalseg 文件夹
file
prob_emit.py 1.13MB
file
__init__.py 2KB
file
prob_trans.py 235B
file
prob_start.py 72B
file
demo.py 5KB
file
nstatus_nkeyword.png 40KB
file
jieba-master.zip 5MB
file
README.md 81B
zip 文件大小:10.08MB