一、python3 爬虫环境搭建之 Anaconda 和 Scrapy

一、python3 爬虫环境搭建之 Anaconda 和 Scrapy

supercrys

于 2018-04-27 21:33:04 发布

11590
收藏 11
分类专栏: python3 爬虫 文章标签: Anaconda Scrapy Windows10 conda install -c conda-forge s
版权

python3 爬虫
专栏收录该内容
4 篇文章0 订阅
订阅专栏
python3 只是爬虫开发的编程语言,开发爬虫还需要很多其他环境,比如 IDE 工具,常用库等等. 根据我的使用体验,推荐如下环境搭建步骤,桌面环境为 Windows 10.

 

安装 Anaconda
Anaconda 是一个集成度很高的基于 python 的数据科学平台,无论在开发爬虫还是机器学习等方面,都游刃有余. Anaconda 包含 250 多个数据科学包和自带的包管理工具 conda,一行命令就可以轻松安装绝大部分依赖库, 比如 Scikit-Learn, Scipy, Tensorflow 等.

Anaconda 的下载地址

安装这个软件跟着提示走就可以,唯一要注意的地方就是软件的安装目录最好是英文的,并且不能有空格. 安装好后找到如下图所示三个图标.

 

比较常用的就是这三个应用了, Anaconda 在安装好后已经为我们配好了自己的系统环境和 python3 的环境,通常安装依赖的话只需要在命令行终端 Anaconda Prompt 直接执行 conda 命令就好.

比如,可以使用下面的命令查看当前配置的环境路径:

>conda env list
# conda environments:
#
base * D:\ProgramFiles\Anaconda
使用下面的命令查看不同路径下的 python:

>where python
D:\ProgramFiles\Anaconda\python.exe
查看当前使用的 python 的版本信息:

>python –version
Python 3.6.3 :: Anaconda custom (64-bit)
查看当前环境下已经安装好的包:

>conda list
# packages in environment at D:\ProgramFiles\Anaconda:
#
# Name Version Build Channel
_ipyw_jlab_nb_ext_conf 0.1.0 py36he6757f0_0
alabaster 0.7.10 py36hcd07829_0
anaconda custom py36h363777c_0
anaconda-client 1.6.14 py36_0
anaconda-navigator 1.8.3 py36_0
anaconda-project 0.8.0 py36h8b3bf89_0
asn1crypto 0.22.0 py36h8e79faa_1
astroid 1.5.3 py36h9d85297_0
astropy 2.0.2 py36h06391c4_4
babel 2.5.0 py36h35444c1_0
backports 1.0 py36h81696a8_1
backports.shutil_get_terminal_size 1.0.0 py36h79ab834_2
beautifulsoup4 4.6.0 py36hd4cc5e8_1
bitarray 0.8.1 py36h6af124b_0
bkcharts 0.2 py36h7e685f7_0
blaze 0.11.3 py36h8a29ca5_0
bleach 2.0.0 py36h0a7e3d6_0
bokeh 0.12.10 py36h0be3b39_0
boto 2.48.0 py36h1a776d2_1
bottleneck 1.2.1 py36hd119dfa_0
bzip2 1.0.6 vc14hdec8e7a_1 [vc14]
ca-certificates 2017.08.26 h94faf87_0
cachecontrol 0.12.3 py36hfe50d7b_0
certifi 2017.7.27.1 py36h043bc9e_0
cffi 1.10.0 py36hae3d1b5_1
chardet 3.0.4 py36h420ce6e_1
click 6.7 py36hec8c647_0
cloudpickle 0.4.0 py36h639d8dc_0
clyent 1.2.2 py36hb10d595_1
colorama 0.3.9 py36h029ae33_0
comtypes 1.1.2 py36heb9b3d1_0
conda 4.5.1 py36_0
conda-build 3.0.27 py36h309a530_0
conda-env 2.6.0 h36134e3_1
conda-verify 2.0.0 py36h065de53_0
console_shortcut 0.1.1 h6bb2dd7_3
contextlib2 0.5.5 py36he5d52c0_0
cryptography 2.0.3 py36h123decb_1
curl 7.55.1 vc14hdaba4a4_3 [vc14]
cycler 0.10.0 py36h009560c_0
cython 0.26.1 py36h18049ac_0
cytoolz 0.8.2 py36h547e66e_0
dask 0.15.3 py36h396fcb9_0
dask-core 0.15.3 py36hd651449_0
datashape 0.5.4 py36h5770b85_0
decorator 4.1.2 py36he63a57b_0
distlib 0.2.5 py36h51371be_0
distributed 1.19.1 py36h8504682_0
docutils 0.14 py36h6012d8f_0
entrypoints 0.2.3 py36hfd66bb0_2
et_xmlfile 1.0.1 py36h3d2d736_0
fastcache 1.0.2 py36hffdae1b_0
filelock 2.0.12 py36hd7ddd41_0
flask 0.12.2 py36h98b5e8f_0
flask-cors 3.0.3 py36h8a3855d_0
freetype 2.8 vc14h17c9bdf_0 [vc14]
get_terminal_size 1.0.0 h38e98db_0
gevent 1.2.2 py36h342a76c_0
glob2 0.5 py36h11cc1bd_1
greenlet 0.4.12 py36ha00ad21_0
h5py 2.7.0 py36hfbe0a52_1
hdf5 1.10.1 vc14hb361328_0 [vc14]
heapdict 1.0.0 py36h21fa5f4_0
html5lib 0.999999999 py36ha09b1f3_0
icc_rt 2017.0.4 h97af966_0
icu 58.2 vc14hc45fdbb_0 [vc14]
idna 2.6 py36h148d497_1
imageio 2.2.0 py36had6c2d2_0
imagesize 0.7.1 py36he29f638_0
intel-openmp 2018.0.0 hcd89f80_7
ipykernel 4.6.1 py36hbb77b34_0
ipython 6.1.0 py36h236ecc8_1
ipython_genutils 0.2.0 py36h3c5d0ee_0
ipywidgets 7.0.0 py36h2e74ada_0
isort 4.2.15 py36h6198cc5_0
itsdangerous 0.24 py36hb6c5a24_1
jdcal 1.3 py36h64a5255_0
jedi 0.10.2 py36hed927a0_0
jinja2 2.9.6 py36h10aa3a0_1
jpeg 9b vc14h4d7706e_1 [vc14]
jsonschema 2.6.0 py36h7636477_0
jupyter 1.0.0 py36h422fd7e_2
jupyter_client 5.1.0 py36h9902a9a_0
jupyter_console 5.2.0 py36h6d89b47_1
jupyter_core 4.3.0 py36h511e818_0
jupyterlab 0.27.0 py36h34cc53b_2
jupyterlab_launcher 0.4.0 py36h22c3ccf_0
lazy-object-proxy 1.3.1 py36hd1c21d2_0
libiconv 1.15 vc14h29686d3_5 [vc14]
libpng 1.6.32 vc14h5163883_3 [vc14]
libssh2 1.8.0 vc14hcf584a9_2 [vc14]
libtiff 4.0.8 vc14h04e2a1e_10 [vc14]
libxml2 2.9.4 vc14h8fd0f11_5 [vc14]
libxslt 1.1.29 vc14hf85b8d4_5 [vc14]
llvmlite 0.20.0 py36_0
locket 0.2.0 py36hfed976d_1
lockfile 0.12.2 py36h0468280_0
lxml 4.1.0 py36h0dcd83c_0
lzo 2.10 vc14h0a64fa6_1 [vc14]
markupsafe 1.0 py36h0e26971_1
matplotlib 2.1.0 py36h11b4b9c_0
mccabe 0.6.1 py36hb41005a_1
menuinst 1.4.10 py36h42196fb_0
mistune 0.7.4 py36h4874169_0
mkl 2018.0.0 h36b65af_4
mkl-service 1.1.2 py36h57e144c_4
mpmath 0.19 py36he326802_2
msgpack-python 0.4.8 py36h58b1e9d_0
multipledispatch 0.4.9 py36he44c36e_0
navigator-updater 0.1.0 py36h8a7b86b_0
nbconvert 5.3.1 py36h8dc0fde_0
nbformat 4.4.0 py36h3a5bc1b_0
networkx 2.0 py36hff991e3_0
nltk 3.2.4 py36hd0e0a39_0
nose 1.3.7 py36h1c3779e_2
notebook 5.0.0 py36hd9fbf6f_2
numba 0.35.0 np113py36_10
numexpr 2.6.2 py36h7ca04dc_1
numpy 1.13.3 py36ha320f96_0
numpydoc 0.7.0 py36ha25429e_0
odo 0.5.1 py36h7560279_0
olefile 0.44 py36h0a7bdd2_0
openpyxl 2.4.8 py36hf3b77f6_1
openssl 1.0.2l vc14hcac20b0_2 [vc14]
packaging 16.8 py36ha0986f6_1
pandas 0.20.3 py36hce827b7_2
pandoc 1.19.2.1 hb2460c7_1
pandocfilters 1.4.2 py36h3ef6317_1
partd 0.3.8 py36hc8e763b_0
path.py 10.3.1 py36h3dd8b46_0
pathlib2 2.3.0 py36h7bfb78b_0
patsy 0.4.1 py36h42cefec_0
pep8 1.7.0 py36h0f3d67a_0
pickleshare 0.7.4 py36h9de030f_0
pillow 4.2.1 py36hdb25ab2_0
pip 9.0.1 py36hadba87b_3
pkginfo 1.4.1 py36hb0f9cfa_1
ply 3.10 py36h1211beb_0
progress 1.3 py36hbeca8d3_0
prompt_toolkit 1.0.15 py36h60b8f86_0
psutil 5.4.0 py36h4e662fb_0
py 1.4.34 py36ha4aca3a_1
pycodestyle 2.3.1 py36h7cc55cd_0
pycosat 0.6.3 py36h413d8a4_0
pycparser 2.18 py36hd053e01_1
pycrypto 2.6.1 py36he68e6e2_1
pycurl 7.43.0 py36h086bf4c_3
pyflakes 1.6.0 py36h0b975d6_0
pygments 2.2.0 py36hb010967_0
pylint 1.7.4 py36ha4e6ded_0
pyodbc 4.0.17 py36h0006bc2_0
pyopenssl 17.2.0 py36h15ca2fc_0
pyparsing 2.2.0 py36h785a196_1
pyqt 5.6.0 py36hb5ed885_5
pysocks 1.6.7 py36h698d350_1
pytables 3.4.2 py36h71138e3_2
pytest 3.2.1 py36h753b05e_1
python 3.6.3 h9e2ca53_1
python-dateutil 2.6.1 py36h509ddcb_1
pytz 2017.2 py36h05d413f_1
pywavelets 0.5.2 py36hc649158_0
pywin32 221 py36h9c10281_0
pyyaml 3.12 py36h1d1928f_1
pyzmq 16.0.2 py36h38c27d9_2
qt 5.6.2 vc14h6f8c307_12 [vc14]
qtawesome 0.4.4 py36h5aa48f6_0
qtconsole 4.3.1 py36h99a29a9_0
qtpy 1.3.1 py36hb8717c5_0
requests 2.18.4 py36h4371aae_1
rope 0.10.5 py36hcaf5641_0
ruamel_yaml 0.11.14 py36h9b16331_2
scikit-image 0.13.0 py36h6dffa3f_1
scikit-learn 0.19.1 py36h53aea1b_0
scipy 0.19.1 py36h7565378_3
seaborn 0.8.0 py36h62cb67c_0
setuptools 36.5.0 py36h65f9e6e_0
simplegeneric 0.8.1 py36heab741f_0
singledispatch 3.4.0.3 py36h17d0c80_0
sip 4.18.1 py36h9c25514_2
six 1.11.0 py36h4db2310_1
snowballstemmer 1.2.1 py36h763602f_0
sortedcollections 0.5.3 py36hbefa0ab_0
sortedcontainers 1.5.7 py36ha90ac20_0
sphinx 1.6.3 py36h9bb690b_0
sphinxcontrib 1.0 py36hbbac3d2_1
sphinxcontrib-websupport 1.0.1 py36hb5e5916_1
spyder 3.2.4 py36h8845eaa_0
sqlalchemy 1.1.13 py36h5948d12_0
sqlite 3.20.1 vc14h7ce8c62_1 [vc14]
statsmodels 0.8.0 py36h6189b4c_0
sympy 1.1.1 py36h96708e0_0
tblib 1.3.2 py36h30f5020_0
testpath 0.3.1 py36h2698cfe_0
tk 8.6.7 vc14hb68737d_1 [vc14]
toolz 0.8.2 py36he152a52_0
tornado 4.5.2 py36h57f6048_0
traitlets 4.3.2 py36h096827d_0
typing 3.6.2 py36hb035bda_0
unicodecsv 0.14.1 py36h6450c06_0
urllib3 1.22 py36h276f60a_0
vc 14 h2379b0c_2
vs2015_runtime 14.0.25123 hd4c4e62_2
wcwidth 0.1.7 py36h3d5aa90_0
webencodings 0.5.1 py36h67c50ae_1
werkzeug 0.12.2 py36h866a736_0
wheel 0.29.0 py36h6ce6cde_1
widgetsnbextension 3.0.2 py36h364476f_1
win_inet_pton 1.0.1 py36he67d7fd_1
win_unicode_console 0.5 py36hcdbd4b5_0
wincertstore 0.2 py36h7fe50ca_0
wrapt 1.10.11 py36he5f5981_0
xlrd 1.1.0 py36h1cb58dc_1
xlsxwriter 1.0.2 py36hf723b7d_0
xlwings 0.11.4 py36hd3cf94d_0
xlwt 1.3.0 py36h1a4751e_0
yaml 0.1.7 vc14hb31d195_1 [vc14]
zict 0.1.3 py36h2d8e73e_0
zlib 1.2.11 vc14h1cdd9ab_1 [vc14]
安装 Scrapy
Scrapy 是爬虫的常用框架之一, 官网的安装提示如下:

conda install -c conda-forge scrapy
但是,我按照上述方法安装后出现如下问题:

CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://conda.anaconda.org/conda-forge/win-64/libssh2-1.8.0-vc14_2.tar.bz2>
Elapsed: –

An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.

CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://conda.anaconda.org/conda-forge/noarch/hyperlink-17.3.1-py_0.tar.bz2>
Elapsed: –

An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.

CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://conda.anaconda.org/conda-forge/win-64/pydispatcher-2.0.5-py36_0.tar.bz2>
Elapsed: –

An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.

CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://conda.anaconda.org/conda-forge/win-64/yaml-0.1.7-vc14_0.tar.bz2>
Elapsed: –

An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.

CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://conda.anaconda.org/conda-forge/win-64/qt-5.6.2-vc14_1.tar.bz2>
Elapsed: –

An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.
有一些包安装失败,原因可能是上述命令使用的资源通道下载速度太慢导致连接不上,于是改用如下方法:

先查看 conda 上有没有提供当前 python 版本的 scrapy 包

>conda search scrapy
Loading channels: done
# Name Version Build Channel
scrapy 0.16.4 py26_0 pkgs/free
scrapy 0.16.4 py27_0 pkgs/free
scrapy 0.24.4 py27_0 pkgs/free
scrapy 1.0.1 py27_0 pkgs/free
scrapy 1.0.3 py27_0 pkgs/free
scrapy 1.1.1 py27_0 pkgs/free
scrapy 1.1.1 py34_0 pkgs/free
scrapy 1.1.1 py35_0 pkgs/free
scrapy 1.1.1 py36_0 pkgs/free
scrapy 1.3.3 py27_0 pkgs/free
scrapy 1.3.3 py35_0 pkgs/free
scrapy 1.3.3 py36_0 pkgs/free
scrapy 1.4.0 py27h4eaa785_1 pkgs/main
scrapy 1.4.0 py35h054a469_1 pkgs/main
scrapy 1.4.0 py36h764da0a_1 pkgs/main
scrapy 1.5.0 py27_0 pkgs/main
scrapy 1.5.0 py35_0 pkgs/main
scrapy 1.5.0 py36_0 pkgs/main
我的 python 版本是 3.6,可以看到列表最下面一行就是 python3.6 最新的 scrapy 版本,于是使用如下命令安装:

>conda install scrapy
Solving environment: done

## Package Plan ##

environment location: D:\ProgramFiles\Anaconda

added / updated specs:
– scrapy

The following packages will be downloaded:

package | build
—————————|—————–
attrs-17.4.0 | py36_0 41 KB
pyasn1-0.4.2 | py36h22e697c_0 101 KB
hyperlink-18.0.0 | py36_0 62 KB
openssl-1.0.2o | h8ea7d77_0 5.4 MB
pyasn1-modules-0.2.1 | py36hd1453cb_0 86 KB
pytest-runner-4.2 | py36_0 12 KB
ca-certificates-2018.03.07 | 0 155 KB
scrapy-1.5.0 | py36_0 329 KB
automat-0.6.0 | py36hc6d8c19_0 67 KB
constantly-15.1.0 | py36_0 13 KB
cssselect-1.0.3 | py36_0 28 KB
incremental-17.5.0 | py36he5b1da3_0 25 KB
certifi-2018.4.16 | py36_0 143 KB
pydispatcher-2.0.5 | py36_0 18 KB
————————————————————
Total: 6.4 MB

The following NEW packages will be INSTALLED:

attrs: 17.4.0-py36_0
automat: 0.6.0-py36hc6d8c19_0
constantly: 15.1.0-py36_0
cssselect: 1.0.3-py36_0
hyperlink: 18.0.0-py36_0
incremental: 17.5.0-py36he5b1da3_0
parsel: 1.4.0-py36_0
pyasn1: 0.4.2-py36h22e697c_0
pyasn1-modules: 0.2.1-py36hd1453cb_0
pydispatcher: 2.0.5-py36_0
pytest-runner: 4.2-py36_0
queuelib: 1.5.0-py36_0
scrapy: 1.5.0-py36_0
service_identity: 17.0.0-py36_0
twisted: 17.5.0-py36_0
w3lib: 1.19.0-py36_0
zope: 1.0-py36_0
zope.interface: 4.5.0-py36hfa6e2cd_0

The following packages will be UPDATED:

ca-certificates: 2017.08.26-h94faf87_0 –> 2018.03.07-0
certifi: 2017.7.27.1-py36h043bc9e_0 –> 2018.4.16-py36_0
openssl: 1.0.2l-vc14hcac20b0_2 –> 1.0.2o-h8ea7d77_0

Proceed ([y]/n)? y

选择 y 后继续安装:

Downloading and Extracting Packages
attrs 17.4.0################################################################################################### | 100%
pyasn1 0.4.2################################################################################################### | 100%
hyperlink 18.0.0############################################################################################### | 100%
openssl 1.0.2o################################################################################################# | 100%
pyasn1-modules 0.2.1########################################################################################### | 100%
pytest-runner 4.2############################################################################################## | 100%
ca-certificates 2018.03.07##################################################################################### | 100%
scrapy 1.5.0################################################################################################### | 100%
automat 0.6.0################################################################################################## | 100%
constantly 15.1.0############################################################################################## | 100%
cssselect 1.0.3################################################################################################ | 100%
incremental 17.5.0############################################################################################# | 100%
certifi 2018.4.16############################################################################################## | 100%
pydispatcher 2.0.5############################################################################################# | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
安装完成,最后可以通过

>conda list

 

查看Scrapy安装是否成功。
————————————————

版权声明:本文为CSDN博主「supercrys」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/crysdem/article/details/80112709