数据库
首页 > 数据库> > Scrapy、Scrapy-Splash、Scrapy-Redis安装

Scrapy、Scrapy-Splash、Scrapy-Redis安装

作者:互联网

前题:安装docker并能使用

安装完在JSON文件中加入国内镜像,阿里云需要自己申请。

"registry-mirrors": [
    "https://********.mirror.aliyuncs.com",
    "https://registry.docker-cn.com",
    "http://hub-mirror.c.163.com",
    "https://docker.mirrors.ustc.edu.cn"
  ]

其他前题:

已为anaconda配置好PATH

Scrapy 安装

JupyterLab 输入

pip install scrapy
import scrapy

Splash 安装

终端中输入:

docker run -p 8050:8050 scrapinghub/splash

成功安装返回类似如下内容

Digest: sha256:b4173a88a9d11c424a4df4c8a41ce67ff6a6a3205bd093808966c12e0b06dacf
Status: Downloaded newer image for scrapinghub/splash:latest
2021-02-01 04:53:28+0000 [-] Log opened.
2021-02-01 04:53:29.033164 [-] Xvfb is started: [‘Xvfb’, ‘:846388905’, ‘-screen’, ‘0’, ‘1024x768x24’, ‘-nolisten’, ‘tcp’]
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to ‘/tmp/runtime-splash’
2021-02-01 04:53:29.354172 [-] Splash version: 3.5
2021-02-01 04:53:29.420819 [-] Qt 5.14.1, PyQt 5.14.2, WebKit 602.1, Chromium 77.0.3865.129, sip 4.19.22, Twisted 19.7.0, Lua 5.2
2021-02-01 04:53:29.421057 [-] Python 3.6.9 (default, Jul 17 2020, 12:50:27) [GCC 8.4.0]
2021-02-01 04:53:29.421504 [-] Open files limit: 1048576
2021-02-01 04:53:29.421711 [-] Can’t bump open files limit
2021-02-01 04:53:29.441758 [-] proxy profiles support is enabled, proxy profiles path: /etc/splash/proxy-profiles
2021-02-01 04:53:29.442007 [-] memory cache: enabled, private mode: enabled, js cross-domain access: disabled
2021-02-01 04:53:29.616771 [-] verbosity=1, slots=20, argument_cache_max_entries=500, max-timeout=90.0
2021-02-01 04:53:29.617331 [-] Web UI: enabled, Lua: enabled (sandbox: enabled), Webkit: enabled, Chromium: enabled
2021-02-01 04:53:29.618280 [-] Site starting on 8050
2021-02-01 04:53:29.618402 [-] Starting factory <twisted.web.server.Site object at 0x7f07b402c5c0>
2021-02-01 04:53:29.618800 [-] Server listening on http://0.0.0.0:8050
2021-02-01 04:54:39.943377 [-] “172.17.0.1” - - [01/Feb/2021:04:54:39 +0000] “GET / HTTP/1.1” 200 7675 “-” “Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36”
2021-02-01 04:54:40.007321 [-] “172.17.0.1” - - [01/Feb/2021:04:54:39 +0000] “GET /_ui/style.css HTTP/1.1” 200 2591 “http://localhost:8050/” “Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36”
2021-02-01 04:54:40.025381 [-] “172.17.0.1” - - [01/Feb/2021:04:54:39 +0000] “GET /_ui/main.js HTTP/1.1” 200 13055 “http://localhost:8050/” “Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36”
2021-02-01 04:54:42.573986 [-] “172.17.0.1” - - [01/Feb/2021:04:54:42 +0000] “GET /_ui/inspections/splash-auto.json HTTP/1.1” 200 177721 “http://localhost:8050/” “Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36”
2021-02-01 04:54:42.698853 [-] “172.17.0.1” - - [01/Feb/2021:04:54:42 +0000] “GET /_ui/favicon.ico HTTP/1.1” 200 4286 “http://localhost:8050/” “Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36”
2021-02-01 04:55:39.940831 [-] Timing out client: IPv4Address(type=‘TCP’, host=‘172.17.0.1’, port=55762)
2021-02-01 04:55:42.699230 [-] Timing out client: IPv4Address(type=‘TCP’, host=‘172.17.0.1’, port=55764)

Splash关闭

先关闭容器再删除容器

sudo docker ps -a
sudo docker stop CONTAINER_ID
sudo docker rm CONTAINER_ID

Scrapy-Splash 安装

JupyterLab 中输入:

pip install scrapy-splash

不能import

Scrapy-Redis 安装

JupyterLab 中输入:

pip install scrapy-redis
import scrapy_redis

Scrapyd 等 安装

pip install scrapyd

pip install scrapyd-client

pip install python-scrapyd-api

Scrapyrt 安装 轻量级scrapyd

pip install scrapyrt

标签:02,01,04,54,Redis,53,Scrapy,Splash,2021
来源: https://blog.csdn.net/jelatinprotain/article/details/113506670