Scrapy spiders for news website
- Install dependency (
pip install -r requirements.txt) - Run spider
- Modify Scrapy Settings if needed
scrapy runspider [SPIDER PATH] -a start_id=1000 -a end_id=1500 -o [OUTPUT_FILE]
scrapy runspider ./news/spiders/prachatai.py -a start_id=1000 -a end_id=1500 -o prachatai.jlURL: https://prachatai.com/print/[ARTICLE_ID]
** Arguments **:
start_id- Article IDsend_id- Article IDs
scrapy runspider ./news/spiders/prachatai.py -a start_id=1000 -a end_id=1500 -o prachatai.jlURL: http://news.thaipbs.or.th/content/[ARTICLE_ID]
** Arguments **:
start_id- Article IDsend_id- Article IDs
scrapy runspider ./news/spiders/thaipbs.py -a start_id=1000 -a end_id=1500 -o thaipbs.jlSupport as scrapy feed export
.csv.jl(JSON Line).json.xml
scrapy runspider .news/spiders/thaipbs.py -a start_id=1000 -a end_id=1500 -o thaipbs.csv