scrapy
https://github.com/scrapy/scrapy
Python
Scrapy, a fast high-level web crawling & scraping framework for Python.
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported160 Subscribers
View all SubscribersAdd a CodeTriage badge to scrapy
Help out
- Issues
- pydispatcher: find a better supported alternative
- Export to a folder instead of a file
- The engine doesn't wait for Spider generator parse before run process_spider_output
- LinkExtractor calls process_value before applying allow and deny
- Clarify multiple handlers execution order for the same signal
- Make the duplicate filter use bytes request fingerprints
- Support multiple download slots for the same request
- Support per-request download handler override
- Update README.rst
- Added error handling for the case when Brotli is not imported scrapy#4697
- Docs
- Python not yet supported