scrapy
https://github.com/scrapy/scrapy
Python
Scrapy, a fast high-level web crawling & scraping framework for Python.
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported161 Subscribers
View all SubscribersAdd a CodeTriage badge to scrapy
Help out
- Issues
- Add line buffering to file `requests.seen` of `RFPDupeFilter`
- scrapy.item.Field memory leak.
- Make media pipeline storage more flexible
- Requesting a site by its IP address instead of hostname raises OpenSSL.SSL.Error: [('SSL routines', '', 'tlsv1 alert internal error')]
- DNSCACHE_ENABLED not respected when in Spider.custom_settings
- Feed URL has been added to the stats.
- Emit a warning if options -o or -O are specified when FeedExporter is disabled
- Fixing the dirty reactor errors
- test_utf16 fails on big-endian architectures
- Refactor cookie handling in CookiesMiddleware
- Docs
- Python not yet supported