marker
https://github.com/datalab-to/marker
Python
Convert PDF to markdown + JSON quickly with high accuracy
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported0 Subscribers
Add a CodeTriage badge to marker
Help out
- Issues
- I used this command "marker /path/to/input/folder --workers 4". The number of workers is 4. My %Cpu(s) is very high. If I increase the number of workers, will it not improve the speed?
- single large PDF file: use a lot of memory and memory leak
- optimizing python code to not get too many PIError: 429 RESOURCE_EXHAUSTED. {'error': {'code': 429, 'message': 'Resource has been exhausted (e.g. check quota).', 'status': 'RESOURCE_EXHAUSTED'}}. Retrying in 3 seconds... (Attempt 1/2)
- Image paths
- This command is identical to the markdown file without llm_serve
- PDF maker v2 not work fine
- Wrong result acquisition in Ollama
- Support for Extracting Calculated Values Instead of Formulas from XLSX Files
- About returning paragraph coordinates
- Latest release broke table_of_contents (if use_llm=true)
- Docs
- Python not yet supported