CiteSeerExtractor is a RESTful API for extracting information from scholarly documents. It is able to extract metadata, citations, and different parts of the full text from a PDF document and return the data in a format of your choice (i.e. XML, JSON, Bibtex)

CiteSeerExtractor incorporates a number of open source tools, such as ParsCit, SVMHeaderParse, and PDFBox

CiteSeerExtractor is open source software so feel free to get the code, mofidy it and, if you make any improvements, please feel free to send them to us for inclusion

You are free to try CiteSeerExtractor live on this Website, but if you will be making a lot of requests we ask that you download and run your own service (it just works!)

- Funded in part by the National Science Foundation