https://github.com/govscienceuseR Tools for automated extraction and disambiguation of scientific resources cited in government documents.
- referenceExtract: Process PDFs and tag citations/references observed in PDFs
- referenceClassify: Clean and classify citations by category (e.g., academic journal, agency document)
- indexBuild: Create a database of academic work to search against for disambiguation of extracted citations
- referenceSearch: Search extracted citations against indexed database of canonical citations to match and disambiguate extracted citations
Check out a worked example applied to the science informing California’s Groundwater Sustainability Plans.