https://github.com/govscienceuseR Tools for automated extraction and disambiguation of scientific resources cited in government documents.

govscienceuseR workflow

  1. referenceExtract: Process PDFs and tag citations/references observed in PDFs
  2. referenceClassify: Clean and classify citations by category (e.g., academic journal, agency document)
  3. indexBuild: Create a database of academic work to search against for disambiguation of extracted citations
  4. referenceSearch: Search extracted citations against indexed database of canonical citations to match and disambiguate extracted citations

Check out a worked example applied to the science informing California’s Groundwater Sustainability Plans.