AI-assisted access to scholarly content creates a two-sided compliance problem. On one side, systems like Claude can use Unpaywall-style open-access discovery to reach legal full-text copies. On the other, creators, publishers, and rights holders must ask whether they are paid, protected, and compliant when AI retrieves and reuses their work. This article brings those threads together in shared vocabulary, paired checklists, a maturity model, and a priority action sequence—without treating OA discovery as permission for unchecked exploitation, or treating every retrieval tool as piracy.
The Through-Line
Access ≠ reuse rights. Unpaywall is an OA discovery service, not a paywall bypass. Compliance lives in licenses, contracts, platform terms, data protection, and governance—not in the green tab alone. Discovery, access, and reuse are separate layers; tier selection and vendor disclosure are separate again from copyright permission.
Shared Vocabulary
| Term | Meaning |
|---|---|
| Discovery | Finding a legal OA location (e.g., Unpaywall API for a DOI) |
| Access | Fetching bytes from that location |
| Reuse | Summarize, store, embed, train, redistribute |
| Compliance boundary | Where automation, scale, license, or ToS intervene |
| Bronze OA | Free to read; license metadata often absent |
| Provenance | Record of DOI, license, URL, tier, purpose, timestamp |
Use this vocabulary in RACI charts, DPIAs, and vendor questionnaires so legal, library, security, and engineering teams are not talking past each other.
Side A Checklist: AI Users, Builders, and Vendors
Discovery
- Use Unpaywall API with valid email and within documented rate limits (REST API)
- Contact OurResearch for commercial/high-volume data feed needs
- Do not route discovery through infringing mirrors or stolen credentials
- Treat
closedstatus as no OA location—not a signal to bypass
Access
- Fetch only from returned OA URLs or separately authorized sources
- Respect robots.txt and publisher terms for automated access
- Do not inject institutional proxy credentials into agent fetchers
- Rate-limit domain fetches; avoid overnight corpus harvesting without review
Reuse
- Capture
oa_status,license, andhost_typebefore retain/embed - Default-deny bronze (
license: null) for automated commercial reuse - Enforce CC BY-NC and BY-ND rules in pipeline gates
- Map actions: read / summarize / store / embed / train / resell
- Never infer training rights from
is_oa: true
Platform and vendor
- Mandate Commercial/API tier for enterprise scholarly workflows
- Activate ZDR (and BAA if PHI) where required (legal/compliance)
- Complete vendor transparency questionnaire (retention, cache, subprocessors)
- Contractually prohibit unlicensed subscription PDF ingestion
Data protection
- Classify scholarly PDFs for personal data before upload
- Complete DPIA for RAG over external literature
- Block consumer-tier shadow AI for work research
Logging and incidents
- Log user/service, DOI, license, URL, tier, purpose, timestamp
- Maintain playbooks for wrongful ingestion and NC violations
- Reconstruct sessions during investigations—not only model outputs
Side B Checklist: Creators, Publishers, and Rights Holders
License clarity
- Attach explicit license to every OA article; minimize bronze ambiguity
- Embed machine-readable license metadata in HTML and PDF
- Publish AI/text-mining policy separate from OA reader access
- Understand Unpaywall
oa_statusfor your portfolio (definitions)
Payment and author rights
- Know OA model (gold/green/hybrid/bronze) and who paid APC
- Use SPARC or institutional addenda where negotiable (author rights)
- Align license choice (BY vs BY-NC) with funder policy and AI risk tolerance
- Do not assume APC purchased protection against license-compliant commercial RAG
Monitoring and enforcement
- Monitor bulk access and anomalous fetch patterns
- Participate in or offer licensing markets for AI/TDM (Copyright Office AI hub)
- Audit attribution in major AI products where feasible
- Coordinate metadata accuracy with discovery partners
Policy advocacy
- Distinguish OA discovery from piracy in internal training
- Engage funders on whether CC BY should remain mandatory for all disciplines
- Track jurisdiction-specific TDM and AI rules (EU, UK, U.S.)
Joint Governance Model
Sustainable AI-assisted scholarship requires a cross-functional forum—not siloed tool adoption.
Participants: legal, library/scholarly communication, IT/security, research office, compliance/privacy, engineering.
RACI example for literature retrieval agents:
| Activity | Legal | Library | Security | Engineering |
|---|---|---|---|---|
| Approve OA discovery use | C | R | I | A |
| License-aware pipeline design | C | C | I | R/A |
| Tier and ZDR configuration | I | I | R/A | C |
| Provenance logging | I | C | R | A |
| Incident response | R/A | C | R | C |
Cadence: annual review of OA/AI policy; trigger review on vendor term changes (consumer terms, usage policy).
Maturity Model
| Tier | Label | Indicators |
|---|---|---|
| 0 | Ad hoc | Paste PDFs into consumer chat; no provenance |
| 1 | Aware | Staff trained on OA vs paywall; no automation |
| 2 | Instrumented | Provenance logs; commercial tier enforced |
| 3 | Governed | License-aware pipelines; DPIA; vendor DD; library alignment |
| 4 | Optimized | Continuous audit; publisher/funder policy alignment; incident metrics |
Most organizations discovering AI literature tools in 2026 are Tier 0–1. Moving to Tier 3 before scaling agents is cheaper than retrofitting after a copyright or data-protection incident.
Priority Action Sequence
If you need a pragmatic order of operations:
- Inventory all AI scholarly workflows (approved and shadow)
- Enforce Commercial/API tier and ZDR where confidential or regulated data may appear
- Instrument Unpaywall/license metadata in every retrieval path
- Default-deny bronze and null-license automated reuse pending legal review
- Publish creator/publisher AI reuse guidance (authors, journals, library)
- Deploy vendor transparency questionnaire and contract updates
Real-World Example
A research university starts at Tier 0: faculty paste PDFs into consumer Claude. Over two quarters they move to Tier 3: library-mediated OA resolver documentation, API-tier Claude for approved projects, provenance fields in a pilot RAG for grant teams, and a SPARC addendum campaign for junior faculty. Discovery stays on Unpaywall and licensed databases—not infringing sites. Compliance becomes demonstrable in funder audits. Neither side of the OA/AI debate “wins”; governance connects them.
What Both Sides Can Agree On
- OA discovery infrastructure is legitimate scholarly tooling (Unpaywall)
- Piracy and credential abuse are out of bounds
- Licenses matter after access
- Transparency beats assumptions
- Education beats conclusory legal slogans
Where they disagree—CC BY vs commercial AI, training markets, bronze policy—belongs in policy and licensing forums, not in covert retrieval workarounds.
Conclusion
Both sides can be “right” at their layer and still produce organizational failure if nobody connects discovery to payment to protection. Unpaywall answers where is a legal OA copy? It does not answer may we embed, sell, or train? Claude’s terms answer how may we use this API? They do not answer may we use this paper? Creators answer what did we license? That does not alone control deployer behavior.
Governance is the connective tissue. Use the checklists above as living documents—versioned, owned, and reviewed when terms, licenses, or features change.
Ready to move from ad hoc AI literature use to governed retrieval? Contact me for a two-sided compliance assessment—discovery architecture, provenance design, tier and DPIA review, and a prioritized roadmap for your organization.
Relevant Sources
-
**FAQ Unpaywall** — OurResearch — https://unpaywall.org/faq -
**REST API Unpaywall** — OurResearch — https://unpaywall.org/products/api - Legal and compliance — Claude Code — Anthropic — https://docs.anthropic.com/en/docs/claude-code/legal-and-compliance
- Open Access — SPARC — https://sparcopen.org/open-access/
- Copyright and AI hub — U.S. Copyright Office — https://copyright.gov/ai/
- Regulation (EU) 2024/1689 — EUR-Lex — https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32024R1689
- Anthropic Transparency Hub — Anthropic — https://www.anthropic.com/transparency/system-trust-reporting
- CC BY 4.0 — Creative Commons — https://creativecommons.org/licenses/by/4.0/
