A Two-Sided Compliance Checklist: Governing AI Access to Scholarly Content

Eran Goldman-Malka · June 26, 2026

AI Compliance

AI-assisted access to scholarly content creates a two-sided compliance problem. On one side, systems like Claude can use Unpaywall-style open-access discovery to reach legal full-text copies. On the other, creators, publishers, and rights holders must ask whether they are paid, protected, and compliant when AI retrieves and reuses their work. This article brings those threads together in shared vocabulary, paired checklists, a maturity model, and a priority action sequence—without treating OA discovery as permission for unchecked exploitation, or treating every retrieval tool as piracy.

The Through-Line

Access ≠ reuse rights. Unpaywall is an OA discovery service, not a paywall bypass. Compliance lives in licenses, contracts, platform terms, data protection, and governance—not in the green tab alone. Discovery, access, and reuse are separate layers; tier selection and vendor disclosure are separate again from copyright permission.

Shared Vocabulary

Term	Meaning
Discovery	Finding a legal OA location (e.g., Unpaywall API for a DOI)
Access	Fetching bytes from that location
Reuse	Summarize, store, embed, train, redistribute
Compliance boundary	Where automation, scale, license, or ToS intervene
Bronze OA	Free to read; license metadata often absent
Provenance	Record of DOI, license, URL, tier, purpose, timestamp

Use this vocabulary in RACI charts, DPIAs, and vendor questionnaires so legal, library, security, and engineering teams are not talking past each other.

Side A Checklist: AI Users, Builders, and Vendors

Discovery

Use Unpaywall API with valid email and within documented rate limits (REST API)
Contact OurResearch for commercial/high-volume data feed needs
Do not route discovery through infringing mirrors or stolen credentials
Treat closed status as no OA location—not a signal to bypass

Access

Fetch only from returned OA URLs or separately authorized sources
Respect robots.txt and publisher terms for automated access
Do not inject institutional proxy credentials into agent fetchers
Rate-limit domain fetches; avoid overnight corpus harvesting without review

Reuse

Capture oa_status, license, and host_type before retain/embed
Default-deny bronze (license: null) for automated commercial reuse
Enforce CC BY-NC and BY-ND rules in pipeline gates
Map actions: read / summarize / store / embed / train / resell
Never infer training rights from is_oa: true

Platform and vendor

Mandate Commercial/API tier for enterprise scholarly workflows
Activate ZDR (and BAA if PHI) where required (legal/compliance)
Complete vendor transparency questionnaire (retention, cache, subprocessors)
Contractually prohibit unlicensed subscription PDF ingestion

Data protection

Classify scholarly PDFs for personal data before upload
Complete DPIA for RAG over external literature
Block consumer-tier shadow AI for work research

Logging and incidents

Log user/service, DOI, license, URL, tier, purpose, timestamp
Maintain playbooks for wrongful ingestion and NC violations
Reconstruct sessions during investigations—not only model outputs

Side B Checklist: Creators, Publishers, and Rights Holders

License clarity

Attach explicit license to every OA article; minimize bronze ambiguity
Embed machine-readable license metadata in HTML and PDF
Publish AI/text-mining policy separate from OA reader access
Understand Unpaywall oa_status for your portfolio (definitions)

Payment and author rights

Know OA model (gold/green/hybrid/bronze) and who paid APC
Use SPARC or institutional addenda where negotiable (author rights)
Align license choice (BY vs BY-NC) with funder policy and AI risk tolerance
Do not assume APC purchased protection against license-compliant commercial RAG

Monitoring and enforcement

Monitor bulk access and anomalous fetch patterns
Participate in or offer licensing markets for AI/TDM (Copyright Office AI hub)
Audit attribution in major AI products where feasible
Coordinate metadata accuracy with discovery partners

Policy advocacy

Distinguish OA discovery from piracy in internal training
Engage funders on whether CC BY should remain mandatory for all disciplines
Track jurisdiction-specific TDM and AI rules (EU, UK, U.S.)

Joint Governance Model

Sustainable AI-assisted scholarship requires a cross-functional forum—not siloed tool adoption.

Participants: legal, library/scholarly communication, IT/security, research office, compliance/privacy, engineering.

RACI example for literature retrieval agents:

Activity	Legal	Library	Security	Engineering
Approve OA discovery use	C	R	I	A
License-aware pipeline design	C	C	I	R/A
Tier and ZDR configuration	I	I	R/A	C
Provenance logging	I	C	R	A
Incident response	R/A	C	R	C

Cadence: annual review of OA/AI policy; trigger review on vendor term changes (consumer terms, usage policy).

Maturity Model

Tier	Label	Indicators
0	Ad hoc	Paste PDFs into consumer chat; no provenance
1	Aware	Staff trained on OA vs paywall; no automation
2	Instrumented	Provenance logs; commercial tier enforced
3	Governed	License-aware pipelines; DPIA; vendor DD; library alignment
4	Optimized	Continuous audit; publisher/funder policy alignment; incident metrics

Most organizations discovering AI literature tools in 2026 are Tier 0–1. Moving to Tier 3 before scaling agents is cheaper than retrofitting after a copyright or data-protection incident.

Priority Action Sequence

If you need a pragmatic order of operations:

Inventory all AI scholarly workflows (approved and shadow)
Enforce Commercial/API tier and ZDR where confidential or regulated data may appear
Instrument Unpaywall/license metadata in every retrieval path
Default-deny bronze and null-license automated reuse pending legal review
Publish creator/publisher AI reuse guidance (authors, journals, library)
Deploy vendor transparency questionnaire and contract updates

Real-World Example

A research university starts at Tier 0: faculty paste PDFs into consumer Claude. Over two quarters they move to Tier 3: library-mediated OA resolver documentation, API-tier Claude for approved projects, provenance fields in a pilot RAG for grant teams, and a SPARC addendum campaign for junior faculty. Discovery stays on Unpaywall and licensed databases—not infringing sites. Compliance becomes demonstrable in funder audits. Neither side of the OA/AI debate “wins”; governance connects them.

What Both Sides Can Agree On

OA discovery infrastructure is legitimate scholarly tooling (Unpaywall)
Piracy and credential abuse are out of bounds
Licenses matter after access
Transparency beats assumptions
Education beats conclusory legal slogans

Where they disagree—CC BY vs commercial AI, training markets, bronze policy—belongs in policy and licensing forums, not in covert retrieval workarounds.

Conclusion

Both sides can be “right” at their layer and still produce organizational failure if nobody connects discovery to payment to protection. Unpaywall answers where is a legal OA copy? It does not answer may we embed, sell, or train? Claude’s terms answer how may we use this API? They do not answer may we use this paper? Creators answer what did we license? That does not alone control deployer behavior.

Governance is the connective tissue. Use the checklists above as living documents—versioned, owned, and reviewed when terms, licenses, or features change.

Ready to move from ad hoc AI literature use to governed retrieval? Contact me for a two-sided compliance assessment—discovery architecture, provenance design, tier and DPIA review, and a prioritized roadmap for your organization.

Relevant Sources

**FAQ Unpaywall** — OurResearch — https://unpaywall.org/faq
**REST API Unpaywall** — OurResearch — https://unpaywall.org/products/api
Legal and compliance — Claude Code — Anthropic — https://docs.anthropic.com/en/docs/claude-code/legal-and-compliance
Open Access — SPARC — https://sparcopen.org/open-access/
Copyright and AI hub — U.S. Copyright Office — https://copyright.gov/ai/
Regulation (EU) 2024/1689 — EUR-Lex — https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32024R1689
Anthropic Transparency Hub — Anthropic — https://www.anthropic.com/transparency/system-trust-reporting
CC BY 4.0 — Creative Commons — https://creativecommons.org/licenses/by/4.0/

Share: Twitter, Facebook