Are Users Compliant? Enterprise AI, Scholarly Retrieval, and the Obligations You Cannot Outsource

Eran Goldman-Malka · June 12, 2026

Frictionless AI retrieval creates a dangerous illusion: if Unpaywall found a legal open-access copy and Claude summarized it, the workflow must be compliant. It may not be. Organizations using Claude or similar tools for scholarly workflows remain responsible for lawful access, license compliance, data protection, and platform terms—even when OA discovery tools and AI vendors make retrieval feel automatic.

Fact: Anthropic’s Commercial Terms require customers to comply with applicable laws and the Usage Policy (legal and compliance). Customers—not the model—bear responsibility for how outputs are used, including human review where appropriate in high-risk domains (usage policy update).

Interpretation: Vendor compliance is bilateral. Anthropic governs your use of Claude; you govern your use of third-party scholarly content inside Claude. Passing a PDF to an API does not transfer copyright clearance any more than photocopying transfers ownership.

The Enterprise Compliance Stack

Deployers of AI scholarly workflows should maintain controls across six domains:

Domain Question
Platform tier Commercial/API vs consumer?
Copyright & license OA status, license, intended reuse
Contractual Library licenses, publisher ToS, Unpaywall API terms
Data protection GDPR, HIPAA, confidential research data
Provenance Can you reconstruct what was retrieved and why?
Governance Who approves high-risk literature use cases?

Missing any row produces audit findings—even when every retrieved article was “open access.”

Commercial Versus Consumer Tier Traps

Fact: Anthropic’s consumer terms (Free, Pro, Max) differ from Commercial Terms on data retention and whether inputs may be used to improve models, subject to user settings (consumer terms update). Commercial API and Enterprise paths offer stronger defaults against training on customer content.

Fact: Zero Data Retention (ZDR) arrangements for the API limit how long customer data is stored at rest (ZDR documentation). ZDR is an organizational setting—not automatic on all tiers.

Risk: Employees pasting paywalled or confidential PDFs into consumer Claude for “quick summaries” bypasses enterprise DPA, retention controls, and license restrictions simultaneously. Shadow AI is a scholarly-compliance problem, not only a security problem.

Control: Mandate Commercial or API access for work-related scholarly workflows; block or monitor consumer endpoints where feasible; train researchers that OA discovery does not equal upload permission.

Confidentiality, GDPR, and Scholarly PDFs

Scholarly articles are not automatically non-personal data. Clinical trials, case reports, genetics studies, and social-science fieldwork may contain personal data subject to GDPR (jurisdiction-specific) or protected health information under HIPAA in the United States.

Fact: Anthropic offers HIPAA-ready API access with a Business Associate Agreement when ZDR is activated for the organization (legal and compliance — BAA section).

Risk analysis: Uploading a patient-enriched oncology PDF to a non-HIPAA, non-ZDR endpoint may constitute unauthorized processing—even if the article is OA. OA addresses copyright access, not data-protection classification.

Deployers should:

  • Classify uploads before they reach the model
  • Complete DPIAs where systematic literature RAG processes external PDFs
  • Document lawful basis and transfer mechanisms for EU operations
  • Segregate identifiable human-subject data from general literature pipelines

Library Licenses Versus Open Access

Fact: Institutions often subscribe to journals under licenses that restrict systematic download, text mining without addendum, or sharing with third-party processors—including AI vendors.

Interpretation: Unpaywall finding an OA copy does not override a separate subscription agreement for the toll-access version an employee uploaded from a desktop. Conversely, absence of OA does not authorize uploading a subscription PDF to Claude if the license prohibits it.

Library and research offices should publish clear guidance: which retrieval paths are approved (OA resolver, interlibrary loan, licensed platform APIs) and which are not (paste into consumer chat).

Provenance and Audit Trails

Regulators and enterprise auditors increasingly ask not what did the model say but what did it read.

Minimum provenance log per retrieval event:

  • User or service identity
  • DOI and Unpaywall oa_status
  • License string and host_type
  • Source URL fetched
  • Timestamp and workflow purpose
  • Model tier and retention flag (ZDR yes/no)
  • Output disposition (ephemeral vs stored in RAG)

For organizations deploying systems that may qualify as high-risk under the EU AI Act, transparency obligations toward deployers appear in Article 13 of Regulation (EU) 2024/1689 (EUR-Lex, AI Act Service Desk Art. 13). Applicability depends on system classification—another jurisdiction-specific analysis requiring counsel.

Anthropic’s Transparency Hub documents platform-side enforcement and legal process handling; it does not replace deployer-side logging of scholarly sources.

Incident Scenarios

NC-licensed corpus in a commercial product. An agent ingests CC BY-NC articles discovered via Unpaywall into a revenue-generating RAG application. License violation regardless of lawful access.

Confidential proposal plus third-party PDFs. A researcher uploads a draft grant containing unpublished results and attaches licensed papers. Data leak plus potential copyright breach.

RAG index retains paywalled content. Scraper misclassifies a toll-access PDF as OA; embeddings persist after takedown. Provenance failure turns a fetch error into a sustained compliance debt.

Consumer-tier clinical summarization. Hospital team uses Pro-tier Claude on OA cancer literature containing trial participant details. Potential HIPAA/GDPR exposure independent of OA status.

Real-World Example

A hospital research team uses Claude to summarize recent OA cancer literature. Trial reports include demographic tables with small-cell counts. Staff use consumer-tier paste because “the papers are open access.” Copyright access may be fine; PHI handling is not. Tier selection failed. Provenance is absent—the compliance team cannot reconstruct which articles entered which sessions during an investigation.

Practical Checklist: Users and Deployers

  • Mandate Commercial/API tier for work-related scholarly workflows
  • Activate ZDR (and BAA if PHI) where required
  • Block or scan uploads of subscription-licensed PDFs without explicit rights
  • Maintain retrieval provenance logs with DOI, license, URL, tier
  • Train staff: OA discovery ≠ upload permission
  • Complete DPIA for RAG over external scholarly content
  • Maintain incident playbooks for wrongful ingestion or shadow AI use
  • Align literature agents with library licensing guidance

Risks and Counterarguments

“Anthropic’s terms protect us.” They protect the customer–vendor relationship, not customer–publisher relationships.

“We only use OA papers.” OA does not eliminate NC restrictions, bronze ambiguity, PHI in articles, or API tier requirements.

“Provenance is engineering overhead.” It is cheaper than post-incident reconstruction under regulatory inquiry.

Conclusion

User compliance is operational, not theoretical. The tools make retrieval easy; governance makes it defensible. Publishers and rights holders face a mirror-image problem: legal OA increases readership while AI-scale aggregation erodes control they once exercised through access restrictions alone.


Enterprise AI governance for research workflows—tiering, logging, DPIA support. Contact me.


Relevant Sources

  1. Legal and compliance — Claude Code — Anthropic — https://docs.anthropic.com/en/docs/claude-code/legal-and-compliance
  2. Zero Data Retention — Anthropic — https://docs.anthropic.com/en/docs/build-with-claude/zero-data-retention
  3. Updates to Consumer Terms — Anthropic — https://www.anthropic.com/news/updates-to-our-consumer-terms
  4. Usage Policy Update — Anthropic — https://www.anthropic.com/news/usage-policy-update
  5. Transparency Hub — Anthropic — https://www.anthropic.com/transparency/system-trust-reporting
  6. Regulation (EU) 2024/1689 (AI Act) — EUR-Lex — https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32024R1689
  7. Article 13 — AI Act — EU AI Act Service Desk — https://ai-act-service-desk.ec.europa.eu/en/ai-act/article-13
  8. Negotiating Your Contract — Georgetown University Library — https://library.georgetown.edu/scholarly-communication/authors-rights-negotiate-your-contract

Twitter, Facebook