Are Publishers Protected? Rights, Technical Controls, and the Limits of Open Access

Eran Goldman-Malka · June 17, 2026

Publishers and rights holders have real tools—copyright, license design, technical controls, and enforcement policies—but legal open access combined with AI-scale retrieval exposes gaps where content is free to read yet reuse, aggregation, and model ingestion are difficult to monitor or monetize.

What Publishers Still Control After OA Publication

Fact: Open-access publication does not abolish copyright. Authors or publishers retain copyright subject to the license applied to the work. IOP Publishing’s guidance notes that CC BY lets others distribute, remix, and build upon the work commercially so long as they credit the author—but the licensor cannot revoke these freedoms while users follow license terms (IOP CC guide).

Publishers typically retain interest in:

  • Version-of-record integrity and citation formatting
  • Branding, metrics, and analytics on publisher platforms
  • Subscription revenue on non-OA articles in hybrid journals
  • Enforcement against infringement outside license scope

Interpretation: OA shifts the default permission set—it does not eliminate the rights bundle, and it does not guarantee publisher visibility into AI-mediated reuse.

License Strategy in the AI Era

Publishers choose licenses with downstream consequences:

License Reader access Commercial AI reuse Derivatives
CC BY Open Permitted with attribution Permitted
CC BY-NC Open Restricted (non-commercial) With NC constraint
CC BY-ND Open Attribution required No derivatives
Bronze / no license Often free read Ambiguous Ambiguous

Unpaywall surfaces license metadata when present in best_oa_location.license (data format). Bronze OA—free on publisher site without a clear license field—creates the widest ambiguity (oa_status definitions).

Jurisdiction-specific: Under EU CDSM rules, rightholders may reserve text-and-data-mining rights in certain contexts. That reservation speaks to mining/training copies, not necessarily to human-equivalent OA reading—but publishers should not conflate the two without counsel.

Unpaywall Exposure: Benefit and Blind Spot

Fact: Unpaywall integrations appear across discovery systems, link resolvers, and library tools worldwide (integrations). For publishers, this increases lawful readership and citation reach for OA content.

Risk: The same metadata that helps libraries helps AI builders locate and fetch OA PDFs at machine speed—without publisher analytics, attribution audits, or royalty participation. A CC BY license may permit much of this legally while undermining subscription leverage on the non-OA corpus of a hybrid journal.

The SPARC Author Addendum illustrates how authors may retain repository and sharing rights that publishers would otherwise restrict (SPARC addendum). Publisher strategy must account for both author-retained rights and machine-scale reuse OA licenses enable.

Legal

  • Copyright enforcement for uses outside license scope
  • Contractual terms on publisher sites restricting automated access
  • Takedown procedures for infringing copies elsewhere

The U.S. Copyright Office Part 3 report emphasizes licensing markets and case-by-case fair use analysis for AI training (Part 3 PDF)—relevant to publishers considering whether to license corpora explicitly.

Technical

  • Rate limiting and bot management on PDF endpoints
  • Watermarking (variable effectiveness against text extraction)
  • API partnerships instead of anonymous scraping paths

Policy

  • Published AI and text-mining policies distinct from OA reader policies
  • Machine-readable license statements in HTML and PDF metadata
  • Crawler directives where appropriate (recognizing limits vs bad actors)

No single control is sufficient. Defense in depth assumes some AI pipelines will comply and others will not.

Enforcement and Detection Challenges

Publishers face asymmetric visibility:

  • Cannot see embeddings in third-party vector databases
  • Cannot audit attribution in every AI-generated summary
  • May detect bulk download patterns or anomalous traffic
  • May rely on user reports, licensing deals, or litigation in egregious cases

Part 2 of the U.S. Copyright Office AI report addresses copyrightability of outputs (Part 2 report)—relevant when AI-generated summaries circulate without clear linkage to the publisher VoR.

Real-World Example

A learned society publishes hybrid OA: authors paying APCs receive CC BY on the publisher VoR. Unpaywall indexes correctly. A startup builds commercial literature RAG for financial analysts, ingesting society OA articles with attribution fields populated—but society analytics show flat publisher-site traffic while the startup monetizes synthesis. License compliance may hold. Subscription strategy on toll-access articles suffers. Attribution audit at scale is impractical without cooperation from the deployer.

Gaps Publishers Should Plan For

  1. Bronze ambiguity — free read without license metadata in Unpaywall
  2. Green/version mismatch — repository preprints indexed as OA while VoR remains toll-access
  3. AI vendor opacity — limited disclosure of retrieval caching or training corpora
  4. License-compliant economic harm — CC BY commercial reuse without payment
  5. Cross-border enforcement — deployers and hosts in multiple jurisdictions

Practical Checklist: Publishers and Rights Holders

  • Attach explicit license to every OA article; minimize bronze ambiguity
  • Embed machine-readable license metadata in HTML and PDF
  • Publish AI/text-mining policy separate from OA reader policy
  • Monitor bulk access and anomalous fetch patterns
  • Participate in or offer licensing markets for AI and TDM uses
  • Educate authors on CC BY vs BY-NC tradeoffs before APC acceptance
  • Coordinate with Unpaywall/data partners on metadata accuracy

Risks and Counterarguments

“Unpaywall undermines publishers.” It indexes OA locations publishers and repositories created. The stressor is scale of automated reuse under permissive licenses—not discovery itself.

“We can block all AI.” Technical blocks are partial; legal strategy and licensing markets matter more for lawful actors.

“CC BY was a mistake.” CC BY advances dissemination goals many funders mandate; the policy remedy is informed license choice and new markets—not mischaracterizing OA discovery as piracy.

Conclusion

Publisher protection is partial and license-dependent. OA clears access; it does not automatically preserve analytics, revenue, or control over AI aggregation. Many OA licenses—especially CC BY—permit reuse that is lawful yet economically harmful to publishers and authors, a gray zone distinct from piracy or circumvention.


Publisher or society board evaluating OA + AI risk? I advise on license architecture and detection strategy.


Relevant Sources

  1. Creative Commons Licences — IOP Publishing — https://publishingsupport.iopscience.iop.org/creative-commons-licences/
  2. **Integrations Unpaywall** — OurResearch — https://unpaywall.org/integrations
  3. **Data Format Unpaywall** — OurResearch — https://unpaywall.org/data-format
  4. Copyright and AI Part 3 — U.S. Copyright Office — https://copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-3-Generative-AI-Training-Report-Pre-Publication-Version.pdf
  5. SPARC Author Addendum — SPARC — https://sparcopen.org/our-work/author-rights/sparc-author-addendum-text/
  6. CC BY 4.0 — Creative Commons — https://creativecommons.org/licenses/by/4.0/
  7. OA status definitions — Unpaywall Support — https://support.unpaywall.org/support/solutions/articles/44001777288-what-do-the-types-of-oa-status-green-gold-hybrid-and-bronze-mean-
  8. Copyright and AI Part 2 — U.S. Copyright Office — https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-2-Copyrightability-Report.pdf

Twitter, Facebook