peer-reviewed journal (PubMed-indexed)
A scoping review maps AI applications across the full peer review pipeline—manuscript triage, reviewer matching, reviewer assistance, and editorial decision support—drawing on PubMed-indexed literature to characterize the governance landscape. AI integration is shown to redistribute accountability across review stages, raising unresolved questions about who bears responsibility for validation outcomes and exposing systematic gaps in risk governance frameworks.
RAND
A RAND analysis examines how non-governmental actors—corporations, professional associations, civil society organizations, and academic publishers—can supplement state regulatory capacity in managing risks from transformative AI systems. Disciplinary associations and scholarly publishers are identified as structurally significant non-state governance nodes whose rule-setting authority over AI practices in research ecosystems operates largely outside legislative mechanisms.
arXiv / Frontis.AI, Tsinghua University
NatureBench benchmarks coding agents against published state-of-the-art results from Nature-family journals, treating the reproducibility of top-tier empirical findings as the performance ceiling for agentic systems. Agents systematically fall short of matching published SOTA across evaluated domains, establishing a quantitative frontier between AI-assisted and AI-generated knowledge production with direct implications for computational reproducibility standards.
Scientific Reports / Nature
A computational audit of clinical registry metadata across glioma drug discovery pipelines identifies systematic incompleteness and inconsistency in translational phase records as a structural constraint on AI-driven inference. Validity of AI outputs is bounded not by computational capacity but by upstream registration quality, showing that AI systems inherit and amplify pre-existing methodological deficiencies rather than correcting them.
Global Philosophy / Springer Nature
Applying Gadamerian hermeneutics — understanding as experience rooted in a lived horizon — the paper interrogates the epistemic status of historical narrative generated by AI systems that lack an interpretive subject. AI-produced narrative cannot achieve interpretation in the hermeneutic sense, marking a qualitative epistemic gap between computational text generation and humanistic knowledge formation that bears directly on disciplinary legitimacy in the humanities.
Philosophy of Management / Springer
Applies three classical conditions of epistemic expertise — reliability, understanding, and accountability — to large language models, asking whether LLMs can qualify as agents whose judgment warrants genuine epistemic trust. Concludes that current LLMs fall short on all three dimensions, establishing a philosophical framework for delimiting the legitimate epistemic role of AI in scientific reasoning.
EPJ Data Science / Springer
Evaluates LLM-generated astronomical hypotheses against structured metrics of originality, plausibility, and scientific value across a data-driven astronomy benchmark. Models show limited but non-zero generative capacity, primarily through recombination of existing concepts rather than genuinely novel ideation, raising questions about AI's impact on the hypothesis-generation phase of scientific inquiry.
arXiv (Umeå University, Dept. of Science and Mathematics Education)
A conceptual framework from science education positions AI as partner across three dimensions of scientific activity — learning about science, practicing it, and engaging with it socially — drawing on philosophy of science and science education theory to recast the researcher as an epistemic agent under AI conditions. "Vigilance" — sustained active scrutiny of AI-generated outputs — is proposed as the core epistemic principle separating productive augmentation from passive delegation, which risks substituting model artifacts for genuine world-directed knowledge.
arXiv
Standard AI pluralism discourse — diversity of values, users, and outputs — is challenged on the grounds that AI systems impose ontologies by determining what counts as an entity, a relation, a harm, and a legitimate form of knowledge. The hidden categorical framework embedded in each system structures the knowable space of science, rendering invisible whatever falls outside its representational scope.
npj Systems Biology and Applications (Nature)
MCP servers functioning as standardized AI-biology interfaces are deployed to automate the construction of multicellular mechanistic models, allowing biologists to specify hypotheses without writing computational code. The approach lowers the expertise threshold for systems-biology modeling and redistributes methodological agency within the discipline.
UK Government Office for Science
The UK Government Office for Science presents five AI development scenarios for 2030, spanning moderate integration through radical transformation of governance structures, research funding models, and knowledge production systems. The analysis argues that trajectory uncertainty demands policies robust to divergent outcomes rather than optimized for a single projected future.
npj Artificial Intelligence (Nature Portfolio)
A controlled experiment varied whether human teams were informed of an AI teammate's presence while holding AI competence constant at human level, measuring both task performance and physiological dynamics. Covert AI participation degraded team performance and altered physiological patterns relative to disclosed conditions, establishing that collaborative outcomes depend on epistemic context rather than AI capability.
arXiv
Empirically tests the reliability of AI-assisted social science analysis with and without human oversight on a defined task, with authors from Cambridge and China Agricultural University. Human oversight remains a necessary condition for valid inference: without it, AI-assisted analysis systematically distorts results, raising the bar for epistemic accountability in automated research pipelines.
npj Digital Medicine (Nature Portfolio)
Uses large language models to automatically generate Common Data Elements for harmonizing 31 biomedical datasets, a task that previously required months of manual domain expertise. Validation across the 31 datasets shows LLMs can sharply accelerate research-data standardization, shifting categorization decisions over knowledge from domain experts to the model and restructuring biomedical research infrastructure itself.
arXiv
"System 0" is proposed as a pre-reflective mode of AI-mediated cognition that activates before conscious deliberation, operating upstream of Kahneman's Systems 1 and 2. AI does not supplement thinking but intercepts epistemic processes at a pre-reflexive level — a condition framed as "cognitive colonization" — restructuring how knowledge is produced before reflection begins.
Nature Medicine
General-purpose LLMs are benchmarked against specialized clinical AI platforms — including OpenEvidence and UpToDate Expert AI — across standard medical QA tasks, with all systems assessed under equivalent conditions. General-purpose models consistently outperform purpose-built clinical tools, which are being deployed at scale with minimal independent validation, revealing a systematic evidentiary gap between commercial adoption and empirical justification.
arXiv
Human oversight structures in AI-assisted social science pipelines are quantitatively evaluated across multiple task types to measure their effect on output reliability. Without structured human supervision, AI-assisted results exhibit systematic distortions; reliability is not an intrinsic property of the AI pipeline but is determined by how oversight responsibility is distributed within the methodological process.
UK Government (GOV.UK)
Evidence from researchers, clinicians, patients, and regulators is synthesized through a national call for evidence examining how AI is altering validation standards, accountability structures, and evidentiary requirements in UK healthcare. Findings identify persistent gaps between current AI tool deployment practices and scientific rigor requirements, establishing institutional consensus for governance frameworks suited to AI-era medical knowledge production.
arXiv
Cross-disciplinary survey of LLM integration in STEM and humanities research practice, examining structural transformation at the level of literature review, argumentation, and result interpretation rather than at the level of individual discovery. Central argument holds that LLMs function as epistemic agents that recalibrate disciplinary thresholds for what qualifies as adequate coverage, persuasive reasoning, and an original scholarly contribution.
European Commission
EU voluntary-to-mandatory code establishing labeling and disclosure requirements for AI-generated content in media and scientific publications distributed across EU member states, with explicit coverage of both text and imagery. Legally operationalizes the distinction between 'AI-assisted' and 'AI-generated' authorship, setting a regulatory precedent that redefines attribution standards in scholarly communication.
npj Digital Medicine
Two paired studies — a longitudinal survey of real-world AI use in medical education and a controlled experiment comparing AI-generated explanations against plausible misinformation — examine how AI reshapes clinical reasoning acquisition. AI explanations present a dual epistemic hazard: they can scaffold correct inference or entrench flawed mental models, with the distinction often indistinguishable to learners at the point of instruction.
arXiv
A standardized "Evaluation Card" format is proposed as an interpretive meta-layer over existing AI benchmarks, designed to unify evaluation reporting across institutions and benchmark suites. Without such a meta-format, cross-benchmark comparison remains structurally impractical, blocking reproducible scientific consensus on AI system capabilities.
UK Department for Science, Innovation & Technology (GOV.UK)
The UK Department for Science, Innovation and Technology sets out a sectoral AI adoption roadmap for life sciences — covering pharmaceuticals, biomedical research, and clinical trials — with specific infrastructure, regulatory, and procurement commitments. The plan illustrates how government policy actively narrows the normative space for AI in scientific research, institutionally privileging particular practices and architectures over alternatives.
Nature Human Behaviour
A standardized reporting checklist for LLM-based studies in behavioural science establishes minimum transparency requirements covering model specification, prompting protocols, sampling procedures, and reproducibility conditions. Without unified reporting standards, findings from LLM behavioural experiments remain methodologically incommensurable, undermining the cumulative knowledge-building that defines the scientific enterprise.
Issues in Science and Technology (National Academies / ASU)
An analysis of governance and funding structures for open research data infrastructure — repositories, archives, and access systems that underpin the global scientific ecosystem — finds that current financing models are structurally inadequate to ensure long-term sustainability. The resulting systemic fragility threatens to concentrate AI-amplified research advantages among institutions with privileged proprietary data access, deepening existing inequalities in scientific capacity.
NBER Working Paper
Investment flow data across sectors are used to characterize the macroeconomic nature of the current AI transition, distinguishing structural reorganization of capital allocation from cyclical demand patterns. The empirical investment signature points toward fundamental economic restructuring rather than a speculative wave, with direct implications for R&D funding trajectories and the redistribution of resources within scientific production systems.
Springer (AI & Society)
A comparative epistemological analysis examines three foundational metaphors — machine, organism, and language — as organizing frameworks shaping how AI model behavior is understood and legitimized within scientific practice. The linguistic paradigm, instantiated by large language models, is shown to most fundamentally destabilize classical norms of scientific objectivity, reproducibility, and mechanistic explanation.
Springer
A normative analysis evaluates the case for AI authorship in scientific publications against ICMJE criteria and broader epistemic norms of scholarly attribution, identifying conditions under which AI contribution may warrant formal recognition. Specific policy adaptations are proposed for journals, peer reviewers, and editors to close accountability gaps introduced by AI-assisted and AI-generated research outputs.
Springer (J. Computer-Aided Molecular Design)
A systematic benchmark compared reproducibility, pose-prediction accuracy, and failure modes of classical docking tools (including AutoDock Vina and Glide) against AI-driven methods (including DiffDock) across standardized protein–ligand datasets. AI approaches exhibited distinct and more frequent failure modes alongside lower cross-run reproducibility relative to classical pipelines, identifying critical validation gaps that must be addressed before AI docking is adopted in prospective drug discovery workflows.
Springer (Erkenntnis)
Convenience AI is analyzed as a distinct category of AI adoption in which systems are deployed primarily to reduce task friction rather than to extend epistemic capability, with particular attention to its effects on research methodology. The analysis argues that convenience-driven adoption introduces systematic biases in scientific decision-making that current AI ethics frameworks fail to theorize.
arXiv
ARA formalizes reproducibility assessment as structured reasoning over scientific documents, extracting a directed workflow graph (source→method→experiment→result links) and scoring reconstructability via structural and content metrics across 213 ReScience C papers. On the ReproBench benchmark ARA reaches 60.71% accuracy versus 36.84% for prior systems, and 61.68% versus 43.56% on GoldStandardDB.
Nature (Nature Human Behaviour)
A six-step theoretical feedback loop is proposed to explain how concentration of research on AI creates self-reinforcing scientific monocropping, drawing on bibliometric evidence and prior work on AI-induced cognitive homogenization. Cultural salience, institutional incentives, methodological convergence, and epistemic self-referentiality are shown to combine into meta-conformity that narrows intellectual breadth while creating an illusion of diversity.
Nature
Empirical analysis of 41.3 million papers across six disciplines used a pre-trained language model (F1 = 0.875) to identify AI-augmented works and measure effects on individual and collective scientific output. AI-augmented scientists publish 3.02× more papers and receive 4.84× more citations, yet collective topical coverage contracts by 4.63% and cross-researcher collaboration falls by 22%.
No entries match this filter.