Copy of Copy of Home | Garden Research

база знаний | digital garden

for minds that move beyond fast answers

2026 Garden Research

The hard truth about volunteering

The death of the hypotheses

Lately, I had a conversation with a friend. We were discussing some of recent research and the way papers get published.

At some point, he said something that caught me completely off guard: “Computer science is basically a cancer on the body of science.”

My first reaction was irritation. As a computer science major, I felt almost personally attacked. Why would you say something like that?

But then I remembered that my friends is also a computer scientist. In fact, a better and more experienced one than I am.

He explained.

And I realized I'd been watching this happen for the past years.

I just didn't have a name for it, but now I do — The Death of the Hypothesis.

Finding Patterns Everywhere

Not long ago, making sense of a dataset was a serious undertaking. You needed Python, or R, or some enterprise software. And more importantly, you needed the training to use it. Graduate school level training, usually.

AI changed that.

Now anyone can drop a spreadsheet into Claude or Colab, and get back a coherent analysis in seconds. The tools got soooo good at writing the code, so even that last technical barrier is gone. You don’t need to know statistics. You don’t need to know what you’re looking for.

You just need a dataset.

So I watch my peers open these tools, dropping in data sheets and typing some version of the same thing:

Help me find patterns. Help me find meaning.

And my personal favorite: “What does this data prove?”

But what’s wrong with these questions, exactly?

Spurious Correlations

Take any dataset large enough, and go looking for patterns. You will find them.
Mathematics guarantees it. Here's why.

Flip a coin ten times and you'll probably get a few streaks — three heads in a row, two tails. That doesn't mean the coin is biased. It means randomness isn't uniform. Now flip it ten thousand times across a hundred different coins, and you'll find streaks that look almost impossible. Not because something is happening, but because with enough attempts, unlikely things become inevitable.

If you have a dataset with 100 variables and you test every possible relationship between them, you're running nearly 5,000 comparisons. If each test has a 5% chance of producing a false positive, which is considered acceptable in standard statistics, you'd expect to find around 250 "significant" patterns purely by chance.

This is the multiple comparisons problem. And it’s not an edge case. It’s what happens every time you let a tool loose on raw data without a prior question.

There’s even a website — Spurious Correlations — built entirely around this phenomenon. It documents statistically robust relationships between completely unrelated variables: per capita cheese consumption and deaths by bedsheet entanglement, divorce rates in Maine and margarine consumption. Correlation coefficients above 0.9, as real as they are meaningless.

So what happens when you drop a dataset into an AI tool and ask it to find patterns?

It finds them. Every time. Without exception.

And they look real. They’re coherent, well-structured, sometimes surprising in exactly the right way.

This is, I think, what’s behind the constant stream of LinkedIn posts that go something like: “Claude helped me uncover hidden patterns in our sales data” or “AI found connections in this dataset we never would have spotted ourselves.” The amazement is genuine, because the patterns are real in the data.

But here’s the thing: finding patterns in data, on its own, is not a scientific result.
Karl Popper spent his career trying to make clear: data cannot prove anything.
Not “insufficient data.” Data, in principle, no matter the volume, cannot establish that a relationship is real.

It can only fail to disprove it.

Think about swans. For centuries, Europeans observed thousands of them — white, every single one. Then someone went to Australia. And one black swan didn’t just complicate the theory. It destroyed it. Completely. Instantly. No amount of white swans could have done what one black swan did in a single observation.

This is what falsifiability actually means. A scientific hypothesis is not a claim you confirm with evidence. It’s a claim you expose to evidence that could destroy it.

Theory first. From that theory, a prediction: if this is true, then under these conditions, we should observe X. You test it. If X doesn’t appear, the hypothesis is dead. If X does appear… you haven’t proven anything. You’ve only failed to kill it.

This is why your first statistics class teaches you that strange incantation: we fail to reject the null hypothesis. Not “confirmed.” Not “proven.” Failed to reject.

No number of white swans can convince us that black swans do not exist. But a single black swan is enough to disprove the theory that only white ones exist.

But does this mean that we can not explore any data without the hypotheses?
Well, science actually has a name for this kind of exercise. It’s called Exploratory Data Analysis. And it’s a legitimate, established practice. You take a dataset, look for structure, notice what’s unexpected. No hypothesis required.

But the goal of EDA isn’t to find answers, but to find better questions, questions worth actually investigating.

EDA is where inquiry begins, not where it ends. The problem is that we’ve started treating it as both. Because when an AI surfaces a clean, confident, well-articulated pattern, it doesn’t feel like the beginning of a question. It feels like an answer.

Backwards Science

This, I think, is what my friend meant. Not that computer science is inherently destructive. But that it handed us tools so good at finding patterns in structured data that we stopped noticing when pattern-finding replaced something else entirely.
Because here’s what happens in practice: someone drops a dataset into a model, the model surfaces a correlation, and then a hypothesis appears.

This has a name: HARKing. Hypothesizing After Results are Known. It’s a known failure mode in science, known enough to have an acronym.

And AI doesn’t just make this easier. The pattern comes wrapped in precise language, confidence intervals and structured reasoning. It feels like the result of a process. It has the aesthetics of science. but without the thing that makes science work: the hypothesis was supposed to come first.

Not as a formality, but as a commitment, a declaration of what you expect to find before you look. The moment you skip that step and let the data suggest the question, you’re no longer testing reality.

Can large language models help explain how language is formed, stored, and functions within the system of thought?

....

//////

.

August 2025
////

↳ узнать больше

Традиционные и нетрадиционные поисковики

Традиционные поисковики такие как Гугл Бинг и Брейв

р

аовлымивалм
мвыломврфмврдмивдоимв

мовлаымивимвиморавимроваимроваирмаофмиваом

Can large language models help explain how language is formed, stored, and functions within the system of thought?

....

//////

.

August 2025
////

↳ узнать больше