Outrunning the AI Hype Train: How I Built a Swarm to Read Research Papers Before YouTube Does

I found out about Hermes the way most people find out about anything in AI now. From a YouTuber with a few hundred thousand views.

To my credit, I did not just watch the video. I went straight to the repo, opened NousResearch/hermes-agent/plugins/memory/holographic/, and started pulling on the thread. I’d traced it back to Holographic Reduced Representations, a vector-symbolic encoding from Tony Plate’s 1995 PhD thesis. A 30-year-old idea, now load-bearing in one of the most hyped agent harnesses of the year.

Plate 1995 (the algebra) → Ganesan et al. 2021 (made it work in deep learning) → Hermes 2026 (productionised it in an agent)

I’d dug in before most second brain bros knew what HRR stood for.

But here is what bothered me.

While co-founding AskRally, I was pushing on the edges, defining what was on the curve. I was testing new patterns and publishing the results of field experiments. Even to this day, I still catch the occasional customer using a default purely AI generated audience… unaware of our most advanced persona generation method. Mike and I were building while the world caught up. Now I was finding out about primitives as they were already popping off on the big AI accounts. The fact that I caught up faster than most viewers was not the win I wanted.

The edge I was looking for was not in the brand. The edge I was looking for was in the primitive underneath. The paper, the architecture, the 30-year-old PhD thesis, six to twelve months before any influencer wraps a thumbnail around it. By the time it is a YouTube video, the alpha is gone.

The problem. Scientific papers are a slog. They are not written for operators. They are written for other researchers who already have the context, and they take some effort to extract the actual primitive underneath the literature review.

So I built FlowScout to do the slog for me. Read to the end if you want to be able to make a knowledge base of paper digests to use in your agentic work.

1. Start Small. One Paper, One Command.

The first move is the smallest one that works.

I built a slash command called /digest-paper. You give it a URL. An arxiv link, a PDF, an HTML page, anything. It returns a structured markdown digest with the parts an operator actually wants. The TLDR. The key takeaway in one sentence. The method, simplified. The figure most worth looking at, cropped and saved locally in markdown. The citations you would want to walk next. And the bit I obsess over: “what experts in this field overlook” which is where the alpha tends to hide.

The command takes a --lens flag. Same paper, different reader. This is where you can add a bit of context on who you are, and what your team is working on. Helps make the paper digest more relevant for you. One paper. Different signal streams.

Why a single command for a single paper? Because composition. If you can read one paper reliably, the rest of this article is just chaining the primitive in different shapes. The whole stack collapses if /digest-paper is unreliable.

Under the hood, the command fans out eight parallel sub-agents in a single dispatch. Each one extracts a different field (TLDR, key takeaway, method, citations, what experts overlook, and so on) and their outputs compose into one structured markdown file with YAML frontmatter and a cropped figure of the most useful diagram in the paper.

End state: one paper, six minutes, a digest dropped into memory/knowledge-sources/papers/, searchable forever. Works great if you have QMD setup.

2. Now Follow the Citations.

The thing about a paper is that it is a node in a graph. The references are the edges.

/citation-walk is what walks the graph for you. You give it a seed paper and a topic. It pulls every reference, scores each one against the topic, and digests the relevant ones in parallel. Each digest is itself a candidate seed for the next walk. The corpus grows outward from your starting point in concentric rings.

But the clever part is the four modes.

--broad covers the citation neighbourhood. Wide and shallow. Good for figuring out the shape of a field you have never read.

--deep follows the single most-relevant thread. Narrow and far. Good for chasing a specific mechanism through three or four generations of papers.

--canonical finds the foundational works. The papers that get cited by three or more digests already in your wiki. This is how you find the McClelland-McNaughton 1995 paper on Complementary Learning Systems, which everyone in modern agent memory keeps almost-citing but never names. The grandparent papers.

And then --orbit. This is the one that matters for operators.

Orbit mode takes the key takeaway from a paper, mutates it four different ways with an LLM, and runs those mutations as queries against the open literature. It finds papers connected by idea, not by citation. Papers that do not share a reference graph but are pointed at the same problem from a different angle. That is where the unbranded primitives live. That is where the alpha is, because nobody has connected the two yet. It found me a 1997 Olshausen paper on sparse coding that lives nowhere in the modern agent-memory citation graph, but is functionally the same idea as half the papers being written about activation steering in 2026. That is the kind of connection no YouTuber is making yet.

3. Chain Them. One Topic, One Command, Four Modes.

You can run citation-walk by hand. But the point of building a swarm is to stop running things by hand.

/research-cycle chains the four modes in sequence on a single topic. Broad first to map the neighbourhood. Canonical next to surface the foundational works. Deep after that to follow the most-relevant thread the broad walk uncovered. Orbit last to find the lateral connections. Then a longitudinal meta-digest that reads every prior cycle’s output on the same topic, so the narrative compounds across runs.

The first complete cycle I ran was on agentic memory. Nineteen papers, a 2,726-word synthesis, about ninety minutes of wall-clock. The synthesis itself was the artifact I cared about most. It was not a list. It was a story. “Here is what the field believed in 2022. Here is what changed in 2024. Here is what is still load-bearing assumption nobody has tested.”

That synthesis is the thing a YouTuber would make a year later if they were any good.

I had it in ninety minutes.

4. The Thing Nobody Tells You. It Will Break.

I am going to interrupt the ascending power curve to tell you what broke, because if I do not, you will hit the same wall.

The first version of /research-cycle was nested. The orchestrator dispatched a citation-walk sub-agent, which dispatched a digest-paper sub-sub-agent. Clean on paper. Disastrous in practice.

The figure images did not extract. The first ten papers in the run, zero graphs pulled. The summaries felt flat. I looked at the logs and saw the failure mode. The fix was unglamorous and important. I flattened the dispatch. The orchestrator now drives /digest-paper directly at the top level, one paper per parallel call, each call getting a fresh context window. Three out of three figures extracted on the next run.

This is the kind of thing you only learn by running the loop. Contact with reality is where you grow.

5. Let Them Run.

Once the cycle was stable, I did the thing the publication is named after.

I wrapped the whole command in /loop /research-cycle and walked away.

/loop is a meta-command that just calls another command repeatedly. New topics, new cycles, until I tell it to stop. I left it run for 2 hours. Three cycles finished while I was on calls. I returned to over a hundred papers digested. A 5,716-word longitudinal synthesis that read like something a postdoc would have written after a month of immersion.

I was hald expecting to see a partial run with errors. Instead the corpus looked good. This is the moment the publication name earns itself. The harness keeps the loop alive. Checkpoint files in experiences/citation-walk/ make every step observable. Trust comes from observability, not from staring at the terminal.

This is also the moment my time-cost-of-research dropped by an order of magnitude. Now you go run one. Start with one paper. Read the digest. Then scale up as you hunt for an edge.

6. From Corpus to Claim.

A corpus is just a pile until you mine it.

/flow-frontier reads the entire wiki and looks for cross-paper theses. Not summaries. Theses. Things like, “the field has converged on X assumption, but here are two papers that quietly contradict it.” It searches for five gap-types. Convergence (everyone agrees, but the agreement is suspicious). Unstated-assumption (papers keep relying on a thing nobody has measured). Mechanism-gap (results without an explanation of why). Edge-of-consensus (the disagreement is shrinking but has not resolved). Direct contradiction (paper A and paper B cannot both be right).

The first run found twenty-three theses. Some were obvious in retrospect. Some I would never have spotted by hand. One of them: “across twenty-eight years of memory papers, the forget-gate gap recurs from LSTM 1997 through Memory-R1 2025, but nobody names it as a unified phenomenon.” That is a publishable observation. The swarm extracted it from a corpus I had not manually re-read.

Then /verify-thesis runs adversarial search against each thesis. It generates four queries designed to falsify the claim, runs them through Exa and WebSearch, scores every candidate paper as supports, contradicts, qualifies, or irrelevant, and writes the verdict back into the thesis file. If the thesis stays open after verification, it drafts an experiment design.

This is the layer that turns reading into research. Reading produces a corpus. Mining produces theses. Verification produces claims you could be first to test. For an operator, that is the actual edge. Not “I read this paper.” But “I have a falsifiable hypothesis nobody is testing yet, and I can ship a product around it before the YouTubers know it exists.”

7. Shipping the Feature Back Into the Second Brain.

Here is the part I did not see coming. A paper from the corpus shipped a feature in Flow OS, my personal second brain.

The paper was Adler 2026, Storage Is Not Memory. Its core argument is that most “memory systems” for LLMs treat storage and retrieval as separate concerns, when in fact the retriever should be load-bearing at write time too. One line in the digest stood out: “when /learn is about to write a new memory, run the candidate text through qmd query against the existing memory first and use the hit profile as a write-time signal.”

So I added that. Research to feature test.

Now when my /learn command is about to commit a candidate memory, it now runs qmd query (the hybrid BM25 + vector search inside my second brain) against the existing corpus first. If the top hit scores above 0.90, the candidate is dropped and an evidence link is written instead, pointing at the existing memory. Middle bands merge or down-route. Only candidates with no strong neighbour get written as new memories. If you want to install QMD on your second brain, point CLaude at this.

That is what reading earlier buys you. Not bragging rights about HRR. A second brain that gets sharper while you sleep, and a private pile of falsifiable claims sitting six to twelve months ahead of the YouTubers.

The next HRR is already in the corpus somewhere. Whose corpus is the only question.

FlowScout is open-source here. Run it on whatever niche you are trying to find an edge in. Tell me what you find.