Sunday, April 12, 2026

A Journal

Last year my brother Tom described a project started in high school. It was a school project, the teacher had the students keep a journal. Since then Tom has kept his up, off and on, with gaps lasting months to years. We had been discussing AI stuff, and I was describing the handwriting capabilities of the models. I had discovered how good things were while doing research on ancestry.com. He wondered aloud what AI could do with the thousands of pages he'd written over the years. I grew up in near poverty, so my compulsion for saving things is obsessive at times. The thought of an organized workflow to archive his journal appealed to me on so many levels. I told him it could be done, but I didn't know the effort involved. Wouldn't know until I got a gander at the books themselves.

That was last year. Recently I brought it up again. He said he didn't have time to devote to such a project. I told him I would do it all. I wanted to test transcription on my local LLM machine.


Digitizing Decades: Bringing Tom's Journal into the Digital Age

Tom has kept a journal since high school. Off and on over the decades, with some stretches of months, and at least one gap close to ten years. Still, more pages than most people ever put down.

The first step was scanning. For most of the notebooks I used a picture scanner. The two bound books — notebooks 7 and 9 — were a different story. The binding made it impossible to lay them flat, so I photographed each page with my phone. That introduced a new problem: the pictures needed to be cropped down to just the page. Partly for looks, but also because the transcription AI does much better with a clean image.

For the cropping I used the Google Photos app on a Samsung tablet. There's an edit feature that lets you drag the four corners freely to match the edges of the page — fitting the frame to whatever angle or shape the photo came out. Painstaking, corner by corner, page by page. But it works. The two bound books were done out of order, which is just how things went. As of this writing the transcription model is still working through them.

For the other notebooks, the picture scanner produced JPEGs that I collected into PDFs, one per notebook. I worked through them in no particular order: notebooks I, IIA, IIB, V, the green bound book, and eventually the two bound books. The big spiral notebook was the last one standing when I left for a Vegas trip. By that point scanning had become routine.

Early on I scanned two pages at a time, side by side. That turned out to be a problem — the transcription had a harder time sorting out two pages in one image. I adjusted as I went, and the later notebooks came out cleaner.

Getting the text out of the images took some experimenting. I tried Google's picture scanner app first, then Google Docs, Gemini, Claude — testing different tools to see what gave the best results. Eventually I set up a workflow on my local machine using a vision model built for reading handwriting. I had Claude write a Python script that processes a whole folder of images, sends them through the AI one by one, and saves everything to a single text file. Hands-free. I refined the script as I went — smoother file handling, better organization. For the two-page scans I updated the prompt to tell the AI to read left page first, then right, with a clear separator between them.

The transcriptions aren't perfect. Handwriting never makes it easy. But they're better than you'd expect — good enough to work with, and cleanable where needed. Everything has been uploaded to Google Docs so Tom can get into it whenever he's ready.

One thing that turned up along the way: poetry scattered through the pages. Looks like all of it is original. I mentioned pulling it out separately, but haven't gone further than that.

A friend, Steve Mays, got interested in the project. He's done something similar and brought up Samuel Pepys — the 17th century diarist who kept a journal for nearly ten years, over a million words. Hard not to think about that in the context of what Tom has.

For the AI processing of the content itself, the prompt I've been working with asks the model to approach it like an archivist — preserve the feeling behind the words, not just the words, and shape it into something readable, something a younger family member could connect with. That's the direction, anyway.

The whole thing has some weight to it. I'm going through treatments right now, and the week after my last session — which got postponed once when my counts were too low — is when I planned to get into it more. Felt like the right time to have something like this put together.

No comments:

A Journal

Last year my brother Tom described a project started in high school. It was a school project, the teacher had the students keep a journal. S...