Discover more from Automatter
The Process Generation Generation
Building a new process is an inherently creative act. That’s why it’s a perfect fit for generative AI.
It’s no secret why. GPT-3, DALL-E, Stable Diffusion, and Midjourney grant superpowers (as finicky as they may be) to produce humanity’s core creative outputs with simple prompts. If you have a keyboard, you can now “create” award-winning “art” with just a couple of keystrokes.AI is here and its first mainstream consumer application is primarily as a toy.
Thanks for reading Automatter! Subscribe for free to receive new posts and support my work.
CommonSim-1 is arguably even more impressive (and immediately applicable). It can be prompted with images, videos, and text to synthesize or replicate 3D objects and scenes.
But there was one recent demo that instantly blew my hair back. And it didn’t look like a toy or a game engine. Behold ACT-1.
Where DALL-E and Stable Diffusion are generating images, ACT-1 is generating processes. And that’s why I’m publishing on Automatter for the first time in over a year.
This is a profound moment. Computers have now progressed from process automation through process mining all the way to novel process generation.
Many enterprises also invest in process mining to feed their process automation initiatives. The premise of process mining is simple: software logs how people work and then articulates a discrete process, which can then be iterated and refined. This hasn’t historically been a focus for Automatter since we focus primarily on small firm productivity, but major players include Celonis and KYP.ai on the process mining side and UiPath and Automation Anywhere on the workflow automation side.
Whether you start at the end result and then laboriously work backwards with a spreadsheet or a database to think through the process or let a process miner capture it for you, you still are fundamentally starting with a human-designed process.
In the process generation paradigm, we can train models to classify and parse incoming data and then allow the models to natively define the ensuing process.
There are two big technological unlocks here:
The aforementioned GPT-3, which can generally produce coherent natural language responses to prompts.
Advanced approaches to few-shot learning (including transfer learning) can significantly reduce the amount of training data and supervision required to train new models – so much so that a single person with access to data on the scale of a single operations team could sort it out.
The combination of those two tools enables developers to daisy-chain models together without structured inputs and outputs.
That’s the promise of what Adept demonstrated. They taught a model to use a browser (and a handful of SaaS tools) and then manually prompted it. If you move one step back – start with the data, train a model to understand and parse that data, and then automate your process using that natural language prompt – you can basically speak a workflow into existence, connecting two products (or more!) without an API.
Thus my reaction in the moment:
Levity.ai is another company that has earned my attention. Assuming they achieve their vision, this would allow anyone (not just developers) to create models without structured inputs and outputs. Levity is heavily influenced by the previous generation of no-code tools. You don’t have to squint very hard to see how they’ll progress from model training to process generation – because they’re already doing it.
This brings the power of process mining to teams with few processes to mine in the first place. It also enables new process generation and automation for relatively small or infrequent tasks – including the operational side of closing a round of venture capital. Previously, an ops person might have spent hours painstakingly sending emails manually, each one with unique details, closing volumes, and signature packets for each investor. A particularly enterprising or technical ops person could have developed a process in a spreadsheet or database and then written a script in Python, but they might spend more time developing it and debugging it than they would have spent copying and pasting a few dozen times.
Now the script is a natural language prompt. With just a little bit of data and a strong prompt, you’ll be able to tell the model what you want, go get a cup of coffee, and come back to see results.
Earlier this month, Ben Thompson interviewed Daniel Gross and Nat Friedman about AI and Nat said this:
There’s just this art in saying, “How are you going to handle hallucination? How are you going to handle errors? How does that make sense in your product? How fast does it need to be?” Which is one of the reasons I think images are doing great, by the way, is that images are fiction. Code is nonfiction, it’s testable, the tests have to pass, it can’t have a syntax error. Images, if there’s an extra stray pixel somewhere, it’s part of the art, there’s no error in a way.
My friend Diana Berlin picked out Nat’s key takeaway and termed it “Find the fiction.”
By asking yourself where’s the fiction in what our users are trying to accomplish?, you’ll find the spots where fuzzy matching is enough, and where surprise can be welcome.
It’s easy to think of processes, especially long-running or opaque processes, as rigid, permanent fixtures – nonfiction in Nat’s framing. But the dirty little secret of any process is that it’s all made up. No process is a law of nature.
A process is a collective fiction we agree on to make the world run a little more smoothly.
And now a friendly neighborhood ML model can help you write that happy story.
As a lapsed professional video editor, the AI video stuff is its own special rabbit hole that seems to come with substantially less difficult baggage re: intellectual property, the human drive to create, and the economic value of creative work. But I’m sure we’ll get there too. I look forward to wacky takes and easy dunks about how these models will replace artists and whether or not we should remove the scare quotes around “create” and “art.”
Incidentally, the man who came to prominence announcing that the next big thing will start out looking like a toy went all in on crypto right before these generative AI toys showed up and now he’s stuck with it. Legendary L for Chris no matter how many times he was on the Midas List.
All of these companies sell solutions at every step of the workflow automation value chain, as is the nature of large enterprise tech companies. UiPath started as robotic process automation and has tried to move backward into mining. Celonis made its name in mining and has pushed forward into automation. If you want to go deeper on this, see The Generalist’s teardown of UiPath’s S-1, which I contributed to and which led me to predict that Celonis would have an easier time integrating forward into automation than UiPath would have integrating backward into mining.