Be very careful, because if you look now, you might just get lost in the AI video slop of the future.
A few years ago a paper came out called Attention is all you need. We didn’t know at the time, but this would have a very profound effect on humanity, because the folk at OpenAI would take this and turn it into the predictive model that’s now known as the GPT series. Essentially training on a massive corpus of data, mostly internet data, they are able to predict the next token (characters or words) in a series that essentially then starts to look like human intelligence.
What you get back from text you put in is almost coherent if not clever responses that mimic humans but also paragraphs of text pulled from various sources or mashed together based on some semblance of proximity in terms of their relationship either in a given context or just potential likelihood that if you say cat the next word might need to be fish… gotcha!
Today though we’ve shifted from this text generation to generate anything. The models got better more advanced, we developed new models and multi modal models that could process text, audio, video and images. Essentially the models are starting to represent some of the core functionality of the very limited part of our brains that process and regurgitate this information.
So where does that get us to now? Well it gets us to something like Sora 2 or Veo 3.
What are these things? They are video generation models. Essentially AI that can generate any sort of short form video based on an input prompt. In time they will likely be able to generate hours if not continuous live content, but for now its short form given the processing power required. So you’d think, wow this is pretty amazing, we could generate worlds to explore, the stars, we could look deep into atoms and try to understand how molecules work. We could generate all sorts of helpful informational content and inspire.
Yet where are we. In very short order, Sora 2 has turned into a grab for a new attention seeking social network that puts the AI content first. What that means is we’re focused on delivering even more extremely addictive content than was previously seen on TikTok or other networks.
You’re now going to see infinite videos generated by humans initially, then AI algorithms in a recursive loop all to keep you watching, all to keep you swiping, all for the purpose of making some new conglomerates money. Yea its all about the money. We’re not saving humanity, we’re now in the matrix farm in pods being drip fed brain rotting sugar in the form of generative content. Don’t take my word for it, here’s some initial reviews from Vox, The Guardian and the hated Business Insider.
Listen, this was inevitable. The idea of being able to generate anything and then handing it over to humanity, it’s worse than the raw uncensored internet of the 90s. I thought that was bad, being a 10 year old with access to anything, but this is much much worse. If you can generate literally anything, then where are we headed? Not just celebrity fakes, but extreme violence, pornography, abuse, it’s going to be very bad. The future of your attention and our children’s attention is on the line. The future of humanity is literally on the line here.
If you are an early user of Sora 2 or related services, please be cautious. If you are prone to doomscrolling and swiping, step away. We don’t have a handle on this yet, we have no way of stopping it, but we can refrain from using it ourselves. We need to know better.
If this is the future of attention, I don’t want to live in this future. Let us find a way to create something else.