All humans imagine and all humans are certainly creative in their own way. There must have been many situations where we get some creative visualization in our head but it will be way too complex to be brought out in a paper or as a digital art.

Not everyone is skilled enough to pour out their creativity exactly in a paper or as a digital art, it is time consuming and demands a lot of skill but there are people who can just pull out a sheet or a drawing tablet, swiftly move their hand and produce an amazing art. And there are people (like me) who can’t even draw a straight line using a ruler (sigh!).

But worry no more, this is the rise of the generative language model era and all crazy imagination can be visualized by just phrasing it in the right way. What do I exactly mean by generative language model? Lemme explain.

Let me take an imagination, “Anime, village, midnight, 1900s, lanterns”. Now, you would have had an imagination after reading the line, will you be able to reproduce it? (it’s a rhetoric question, don’t answer it…unless you are way too good at art) but Artificial Intelligence can do so.

Generative Language models are AI models which can understand any random line, make sense of that line and create its own illustrations based on its understanding and the most astonishing thing is that, it is way too damn accurate (for most of the cases).

The name is pretty self-explanatory, based on the Language (ie) the given phrase, it can Generate its own illustrations. Now how cool is that! It truly is the state of the art.

The important thing about such models is that, it takes a lot of time to train the models and it is really expensive, so only a few only have tried it. You might be wondering why an article about this? Well, these models have taken the world by storm recently. There are handful of such models out there but we are going to see only about Stable Diffusion by, Dall-E 2 by Open AI and Mid Journey.

Now let’s go one by one!

Stable Diffusion

Stable Diffusion is a generative language model developed by Now here is a little bit of company background, founded by Emad Mostaque, is a man on a mission to create open-source AI projects under the company named, created in late 2020s.

Stable Diffusion is a diffusion model which can convert text to images. LAION-5B is the largest, freely accessible dataset with multiple modals and it was used in stable diffusion.

Okay now you may ask “What is a diffusion model?”. Well, a diffusion model is a type of AI model where first, the model adds random pixels values (noise) to image and then it starts to remove those noise values (de-noising) from the image based on conditional probability – the probability of occurrence of an event based on the occurrence of a previous event and then perform the de-noising process until an actual image is formed.

A reasonable analogy would be, during school days, for art projects we would have done a similar process. We would apply glue on a paper or board in a particular design and then sprinkle glitters over it in random fashion (noising) and then we would tap off the excess glitter from the sheet or board (de-noising) to obtain the final output. Here we know exactly what we want but for the model it will based on the probability so this becomes an iterative process with complex computations thus categorized under deep learning.

Since it is open source, I got hands on with stable diffusion (knuckle cracks). Do you remember what I told you to imagine, well here is what stable diffusion imagined…

“Anime, village, midnight, 1900s, lanterns”

Beautiful ain’t it? This is the power of generative language models.

All these mind-blowing demonstrations come with a cost. Stable Diffusion was trained using 256 a100 GPUs, 150,000 hours of training which altogether costs around $600,000. Now you know why only a few have tried out these models.


The recent sensation from Open AI is DALL E 2 and people were just in awe after seeing the AI produced images. Open AI has already claimed its position in the field of AI and it is not new for them to create mind blowing creations in AI.

DALL E 2 is a generative language model which follows the diffusion model similar to Stable Diffusion but uses a different approach and it also does a pre-processing step called CLIP. CLIP stands for Contrastive Language-Image Pre-training and what it does is that, rather than simply labeling images based on the image context, it produces a caption for that image which will be useful for training.

DALL·E 2 would cost $131,604 to train on AWS, assuming a p3.16x-large at market rates. Could be as low as $40k if you already paid for reserved instances.

We are not going to dive deep into DALL E 2 because we already have a blog about DALL E 2 in depth by Mrithika Sivakumar. Check out the blog to learn more about DALL E 2.


Another generative language model in the list is Midjourney and the company also goes by the same name.

Midjourney is a small, self-funded team, independent research lab on a mission to ”exploring new mediums of thought and expanding the imaginative powers of the human species.”

Midjourney has been used in various areas in the recent timeline and the results it produces is just too fascinating. It has its own trademark style of art generation which differentiates itself from the other generative language models.

The model architecture and the cost associated with making the model has not been disclosed at the time of writing this blog.

Midjourney can be accessed using Discord – a social hangout platform and currently each user account gets few free trials of the model using the command prompts given the Midjourney server and once the free trial ends, one can opt for subscription-based access to Midjourney model. Check the official Midjourney website for more details.

Here are some of the images I generated using the Midjourney (obviously using the free trial xD)

Demon slayer zenitsu with yellow lightning around him

Binary Space

Futuristic Mario and bowser castle

Well, we saw all those beautiful arts generated by the AI models yet they are not perfect, even after so many hours of training they do make mistakes because of the sheer amount of data that are available in the world and the processing power required to perfect the model. Certainly, this is the rise of the generative language models and there is so much more room to explore and on some day, we will see these models at their peak performance.

These models certainly have their own sense of humor. These ’art’ificial intelligence models are surely artistic.

These examples show that generative language models are going to be prominent in the future and there has been debate already that these models will replace artists. So here is my take on this topic, these models are far away from perfection and they are not going to replace human artists as only humans only can express feelings and emotions through art and on the other hand these AI models can only understand the meaning of the phrase but cannot give its own touch of authenticity to the art it makes.

These AI models will support human artists to explore various art forms and ideas, thus again proving that AI is here only to co-exist with humans not overthrow them. Let’s embrace the future and create a better world to live in.

Click here to read about the international day of peace

Click here to read about the time travel paradox