Machine Learning Artwork (for non-Experts)
The purpose of this page is to give a general overview of how machine learning works (in a general sort of way), giving a basic understanding so you know how to work with these tools. This isn't how any single system works - there's a lot more to it than that, and it's been simplified significantly to get to the point. Also important to note that pieces here might be wrong for any particular tool (given we don't have access to inner workings for the most part), but should remain generally correct - if not, update it :).
Overview[edit | edit source]
How would you ask an artist to draw an apple? Probably just describe it simply - "an apple, red, on a brown table" if you wanted to be specific. But regardless of how simple an explanation, you'd probably end up with what you generally asked for. The artist can visualize the apple, knows what an apple is, and knows how an apple reacts - to light, to gravity, to someone holding it, etc.
What about when the artist has drawn the apple? How do you know it is, indeed, an apple? I bet you'd recognize most human drawn drawings as an apple, regardless of how abstract or under developed.
What if the artist hasn't seen an apple before? You can generally describe it, maybe sketch one yourself, and give them a general idea - it won't be perfect, but might be good enough in the apple case that they could continue.
Machine learning doesn't work like that. Calling it "artificial intelligence" doesn't get the point across correctly for non-experts. People ask questions like "well why doesn't it understand what a hand is?" or "why can't it just understand two people?" or "can we just load in the knowledge like the matrix?".
Machine learning in this same situation needs:
- A function to generate art from some input text
- A function to check art to see what text it renders to, or to check how close it is to some input text
(and developers generally add their own "secret sauce")
Generally these functions are created by "training" a program with input data. Want to draw an apple? Load up hundreds, thousands, millions of images, and the program gets a sense of what the average "apple" is from this data. It can then use it to evaluate an image - is it getting closer or further away from being an apple? If closer, use that image and edit it slightly, and check again. It can do this very very fast, which is where we get these kinds of tools.
A close analogy to having a human draw the same way is to ask them to draw an apple - but eyes closed, and holding the brush in their teeth. After the drawing, you look at it and say "Yeah... that's 2% of the way to being an apple - do it again, just very slightly differently, and I'll let you know how you did. We'll do that over and over until you have what I think is an apple!". But even that glosses over a lot as we'll discuss a bit below.
So without getting into jargon like Neural Networks, let's just explore what this means, how it gets you "art", and how you can use that knowledge to succeed.
The Process[edit | edit source]
This is a super simplified diagram showing the process.
- We generate a starting image, and feed it into the process
- We do the Art Process
- We check the output image of the Art Process
- We send an image back to the art process (instead of the seed starting image) to continue with
- This loops until we've done enough loops, and the image is done!
Starting Seed Image[edit | edit source]
This basically gives the loop a starting point. Generally this is done by generating an image of pure static using some sort of repeatable algorithm using a seed number. With a given size of image, this static should always be the same.
Art Process[edit | edit source]
The "art process" can be as simple as "randomly change the image slightly in a specific way", but generally is a bit smarter than that. For the purpose of this page it doesn't matter - the key is it does things in a usually repeatable way. Maybe it does this just one way, or maybe it generates a bunch of alternatives - either way it makes its changes and passed it on to the Check Output process. At this point it doesn't know if it's done well or not - it just knows it did some stuff.
Check Output Process[edit | edit source]
This is what we really care about. This process looks at an image and attempts to determine how close it ends up to your original prompt. If given several images, it might pick the "best" one (closest to your output) and send it back to the Art Process for another run.
When you give it multiple prompts or words, it's going to use some system of rating the output to balance out the prompts ("apple:: tart::" is looking for the image that's 50% apple and 50% tart). It may or may not use natural language processing ("apple tart" could be treated as literally the same as "apple:: tart::", but a smarter algorithm would say "well apple tart actually means a specific thing, not just two random terms together").
Either way, the key here is that it's just looking at the output from the art process, and seeing what it looks like to see if it's improving. You can compare this to something like Google Lens (which is trained on a vast data set and is much better at identifying objects from photos) to see a version of this technology in your hands on your phone if you are curious.
Image Done![edit | edit source]
Once you've done the loop enough times, or if the Check Output Process is certain enough it has what you want, it ends! Here it might still run yet other algorithms for general cleanup / post processing that the devs have added on to fix certain issues or add certain functionality.
Other Notes[edit | edit source]
It might change the art process and output process throughout (for example you might finish slightly differently than you started, or have a different algorithm for going from complete static to blobs before switching for details, etc). This is also a vast simplification (can't overstate this), but helps build understanding for the next part.
How Can I Use This Knowledge?[edit | edit source]
Knowing the computer is just kind of semi-randomly trying things and trying to get towards an image it recognizes as your prompt, and it does so with input data it's been trained with...
You can start understanding why it can't "just draw hands in the right place". It doesn't know to look at only parts of the image, it doesn't "know" hands go at the end of arms, it doesn't "know" how many fingers a hand has... It just doesn't work like a human drawing stuff does. The developers can't just teach it like it does a person, it can't load it with info like "The Matrix". The developers have lots of knobs they can use to improve things (more data, tweak the algorithms for drawing, add post-processing , etc) - but as far as I'm aware there is no mechanism to teach a machine to draw anything perfectly the first time :).
It knows it has an image recognition algorithm that's trying to find a "hand" in the image, while it's also likely trying to find a bunch of different stuff. It's trying to meet all of your demands at once too. So what is a user to do?
This is where prompt crafting comes in. Given it doesn't have a firm understanding of anything, how do we craft inputs that the model reliably evaluates to certain features, or styles, or otherwise? How do we reliably improve the "hands" aspect to get correct hands on figures, while not having that prompt take over the rest of the image and ruining our result? That's effectively the purpose this wiki - to help you get what you want without getting things you don't want happening.