I don’t know if this community is intened for posts like this, if not, I’m sorry and I’ll delete this post ASAP…
So, I play TTRPG (mostly online) and I’m a big fan of visual aids, so I wanted to create some chahrcter images for my charakter in the new campaign I’m playing in. I don’t need perfect consistency as humans usually change a little over time and I only needed the character to be recognizable on a couple of images that are usually viewed on their own and not side by side, so nothing like the consistency you’d need for a comic book or something similar. So I decided to create a Textual Inversion following this tutorial and it worked way better than expected. After less than 6 epochs I had a consistency that was enough for my usecase and it didn’t start to overfit when I stopped the training around epoch 50.
Then my SO, who’s playing in the same campaign asked me to do the same for their character. So we went through the motions and created and filtered the images. A first training attempt had the TI starting to overfit halfway through the second epoch, so I lowered the learning rate by factor five and started another round. This time the TI started overfitting somewhere around epoch 8 without reaching consistency before. The generated images alternate between a couple of similar yet distinguishable faces. To my eye the training images seem to have a simliar or higher quality than the images I used in the first set. Was I just lucky with my first TI and unlucky with the other two and simply should keep on trying or is there something I should change (like the learningrate that still seems high to me with 0.0002 judging from other machine learning topics)?
That’s a great tip, thanks for posting it. Seems to me both methods are useful, just use case dependent.