The "VTuber" and Why Artificial Intelligence has Limits

Serial Experiments Junferno.

0:00 VTuber Technology
21:05 Opinion

Join the Discord:
Secondary Discord invite if vanity invite expires:
Check out my other stuff on GitHub:

Footnotes (cut for time):
- Woody Bledsoe later on went on to make contributions in early A.I. pattern recognition. An unknown amount of the research in the Facial Recognition Project is missing, and the research group the report was submitted to (King-Hurley Research group) was found to be a front for the CIA. The CIA stated that it could neither confirm nor deny the existence or nonexistence of the report in 2014.
- Modern marker-based mocap systems can also use LEDs rather than reflective or retroreflective material, which can be more easily detected.
- CodeMiko's mocap hardware involves a combination of an Xsens suit, a pair of Manus VR gloves, and a MOCAP Design facial tracking helmet (which is just a helmet attached to an iPhone X). The suit and gloves use various accelerometer and gyroscope sensors attached to the body to detect motion.
- Infrared light has a wavelength longer than visible light and is commonly used for night vision and surveillance, as it is radiated naturally by humans.
- PrimeSense, the company that designed the Xbox Kinect, was acquired by Apple in 2013 to design the TrueDepth camera. Thus, it can be assumed that TrueDepth's IR technology works similarly to the Kinect.
- The stereo mesh solver from OpenCV can create a 3D surface mesh using an array of cameras at different angles to a face, similar to the Pixel's uDepth.
- Another regression model is logistic regression, which puts the hypothesis through a sigmoid function to bound it between 0 and 1 and compares it to a threshold, thus predicting classification (0 or 1 rather than a range of numbers). The sigmoid function is a common activation function for neural networks.
- The cost function is sometimes referred to as the "loss function". These terms are generally interchangeable though technically "cost" refers to the cost of the entire dataset whereas "loss" refers to the loss of one input-output set. The cost is thus the average loss across the entire dataset.
- The first ConvNet model used for computer vision was by Yann LeCun et al. in 1989. LeCun was one of the first to use backpropagation and was able to recognise hand-written numbers. This model would later be called the LeNet.
- Facial recognition (classification) models can also use statistical approaches such as PCA or LDA for dimension reduction.
- Tensorflow is a free and open-source software library for machine learning and artificial intelligence developed by the Google Brain team. Pre-trained models are available for use and further training via the Tensorflow Model Zoo.
- There are three basic machine learning paradigms: unsupervised learning, supervised learning, and reinforcement learning. Facial landmark detection uses supervised learning, in which it trains off of labelled training data.
- Landmarks are usually mapped to a 3D model by the bones of the model (which the renderer uses as a guide to move the rest of the model). The movement does not have to be exact to the captured landmarks. For example, an avatar could have a different degree of skin elasticity.
- Both Hololive and Nijisanji began for their respective mobile AR apps: The hololive app for iOS and Android and the Nijisanji app for iOS.
- HoloMyth debuted with five members: Gawr Gura, Ninomae Ina'nis, Calliope Mori, Amelia Watson, and Takanashi Kiara. They are the first wave of VTubers under Hololive English.
- Hololive has several other branches besides Hololive English. International branches include Hololive Indonesia and Hololive China (though Hololive China was shut down following political controversy). Holostars is the male-only branch.
- OpenSeeFace's models are trained from MobileNet V3 which is a ConvNet designed for mobile phone CPUs. Models can be built off of other models via transfer learning.
- Kizuna AI went on indefinite hiatus in February 2022 following a last livestream titled "hello, world 2022".
- A.I. mentioned at the end refers to models as they are today and the Eliza effect, and not potential future models that take completely different approaches. An A.I. VTuber would be like an autocomplete framed as a human. Artificial consciousness is a nuanced philosophical topic that is only touched on briefly in this video.

References (password: kizuna):

Photos courtesy of Wikimedia Commons, Fandom
Kizuna AI:
Gawr Gura:
Melty Milo, "Vtubers: Creatively Bankrupt":

Music tracklist:

The "VTuber" and Why Artificial Intelligence has Limits

Post a Comment