partly because it uses an
alternative approach that has
come of age in the past year or
so, called a diffusion model. The
network is still trained using
images, but it handles them
differently. It gradually and
deliberately destroys them
by adding noise.
A pristine image has a layer
of noise added that degrades it
slightly, and then more noise
is added and so on, until the
image is pure chaos. The AI,
known as a neural network,
watches this process and
consequently learns how to
reverse it. It can then begin
with an input that is nothing
but noise and efficiently work
towards a photorealistic image –
effectively un-destroying a new
image into existence.
When applied to a situation in
which an AI creates images from
text descriptions, this approach
is far more efficient in terms
of computer power than the
approach used in DALL-E.
What’s more, the results are
of higher quality. In a test of the
software’s performance, human
judges preferred GLIDE’s images
over those from DALL-E 87 per
cent of the time in terms of
photorealism, and 69 per cent
of the time based on how well
they matched the text input
(arxiv.org/abs/2112.10741).
Although each GLIDE image
still takes 15 seconds to create
on an A100 graphics processing
unit (GPU) that costs upwards
of £10,000, the work represents
an important step forward, says
Malekmohamadi. “I’m glad to
see that this kind of research
direction is leading toward a
smaller model that could be
trained on less powerful
GPUs,” he says.
The method of destroying
data to train the AI may seem
counter-intuitive. “You take an
image that’s pristine and clear
and you take it all the way
down to the point where it’s
completely unrecognisable;
[the AI] is in fact learning the
opposite, which is taking
something that’s completely
unrecognisable and ‘restoring’
it back to pristine condition,”
says Mark Riedl at the Georgia
Institute of Technology in
Atlanta. He believes that AI and
diffusion models such as GLIDE
will have a big impact on photo
editing. “Photoshop will
become neural,” he says.
The OpenAI researchers, who
weren’t available for interview,
say in their paper that GLIDE can
find it hard to produce realistic
images for complex prompts. To
try to solve this, they added the
ability to edit the initial images.
Users can ask GLIDE to create
“a cosy living room” and then
select a region of the resultant
picture and ask for more details,
such as “a painting of a corgi on
the wall”. Riedl believes that this
sort of process will one day be
seen in commercial software.
❚
Do'stlaringiz bilan baham: |