Why GlTF is the JPEG for Metaverse and Digital Twins

We’re excited to bring Transform 2022 back in person on July 19 and virtually from July 20-28. Join AI and data leaders for in-depth discussions and exciting networking opportunities. Register today!


The JPEG file format has played a vital role in the web’s transition from a text-based world to a visual experience through an open and efficient image sharing container. Now the Graphics Language Transmission Format (glTF) promises to do the same for 3D objects in the metaverse and digital twins.

JPEG used several compression tricks to significantly reduce images compared to other formats such as GIF. The latest version of glTF also uses compression techniques for both the geometry of 3D objects and their textures. The glTF already plays a pivotal role in e-commerce, as evidenced by Adobe’s push into the metaverse.

VentureBeat spoke with Neil Trevett, president of the Khronos Foundation that manages the glTF standard, to learn more about what glTF means for businesses. He is also vice president of developer ecosystems at Nvidia, where his job is to make it easier for developers to use GPUs. It explains how glTF complements other digital twin and metaverse formats like USD, how to use it and where it goes.

VentureBeat: what is glTF and how does it fit into the ecosystem of file formats associated with metaverse and digital twins?

Neil Trevet: At Khronos, we have put a lot of effort into 3D APIs such as OpenGL, WebGL and Vulkan. We’ve found that every app that uses 3D has to import assets at some point. The glTF file format is widely used and very complementary to USD, which is becoming the standard for authoring and authoring on platforms such as Omniverse. USD is the place to be if you want to bundle multiple tools into advanced pipelines and create very high-quality content, including movies. This is why Nvidia is investing heavily in USD for the Omniverse ecosystem.

On the other hand, glTF focuses on efficiency and ease of use as a streaming format. It’s a lightweight, streamlined and easy-to-handle format that can run on any platform or device, right down to web browsers on mobile phones. The tagline we use as an analogy is that “glTF is the JPEG of 3D”.

It also complements the file formats used in authoring tools. For example, Adobe Photoshop uses PSD files to edit images. No professional photographer would edit JPEG files because so much information has been lost. PSD files are more advanced than JPEGs and support multiple layers. You wouldn’t send a PSD file to my mom’s cell phone though. You need JPEG to stream it to a billion devices as efficiently and quickly as possible. USD and glTF thus complement each other in the same way.

VentureBeat: How do you transition from one to the other?

Trevet: It is essential to have a transparent distillation process from USD assets to glTF assets. Nvidia is investing in a glTF connector for Omniverse so that we can seamlessly import and export glTF assets to and from Omniverse. At the Khronos glTF Working Group, we are pleased that the USD is meeting the industry’s needs for a creative format because it is a huge amount of work. The aim is for glTF to become the ideal USD distillation target to support widespread adoption.

A creative format and a delivery format have very different design requirements. The USD design is all about flexibility. It helps to put things together to create a movie or VR environment. If you want to import another element and merge it with the existing scene, you must keep all the design information. And you want everything to have true resolution and quality.

The design of a transmission format is different. For example, with glTF vertex information is not very flexible to rewrite. But it’s sent in exactly the form the GPU needs to run that geometry most efficiently through a 3D API like WebGL or Vulkan. So glTF puts a lot of design effort into compression to reduce download times. For example, Google contributed their Draco 3D Mesh compression technology and Binomial contributed their Base Universal texture compression technology. We are also starting to pay a lot of attention to Level of Detail (LOD) management so that you can download models very efficiently.

Distillation allows you to switch from one file format to another. A big part of that is getting rid of design and build information you no longer need. But you don’t want to diminish the visual quality unless you really have to. glTF allows you to maintain visual fidelity, but also gives you the option of compressing things when you’re aiming for a low-bandwidth bet.

VentureBeat: How Much Smaller Can You Shrink Without Losing Too Much Loyalty?

Trevet: It’s like JPEG where you have a dial to increase the compression with an acceptable loss of image quality, only glTF has the same for geometry and texture compression. If it’s a geometry-intensive CAD model, the geometry will be the bulk of the data. But if it’s more of a consumer-oriented model, the texture data can be much larger than the geometry.

With Draco, reducing data 5-10 times is reasonable without significant loss of quality. There is also something similar for the texture.

Another factor is the amount of memory required, which is a valuable resource in mobile phones. Before binomial compression was implemented in glTF, people sent JPEGs, which is great because they are relatively small. But the process of extracting to a full-size texture can take up hundreds of megabytes even for a simple model, which can hurt a cell phone’s power and performance. With glTF textures, you can take a super-compressed texture in JPEG format and immediately decompress it to a GPU-native texture, so that it never reaches its maximum size. As a result, you reduce both data transfer and memory requirements by 5-10 times. This can be useful when downloading assets in a browser on a mobile phone.

VentureBeat: How do people effectively represent the textures of 3D objects?

Trevet: Well, there are two basic texture classes. One of the most common are simple image-based textures, such as mapping a logo image onto a t-shirt. The other is procedural texture, where you generate a pattern, such as marble, wood, or stone, just by running an algorithm.

There are several algorithms you can use. For example, Allegorithmic, recently acquired by Adobe, pioneered an interesting texture generation technique now used in Adobe Substance Designer. You often change this texture in an image because it is easier to process on client devices.

Once you have a texture, you can do more than stick it to the model like a piece of wrapping paper. You can use these texture images to create a more refined material look. For example, with Physical Rendering Materials (PBR), you try to go as far as you can mimic the characteristics of real-world materials. Is it metallic, giving it a shiny look? Is it translucent? Does the light break? Some of the most advanced PBR algorithms can use up to 5 or 6 different texture maps feeding parameters that characterize the degree of gloss or translucency.

VentureBeat: How has glTF progressed on the scene graph of representing relationships within objects, such as how car wheels can turn or connect multiple things?

Trevet: This is one area where the USD is ahead of the glTF. So far, most glTF use cases have been met by a single asset in a single asset file. 3D commerce is an important use case where you want to assemble a chair and drop it in your living room like Ikea. This is a unique glTF agent and many usage scenarios are satisfied with it. As we enter the metaverse and VR and AR, people want to create scenes with multiple resources to deploy. An active area discussed in the working group is how best to implement multi-glTF scenes and assets and how to link them together. It won’t be as sophisticated as the USD as the emphasis is on shipping and delivery rather than creation. But glTF will have something in the next 12-18 months to allow for the compounding and linking of multiple assets.

VentureBeat: How will glTF evolve to support more metaverse and digital twin use cases?

Trevet: We need to start bringing things beyond the physical appearance. Today we have geometry, textures and animations in glTF 2.0. The current glTF says nothing about physical properties, sounds or interactions. I think many of the next generation extensions for glTF will include this kind of behavior and features.

The industry is now deciding it will be USD and glTF going forward. While there are older formats like OBJ, they are starting to show their age. There are popular formats like FBX that are proprietary. USD is an open source project and glTF is an open standard. Humans can participate in both ecosystems and help them evolve to meet the needs of their customers and the market. I think the two formats will evolve a bit side by side. The goal now is to keep them aligned and that efficient distillation process in between.

Leave a Comment