Matterport: Pioneers in 3D Digital Twin Technology

Matterport, a leader in 3D digital twins has digitized more than 35 billion square feet, making them one of the largest players in the domain. The company’s Vision & Learning team drives the company’s AI/ML capabilities. Alan Dolhasz manages the research activities of this team with a focus on computer vision and machine learning problems.

Furthermore, the team is responsible for rapidly assessing new research, converting promising results into fully fledged products that answer vital questions about the scanned spaces. They are at the core of Matterport's innovation, developing machine learning models on opt-in data to predict useful information about their spaces based on their extensive datasets1. Importantly, Matterport's highly selective approach to data utilization, aligned with customer privacy settings, ensures model accuracy and compliance amidst diverse data usage preferences.

Matterport: Pioneers in 3D Digital Twin Technology

The Challenges

Before adopting Activeloop, Matterport faced challenges in managing their colossal datasets2. With over 7 million scanned spaces, the sheer size of the data posed significant logistical issues.

"Imagine you take a million Matterport spaces, each one might have a hundred photographs taken inside of it. You've got effectively two images that you need to store and maintain for every one of the 10 million items in the dataset. Very quickly, this becomes impossible to carry around"

Alan Dolhasz

Manager, Machine Learning Development at Matterport
Alan Dolhasz
  • 1

    Rapidly evolving vast data

    The dynamic nature of Matterport's datasets3 introduced certain challenges. As Alan observed, "With every new engineer undertaking a project, there was an initial phase dedicated to transferring data, which, while necessary, involved a considerable amount of foundational work." This aspect of the process meant that a significant portion of time was invested in preliminary tasks such as data preparation and basic coding routines.

  • 2

    Lack of standardization

    The absence of a unified standard in data management occasionally led to variations in how ML datasets4 were created, leading to a less streamlined approach across different projects. While this diversity in methods offered flexibility, it also underscored the potential for enhancing organizational coherence.

  • 3

    Experimentation and training models in the cloud

    The process of setting up a new machine learning project was time-consuming since it involved downloading a large dataset5 from a cloud storage service like S3 and moving it back and forth. The transferring, storing, and tracking changes of these datasets were time-consuming and complex. As Alan highlighted, "Very quickly, as you scale up, this becomes super hard." This offered a valuable opportunity for streamlining processes within the dynamic environment at Matterport.

The Solution

With its capacity to handle multimodal data, Deeplake significantly streamlined the data handling process for Matterport's machine learning projects.

Deeplake just made it super easy for us to scale horizontally the different data modalities that we use.
Alan Dolhasz

Alan Dolhasz

Manager, Machine Learning Development at Matterport
Multimodal support

Deeplake provided a uniform, efficient storage format for Matterport's datasets, allowing stakeholders across teams to store data in an ML-native format, and abstract away a lot of the boilerplate code required to set up a training pipeline for one project.

Deeplake knocked out like 80 percent of the data random work associated... because once you've done it, that's it. Nobody else has to repeat that process unless you change the dataset.
Alan Dolhasz

Alan Dolhasz

Manager, Machine Learning Development at Matterport
Data Standardization

With Deeplake's streaming dataloader, Matterport was able to stream their data real-time to training frameworks, utilizing compute resources efficiently. With Deeplake datasets acting as 'magic links' within the code, Matterport team was able to plug and play the dataset they wanted to rapidly iterate on choosing the best model architecture for the problem at hand.

With Deeplake, it's literally changing one line and we can train on a completely different dataset. This is something that would take at least a day before
Alan Dolhasz

Alan Dolhasz

Manager, Machine Learning Development at Matterport
Streaming

Data Visualization

Deeplake's powerful UI for complex data visualization allowed the team to share datasets6 easily for QA among the team and with other teams who may not understand their work thoroughly.

Deeplake allowed Matterport store and visualize multimodal datasets in one place, setting the team up for fast ML cyclesDeeplake allowed Matterport store and visualize multimodal datasets in one place

Results

Deeplake significantly reduced the time and effort required to get from raw data7to training. Implementing Deeplake also led to substantial improvements in Matterport's operations, enabling the team to focus more on core tasks like iterating on model architecture and less on time-consuming data wrangling. It has freed up resources, and made managing complex, multimodal data easier.

It just abstracted so much of this work away so we could actually focus on the hard problems.
Alan Dolhasz

Alan Dolhasz

Manager, Machine Learning Development at Matterport
Increased Productivity
By standardizing the data handling process, Deeplake allowed Matteport to allocate more of their time to business logic rather than infrastructure.
-80% Less Time Spent
On Training Data Preparation
From Hours to Seconds
Time to Train On a New Dataset
"Deeplake made working on more complex data no more complicated from a data management point of view. Whether I'm working on 10 million images with 10 different modalities or a thousand images with just one modality, it's all the same from the perspective of the user of the system."

Alan Dolhasz

Manager, Machine Learning Development at Matterport
Alan Dolhasz

Future Plans

Combining generative AI and property insights, Matterport’s digital twin platform aims to reshape the real estate landscape, optimizing interior design, space utilization, energy efficiency, safety, and accessibility while transforming property marketing strategies.

The company is particularly focused on leveraging multimodal data to modify spaces based on user requests. As they dive deeper into this complex data, Deep Lake's ability to efficiently manage multimodal data will be instrumental in helping Matterport achieve its future objectives.

Matterport: Pioneers in 3D Digital Twin Technology

Disclaimer

1-7. Matterport is dedicated to using only authorized data to enhance and refine their services, with a strong commitment to respecting the privacy preferences of their diverse customer base. For further details, see Matterport's Terms of Use. https://matterport.com/terms-of-use