Data is the fuel that drives the modern economy. This means that businesses across sectors demand ever more data to train the neural networks that power innovation and grant them a competitive edge. But some industries and organizations lack the requisite volume of data needed to catalyze deep neural networks. Commercial real estate investors is one such group liable to find the answer to the burning question: how do we bridge the gap between small datasets and big data insights?
Luckily, the answer lies in existing AI-based solutions and something called “synthetic data.” Synthetic data is computer-generated data that, while wholly unique, replicates key features of the original data that analysts hope to model. Using sophisticated algorithms, synthesized data can be tailored to the specific needs of the industry, project, or application. Data synthesis also automatically adds accurate labels, helping circumvent the often costly and length process of labelling data by hand. But most importantly (much like its name would suggest) synthesized data can be generated with far lower risk, in much larger quantities, and essentially from scratch.
How Synthetic Data Unlocks AI for CRE
Despite the immense potential of AI and neural networks, these technologies face practical limitations. The most common hurdle deep learning algorithms must clear is obtaining sufficiently large data sets. Though small datasets would normally preclude the possibility of leveraging deep learning, synthesized data holds exciting promise as a suitable workaround.
The current commercial real estate market simply lacks the requisite volume of data needed to properly fuel deep neural networks. This becomes apparent when comparing available data for single-family versus multifamily homes. According to the National Association of Realtors, 5.34 million single-family homes were sold in 2018, and the US Census Bureau reports an additional 667,000 new homes were sold that same year.
Now compare these numbers to multifamily assets, where the total number of transactions over the past twenty years amount to fewer than 200,000. Single-family home sales generate enough data to leverage neural networks to generate insights, but in the case of multifamily homes (and commercial real estate more broadly), synthetic data could provide the necessary data volume needed to bridge the gap.
Synthetic Cities
Commercial real estate decision makers can no longer depend on historical, pre-existing datasets when buying or selling properties in this hypercompetitive sector. Bigger and newer datasets, such as those compiled from the web, topological and satellite images, and AI-generated data are needed for enhanced accuracy, quality, and reliability.
Enter synthetic cities.

A synthetic city simulation would operate as an artificial city model to amplify currently existing data sources. At Skyline AI, we have begun the planning stages of our own synthetic simulation. To build it, data will be produced based on the inputs of two AI machines. One, the Generator, will attempt to generate false or unrealistic entries within a dataset. The other machine, the Detective, is a data classifier assigned with detecting these forged samples. This interplay will enable the emergence of a synthetic city featuring a large simulated dataset for better insights on rent prices, home-value trends, market anomalies, and risk-reward ratios.
Using our synthetic city, we will run what-if scenarios on the entire market, experiment with various results, supplement missing information in existing datasets, and provide a richer environment for training new models—among many other applications. Overall, we expect the yielded data to provide an enhanced understanding of our market strategies and value predictions.
Looking Forward
With rapid advances in AI/ML algorithms and techniques, artificially synthesized data can fuel better testing of current models and provide additional authentication when responding to fluctuating market dynamics.
For real estate investors, urban planners, and the enterprises of tomorrow in general, the potential future applications of synthetic data are worth noting. Simulations can support deal sourcing, underwriting, research and more. Through this new access to diversified data on urban environments and asset behavior, pioneering companies can enhance their real-world market understanding and their critical decision making.