Sometimes big ideas that have been circling in the atmosphere for a long time finally get codified and set in stone. These events are often wind up memorialized, as even though they are not always the origin of the idea they are an acceptable stand-in. The official creation of things like agreements, treaties, laws, declarations, commandments and commitments are often done under the pomp of circumstance, in dramatic locations fit for the occasion’s grandeur.
Sometimes, though, these events happen under the veil of the mundane. They happen at small, invite-only meetings in roadside conference centers in forgettable towns like Sebastopol, California (before you think I am being too harsh on the Northern Californian recovering hippy enclave, I should say that I grew up just minutes from there and I know well that the Sebastopicians want nothing more than to be left alone, to live their tranquil, quirky and eco-friendly lives undisturbed under the shade of the redwoods). This is the case for the event that is credited with solidifying the tech world’s thinking on open data. On December 7th of 2007 a group of 30 tech entrepreneurs, academic researchers and policy wonks met in Sebastopol to write a formal definition for the open data movement.
The codes that they decided on weren’t all that groundbreaking:
Open Government Data Principles
Government data shall be considered open if it is made public in a way that complies with the principles below:
1. Complete: All public data is made available. Public data is data that is not subject to valid privacy, security or privilege limitations.
2. Primary: Data is as collected at the source, with the highest possible level of granularity, not in aggregate or modified forms.
3. Timely: Data is made available as quickly as necessary to preserve the value of the data.
4. Accessible: Data is available to the widest range of users for the widest range of purposes.
5. Machine processable: Data is reasonably structured to allow automated processing.
6. Non-discriminatory: Data is available to anyone, with no requirement of registration.
7. Non-proprietary: Data is available in a format over which no entity has exclusive control.
8. License-free: Data is not subject to any copyright, patent, trademark or trade secret regulation. Reasonable privacy, security and privilege restrictions may be allowed.
But, the fact that they created the dialogue around what would soon become one of the main talking points for proponents of internet freedom was significant in its own right. As more and more of the general population gets used to the benefits that open access to data provides and come to understand the dangers of having publicly useful data being hoarded by powerful internet giants, the push for open data has become a popular topic of conversation.
The property industry has had its own push towards open data. Redfin and Zillow are able to syndicate what were previously private residential listings after years of effort and an eventual DOJ ruling against local Realtor organizations and their Multiple Listing Services. Spencer Rascoff, Zillow’s co-founder and CEO, went on the record to say that “Anyone whose business model is predicated on the assumption that their secret data will remain secret and proprietary, that’s not a sustainable business model. This data will inevitably be free.”
But commercial real estate is not residential. Even though sales records are public, just like their residential counterparts, commercial properties don’t change hands often and so rely on lease revenue as the main way to estimate their market value. These lease terms have always been private, a confidential agreement between the landlord and the tenant. This was an asset for large real estate firms or those with heavy geographical or sector-specific focus, giving them the ability to understand the market better than their competitors and conduct the information arbitrage that the property industry has always relied on.
Now we see a number of tech companies that are able to vacuum up data at an industrial scale. These companies are able to then sell this data to the varied interests, inside and outside of the commercial property space, that use it to make important and costly decisions. They can also use the data to create useful tools for the real estate industry that can help them with their daily jobs of managing, investing or marketing properties. The wider tech industry, led by Google, has taken the approach of giving the data away and monetizing the tools. Residential PropTech like Zillow and Redfin are doing the same. But in commercial it is a bit different.
There are some companies pursuing an open data model for commercial real estate. One example is the growing number of online marketplaces that are displaying listings for free. But this data is created by the industry itself, so these companies, along with others like Yardi, MRI, Argus, VTS and Compstak, are beholden to the users that produce it. These companies, with the exception of Compstak, can only publish their data in an anonymized form, if at all. At the last RealComm event there was a panel on the topic where Richard Sarkis, CEO and co-founder of Reonomy, said, “there is still some scar tissue in the industry over whether or not companies are losing their data. We have to convince our clients that they will keep custody and ownership of their data. What does it mean to own the data.” On the same panel Michael Mandel, founder of Compstak, explained, “we have to help the industry understand how we are using data so they can believe us.”
But the open data principles apply to government data, not proprietary data. It is irrational to think that companies whose entire business models revolve around collecting and monetizing unique data sets would give away their digital assets for free. So will we see open, complete, primary, timely, accessible, machine processable, non-discriminatory, non-proprietary, license-free data in commercial real estate any time soon? Many people hope so including Candyce Edelen of PropelGrowth, who wrote what is still one of our most popular essays to date called Transparency is Essential to Efficient Markets. While there are a host of benefits to open, transparent data, much like the financial services industry has with their FIX trading protocol, the reality is that we might never see this level of data accessibility in commercial real estate.
Anyone that works in technology will tell you that computer code is a liability as much as it is an asset. Code fails, get corrupted and/or doesn’t get updated. It requires developers to review and fix it from time to time, if not constantly. Data, especially data coming from a lot of different government sources with no incentive to standardize tier outputs, is the same way. I remember a conversation I had on one of our podcasts with Reonomy’s Richard Sarkis and CEO and co-founder Josh Fraser of Estated, two companies aggregating public real estate data, about the sorry condition of the data that they collect and how much its costs them to get it into a usable form.
Expecting companies to process this data for free is naive. But what if the data can become standardized? There are organizations working on this, RESO and OSCRE most notably, but many have predicted that even these efforts might fall short. Bob Courteau of Altus Group sparked a conversation on Twitter for his doubts about the complete standardization of the industry. Interestingly, those who criticized often brought up the need for standardized data as a pathway to open data. In my opinion, Bob’s response reflects the reality of the industry landscape where even a term as basic and fundamental to the industry as “gross lease” has a different definition depending on what part of the country you ask.
Bob Courteau of the Altus Group was criticized a bit on Twitter for his doubts about the complete standardization of the industry. Interestingly, those who criticized often brought up the need for standardized data as a pathway to open data. In my opinion, Bob was just reacting to the landscape his company inhabits. It is not his fault that even a term as basic and fundamental to the industry as “gross lease” has a different definition depending on what part of the country you ask.
I emailed Jeff Adler, vice president of Yardi’s Matrix data platform about his company’s path forward when it comes to data. He thinks that real estate data will always be sold with a subscription or transaction fee and told me, “I think any grand data strategy that tries to encompass the entire industry is doomed for failure. Some data sets will become “mobile” but there remains extremely difficult issues in data governance in how terms are defined across sectors and across regions (i.e, square footage, NOI, etc).”
So as much as the idea of open data appeals to our democratic sensibilities, as much as it could help unlock value and make the industry more efficient as a whole, we might never see a truly open data ecosystem in commercial real estate. As technology gets better, we might not have to. The data coming in from government sources might always be unorganized and non-congruent. But the work of processing it will only get easier as more money gets thrown at it. Even if there will always be a need for companies to do the hard work of parsing property records and there will always be companies with no incentive to share their valuable, proprietary data, they will eventually have to compete with other companies doing the same thing. Market forces will likely push these companies to compete on what they build with the data, not the data itself, even if we never see the dream that was solidified on that fateful day in Sebastopol come true in our industry.