As digital technology seeps further into our workdays, the level of the average person’s understanding of computer science will have to increase. Just look how words like “operating system,” “bandwidth,” and “algorithm” have become ingrained in our daily vernacular in just a few short decades. But there is a difference between understanding a term and being able to apply the technology effectively.
One of those terms that I have heard thrown around a lot recently is API. The word is an acronym for application program interface, basically a set of rules that help two different computers interact. They can act as a translator, if you will, between two systems that don’t otherwise know how to communicate. These amazing protocols can push and pull a massive amount of data and have enabled the modern tech stack.
For as much as I hear the term, API, I hear very few people talking about the details. If there is one thing I know about computers (maybe one of the only things I know) it’s that no detail is minor. One incorrect instruction can send a program into a tailspin. One errant keystroke can render a database useless.
The modern real estate industry runs on data APIs. It will take more than just passing knowledge of the subject to make use of them. Being able to understand how they work is critical to determining their usefulness and business implications. I admittedly knew very little about the nuts and bolts of how APIs make the digital connections that they do. So when I came across The Essential Guide to a Property Data API I was excited to learn more.
The guide is great because you don’t need to have coding experience to understand it but it still has very detailed information about APIs that even CIOs and IT departments can use. Even the best explanation is often not enough for a slow learner like myself so I reached out to the company that put the guide together, Estated. Their CEO Josh Frasier was kind enough to answer some questions about all of the things I was still confused about (the API related ones at least).
One of the first things I learned from the guide was the difference between synchronous and asynchronous data. Synchronous data is a straight feed. It creates a pipeline of information that can be called at any time. I asked Josh what types of companies would want synchronous feeds. He told me, “we’ve seen companies use synchronous data to build quote generators in insurance and clients in real estate populating their online analytics platforms.”
If you already have a big data set and need to add to it, you need an asynchronous data, particularly a technique called Match and Append. As Josh puts it, “Match and Append would be an option to add data fields, like ownership or tax information, to a subset of addresses that is already known. A real estate investor, for example, might request ownership information for all residential structures within a specific zip code, receive a file that is standardized, then reach out to the owners of target high investment potential properties.”
Another important lesson I got from the guide was the reminder that data may have licensing limitations. Companies make a lot of money selling the data so they want to make sure that someone can’t just plug into their API and resell it without paying them their royalties. To limit this, many data sets have limitations written into them. This can be a problem if companies try to sell their analytics tool as part of their offering. “A real estate investor analyzing data to generate her own market insights wouldn’t be a problem,” Josh said. “But if that investor owned a business and sold access to her online investment analytics website, this could require paying a royalty fee to the data provider.”
Understanding APIs goes hand in hand with understanding data science. Since APIs can connect you with a constantly changing database of information it is important to understand how to define what state the database is in. The terms depth, breadth and freshness are used to explain certain states of the data. Depth has to do with how many fields are collected about each “participant,” for our purposes, properties. Breadth has to do with how many participants are in the set, or what percentage they are of the total population is provided. Freshness, as you probably guessed, has nothing to do with odor but rather the age of the data.
Any one of these criteria varies depending on who generates the data for each query. For property data, one of those sources is the local County Tax Assessor. Josh explained: “Access to data is influenced by how the public records are stored and maintained within the geography that the dataset is being pulled from. If the tax assessor’s office in Salisbury, Connecticut lacks the funding to provide a strong infrastructure for collecting and sharing the data, then the breadth and depth of its dataset will be less complete for vendors across the board, who will search for other sources, where they exist, to fill in the holes.”
To determine the breadth and depth of a data set Josh suggests testing it. He thinks that you can usually assume that most property data sets have full coverage but, “don’t have unrealistic expectations for success 100% is not going to be attainable no matter who you are testing”
When it comes to freshness, it can vary depending on the vendor. “Some vendors publish the dates the data was last updated at the source as well as when it was last updated in their system,” Josh said. “Not all vendors have this level of transparency and it is really the only way to know the quality of the data you are getting.”
The real estate firms that will succeed are the ones that do the best job of staying ahead of the rest of the pack. Currently, most of us in the property industry don’t have an advanced understanding of many of the intricacies of computer science. But this is going to change. Knowing how to evaluate and hook up an API separates the sophisticated offices from the others. Eventually, though, the term API will become just as commonplace as “the cloud” in real estate conversations and we will have to go searching for another computing idea that isn’t yet widely understood.