How to Build a ChatGPT Real Estate Streamlit App in Python

In this video, I'm going to show you how open AI can answer questions on Redfin's Housing market data set, we will walk through an example of integrating the data with chat GPT in Python

Ariel Herrera 0:00

Chat GPT has been a pivotal moment in how humans use technology. I've used Chat GPT's interface, and its API to help with everyday activities like writing Python code, asking about real estate trends, and even answering questions about my own data. This has saved me lots of time it made me more efficient as it can for you to. In this video, I'm going to show you how open AI can answer questions on Redfin's Housley market data set, we will walk through an example of integrating the data with chat GPT in Python, my name is Ariel Herrera Goodfellow, data scientists and real estate investor with the analytics erielle channel where we bridge the gap between real estate and technology. Please like this video and subscribe to help us reach a wider audience. All right, let's get started. In this video, we're going to walk through how to use chat GPT for our own data set. Right now I'm on Chat GPTs UI, which you could use without any code. Let's say we had a question about days on market. Perhaps we could ask chat GPT what is days on market which is a real estate metric that was captured by both agents as well as investors. We can see days on market or first a number of days of properties such as a house or condo has been listed on the market before it's sold or taken off. We may want to know what is the good days on market and what is the bad days on market. We could ask how do we know if days on market is good or bad? Chat GPT is trained on a massive amount of data. And it can quickly interpret our question. It says in a seller's market when there are more buyers and available properties. A shorter days on market is generally considered good. Whereas the opposite for a buyer. But what if we wanted to ask more detailed questions and we say, hey, chat GPT. How has days on market change for Florida? From May 2022 to let's say December 2022. We want to get a little bit more detailed information on real time data. However, Chat GPT is not able to do so here. It just gives us general information about days on market, as we already know. But what if we want to actually answer this question with data that we already know that we have? We can do so by training chat GPT open AI on our particular data set. The data set we're going to use is from Redfin. Redfin data center publishes for free downloadable housing market data. This includes data all the way up from national Metro state, city, zip code and neighborhood level. I have a whole tutorial on how to actually pull this information from Redfin using zip code, as well. Within my course real estate data analytics, we go over how to do this, not just for Redfin, but also for Zillow, apartment list, and more, we also go into creating dashboards as well. So with this tutorial, I expect that you already know Python, or you've at least taken the free introduction to Python course that I have on YouTube, as we're going to be moving through this tutorial as if you're intermediate already in Python. So what does our dataset look like? Well, if we were to download this data from Redfin, and I've already done so with Python, with this tutorial videos that I've already put out, we have a dataset that has information on each month for particular regions. In this case, I chose states, so I have a select amount of states. We have information on property type, median sale, price, median list price, and these are also year over year, month over month, so we could see trends, price per square foot home sold more inventory information. And we actually have data like average sale to list price as well, to see if it's more of a buyer's or seller's market. This is the information that's going to allow us to answer questions like median days on market for a state, which chat GPT was not able to do originally, with their already trained data, we want to train track GPT. With this data that we have, I'm going to first walk you through how this app works. And then we're going to work our way backwards, and how we actually implement it with Python. So first, I have our data set loaded within Streamlight. It's a super basic extremely app, as you can see, as the main functionality that I want to get across is using open AI over this being visually appealing. So we see that same exact data frame that I previously had now loaded. I have this data frame within public data on GitHub if you'd like to use the same resource as well. In addition for the dataset, I'm filtering specifically on UNSMIL states smile states are also known as sunbelt states, there are 15 states stretching across from Florida to California, they got the name because of the warm climates each state has near the southern coast. They've also have some of the largest migration trends in the history of the United States, of people coming from northern states to the southern states for a variety of reasons, including lower costs, like lower or non income taxes, political reasons, and generally just great weather. I'm one of those I moved from New Jersey to Florida several years ago, and I do not regret it. So for this app, I filtered on those smile states. And now I want to train chat GPT first reading my CSV file, and then be able to answer questions on it. So I'm going to start with just how the app works. And then we'll work our way backwards with how to implement it with Python. So I've already loaded it in my open API API key. And here, I can ask a question. Let's ask Chad GPT a question about our data set. We could ask something simple like which date has the lowest median days on market, knowing the lowest median days on market could possibly tell us which date is the hottest, meaning that it's mostly a seller's market? Here we could see Mississippi has the lowest median days on market. If we look at our terminal, we could see that once we ask the question, a SQL query was created. The SQL query got data from state's database, which is the name of our database, and it looked for state with a minimum median days on market, that same column that we see here listed to the right, if we sort this from within our streaming app, we could see that the state with the lowest days on market is Mississippi, which is correct. Here, the SQL result got Mississippi and then provided us an answer that was easy to interpret as a sentence. Let's try something a little bit more complicated. Let's ask which state has had the most growth and median sale price year over year in March of 2023. So now we're asking a more targeted question about a particular column that we have. And we want to see if open AI can interpret that we're looking to have a four trend of growth. It says New Mexico has had the most growth and median sale price year over year. And if we look at the SQL query, we could see now a longer query was written. So open AI gets data from state's database where period begin is 2023 of March, and the end is the same period, the columns gathered our state and median sale price year over year. This is an ordered by median sale price year over year descending so that the highest value comes first, we could see New Mexico had a year over year. So March of 2022, to march 2023 increase of about 11% for median sale price fall by Arkansas, Oklahoma, South Carolina and North Carolina. What I love about this is that when we receive our results, it again is easy to interpret as a single sentence. And we get the right answer, which is New Mexico. Let's ask one more question. Let's see if opening I can interpret a question that takes in multiple rows of data. So let's say which state has seen

the most price drops over the last six months, maybe this can indicate to us that this particular state is cooling off because prices are coming down. Or maybe there was a lot of wish pricing that wasn't realistic. This is currently running. And now we see Arizona has had the most price drops over the last six months. Going back to our terminal as we read our SQL query, we selected state, some the column price drops. And we label this as total price drops from our database where the period is over the last six months. So greater than 2020 20901 ordered by state where total price drops ascending. And we could see the summary of this data where Arizona is in fact, the state that had the largest price drops when summing it up as well followed by Texas, Nevada, Florida and Georgia. Of course, if we were using this to really analyze our market, we want to go even deeper to possibly a zip code or neighborhood granularity and try to understand is it really a coolie market that there's been a lot of price drops, or could perhaps be a lot of wish listing. Now let's shift into a walkthrough of how this was developed in Python. So I have here on the left hand side VS code, which is my IDE, and I will walk you through the Google Hello version as well, so that you could run this locally. We have imported SQL lite for our database SQL alchemy to create your engine and Lang chain as well to use open AI. We have functions to read in our dataset as well as to prepare it before storing it into the database. We then have locals and constants, which include naming our table so you can name your table anything I call it state's database, your URI to store this data cache, so only as we're using it with the streamlet app, read and are open API API key, which use the link below to do so. And then here, I want to truncate my data a bit. So I only looked at the smile states and I have them listed here. The next step is getting into actually building the streaming app, which is our web application. If you haven't already watched, I have a whole series of being able to create streaming apps from Ground Zero to Hero, we have here simply our title and our sub header. Next, once that set, we want to read in our data. So I have this data stored in GitHub that you could use as well. The data is loaded in as a panda's data frame, we then have an expander, that shows us all of our states, so we can collapse it. And we can maximize it the purpose of this multi select filter that we have, so that we could select different states and autofilter our data frame. So I have the set that you could select any state within our region of particularly the ones I want to auto select are those that are the default region, which is my smile states, then based on what the user selects, our data frame is then filtered down just to those states. So instead of all 50 states, it's just right now the smile states, I have it sorted by period begin, which is why you see ascending is false, meaning that the most recent period comes first. And you see that here. Next, I display the first five rows of the data. So as you see here, you can actually scroll up and down because only the first five rows are present. Then we have a text input. This is where the user can ask their particular question to chat GPT. Now, of course, the question needs to actually be asked towards data that's in some sort of database. So our next step is to commit this data to SQL. So you prepare our data frame by removing underscores in our columns. We then connect to SQL lite database. And then we push this data to our table or table that we call state's dB. If it's already there, then replace the table as well. Do not add the index. Next, we create the database engine. And as you can see here, we're starting to now use Lang chain to create a SQL database, we then create an open API connection with DB chain database chain, and we have a temperature set to zero. So the creativity scale is pretty much non existent. So this is straightforward data and we want straightforward answers. We also pass in our open API key. And lastly, this is where we actually handle the user's question. So if a user asked a question, this would be set to true, then we want to run the database chain on that user's question. And then we want to write back the result to that user, which is where you saw the sentence populate below this text input. The amazing part here is that we created this simple web app all within 100 lines of code, so doesn't take too much to write this together. If this is a little bit complicated, or too deep, no worries, check out my introduction to Python free set of videos on YouTube as well, my real estate Data Analytics course where we go into a lot more detail on Python, being able to do web scraping, Tableau dashboards and leverage different data sources. If you'd like to see a course particularly about this topic of using chat GPT amongst real estate data, please let me know in the comments below, I will consider it to be a course that could be released this summer. Great. So we just did the walkthrough of how to use the app, how it was created with Python. And now if you want to run this on your own, you can without having Python installed at all. Right now I'm on Google collab, which is free provided by Google and allows you to code in a programming language in this particular case, Python. If you use the link below, you can open this and go to file and save a copy and drive. This will give you your own unique notebook that you could work on. It's private just to you. Now next step would be runtime run all which will run all of the cells First installs the packages that we need. And then I've taken what's in VS code. So that previous code that you saw, and placed it in here, where we write to a file called app.py. This app.py file, then gets run, and you have your own unique port that's local to be able to view your app. So if I select this here, I'm going to be met with this prompt, click Continue. And now we see the same exact app. We have the same filters. It's reading the data in from GitHub. But we have a new field, which is API key. And this is where you will enter your own open API key, you can then ask questions based on the dataset, I highly encourage you not only to work with the data set that I provided for free, but also check out some other data sources that I have relating to my US housing market data API, as well. Aside from macro economics, you could go deeper into actually properties. So I have a whole series on being able to extract data from the MLS, from particularly a Zillow scraper. So you can ask questions about which property has the best cash flow in the area, which property is best priced, which property has potential growth, there's so many ways you can take this, which is why I'm so so excited about being able to publish this allow you to build upon it, and I would love to hear in the comments below on some of the projects that you're working on related to chat GPT. In addition, if you'd like to share on the analytics erielle channel, please message me directly at Ariel Herrera at analytics erielle.com Thanks so much.

Previous
Previous

Get Property and Rent Estimates for Any Spreadsheet

Next
Next

How to Get Started in Joint Ventures Wholesaling for Real Estate Beginners | Drew Moran