How to Build a ChatGPT Pandas CSV Streamlit App in Python

In this video, I'm going to show you how to upload any CSV file for open AI to answer questions on we will walk through an example of uploading US housing market data with Chat GPT in Python.

Ariel Herrera 0:00

chat GPT has transformed the way we interact with technology. I've used chat GPS interface and its API to help with everyday activities like writing code for real estate metrics, creating blog content, and even answering questions about my own data. This has saved me lots of time and made me more efficient as a can for you, too. In this video, I'm going to show you how to upload any CSV file for open AI to answer questions on we will walk through an example of uploading US housing market data with chat GPT in Python. My name is Ariel Herrera, your fellow data scientists with the analytics erielle channel, where we bridge the gap between real estate and technology. Please like this video and subscribe to help us reach a wider audience. All right, let's get started. Chat GPT has been transformational in the way we speak to technology to ask questions. For example, I'm on Chachi PTS home site right now where we can ask any chat. If I was thinking of creating a real estate blog about cash flow, I could ask chat GPT how to calculate cash flow for a real estate deal. Chad GPT is trained on a massive amount of data. And right away we are met with an answer that is easy to digest, we get an answer on how to calculate cash flow attributes that are important like gross rental income vacancy, other income expenses, etc. From there, I could even take it a step further. I could use this for a blog perhaps. And I can ask more pointed questions related to code, I can ask chat GPT how to calculate cash flow for a property using Python. Chat GPT is smart enough to recognize that I'm looking for code in this instance, it is now writing code that will allow us to calculate cash flow, and we could possibly merge with some other data. It provides us with a real time example of how we could utilize this code. We can copy it and bring it over to a notebook such as Google collab. Now, this may be nothing new to you. If you've been already exploring chat, GBT. If it is I highly encourage you to ask more questions as it's free right now to use. But for this instance, let's imagine that we have our own data set. We have information on economic data for real estate, and housing. And we want to understand more about our data without having to create charts, or dive in deep into Excel. How can we get chat GPT to understand our own individual data source, I'm going to show you how I was able to do this with Python. And stick to the end because I will show you how to actually run the same exact code that I did in Google collab without having to have Python locally installed at all. Right now we have two windows open on the left hand side is VS code. This is the IDE that I'm using to write Python code. On the right hand side I have a streamlet app. This is running locally on my machine. I have a full 45 minute tutorial video on how to get started in stream lit from ground zero all the way to hero so I highly suggest to check that out. And this use case we're going to work with a dataset. That's a CSV file. This is a data set that I've created and made publicly available. It's called US housing market data. It takes information from the US Census, Fred as well as Redfin to consolidate information about macro economic stats for US housing data. Here I have a sample on the state level of the data. We have information coming from Fred such as total population, median household income, days on market, ownership rate vacancy rate, and we also have information from Redfin. I have selected several states here, most of them smile states. And I also have information like median sale, price, price per square foot, etc. My pain point here is that I have all this data, which is amazing, but I need to be able to ask questions about the data without it taking so long to find the answers. For me, I personally use pandas in order to answer questions about my data. But what if I didn't even really know code? This is where combining streamlet and chat GPT is an amazing resource to be able to get chat GPT to train on your dataset. First, I'm going to show you how the app works and then we'll work backwards into the code behind it. So in this case, I'm going to browse for files on my streamline app which is being hosted locally. Here I have my file that I just showed you, which has information on macro econd Amick stats at the state level from 2018 to the end of 2022. I'm going to open this file, and I am met with the first few rows of my file. It mirrors exactly what we saw when I opened it up in Excel here. Let's start by asking Chachi PT a fairly simple question. We have information by state, we could see here the first row being Alabama. And since I have it sorted by date, we have the first date, which is 2018. Let's say we want to ask chat GPT. What is the total population for Alabama in January 2018. We can ask here, what is the total population and Alabama in January of 2018, click enter. And now chat GPT is reading in our database. And it's going to write a query to answer our question. We could see here the answer is correct. 4.89 mil which we see here in this first row. If we look at our terminal, we could see that when we ask the question, we have entering new SQL database chain, which I'll go into, we ask the question, what is the total population in Alabama and January 2018. After interpreting our data, and using Lang chain package, chat, GPT is able to write up the SQL query on its own. It gets the total population that exact column for Fred from our state's database, which is what we named our database, where Redfin state equals Alabama, and date equals 2018. It then got the result, and it answered back to us the result in a nice sentence. Awesome, we were able to answer a question based on a dataset that we just uploaded in seconds. Let's now ask a question. It's a bit more challenging. We also have unemployment rate in the data. So I'm going to ask a question about unemployment rate for my home state, which is Florida. I'm going to ask what is the trend for unemployment rate for Florida, in 2022, so asking Traci to look at a segment within my data set, which is just Florida and Jash 2022, and see if unemployment rate is going up or down, which in this case, it says it is trending downwards.

Let's look at the terminal to review the SQL query. Here we could see the SQL queries, again, getting data from the database, which we just read in our CSV file. It's filtering on Redfin. Dot state equals Florida, which is correct. And Fred that date, like 2022, which is the year we asked for, it then grabs Fred date, and Fred unemployment rate, and it gets the sequel results. For the first five rows, I have this set this way, a limit of five. But this could be expanded, say to a limit of 12. If I want to look at the full year, here, we see that in January 2022 unemployment rate was 3.5. Then if we look down at May of 2022, it's 2.9. So yes, it has trended downwards, and Chachi PT was correct in the statement. Now let's dive into the Python query of how we were able to create this small, lightweight but powerful app. So on the left hand side, I'm in VS code I use as an IDE. You can also utilize Google collab to if you are brand new to Python, I highly suggest for you to watch my series of introduction to Python videos for free on YouTube. I also have an entire course of how to review market data, web scraping, Tableau, and more within real estate data and using Python. So I'm assuming here that you already either have Python skills or have gone through the course. Now going back to the code on the left hand side, on the first part, I'm importing the libraries that I'm going to be using. And this includes Liang Chang, and open AI. So open AI is the creator of chat GPT. In order for you to use chat TPTs API, you have to be able to create an account and setup billing with them. If you follow the link below takes about two minutes. And it walks you through how to sign up for free for the platform open API, and then how to create a new secret key. Once you have that key, you can then bring it into your Python code. The next step in my code that I have two functions, which I'll detail in a moment, but first I'm setting up my locals and constants. I have a table called state's database. This table could be named anything. I just have this As a variable right now, then you are i. So for the CSV that I'm uploading, I just want to store it temporarily so that it can be accessed within the app. Next, I'm reading in my open AI API key. When you use the Google Cloud version, you'll be able to just pop in this key as a text input. Next, I'm creating my homepage, which is my title you see here in the right chat GPT, pandas, CSV streamlet. I then have a quick subtitle or sub header, upload a file to query, then I use stream Lutz file uploader. To allow you to choose a file which should be a CSV file. It then gets read in by pandas, I write back the file, which you see here. Next, is a text input also from streamlet, to ask the user about a question. So here, a user could use a text input and enter their question which gets stored as a variable called user underscore Q. Afterwards, I commit the data to SQL. So I first prepare the Excel file. Preparing the Excel file includes replacing the columns to make sure that they're one lowercase and to those underscores that we see are removed. Then we connect this to a sequel lite database. We then push the data to SQL by stating our table name, which was dates dB, our connection. And if the data is already there, replace it. We also don't want to save the index, which is this column here. So we set that to false. Once the data is now in our SQL lite database, we create a database engine. Here we have our URL, static pool to make a single connection for requests, and then pass in our connection. We then set our database to SQL database and input the engine. This is required for link chain. Afterwards, we then create open API connection and database chain. So this is where we're actually utilizing chat GP TS API, we pass in the API here, then I set the temperature to zero, you can set it from I believe, zero to 100, the highest being more creative, and how open AI will answer your question. I just want facts about my data. So I'm setting this to zero. I also have Max token set as well. I then create the SQL database chain and set this to dB chain. Now the last step is if a user asks a question, I want to run their question through the database chain, which has already been trained on my particular dataset. And then I write the result within the terminal, I can actually track to see what the SQL query was, as well as a result. If this walkthrough was too quick, then again, I highly suggest for you to watch the Python introduction videos that I have for free, as well. Let me know below if you'd like to have a course built out around this topic. What if you do not have Python installed locally, or maybe you've never created a streaming app before? That is fine. I have migrated the streaming app. And so Google collab, which is a free notebook environment, you can make a direct copy of the notebook that I have, and save it to your drive. Once you do, you can run all the cells. When you run all the cells, it's going to install the packages required to run this app, which is streamlet. Lang chain and open AI. I've then taken the file that I had in VS code and pushed it into this one single cell where I label it as app.py. Then below, we're running the stream labs app and the background. And we're exposing it to Port 8501, which is local, so only you will have access to this particular app. Once it runs, you'll see the URL pop up at the bottom, this is going to be unique. Click it and click Continue. You will now see the streamline app, if you'd like to use the same file that I used, you can either sign up for my API, or you can go directly to my GitHub, where I have an example of that same file that I used, then you'll come back to the app, browse files and select your file. And there's going to be a slight difference from the local streaming app that I was running. If we look below, there's now a new text input called API key. This allows you to enter in your API key that you've signed up for. Once you enter your API key, click enter. And then you could ask your question. So in this case, I'm going to ask something very At least simple, which is, what is the median household income for Florida in June of 2022. Just like in the previous app chat GPT has now answered our question and we could see it the median household income for Florida, and June 2022 was almost 60,000. Great. So what we've been able to do is utilize Python web chat GPT open AI to not only answer questions off the web, but questions about our unique dataset. We've also walked through how to create the same exact app that I did in Google collab so that you can run it immediately with your own API key. If you'd like to see more of this type of content of me integrating real estate and chat GPT please let me know in the comments below, and if you want a deep dive, I'm open to possibly creating a course this summer on it. If you haven't already, please subscribe and like this video. Thanks.

Transcribed by https://otter.ai

Previous
Previous

How to Get Started in Joint Ventures Wholesaling for Real Estate Beginners | Drew Moran

Next
Next

Data Science Strategies for Short-Term Rental Revenue Management with Chris Garcia