Houser
January 2024
Motivation
It was the winter of 2024 when I received my offer from Robinhood and connected with some of the other interns in my cohort. With a housing budget of around $2,000 per month, as outlined in my offer, I began the all-too-familiar search for a place to stay. Trust me, it’s a tedious and time-consuming task. Between scrolling through AirBNB, Zillow, Apartments.com, and Furnished Finder, trying to find something within budget, individual bedrooms, and close enough to work or public transit felt like navigating a maze.
There’s nothing worse than finding what seems like the perfect place—everything checks out, but then realizing the commute is nearly two hours. Or actually, finding a spot that meets all your criteria—within budget, enough bedrooms, close to work—only to discover you’re a few days late and the landlord is already in the process of renting it out to someone else. The housing market in California is a mess, and this was almost a daily occurence. After getting fed up, I decided that I’d use computers to optimize the process.
Overview
In order to solve this problem, I broke down the issue into the following components:
- Gather Housing Data
- Rent Price (Amortized Monthly Granularity)
- Number of Bedrooms
- Location
- Store These Options
- Prevent Duplicates/Banned Data
- Calculate Commute
- Identify Most Optimal Solution
Gathering Housing Data
I began by first trying to use the requests
package from python in order to parse the websites but this quickly ran into a large issue.
Most websites utilize tools like cloudflare that block this sort of traffic by subjecting requests with javascript challenges that can only be solved by a browser.
This immedietely meant that I would get blocked from parsing fairly soon. To rectify this I used a headless browser called selenium
which was capable of bypassing these challenges.
One of the challenges I faced was that many locations like AirBnB or FurnishedFinder didn’t list the exact location but rather this radius of where a place is located.
Luckily, both of these places used a google map with markers placed on the map which I was able to parse out the longitude and latitude for. Other locations simply had their
address which I was able to convert into longitude and latitude through the tomtom
API.
Scraping for rent and bedrooms was easier with it typically being listed explicitly on the website.
Using systemctl
I was able to queue these scrapers to run at a hourly granularity which meant I was continuously gathering data.
Storing the Data
I kept this data in a simple SQLite
database as it was only being used locally. The schema could be boiled down to the following:
- Banned (explained later)
- Link
- Longitude
- Latitude
- Time to Office
- Time from Office
- Mode for Transport To
- Mode for Transport From
- Bedrooms
- Monthly Rent
Preventing Duplicates
Ocassionally not many new postings would be put up within the hour and so the parsers would sometimes revisit old entries.
By enforcing the UNIQUE
keyword and also ensuring that the parsers checked the database before inserting prevented this.
Additionally, sometimes these entries would be falsely represented and would make the parser pick it as the most optimal choice even when it wasn’t. Adding the banned keyword meant I could ensure that these were ignored when running the optimization model.
Calculating Commute
While I hinted at this a little in the House Data section, the application used the tomtom
API. For each house the API would
identify the time to commute using public transit or walking. We would then store the strategy that took the least time and the time in seconds.
This algorithm would also calculate the opposite which is the time needed to go from the office back home and would record that as well.
Identifying the Optimal Solution
To approach this problem I used a constrained optimization approach using all the data to find the best possible combination of houses that provide enough bedrooms such that each intern had their own room, was paying the least in cost, and that the commute was not too difficult.
Objective Function
- is the number of non-banned entries
- is the rent of the i-th house
- is the cost for every second of transit (constant)
- is the time to the office for the i-th house
- is the time to the office for the i-th house
- is the number of bedrooms in the i-th house
- is the total number of bedrooms needed
- is the identity variable as to if we take (1) or don’t take (0) the house
These objectives and constraints were modeled using gurobi
and deployed on the NEOS Supercomputer ↗ running the CPLEX
optimization model.
This job would be run on an hourly basis which meant for any possible solution we were able to contact the property owners typically within the day if not within a few hours.
The answers were published to a discord webhook ensuring that even if I was away from my device I could still know the best housing choice.
def obj_rule(model):
return sum(
df.iloc[i]["Rent (Total)"] * model.x[i] +
TransitWeight * (df.iloc[i]["ToTime"] + df.iloc[i]["FromTime"]) * model.x[i]
for i in model.N
)
model.obj = Objective(rule=obj_rule, sense=minimize)
model.BedConstraint = Constraint(rule=sum(df.iloc[i]["Beds"] * model.x[i] for i in model.N) == Num_Beds)
Outcome
This project turned out to be an INCREDIBLY successful one. We managed to find an eight-bedroom, seven-bathroom house within an hour of its posting, just a 20-minute commute from the office. Even better, the bus stop was conveniently located right in front of the house and dropped off right at the office, making the commute a breeze.
- House Cost $1,672 per month (including utilities)
- Transit Time < 30 minutes
- Everyone had their own bedroom (also sometimes their own bathroom too!)
On top of that, the house came with some awesome perks: a fantastic rooftop perfect for coding sessions, an excellent kitchen, and I even scored a massive bedroom WITH a balcony!
Future Work
Overall while I was incredibly happy with how the project turned out and the results it generated. I know there are definetely improvements that could be made to the system.
Removing Sample Space
Sometimes listings get taken off the internet or simply will always be unoptimal, removing these or banning them would make it so that the algorithm has less variables to consider and can scale better.
Optimizing Distance Between Houses
We weren’t sure if we would get 8 people in one house. Truth be told, I was anticipating that we’d need two houses at best. This resulted in us creating a seperate optimization function that would also minimize the distance between two houses but could be generalized as well.
- is the number of non-banned entries
- is the rent of the i-th house
- is the cost for every second of transit (constant)
- is the cost for distance between houses (constant)
- is the time to the office for the i-th house
- is the time to the office for the i-th house
- is the number of bedrooms in the i-th house
- is the total number of bedrooms needed
- is the distance between house i and house j.
- is the identity variable as to if we take (1) or don’t take (0) the house
Improving Scalability
Currently, this process runs fine on a raspberry PI or single node device, would be interesting to see how this grows in scale or if it would be marketable. I think something like this has potential to be used more generally, but figuring out that market and business plan is something I’ll leave as an exercise for the reader.