ALY
February 2023
ALY is a tool designed to connect farm producers with consumers, aiming to reduce food waste and improve the efficiency of the food supply chain. The project was developed during the Carlson Analytics for Good Hackathon ↗ , where our team was honored with the Most Engaged and Inquisitive Award. My primary responsibilities included data disambiguation, feature engineering of two datasets, database schema design, developing a pipeline for training a machine learning model, and integrating the backend with the frontend.
Background
Our team was provided with operational data from The Good Acre ↗ , consisting of orders and contracts between local farmers and consumers, along with information on fulfillment rates. The objective was to use this data to better predict the completion rate of orders and contracts, thereby improving producer-consumer matching and reducing food waste. The datasets provided were unstructured and dispersed across multiple sources. My task, using pandas, was to standardize, clean, and merge the data into a unified dataset that could be stored in a database accessible to the frontend.
Development
Our team defined three key metrics to evaluate the performance of farmer and contract relationships: On Time & In Full, On Time, and Fulfillment Average.
- On Time & In Full refers to the farmer delivering the exact quantity of produce on the specified date.
- On Time indicates timely delivery, though the quantity might be insufficient.
- Fulfillment Average measures the percentage of requested produce delivered, even if it was not on the requested date.
While calculating these metrics was relatively straightforward, pairing the contracts with the corresponding produce deliveries was more challenging due to the lack of guaranteed date alignment. We implemented a greedy algorithm to match the nearest contract to a farmer’s produce, which, although not ideal, was the most feasible solution given the available data.
Machine Learning Model
Once the data was cleaned and stored in our database, I developed three machine learning models to predict the success of farmer-contract relationships:
- Stochastic Gradient Descent Classifier
- Multivariate Regression Model
- Random Forest Classifier
Due to the limited data spanning only two years, the Stochastic Gradient Descent Classifier and Multivariate Regression Model produced suboptimal results. However, the Ensemble Random Forest Classifier achieved over 80% accuracy. This model was serialized using pickle, deployed on a FastAPI server, and integrated into our frontend.
Outcome
This project offered valuable insights into how analytics can address real-world challenges. I gained significant experience working with my team, and I am proud that our efforts were recognized. Below is the final product we presented at the hackathon.