Week 6: Design Phase (Constructing and Preparing the Dataset)
- Feb 27, 2025
- 2 min read
This week was heavily focused on building and preparing my dataset for the ball detection model, and it marked a key shift from planning to actual implementation. I started by gathering data from universe.roboflow.com, a platform that offers a wide range of public datasets across various domains, including sports. The resources available there were incredibly helpful, as I was able to find datasets specifically suited to ball tracking and detection.
Once I had selected the data I needed, I used Roboflow’s web-based tool to manage the annotation process. This online platform made it really easy to upload, annotate, and organize the dataset in a single workspace. The annotation tools were straightforward and efficient, and I could label images directly on the site without the need for any additional software. After annotating, I created a version of the dataset through Roboflow’s built-in versioning system, which allowed me to keep track of updates and export the dataset in formats compatible with different training models.
For my training workflow, I chose to use YOLOv8 due to its accuracy, speed, and strong community support. Roboflow’s integration with Google Colab allowed me to seamlessly export the annotated dataset and begin training the model in the cloud. Roboflow even generated the training scripts tailored to Colab, which made the setup process far smoother than I expected. I could quickly load my dataset, configure the training parameters, and get the model up and running with minimal hassle.

To complement the dataset I sourced online, I also started recording some of my own video clips to better reflect real-world scenarios for ball tracking. These clips will help evaluate how well the model performs outside of curated datasets and give me the flexibility to add more personalized data as needed.
Overall, this week’s progress has been a turning point in the project. Leveraging Roboflow and YOLOv8 from the beginning has provided a clear and efficient path from dataset creation to model training. With my initial version of the model now in training, I’m looking forward to reviewing its early performance and improving it over the coming weeks.


Comments