Lessons learned from the last DIYRobocar race.
Overall today was a success given that two Donkey controlled vehicles did at least one completely autonomous lap prior to race time. This is a big improvement from the same event 3 months ago when we worked on Adams car all day to get it to lurch forward and turn right. This is short debrief of what I learned this weekend to help guide our next efforts to make Donkey a better library for training autopilots for these races.
Of course, after the race the driving problems became obvious.
- The vehicle drive loop was only updating every 1/5th of a second but should have been updating every 1/15th of a second or more frequently. This meant we didn’t collect very much training data and the autopilot didn’t update often.
- Training data was not cleaned. We didn’t have a good way to view the training data on the ec2 instance so we could not see that there we were using training data that included bad steering, and even images when the vehicle had been picked up.
Beyond bugs, how can do better at these races?¶
On a race day, we have 4-6 hours to take a proven car and autopilot and train it to perform on a new track. The more efficiently we can improve the autopilot performance the better we’ll do. Here’s an overview problems I saw today and proposals how to fix them. Specific issues and solutions live in github issues.
Many steps are required to update an driving model.¶
At one point today I had 20+ terminal windows open because changing the autopilot requires many useless “plumbing” steps. These steps can be automated or made easier with command line arguments or web interface.
- Switching models requires restarting the server
- It's difficult to remember which session was the good session.
- Combining sessions is a separate script.
Driving is required to test models.¶
Since updating and driving an autopilot takes time, we need to make sure that our changes actually improve the autopilot before we test it on the track. A trusted performance measurement is needed at the time of training. This could be a combination of the error on a validation dataset and a visual showing how closely predicted values were to actual.
There is no way to debug an autopilot.¶
Currently an autopilot either works or it doesn’t. Driving performance are the only clues to help us understand what’s going wrong. Helpful clues would include: Adding a visual showing what the network is queuing off of would help. Lag times Predicted vs actual overlaid on image.
Common problems that don’t have obvious solutions.¶
Even after a common problem has been identified, there’s no standard solution to fix the problem. “Agile training” could be used to correct the autopilot by creating more training data where the autopilot fails.
- Vehicle doesn’t turn sharp enough.
- Vehicle doesn’t turn at all on a corner.
- Vehicle goes to slow.
There is no easy way to clean training data on a remote instance.¶
Training on bad data makes bad autopilots. To learn where bad training data exists you need to see the image the recorded steering values, This is impossible on a CLI but would be possible through a web interface.