Yeah, this is the approach people are trying to take more now, the problem is generally amount of that data needed and verifying it's high quality in the first place, but these systems are positive feedback loops both in training and in use. If you train on higher quality code, it will write higher quality code, but be less able to handle edge cases or potentially complete code in a salient way that wasn't at the same quality bar or style as the training code.
On the use side, if you provide higher quality code as input when prompting, it is more likely to predict higher quality code because it's continuing what was written. Using standard approaches, documenting, just generally following good practice with code before sending it to the LLM will majorly improve results.
I almost exclusivity self-checkout for groceries, and it had drastically sped up my checkout time as most people in my area opt to use traditional checkout and the stores are still keeping lots of lanes open (just closing the express lanes). The last 3 times I've used a non-self checkout, each time I was double charged for items or didn't have reduced prices applied and didn't notice because I was bagging.