I did the Anthropic Original Performance Take Home last Thursday and succeeded in getting a runtime of 1157 cycles. I basically focused on optimizing/pruning the actual instructions initially, and then passed a large set of instructions to an LLM to pack and bit twiddle the actual program. I used Claude Code exclusively as an LLM.

For reference, the best solution that an LLM alone is able to do was 1363 cycles, so this is a significant improvement. The github does not really mention an etiquette on when people should disclose solutions, so I will wait a while before revealing any details.
I mostly program in Clojure, a language that largely eschews any real concerns about performance in favor of making code simple. Clojurians are lax and hand wave log_32(N) data structures as being "constant", and mostly don't care about memory alignment or things of that nature. I think this is entirely the right set of tradeoffs for > 99.9% of use cases.
That is a digressive way of saying that most of this optimization stuff was new to me. I knew about it because I know CS fundamentals, data structures, and algorithms; but I have never concerned myself with it in reality. Probably the closest I ever come to being concerned about it is in optimizing view renderings to make UI's feel responsive and snappy to the user. But this sort of work feels different. It is such a well-defined problem space (make the number go down), and you are free to do any logically equivalent transformation to the program in your pursuit of that goal. It is something that is really easy to get "in the flow" when you do it, and I found myself in that state at many points while working on it. It was fun, and I enjoyed the opportunity.