Relatively SoTA LLM Agents from Scratch?

Question

As we know, OpenAI is not so open.In 2023, I was playing with transformers, RNNs and I had an understanding how it worked from top to bottom (e.g. made my own keras, could whiteboard small nets) and I can throw things together in keras or tf pretty quickI got a job and never touched that again. Data and compute notwithstanding, how hard would it be to make a pet project foundation model using the latest techniques? I&rsquo;ve heard about MoE, things like that and I figure we&rsquo;re not just throwing a bunch of layers and dropout in Keras anymore.

huevosabio · Accepted Answer

The Olmo team is AFAIK the only SOTA-ish model that has fully open source code and data. Their report is fantastic: https://www.datocms-assets.com/64837/1763662397-1763646865-o...
It should give you an idea of how hard it is to do a SOTA model from scratch!
If you relax the SOTA aspect, Karpathy's nanochat has you covered: https://github.com/karpathy/nanochat