• 3 Posts
  • 142 Comments
Joined 3 years ago
Cake day: June 30th, 2023






  • The embedding layer post tokenization is not just a probability machine the way you’re suggesting it. You can argue that it is probabilistic with inferred sentiment, but too many people think it works like how text prediction on your phone does and that is just factually inaccurate.

    Verify output of course, but saying “it doesn’t understand anything” and “probability machine” is a borderline erroneous short sell. At the level of tokens it “understands” relationships, and those relationships are not probabilistic, though they are fundamentally approximated based on a training corpus.



  • More than you would think. Most people don’t show up to ask to do research in my opinion. I got my first one by literally walking up to a Prof after class and saying “I like this. Do you know anyone who does research like this?”. He said “me”, gave me a test project, and I worked for him for like almost 20 years.

    I should say I also did this my freshman year (the year before the story above) and I got referred to a different guy in the department where I worked for a summer but it didn’t work out for either of us. Turns out I’m more engineer than physicist, but it was good knowledge.

    Edit: To be more direct about numbers, you’re right there’s not a spot for everybody or even half, but when I was in undergrad way less than that even went looking for research, they just did the job fair every year mostly and were students the rest of the time which was… not as reliable a strategy we will say.


  • This advice won’t be helpful to you unfortunately, but if there are students reading, the answer as others have said is already being connected. A key way to make this happen is through research with faculty followed by (potentially) internships and then full time offers. If you just show up and kick ass every day for 3 to 5 years and even get a 4.0 GPA the market will still be very tough.

    Bonus points though which might help OP: anything you can do that narrows the pool helps you. For example, if you’re a white bread american dude maybe look for a job that requires getting a clearance, if you’re mandarin Chinese maybe look for something that requires some translation or speaking, etc. You may not be the best programmer or the best salesman, but you might be a top tier salesman for programming tools.



  • There is a data point missing here.

    Do the same study and give some an LLM, some no LLM, and some a type A subject matter expert for reference. It may also matter if this person is a friend coworker or random passerby, but I would be willing to bet money that the same effect is present to a lesser (but still statistically significant) degree.

    Maybe a future study can be further refined to build some scaffolding for more effective teaching/learning “on the job” or in general.







  • Unless that’s how people are designing front ends for models, it literally DOESN’T work like that. It works like that until you finish training an embedding model with masking related tasks, but that’s the tip of the iceberg. The input vector, after being tokenized, is ingested wholesale. Now there’s sometimes funny business to manage the size of a context window effectively but this isn’t that unless you’re home-rolling and you’re caching your own inputs or something before you give it to the model.