DeepSeek-R1 on DeepSeek-R1: Self-Analysis through Haiku

CodexArcanum@lemmy.dbzer0.com · 5 days ago

Words also just rotate around in popularity like any other fad. Remember synergy? Paradigm shifts? Thinking outside the box?

Academia isn’t immune to memes, far from it. In the semi-contained world of higher education, trends in words and phrases are even more pronounced and likely to spread.

If this is evidence of LLM usage, it could easily be the machines reflecting back trends. These things pick up on subtle cues in your prompts to match tone with you as well so I wouldn’t rule out human influence either in prompts or the RLHF process.

CodexArcanum@lemmy.dbzer0.com · 5 days ago

Yeah, even compiling everything you run into the Trusting Trust problem and that’s only gotten way worse.

I love rust, but I was installing fd the other day as an alternative to find. Find, written in C I’m guessing and nearly as old as the silicon running it, is 200KB in size, while fd is 4MB. Is it 20 times better for being 20 times bigger? I’m not worried about the space but obviously 3.8MB of runtime and framework, in every executable, is both a lot of overhead and a lot of places to hide surveillance. Should i be worried that every rust program, compiled to LLVM, a system maintained and sponsored by Apple, has the potential to be backdoored?

Well probably not since all the chips are already backdoored, but who’s to say Apple wouldn’t double down. How far do you trust the .net or java runtimes? It’s tough out here for the paranoids!

CodexArcanum@lemmy.dbzer0.com · 5 days ago

People say this, but I’ve been on Lemmy and Mastodon for about 1.5 years and Lemmy feels a lot more engaging than Masto. My posts there get one or two likes and boosts, while posts and comments here regularly get dozens if not hundreds of upvotes. I think Blue Sky is eating their lunch right now.

CodexArcanum@lemmy.dbzer0.com · 10 days ago

Imagine being a director at this company. One of your employees brings you a report showing that your most active users, who are the backbone of your business, have a huge overlap with rape reports. This will destroy the company, and you know they’ll fire you for bringing it up and suppress it anyway. So you just… forget… to bring it up at the next quarterly. You used to work at Uber, and before that covering up how gambling and gaming companies float on a raft made of addicts, so this is well practiced blindness.

CodexArcanum@lemmy.dbzer0.com · edit-2 10 days ago

There are aspects that could be better, sure. I think communities should be like sets of posts, subject to unions, conjuctions, and other set operations. Then you wouldnt have the issue of 5 versions of c/memes, they could be virtually joined into one memes community at the user level (and the user can filter out instances, communities, and users they don’t like of course). Moderation could be decoupled from communities and made a broader service that users choose to interact with, agreeing to a level of moderation comfortable for their experience.

But also, put me in the group that thinks lemmy should stay small. Corpo social has convinced us that a single big room with every idiot and literally their mother screaming into it is how the internet should be and it isn’t. We can go back to smaller, focused online communities that don’t openly invite everyone to come in and fight.

Centralization tendencies are all rooted in power and control. We need to fragment more.

CodexArcanum@lemmy.dbzer0.com · edit-2 16 days ago

Agreed with @[email protected]’s comment: not a good article.

First off, this article is just a link and a short summary of a much longer blog post of the Qualys AI security company. So this isn’t a standard industry or academic benchmark, this is something Qualys is using as part of their sales strategy for AI consulting services.

Which, frankly, I would avoid if this article is indicative of how they work.

They tested 1 model. One. Insane. DeepSeek released about a half-dozen distillation of llama and qwep2: 3B, 7B, 8B, 14B, 32B, and the full 671B model. As an AI-expert security company, why would you test one of the weakest distillations only? Was it really so expensive to rent some time on the cloud to test out the other models? Not even the 32B which should be well within most hobbyist ability and budget to run on cloud and test out.

The full article is such obvious SEO bait, I’d be surprised if AI didn’t help write it. Just goes on and on and on talking about why AI security is important, why these tests are important, why its important each single test that it failed. But ultimately, no matter how many words they write, the point that they only test 1 very weak version of the model makes the whole thing pointless except as a way to tie the name Qualys to DeepSeek and thus lure in more gullible rubes. Like did they only test 8B because that’s cheap enough for them to run as a service for their business?

Anyway, I’ve previously written here on Lemmy about how Deepseek distilations are so easy to break than I almost think they’re buggy and not just insufficient in this regard. I’d really want to see this kind of analysis done on the full model, and obviously also on the other big LLMs they could test to compare against. I wouldn’t even trust the 8B models for unsupervised work. It’s certainly not reliable and non-hallucenatory enough for things like content filtering or expert system usage, no 8B-level model I’ve tested is. So very concerning that it seems like that’s the level of service they’re pushing.

CodexArcanum@lemmy.dbzer0.com · 17 days ago

I switched fully over and am in the process of degoogling and de-microsofting my life. No more easy defaults of VS Code, back to custom configuring my emacs. No more surveillance, self-hosting and encryption. No more shitty windows gaming, Linux and Proton for gaming bliss!

CodexArcanum@lemmy.dbzer0.com · 25 days ago

DeepSeek-R1 on DeepSeek-R1: Self-Analysis through Haiku

CodexArcanum

DeepSeek-R1 on DeepSeek-R1: Self-Analysis through Haiku

DeepSeek-R1 on DeepSeek-R1: Self-Analysis through Haiku