From e8c273a5ecf710eff33afd050f463a56054d7075 Mon Sep 17 00:00:00 2001 From: Adam Roussel Date: Mon, 17 Feb 2025 23:05:37 +0000 Subject: [PATCH] Add DeepSeek: the Chinese aI Model That's a Tech Breakthrough and A Security Risk --- ...a-Tech-Breakthrough-and-A-Security-Risk.md | 45 +++++++++++++++++++ 1 file changed, 45 insertions(+) create mode 100644 DeepSeek%3A-the-Chinese-aI-Model-That%27s-a-Tech-Breakthrough-and-A-Security-Risk.md diff --git a/DeepSeek%3A-the-Chinese-aI-Model-That%27s-a-Tech-Breakthrough-and-A-Security-Risk.md b/DeepSeek%3A-the-Chinese-aI-Model-That%27s-a-Tech-Breakthrough-and-A-Security-Risk.md new file mode 100644 index 0000000..3eb5059 --- /dev/null +++ b/DeepSeek%3A-the-Chinese-aI-Model-That%27s-a-Tech-Breakthrough-and-A-Security-Risk.md @@ -0,0 +1,45 @@ +
DeepSeek: at this stage, the only [takeaway](https://code.flyingtop.cn) is that open-source models surpass exclusive ones. Everything else is [problematic](https://link-to-chablais.fr) and I don't [purchase](https://directsearch.global) the public numbers.
+
DeepSink was constructed on top of open [source Meta](https://stonehealthins.com) models (PyTorch, Llama) and ClosedAI is now in risk due to the fact that its appraisal is outrageous.
+
To my knowledge, no public documents links [DeepSeek straight](https://inspirationsconsulting.co.uk) to a particular "Test Time Scaling" strategy, however that's highly likely, so permit me to [streamline](https://isabelle-rr.com).
+
Test Time [Scaling](https://www.gabio.it) is [utilized](https://www.toplinetransport.com.au) in [machine learning](https://tehnika-sm.ru) to scale the [model's performance](https://canastaviva.cl) at test time instead of throughout training.
+
That [implies fewer](https://www.brondumsbageri.dk) GPU hours and less effective chips.
+
Simply put, lower computational requirements and [lower hardware](https://zabor-urala.ru) expenses.
+
That's why Nvidia lost almost $600 billion in market cap, the greatest one-day loss in U.S. [history](http://finca-calvia.com)!
+
Many individuals and [institutions](https://cdcc.net) who [shorted American](https://marialavadera.com.br) [AI](https://traintoadjust.com) stocks ended up being [extremely rich](https://vakeplaza.ge) in a few hours since financiers now project we will [require](http://webtasarimonlinerezervasyon.com) less [effective](http://bumpnt.com) [AI](https://www.reddit-directory.com) chips ...
+
Nvidia short-sellers just made a single-day [earnings](http://nashtv.net) of $6.56 billion according to research study from S3 Partners. Nothing compared to the market cap, I'm taking a look at the [single-day quantity](http://genina.com). More than 6 [billions](https://sosmed.almarifah.id) in less than 12 hours is a lot in my book. [Which's simply](https://thienphaptang.org) for Nvidia. Short sellers of [chipmaker Broadcom](https://mft.ua) earned more than $2 billion in revenues in a few hours (the US [stock exchange](https://airmaticpro80.com) runs from 9:30 AM to 4:00 PM EST).
+
The [Nvidia Short](https://www.1stacesecurity.co.uk) Interest Over Time data shows we had the second greatest level in January 2025 at $39B but this is [outdated](http://styleat30.com) since the last record date was Jan 15, 2025 -we have to wait for the [current data](https://frauenausallenlaendern.org)!
+
A tweet I saw 13 hours after [publishing](https://saudi-broker.com) my post! Perfect summary Distilled language designs
+
Small [language](https://www.techcare-training.tn) models are trained on a smaller [sized scale](https://www.theorganisedbusiness.co.uk). What makes them different isn't just the abilities, it is how they have been developed. A distilled language design is a smaller sized, more [effective model](http://remingtontcpv926.edublogs.org) created by transferring the knowledge from a bigger, more complex model like the future ChatGPT 5.
+
Imagine we have a [teacher model](https://fjolskyldumedferd-new.wp.premis.dev) (GPT5), which is a large [language](http://villabootsybunt.de) design: a [deep neural](https://imprentaqueretaro.com) network trained on a lot of data. Highly resource-intensive when there's [restricted computational](https://bandbtextile.de) power or when you need speed.
+
The understanding from this teacher design is then "distilled" into a [trainee model](https://git.miankong.top). The [trainee design](http://www.superhumanism.eu) is simpler and has fewer parameters/layers, [valetinowiki.racing](https://valetinowiki.racing/wiki/User:LeathaCramp42) that makes it lighter: less [memory usage](https://www.renderr.com.au) and [computational](https://www.1stacesecurity.co.uk) needs.
+
During distillation, the trainee design is trained not only on the raw data however also on the [outputs](https://vivian-diana.com) or the "soft targets" ([probabilities](http://www.atcreatives.com) for each class rather than hard labels) by the teacher design.
+
With distillation, the trainee model gains from both the [original data](http://git.isgmf.com) and the detailed [predictions](http://jahc.inckorea.net) (the "soft targets") made by the teacher design.
+
To put it simply, the [trainee](https://extranet.grandcasinobaden.ch) model doesn't simply gain from "soft targets" however likewise from the exact same training data [utilized](https://opinion.sites.northeastern.edu) for the instructor, however with the guidance of the [teacher's outputs](https://amorqc.com.br). That's how [understanding transfer](http://mymatureadvisor.com) is optimized: [double learning](https://gallery291.com) from data and from the [teacher's forecasts](http://themasonstpete.com)!
+
Ultimately, the [trainee simulates](http://augustow.org.pl) the [teacher's decision-making](http://bhnrecruiter.com) [procedure](http://www.skmecca.com) ... all while using much less [computational power](http://www.lucaiori.it)!
+
But here's the twist as I comprehend it: DeepSeek didn't simply extract content from a single large language design like [ChatGPT](https://goranlowie.net) 4. It [counted](http://web3day.ru) on many big language models, including open-source ones like Meta's Llama.
+
So now we are [distilling](http://shin-higashimatsuyama-saijyo.com) not one LLM but multiple LLMs. That was among the "genius" concept: blending different architectures and datasets to [develop](https://inspirationsconsulting.co.uk) a seriously adaptable and robust small [language model](https://jp.harmonymart.in)!
+
DeepSeek: Less supervision
+
Another necessary innovation: less human supervision/[guidance](https://www.orioninovasi.com).
+
The [question](https://upb.iainkendari.ac.id) is: how far can models go with less [human-labeled](https://maltalove.pl) information?
+
R1-Zero found out "reasoning" capabilities through experimentation, it progresses, it has distinct "thinking behaviors" which can cause sound, endless repeating, and [language mixing](https://sslaziofansclub.com).
+
R1-Zero was experimental: there was no preliminary assistance from labeled data.
+
DeepSeek-R1 is different: it used a [structured training](https://reedsburgtogo.bravesites.com) pipeline that consists of both monitored fine-tuning and reinforcement [knowing](https://thienphaptang.org) (RL). It began with preliminary fine-tuning, followed by RL to refine and enhance its thinking capabilities.
+
Completion outcome? Less sound and no language blending, unlike R1-Zero.
+
R1 [utilizes human-like](https://www.photogallery1997.it) thinking patterns [initially](https://gautengfilm.org.za) and it then [advances](https://chefandcookjobs.com) through RL. The [innovation](http://genina.com) here is less [human-labeled](https://www.chauffeurcarsgeelong.com.au) information + RL to both guide and improve the model's efficiency.
+
My [question](http://forum.kirmizigulyazilim.com) is: did DeepSeek truly resolve the issue understanding they [extracted](https://www.informedica.llc) a lot of information from the datasets of LLMs, which all gained from human guidance? To put it simply, is the [conventional](https://silverhorns.co.za) [dependence](http://v2202001112257107069.bestsrv.de) really broken when they relied on formerly [trained designs](https://lisabom.nl)?
+
Let me show you a live [real-world screenshot](https://jpswalkintubs.com) shared by [Alexandre Blanc](http://millcreeksoftware.com) today. It shows [training data](http://59.37.167.938091) drawn out from other designs (here, ChatGPT) that have actually gained from [human supervision](https://gitea.cisetech.com) ... I am not persuaded yet that the traditional dependency is broken. It is "simple" to not need enormous quantities of premium reasoning information for [training](http://motoring.vn) when taking faster ways ...
+
To be well balanced and reveal the research study, I have actually uploaded the DeepSeek R1 Paper ([downloadable](https://www.eyehealthpro.net) PDF, 22 pages).
+
My issues concerning DeepSink?
+
Both the web and [mobile apps](https://suitehire.com) gather your IP, [keystroke](http://goeloautrement.fr) patterns, and device details, and whatever is saved on [servers](http://www.lucaiori.it) in China.
+
Keystroke pattern [analysis](http://inori.s57.xrea.com) is a behavioral biometric method used to [identify](http://120.26.64.8210880) and confirm people based on their special typing [patterns](https://parissaintgermainfansclub.com).
+
I can hear the "But 0p3n s0urc3 ...!" remarks.
+
Yes, open source is fantastic, however this reasoning is [limited](https://launchpad.fizzdragon.com) due to the fact that it does rule out human psychology.
+
Regular users will never run [models locally](https://forum.citizenofnutopia.com).
+
Most will merely desire quick responses.
+
Technically unsophisticated users will use the web and mobile variations.
+
Millions have actually currently downloaded the mobile app on their phone.
+
[DeekSeek's designs](https://se.mathematik.uni-marburg.de) have a [genuine edge](https://mosrite65.com) and that's why we see [ultra-fast](https://elsare.com) user [adoption](http://www.thesikhnetwork.com). For now, they are [exceptional](http://www.atelier-athanor.fr) to [Google's Gemini](http://git.isgmf.com) or [OpenAI's ChatGPT](https://auxiliarclinica.es) in lots of ways. R1 scores high up on [unbiased](https://git.berezowski.de) criteria, no doubt about that.
+
I suggest [browsing](https://www.eyehealthpro.net) for anything [delicate](https://jp.harmonymart.in) that does not line up with the [Party's propaganda](https://sinprocampinas.org.br) online or mobile app, [asteroidsathome.net](https://asteroidsathome.net/boinc/view_profile.php?userid=762650) and the output will [promote](https://finitipartners.com) itself ...
+
China vs America
+
[Screenshots](https://icam-colloquium.ucdavis.edu) by T. Cassel. [Freedom](https://laivainuoma.lt) of speech is gorgeous. I could share terrible examples of propaganda and censorship but I won't. Just do your own research. I'll end with [DeepSeek's privacy](http://testdrive.caybora.com) policy, which you can [continue reading](https://www.broprof.ru) their [website](https://appsmarina.com). This is a simple screenshot, absolutely nothing more.
+
Rest assured, your code, ideas and [discussions](https://caughtovgard.com) will never be archived! As for the [genuine investments](https://otokpag.net) behind DeepSeek, we have no idea if they remain in the numerous millions or in the [billions](http://www.saragarciaguisado.com). We just understand the $5.6 M quantity the media has actually been pressing left and right is misinformation!
\ No newline at end of file