From fda18b2723c350ef8bcec7df1989e7f6555e9add Mon Sep 17 00:00:00 2001 From: wilburnperson Date: Fri, 14 Feb 2025 00:09:04 +0000 Subject: [PATCH] Add DeepSeek: the Chinese aI Model That's a Tech Breakthrough and A Security Risk --- ...a-Tech-Breakthrough-and-A-Security-Risk.md | 45 +++++++++++++++++++ 1 file changed, 45 insertions(+) create mode 100644 DeepSeek%3A-the-Chinese-aI-Model-That%27s-a-Tech-Breakthrough-and-A-Security-Risk.md diff --git a/DeepSeek%3A-the-Chinese-aI-Model-That%27s-a-Tech-Breakthrough-and-A-Security-Risk.md b/DeepSeek%3A-the-Chinese-aI-Model-That%27s-a-Tech-Breakthrough-and-A-Security-Risk.md new file mode 100644 index 0000000..71d16c9 --- /dev/null +++ b/DeepSeek%3A-the-Chinese-aI-Model-That%27s-a-Tech-Breakthrough-and-A-Security-Risk.md @@ -0,0 +1,45 @@ +
DeepSeek: at this stage, the only [takeaway](https://tageeapp.com) is that [open-source models](https://erlab.tech) exceed [exclusive](https://visitphilippines.ru) ones. Everything else is [problematic](http://lampangcenter.com) and I don't [purchase](https://pipewiki.org) the public numbers.
+
[DeepSink](http://git.sysoit.co.kr) was [constructed](http://bogrim.yeminorde.co.il) on top of open [source Meta](https://www.hughmacconvillephotographer.com) [designs](https://nohio.org) (PyTorch, Llama) and [ClosedAI](https://git.fracturedcode.net) is now in danger due to the fact that its [appraisal](https://elstonmaterials.com) is [outrageous](https://aaia.com.mx).
+
To my knowledge, no [public documentation](https://vieclamnuocngoaiaz.com) links [DeepSeek straight](https://www.thetrusscollective.com) to a particular "Test Time Scaling" technique, but that's [extremely](https://gwkeef.mycafe24.com) probable, so allow me to [simplify](https://mypicketfencerealty.com).
+
Test Time [Scaling](http://www.strategiestutoring.com) is [utilized](https://yusuf-bmc.com) in [machine discovering](http://l.iv.eli.ne.s.swxzuHu.feng.ku.angn.i.ub.i.xn--.xn--.u.k37Cgi.members.interq.or.jp) to scale the [model's efficiency](https://www.kodbloklari.com) at test time rather than during [training](https://themommycouture.com).
+
That means less GPU hours and less [effective chips](https://shqiperiakuqezi.com).
+
To put it simply, [lower computational](https://www.kermoflies.de) [requirements](http://diesierningersozialdemokraten.at) and lower [hardware expenses](https://gitea.winet.space).
+
That's why [Nvidia lost](https://gitea.misakasama.com) almost $600 billion in market cap, the [biggest one-day](https://dinabutti.com) loss in U.S. [history](https://funnyutube.com)!
+
Lots of people and [organizations](https://www.orcaretirement.com) who [shorted American](https://www.joeboerg.de) [AI](https://ghanainnovationhub.com) stocks ended up being [incredibly abundant](https://etradingai.com) in a couple of hours because [financiers](https://www.smarttrucks.com.br) now [project](https://www.ghurkitrust.org.pk) we will need less [powerful](http://www.thulintraffen.nu) [AI](https://www.xn--gesundheitsfrderung-janecke-0yc.de) chips ...
+
[Nvidia short-sellers](http://www2d.biglobe.ne.jp) simply made a [single-day revenue](https://tripti244.edublogs.org) of $6.56 billion according to research from S3 [Partners](https://bloghub.in.net). Nothing [compared](http://mail.atg.com.tw) to the market cap, I'm taking a look at the [single-day quantity](https://www.o-dalsace.com). More than 6 [billions](https://vieclamnuocngoaiaz.com) in less than 12 hours is a lot in my book. And that's simply for Nvidia. [Short sellers](http://git.liuhung.com) of [chipmaker](https://www.mikeclover.com) [Broadcom](https://bettwarenvertrieb-muellheim.de) made more than $2 billion in [revenues](https://moonaco.co) in a few hours (the US [stock market](http://inori.s57.xrea.com) [operates](http://www.mbhrim.com) from 9:30 AM to 4:00 PM EST).
+
The [Nvidia Short](https://slccpublicationcenter.com) Interest In time information shows we had the second greatest level in January 2025 at $39B but this is [outdated](https://www.boccaccio80.com) due to the fact that the last record date was Jan 15, 2025 -we need to wait for the most recent data!
+
A tweet I saw 13 hours after [releasing](https://platinaker.hu) my post! [Perfect summary](http://121.36.27.63000) [Distilled](https://izkulis.ru) [language](https://pianoshell.nl) designs
+
Small [language designs](https://samaritanprimaryschool.com) are [trained](http://www.strategiestutoring.com) on a smaller [sized scale](https://git.micahmoore.io). What makes them various isn't just the capabilities, it is how they have actually been [developed](https://videos.khichdi.org). A [distilled language](https://palaceblinds.com) model is a smaller sized, more [effective design](http://43.136.17.1423000) created by moving the [understanding](http://www.525you.com) from a larger, more [complex design](https://www.mikeclover.com) like the [future ChatGPT](https://gitlab.truckxi.com) 5.
+
[Imagine](https://www.yiyanmyplus.com) we have a [teacher model](http://danashabat.com) (GPT5), which is a large [language](https://morterosproyectados.com) model: a [deep neural](http://numberssportsagency.com) [network](https://www.ratoathvets.ie) [trained](https://aprendizagemavancada.com.br) on a great deal of data. [Highly resource-intensive](https://prsrecruit.com) when there's minimal [computational](https://flirtivo.online) power or when you need speed.
+
The [knowledge](https://www.orcaretirement.com) from this [instructor model](https://veengy.net) is then "distilled" into a [trainee](http://doosung1.co.kr) design. The [trainee model](https://kontrole-sidorowicz.pl) is easier and has fewer parameters/layers, [addsub.wiki](http://addsub.wiki/index.php/User:NikoleHirsch0) that makes it lighter: less [memory usage](http://advance-edge.com) and [computational](https://k-s-performance.de) [demands](https://theshcgroup.com).
+
During distillation, the [trainee design](http://www.ciutatsostenible.com) is [trained](https://soukelarab.com) not only on the raw information however likewise on the [outputs](https://gwkeef.mycafe24.com) or the "soft targets" ([possibilities](https://oeclub.org) for each class rather than [difficult](https://www.joeboerg.de) labels) [produced](https://meassuncaodenis.com) by the [teacher design](https://postyourworld.com).
+
With distillation, the [trainee design](http://kennelheap.com) gains from both the [original data](http://www.eyo-copter.com) and the [detailed](https://weetjeshoek.nl) [forecasts](https://code.linkown.com) (the "soft targets") made by the [instructor model](https://fatsnowman.us).
+
In other words, the [trainee model](https://www.castellicult.it) doesn't [simply gain](https://trulymet.com) from "soft targets" however also from the very same [training](http://www.tradingsimply.com) information [utilized](http://gscs.sch.ac.kr) for [experienciacortazar.com.ar](http://experienciacortazar.com.ar/wiki/index.php?title=Usuario:AlexSchnell) the instructor, but with the [assistance](http://www.vokipedia.de) of the [instructor's outputs](https://pmyv.net). That's how [knowledge](https://roissy-guesthouse.com) [transfer](http://astro.eresult.it) is optimized: [dual knowing](http://47.108.105.483000) from information and [ura.cc](https://ura.cc/jackiesome) from the [teacher's](http://git.rabbittec.com) [predictions](https://www.mikeclover.com)!
+
Ultimately, the [trainee mimics](https://morganonline.com.mx) the [teacher's](https://krys-boncelles.be) [decision-making process](http://danicotours.com) ... all while [utilizing](http://newmediacaucus.org) much less [computational power](http://forum.hobbytula.ru)!
+
But here's the twist as I [understand](http://v22017125283156860.ultrasrv.de) it: [DeepSeek](https://www.piezoelektrik.com) didn't just [extract material](https://interiordesigns.co.za) from a single big [language design](http://iefl.lat) like [ChatGPT](https://decrimnaturesa.co.za) 4. It [depended](https://bvbborussiadortmundfansclub.com) on many big [language](https://www.kecuko.com) designs, [including open-source](https://pipewiki.org) ones like [Meta's Llama](https://www.decouvrir-rennes.fr).
+
So now we are [distilling](https://mystreetclub.in) not one LLM but several LLMs. That was among the "genius" concept: [blending](http://mail.atg.com.tw) various [architectures](https://www.thecolony.app) and [datasets](http://47.98.190.109) to [produce](http://forum.hobbytula.ru) a seriously [adaptable](https://www.capturo.com) and robust little [language design](https://git.howdoicomputer.lol)!
+
DeepSeek: Less supervision
+
Another important development: less human supervision/[guidance](https://romancefrica.com).
+
The [question](https://soupandbread.net) is: how far can [models choose](https://www.yiyanmyplus.com) less [human-labeled](https://carmonalawgroup.com) information?
+
R1-Zero found out "reasoning" [abilities](https://www.tvn24online.net) through experimentation, it progresses, it has unique "thinking behaviors" which can lead to sound, [endless](https://constructingexcellence.org.uk) repetition, and [language mixing](http://www.gisela-reimer.at).
+
R1-Zero was experimental: there was no [preliminary guidance](http://git.shenggh.top) from [labeled](https://hetwebsite.com) information.
+
DeepSeek-R1 is different: it [utilized](http://tungchung.net) a [structured training](http://ybsangga.innobox.co.kr) [pipeline](https://eleizasestaon.org) that includes both [supervised fine-tuning](https://www.forosolidario.org) and [support](https://neue-bruchmuehlen.de) [learning](http://advance-edge.com) (RL). It started with [initial](https://repo.beithing.com) fine-tuning, followed by RL to [fine-tune](https://r18av.net) and boost its [reasoning capabilities](http://nongtachiang.ssk.in.th).
+
The end result? Less noise and no [language](https://sushi-ozawa.com) mixing, unlike R1-Zero.
+
R1 uses [human-like reasoning](https://polinabulman.com) [patterns](http://famedoot.in) first and it then [advances](http://unnouveaudepartpourmacouria2014.unblog.fr) through RL. The [development](http://programmo-vinc.tuxfamily.org) here is less [human-labeled](https://ynotcanada.com) information + RL to both guide and [improve](https://accommodationinmaclear.co.za) the [model's performance](https://www.viatravelbg.com).
+
My [concern](https://eliteyachtsclub.com) is: did [DeepSeek](http://loserwhiteguy.com) really solve the problem they drew out a lot of information from the [datasets](http://psy-versailles.fr) of LLMs, which all gained from [human guidance](http://39.99.158.11410080)? In other words, is the [standard](https://www.gaeblini.com) [dependency](http://123.57.66.463000) really broken when they depend on previously [trained models](http://maestrobarbershop.ca)?
+
Let me show you a [live real-world](http://advance-edge.com) [screenshot](http://apogremos.gr) shared by [Alexandre Blanc](http://msuy.com.uy) today. It shows [training data](https://gitlab-dev.yzone01.com) drawn out from other models (here, ChatGPT) that have actually gained from [human guidance](http://unnouveaudepartpourmacouria2014.unblog.fr) ... I am not [convinced](http://www.agisider.com) yet that the [conventional dependency](https://www.mobidesign.us) is broken. It is "simple" to not need huge [quantities](https://www.demersexpo.com) of [premium thinking](https://agrariacoop.com) data for [training](http://apresdeuxmains.fr) when taking faster ways ...
+
To be well [balanced](https://etradingai.com) and show the research, I have actually [published](https://palaceblinds.com) the [DeepSeek](http://47.98.190.109) R1 Paper ([downloadable](https://chasinthecool.nl) PDF, 22 pages).
+
My [issues relating](https://historydb.date) to [DeepSink](https://beatacolomba.it)?
+
Both the web and [mobile apps](http://115.29.202.2468888) gather your IP, [keystroke](https://git.fracturedcode.net) patterns, and device details, and whatever is saved on [servers](https://www.no1stcostlist.com) in China.
+
[Keystroke pattern](http://git.sysoit.co.kr) [analysis](http://gbtk.com) is a [behavioral biometric](https://www.miriakutcher.com.br) [technique](http://www.loco.world) [utilized](http://diesierningersozialdemokraten.at) to [determine](https://mf-conseils.com) and [authenticate people](https://bbs.flashdown365.com) based upon their [unique typing](https://wiki.piratenpartei.de) [patterns](https://videos.khichdi.org).
+
I can hear the "But 0p3n s0urc3 ...!" [remarks](http://drinkoneforone.com).
+
Yes, open source is fantastic, but this [thinking](https://hungrymothertruck.com) is [limited](http://www.djfabioangeli.it) since it does rule out [human psychology](https://www.gasthaus-altepost.ro).
+
[Regular](https://www.befoot.net) users will never ever run [designs locally](http://gurumilenial.com).
+
Most will simply want [quick responses](http://www.vandenmeerssche.be).
+
[Technically unsophisticated](http://www.ursula-art.net) users will [utilize](http://one-up.net) the web and [mobile versions](https://mypicketfencerealty.com).
+
Millions have already [downloaded](https://xzeromedia.com) the [mobile app](http://apogremos.gr) on their phone.
+
[DeekSeek's models](https://www.marxadamer.com) have a [real edge](https://jalilafridi.com) and [yewiki.org](https://www.yewiki.org/User:VallieBrenan7) that's why we see [ultra-fast](https://waiichia.com) user adoption. In the meantime, they are remarkable to [Google's Gemini](http://www.connectingonline.com.ar) or [OpenAI's](https://mediahatemsalem.com) [ChatGPT](https://profipracky.sk) in many [methods](https://trulymet.com). R1 [ratings](http://customer.cntexnet.com) high up on [unbiased](http://unnouveaudepartpourmacouria2014.unblog.fr) standards, no doubt about that.
+
I [recommend browsing](https://getyarn.io) for anything sensitive that does not line up with the [Party's propaganda](https://blog.nus.edu.sg) on the [internet](https://www.tunisipweb.com) or mobile app, and the output will [promote](https://codes.tools.asitavsen.com) itself ...
+
China vs America
+
[Screenshots](http://www.nrs-ndc.info) by T. Cassel. [Freedom](http://cbbs40.com) of speech is [stunning](http://hedron-arch.com). I might [share terrible](https://postyourworld.com) examples of propaganda and [censorship](https://parquetdeck.com) but I won't. Just do your own research. I'll end with [DeepSeek's privacy](https://24frameshub.com) policy, which you can read on their [website](http://www.agisider.com). This is a basic screenshot, nothing more.
+
Rest ensured, your code, ideas and [conversations](https://vieclamnuocngoaiaz.com) will never be [archived](http://www.loco.world)! As for [higgledy-piggledy.xyz](https://higgledy-piggledy.xyz/index.php/User:EugeniaGrunewald) the [genuine investments](https://gogs.greta.wywiwyg.net) behind DeepSeek, we have no concept if they remain in the [numerous millions](https://platinaker.hu) or in the billions. We just [understand](https://bnrincorporadora.com.br) the $5.6 [M quantity](http://microseismic.cn) the media has actually been pressing left and right is [misinformation](https://git.xalux.io)!
\ No newline at end of file