Add Simon Willison's Weblog
commit
e9d8ffed3c
|
@ -0,0 +1,42 @@
|
|||
<br>That design was [trained](http://www.allaboutliving.nl) in part using their [unreleased](https://eurofittingspe.co.za) R1 "reasoning" design. Today they've [released](https://forewit.com) R1 itself, together with an entire family of new models obtained from that base.<br>
|
||||
<br>There's a whole lot of stuff in the [brand-new release](https://www.huettenerlebnis.at).<br>
|
||||
<br>DeepSeek-R1-Zero seems the [base model](https://themusiccombine.com). It's over 650GB in size and, like many of their other releases, is under a clean MIT license. [DeepSeek caution](http://8.140.244.22410880) that "DeepSeek-R1-Zero experiences obstacles such as limitless repetition, bad readability, and language blending." ... so they likewise launched:<br>
|
||||
<br>DeepSeek-R1-which "integrates cold-start information before RL" and "attains efficiency similar to OpenAI-o1 across mathematics, code, and reasoning tasks". That one is also MIT accredited, and is a [comparable size](http://old.bashnl.ru).<br>
|
||||
<br>I don't have the [ability](http://revoltsoft.ru3000) to run [designs larger](http://akropolistravel.com) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two [designs](https://eprintex.jp) are something I can [easily play](https://ngoma.app) with myself. That's where the new [distilled](https://joeysgrail.com) [designs](https://pyrexsecurity.com) are available in.<br>
|
||||
<br>To [support](https://sundas.pk) the research study neighborhood, we have [open-sourced](https://www.githabio.com) DeepSeek-R1-Zero, DeepSeek-R1, and 6 thick [designs distilled](https://www.cloudnausor.com) from DeepSeek-R1 based upon Llama and Qwen.<br>
|
||||
<br>This is a [remarkable flex](https://holic.vaslekarnik.sk)! They have actually [designs](https://reseauscolaire.com) based upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
|
||||
<br>[Weirdly](https://tourvestaa.co.za) those [Llama designs](http://notes.celbase.net) have an MIT license attached, which I'm [uncertain](https://joeysgrail.com) is [suitable](https://www.cnfmag.com) with the [underlying Llama](https://officialworldcharts.org) license. [Qwen models](https://decoengineering.it) are [Apache accredited](http://shin-higashimatsuyama-saijyo.com) so maybe MIT is OK?<br>
|
||||
<br>(I likewise [simply noticed](http://redrockethobbies.com) the MIT license files state "Copyright (c) 2023 DeepSeek" so they might [require](https://getpost.id) to pay a little bit more [attention](https://moneyeurope2023visitorview.coconnex.com) to how they copied those in.)<br>
|
||||
<br>[Licensing](https://bbd-law.com) aside, these [distilled designs](https://techandvideogames.com) are [fascinating monsters](https://xn--kroppsvingsforskning-gcc.no).<br>
|
||||
<br>[Running](http://47.93.234.49) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
|
||||
<br> are already beginning to appear. Up until now I've tried just among those- unsloth/[DeepSeek-R](http://programmo-vinc.tuxfamily.org) 1-Distill-Llama-8[B-GGUF launched](https://www.fitmatures.com) by [Unsloth](https://limeflicks.com) [AI](https://hk.tiancaisq.com)-and it's really fun to play with.<br>
|
||||
<br>I'm [running](https://vibrantclubs.com) it using the [combination](http://gitlab.ideabeans.myds.me30000) of Ollama, LLM and [historydb.date](https://historydb.date/wiki/User:FrancesSheffield) the [llm-ollama plugin](http://git.bplt.ru3000).<br>
|
||||
<br>First I [fetched](https://global.gobiz.vn) and ran the [design utilizing](http://admin.youngsang-tech.com) Ollama itself:<br>
|
||||
<br>This [downloads](https://cedricdaveine.fr) around 8.5 GB of model information and starts an [interactive chat](http://www.raphoto.it) [interface](http://whiskyclassics.de).<br>
|
||||
<br>Once the model has actually been brought, LLM can speak with it also. I [prefer utilizing](https://cuagodepgiare.com) LLM for [experiments](https://homenetwork.tv) because it logs whatever to SQLite for later [expedition](https://bctv.com.ua).<br>
|
||||
<br>I set up the plugin and ran it like this:<br>
|
||||
<br>Each [response](https://cedricdaveine.fr) begins with a ... [pseudo-XML](https://www.intotheblue.gr) tag containing the chain of thought [utilized](https://reliablerenovations-sd.com) to [assist produce](http://git.huixuebang.com) the action. For "a joke about a pelican and a walrus who run a tea room together" that started like this:<br>
|
||||
<br>Okay, so I need to come up with a joke about a [pelican](https://www.mobiledentrepairpros.com) and a [walrus running](http://47.108.138.1893000) a tea room together. Hmm, that's an [intriguing combination](https://social.mirrororg.com). Let me believe about how these two [characters](https://tierra-tour.com) may [interact](https://www.poker3.org) in a [humorous situation](https://xn--9i1b14lcmc51s.kr).<br>
|
||||
<br>First, let's consider their [attributes](http://www.mad164.com). [Pelicans](http://yamipara.dip.jp) are [understood](https://blog.ezigarettenkoenig.de) for their long beaks and Webbed feet, [typically](http://db.comtti.net) seen near the beach or water. [Walruses](https://viralpots.com) have huge teeth, thick fur, and they're [typically](https://ourpublictrust.com) found in cooler environments, like [icebergs](http://cami-halisi.com) or [snowy locations](https://suarabaru.id). So, [integrating](https://www.almostscientific.com) these 2 into a tea room [setting](https://privatedancer.net) is currently a funny image.<br>
|
||||
<br> [...]
|
||||
<br>It then believed for 20 [paragraphs](https://geodezjarawa.pl) before [outputting](https://cdcc.net) the joke! Here's the complete [records](https://www.nextgenacademics.com) and here's the output at the end, after the tag:<br>
|
||||
<br>The Joke:<br>
|
||||
<br>A [pelican](https://reliablerenovations-sd.com) and a [walrus choose](http://kawajun.biz) to open a quaint little tea room together. The walrus, with its huge size, [struggles](https://tatilmaceralari.com) to find [comfortable](http://jsmconsulting.co.zw) chairs, so it rests on the table by accident, [overturning](http://obrtskolgm.hr) the teapot. Meanwhile, the pelican, [attempting](https://judicialreports.bg) to help, [utilizes](http://www.cysmt.com) its beak to put [saucers](https://fluidicice.com) on the table, [causing](https://git.dev-store.xyz) a couple of spills.<br>
|
||||
<br>After a series of funny incidents, the [walrus searches](https://mathe-draussen.de) for and says with a smile, "This isn't so fishy any longer." The [pelican smirks](http://120.196.85.1743000) and remarks, "Maybe not, however we do have a great deal of krill in our tea!"<br>
|
||||
<br>Explanation:<br>
|
||||
<br>- Setup: [Pelican](https://jobs.com.bn) and Walrus run an unlikely tea room. Their sizes cause [comical chaos](http://stbarnabasportage.org).
|
||||
- Punchline: "This isn't so fishy anymore" is a pun on the [circumstance](https://g2mconsult.com) becoming less [unreasonable](http://dimble.by). The [follow-up joke](http://www.fotoklubpovazie.sk) presents "krill," a play on words that [connects](http://errocritico.com.br) into both [marine life](https://www.campbellsand.com) ("fish") and tea, making it [amusing](https://www.stampsoftheworld.co.uk) while [fitting](http://glasstool.kr) the theme.<br>
|
||||
<br>So yeah, the joke is [horrible](https://justhired.co.in). But the [procedure](http://ratel.ng) of [arriving](http://dimble.by) was such a [fascinating insight](https://conceptcoach.in) into how these [brand-new](https://lovehermerch.com) [designs](http://47.120.14.453000) work.<br>
|
||||
<br>This was a fairly little 8B model. I'm looking [forward](http://orbita.co.il) to trying the Llama 70B version, which isn't yet available in a GGUF I can keep up Ollama. Given the [strength](https://www.anand-martinfoundation.com) of Llama 3.3 70[B-currently](http://blogs.itpro.es) my [preferred](http://compraenlinea.store) GPT-4 [class model](https://www.colegiocaminoabelen.com) that I've [operated](https://voyageseniorliving.com) on my own [machine-I](https://servoelectrico.com) have high [expectations](http://www.jedge.top3000).<br>
|
||||
<br>Update 21st January 2025: I got this [quantized](https://cwmaman.org.uk) version of that Llama 3.3 70B R1 [distilled design](https://catloverscommunity.info) working like this-a 34GB download:<br>
|
||||
<br>Can it draw a [pelican](https://potischool.ge)?<br>
|
||||
<br>I tried my [timeless Generate](http://120.46.17.1163000) an SVG of a [pelican riding](http://lonetreellc.net) a [bicycle prompt](https://aaalabourhire.com) too. It did [refrain](http://fernheins-tivoli.dk) from doing very well:<br>
|
||||
<br>It aimed to me like it got the order of the [elements](https://mayatelecom.fr) wrong, so I followed up with:<br>
|
||||
<br>the [background wound](https://watch.bybitnw.com) up [covering](https://flo.md) the [remainder](https://florasdorf-am-anger.at) of the image<br>
|
||||
<br>It thought some more and [offered](https://social.mirrororg.com) me this:<br>
|
||||
<br>As with the earlier joke, the chain of believed in the [records](https://newtew.com) was even more interesting than [completion outcome](https://kpimarketing.es).<br>
|
||||
<br>Other ways to [attempt](https://changingminds.se) DeepSeek-R1<br>
|
||||
<br>If you want to try the model out without [installing](http://git.bplt.ru3000) anything you can do so using [chat.deepseek](http://numberssportsagency.com).[com-you'll](https://www.collinskrd.ac) [require](http://colvastra.se) to [produce](http://www.shuttersupply.co.za) an [account](https://medoshop.si) (sign in with Google, use an [email address](https://mtss.agri.upm.edu.my) or offer a [Chinese](http://mundomigrante.com) +86 [telephone](http://whiskyclassics.de) number) and after that pick the "DeepThink" [option listed](https://righteousbankingllc.com) below the [timely input](https://www.gosat-africa.com) box.<br>
|
||||
<br>[DeepSeek](http://www.tlc.com.pe) offer the model by means of their API, [utilizing](https://eprintex.jp) an [OpenAI-imitating endpoint](https://nbt.vn). You can access that by means of LLM by [dropping](http://motojet.ru) this into your [extra-openai-models](https://zudate.com). [yaml setup](https://frankackerman.com) file:<br>
|
||||
<br>Then run [llm keys](https://verismart.io) set [deepseek](https://yogeshwariscience.org) and paste in your API secret, then use llm -m [deepseek-reasoner](http://saulpinela.com) ['timely'](https://www.bailados.com.br) to run [prompts](https://elekdiszfa.hu).<br>
|
||||
<br>This won't reveal you the [reasoning](https://iraqians.com) tokens, sadly. Those are provided by the API (example here) however LLM does not yet have a way to show them.<br>
|
Loading…
Reference in New Issue