☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 12 days agoDeepSeek open source DeepEP – library for MoE training and Inferenceplus-squaregithub.comexternal-linkmessage-square0fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkDeepSeek open source DeepEP – library for MoE training and Inferenceplus-squaregithub.com☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 12 days agomessage-square0fedilink
☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 18 days agoTowards Monosemanticity: Decomposing Language Models With Dictionary Learningtransformer-circuits.pubexternal-linkmessage-square0fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkTowards Monosemanticity: Decomposing Language Models With Dictionary Learningtransformer-circuits.pub☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 18 days agomessage-square0fedilink
☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 18 days agoScaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnetplus-squaretransformer-circuits.pubexternal-linkmessage-square0fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkScaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnetplus-squaretransformer-circuits.pub☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 18 days agomessage-square0fedilink
☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 1 month agoDeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learningarxiv.orgexternal-linkmessage-square0fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkDeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learningarxiv.org☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 1 month agomessage-square0fedilink
☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 3 months agoNeurosymbolic AI -- Why, What, and Howplus-squarearxiv.orgexternal-linkmessage-square0fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkNeurosymbolic AI -- Why, What, and Howplus-squarearxiv.org☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 3 months agomessage-square0fedilink
☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 3 months agoClassical Sorting Algorithms as a Model of Morphogenesis: self-sorting arrays reveal unexpected competencies in a minimal model of basal intelligenceplus-squarearxiv.orgexternal-linkmessage-square0fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkClassical Sorting Algorithms as a Model of Morphogenesis: self-sorting arrays reveal unexpected competencies in a minimal model of basal intelligenceplus-squarearxiv.org☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 3 months agomessage-square0fedilink
☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 3 months agoGenie 2: A large-scale foundation world modelplus-squaredeepmind.googleexternal-linkmessage-square0fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkGenie 2: A large-scale foundation world modelplus-squaredeepmind.google☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 3 months agomessage-square0fedilink
☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 4 months agoA good primer on what to expect running local LLMsplus-squarenullprogram.comexternal-linkmessage-square0fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkA good primer on what to expect running local LLMsplus-squarenullprogram.com☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 4 months agomessage-square0fedilink
Shamar@feddit.itEnglish · edit-24 months agoA community statement supporting the Open Source Definition (OSD)plus-squareosd.fyiexternal-linkmessage-square0fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkA community statement supporting the Open Source Definition (OSD)plus-squareosd.fyiShamar@feddit.itEnglish · edit-24 months agomessage-square0fedilink
☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 6 months agoHow ‘Embeddings’ Encode What Words Meanplus-squarewww.quantamagazine.orgexternal-linkmessage-square0fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkHow ‘Embeddings’ Encode What Words Meanplus-squarewww.quantamagazine.org☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 6 months agomessage-square0fedilink
☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 6 months agoNew AI model “learns” how to simulate Super Mario Bros. from video footageplus-squarearstechnica.comexternal-linkmessage-square0fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkNew AI model “learns” how to simulate Super Mario Bros. from video footageplus-squarearstechnica.com☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 6 months agomessage-square0fedilink
☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 6 months agoReflection 70B holds its own against even the top closed-source models (Claude 3.5 Sonnet, GPT-4o)plus-squarehuggingface.coexternal-linkmessage-square0fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkReflection 70B holds its own against even the top closed-source models (Claude 3.5 Sonnet, GPT-4o)plus-squarehuggingface.co☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 6 months agomessage-square0fedilink
☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 6 months agoIt’s Not Intelligent If It Always Halts: A Critical Perspective on Current Approaches to AGIplus-squarewww.lifeiscomputation.comexternal-linkmessage-square0fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkIt’s Not Intelligent If It Always Halts: A Critical Perspective on Current Approaches to AGIplus-squarewww.lifeiscomputation.com☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 6 months agomessage-square0fedilink
☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 6 months agoThe Difference Between Speaking and Thinkingplus-squarewww.theatlantic.comexternal-linkmessage-square0fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkThe Difference Between Speaking and Thinkingplus-squarewww.theatlantic.com☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 6 months agomessage-square0fedilink
☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 6 months agoDiffusion Models Are Real-Time Game Enginesplus-squaregamengen.github.ioexternal-linkmessage-square0fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkDiffusion Models Are Real-Time Game Enginesplus-squaregamengen.github.io☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 6 months agomessage-square0fedilink
☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 7 months agoLiger Kernel is a collection of Triton kernels designed specifically for LLM training. It can effectively increase multi-GPU training throughput by 20% and reduces memory usage by 60%.plus-squaregithub.comexternal-linkmessage-square0fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkLiger Kernel is a collection of Triton kernels designed specifically for LLM training. It can effectively increase multi-GPU training throughput by 20% and reduces memory usage by 60%.plus-squaregithub.com☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 7 months agomessage-square0fedilink
☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 7 months agoTransformer Explainerplus-squarepoloclub.github.ioexternal-linkmessage-square0fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkTransformer Explainerplus-squarepoloclub.github.io☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 7 months agomessage-square0fedilink
☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 7 months agoAlibaba claims no. 1 spot in AI math models with Qwen2-Mathplus-squareventurebeat.comexternal-linkmessage-square0fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkAlibaba claims no. 1 spot in AI math models with Qwen2-Mathplus-squareventurebeat.com☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 7 months agomessage-square0fedilink
yboutros@infosec.pubEnglish · 7 months agoHow to convert a positionally encoded predicted embedding from a decoder to its matching token?plus-squaremessage-squaremessage-square0fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1message-squareHow to convert a positionally encoded predicted embedding from a decoder to its matching token?plus-squareyboutros@infosec.pubEnglish · 7 months agomessage-square0fedilink
☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 7 months agoNew Open-Source AI Image Generator Beats Midjourney, SD3 and Auraflowplus-squaredecrypt.coexternal-linkmessage-square0fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkNew Open-Source AI Image Generator Beats Midjourney, SD3 and Auraflowplus-squaredecrypt.co☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 7 months agomessage-square0fedilink