Shared conversation
Claude-3-Opus ยท
EvoLLM-JP Sakana AI https://huggingface.co/SakanaAI/EvoLLM-JP-v1-10B 10 800 80:1 0.3 ๐ ๐โฌ ๐ธ ๐ Mar/2024 ๐ข https://arxiv.org/abs/2403.13187
Parakeet Independent https://colab.research.google.com/drive/1gI8CM9Bz9ov0-E6aL2jF808rE56UtZyF?usp=sharing 0.378 3 8:1 0.0 ๐ ๐โฌ ๐ธ ๐ Mar/2024 ๐ข https://news.ycombinator.com/item?id=39745700#39745702
RWKV-v5 EagleX RWKV https://huggingface.co/recursal/EagleX_1-7T 7.52 1700 227:1 0.4 ๐ ๐โฌ ๐ธ ๐ Mar/2024 ๐ข https://substack.recursal.ai/p/eaglex-17t-soaring-past-llama-7b
MM1 Apple 30 2010 67:1 0.8 ๐ Mar/2024 ๐ด https://arxiv.org/abs/2403.09611
RFM-1 Covariant https://vimeo.com/921866765 8 160 20:1 0.1 ๐ ๐โฌ ๐ธ ๐ Mar/2024 ๐ก https://covariant.ai/insights/introducing-rfm-1-giving-robots-human-like-reasoning-capabilities/
Command-R Cohere Cohere 35 700 20:1 0.5 ๐ ๐ธ Mar/2024 ๐ข https://txt.cohere.com/command-r/
DeepSeek-VL DeepSeek-AI https://github.com/deepseek-ai/DeepSeek-VL?tab=readme-ov-file 7 2000 286:1 0.4 ๐ ๐โฌ ๐ธ ๐ Mar/2024 ๐ข https://arxiv.org/abs/2403.05525
AnyGPT Fudan University https://junzhan2000.github.io/AnyGPT.github.io/ 7 2000 286:1 0.4 ๐ ๐โฌ ๐ธ ๐ Mar/2024 ๐ข https://arxiv.org/abs/2402.12226
Stable Beluga 2.5 Stability AI 70 2000 29:1 1.2 ๐ ๐โฌ ๐ธ ๐ Mar/2024 ๐ข https://stability.ai/news/putting-the-ai-supercomputer-to-work
Inflection-2.5 Inflection AI https://inflection.ai/inflection-2 1200 20000 17:1 16.3 ๐ ๐ โฌ ๐ธ Mar/2024 ๐ข https://inflection.ai/inflection-2-5
Apollo SRIBD/CUHK https://apollo.llmzoo.com/ 7 2500 358:1 0.4 ๐ ๐๐ธ ๐ Mar/2024 ๐ข https://arxiv.org/abs/2403.03640
Claude 3 Opus Anthropic https://claude.ai/ 2000 40000 20:1 29.8 ๐ ๐โฌ ๐ธ ๐ Mar/2024 ๐ข https://www.anthropic.com/claude-3-model-card
Samba-1 SambaNova https://trysambanova.ai/ 1400 20000 15:1 17.6 ๐ Feb/2024 ๐ก https://sambanova.ai/press/secure-one-trillion-parameter-generative-ai-model-for-the-enterprise
StarCoder 2 HF/ServiceNow 15 4300 287:1 0.8 ๐ Feb/2024 ๐ข https://arxiv.org/abs/2402.19173
530B ByteDance 530 300 1:1 1.3 ๐ ๐โฌ ๐ธ ๐ Feb/2024 ๐ด https://arxiv.org/abs/2402.15627
175B ByteDance 175 300 2:1 0.8 ๐ ๐โฌ ๐ธ ๐ Feb/2024 ๐ด https://arxiv.org/abs/2402.15627
Mistral Small Mistral AI https://chat.mistral.ai/chat 7 3000 429:1 0.5 ๐ ๐โฌ ๐ธ ๐ Feb/2024 ๐ข https://mistral.ai/news/mistral-large/
Mistral Large Mistral AI https://poe.com/Mistral-Large 540 11000 21:1 8.1 ๐ ๐โฌ ๐ธ ๐ Feb/2024 ๐ข https://mistral.ai/news/mistral-large/
Hanooman Reliance 40 ๐ Feb/2024 ๐ข https://www.hanooman.ai/
Ask Apple 20 ๐ Feb/2024 ๐ด https://www.macrumors.com/2024/02/22/applecare-advisors-testing-new-ask-tool/
Reka Edge Reka AI https://chat.reka.ai/ 7 ๐ ๐โฌ ๐ธ ๐ Feb/2024 ๐ข https://reka.ai/reka-flash-an-efficient-and-capable-multimodal-language-model/
Reka Flash Reka AI https://poe.com/RekaFlash 21 ๐ ๐โฌ ๐ธ ๐ Feb/2024 ๐ข https://reka.ai/reka-flash-an-efficient-and-capable-multimodal-language-model/
Gemma Google DeepMind https://labs.pplx.ai/ 7 6000 858:1 0.7 ๐ ๐โฌ ๐ธ ๐ Feb/2024 ๐ข https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf
Gemini 1.5 Pro Google DeepMind https://aistudio.google.com/app/waitlist/97445851 1500 30000 20:1 22.4 ๐ ๐โฌ ๐ธ ๐ Feb/2024 ๐ก https://storage.googleapis.com/deepmind-media/gemini/gemini_v1_5_report.pdf
Qwen-1.5 Alibaba https://huggingface.co/spaces/Qwen/Qwen1.5-72B-Chat 72 3000 42:1 1.5 ๐ ๐โฌ ๐ธ ๐ Feb/2024 ๐ข https://qwenlm.github.io/blog/qwen1.5/
GOODY-2 BRAIN https://www.goody2.ai/chat ๐ Feb/2024 ๐ข https://www.goody2.ai/goody2-modelcard.pdf
Natural-SQL-7B ChatDB 7 2000 286:1 0.4 ๐ Feb/2024 ๐ข https://huggingface.co/chatdb/natural-sql-7b
Sea-Lion AI Singapore https://aisingapore.org/aiproducts/sea-lion/ 7.5 980 131:1 0.3 ๐ ๐โฌ ๐ธ ๐ Feb/2024 ๐ข https://huggingface.co/aisingapore/sealion7b
TimesFM Google 0.2 100 500:1 0.0 ๐ ๐ Feb/2024 ๐ด https://blog.research.google/2024/02/a-decoder-only-foundation-model-for.html
OLMo Allen AI https://huggingface.co/allenai/OLMo-7B 7 2500 358:1 0.4 ๐ ๐โฌ ๐ธ ๐ Feb/2024 ๐ข https://allenai.org/olmo/olmo-paper.pdf
FLOR-6.3B Cerebras https://huggingface.co/projecte-aina/FLOR-6.3B 6.3 481 77:1 0.2 ๐ ๐โฌ ๐ธ ๐ Jan/2024 ๐ข https://www.cerebras.net/press-release/cerebras-systems-and-barcelona-supercomputing-center-train-industry-leading-multilingual-spanish-catalan-english-llm
Weaver AIWaves.cn https://www.wawawriter.com/ 34 2018 60:1 0.9 ๐ Jan/2024 ๐ข https://arxiv.org/abs/2401.17268
miqu 70b Mistral AI https://huggingface.co/miqudev/miqu-1-70b 70 3000 43:1 1.5 ๐ ๐โฌ ๐ธ ๐ Jan/2024 ๐ข https://huggingface.co/miqudev/miqu-1-70b
iFlytekSpark-13B iFlyTek https://gitee.com/iflytekopensource/iFlytekSpark-13B 13 3000 231:1 0.7 ๐ ๐โฌ ๐ธ ๐ Jan/2024 ๐ข https://www.ithome.com/0/748/030.htm
Xinghuo 3.5 (Spark) iFlyTek 200 4000 20:1 3.0 ๐ ๐โฌ ๐ธ ๐ Jan/2024 ๐ข https://www.laitimes.com/en/article/6f50u_6vhbm.html
MGIE Apple https://github.com/tsujuifu/pytorch_mgie 7 2000 286:1 0.4 ๐ ๐โฌ ๐ธ ๐ Jan/2024 ๐ข https://openreview.net/forum?id=S1RKWSyZ2Y
CodeLlama-70B Meta AI https://huggingface.co/codellama/CodeLlama-70b-hf 70 2000 29:1 1.2 ๐ ๐โฌ ๐ธ ๐ Jan/2024 ๐ข https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/
RWKV-v5 Eagle 7B RWKV https://huggingface.co/spaces/BlinkDL/RWKV-Gradio-2 7.52 1100 147:1 0.3 ๐ ๐โฌ ๐ธ ๐ Jan/2024 ๐ข https://blog.rwkv.com/p/eagle-7b-soaring-past-transformers
MaLA-500 LMU https://huggingface.co/MaLA-LM/mala-500 10 2000 200:1 0.5 ๐ ๐โฌ ๐ธ ๐ Jan/2024 ๐ข https://arxiv.org/abs/2401.13303
MambaByte Cornell https://github.com/kyegomez/MambaByte 0.972 37.5 39:1 0.0 ๐๐ Jan/2024 ๐ด https://arxiv.org/abs/2401.13660
DeepSeek-Coder DeepSeek-AI https://coder.deepseek.com/ 33 2000 61:1 0.9 ๐ ๐โฌ ๐ธ ๐ Jan/2024 ๐ข https://arxiv.org/abs/2401.14196
FuseLLM Tencent https://github.com/fanqiwan/FuseLLM 7 2000 286:1 0.4 ๐ ๐โฌ ๐ธ ๐ Jan/2024 ๐ข https://arxiv.org/abs/2401.10491
Fuyu-Heavy Adept 120 5000 42:1 2.6 ๐ ๐โฌ ๐ธ ๐ Jan/2024 ๐ก https://www.adept.ai/blog/adept-fuyu-heavy
GLM-4 Zhipu AI (Tsinghua) https://open.bigmodel.cn/ 200 4000 20:1 3.0 ๐ ๐โฌ ๐ธ ๐ Jan/2024 ๐ข https://pandaily.com/zhipu-ai-unveils-glm-4-model-with-advanced-performance-paralleling-gpt-4/
DeepSeekMoE DeepSeek-AI 16 2000 125:1 0.6 ๐ ๐โฌ ๐ธ ๐ Jan/2024 ๐ด https://arxiv.org/abs/2401.06066
DeepSeek DeepSeek-AI https://chat.deepseek.com/ 67 2000 30:1 1.2 ๐ ๐โฌ ๐ธ ๐ Jan/2024 ๐ข https://arxiv.org/abs/2401.02954
LLaMA Pro Tencent https://huggingface.co/TencentARC/LLaMA-Pro-8B 8.3 2080 251:1 0.4 ๐ ๐โฌ ๐ธ ๐ Jan/2024 ๐ข https://arxiv.org/abs/2401.02415
TinyLlama SUTD/Independent https://github.com/jzhang38/TinyLlama 1.1 3000 2,728:1 0.2 ๐ ๐โฌ ๐ธ ๐ Jan/2024 ๐ข https://arxiv.org/abs/2401.02385
DocLLM JPMorgan 7 2000 286:1 0.4 ๐ ๐โฌ ๐ธ ๐ Jan/2024 ๐ด https://arxiv.org/abs/2401.00908
====
this is a dump of llms released in q1 2024. i need this as a comma item list with each model name linked to the paper. for example, the output should be (with markdown linking the model name to the paper):
JPMorgan DocLLM 7B, SUTD TinyLlama 1.1B,
Here is the comma-separated list with each model name linked to the respective paper or source using Markdown:
EvoLLM-JP 10B, Parakeet 378M, RWKV-v5 EagleX 1.7T, Apple MM1, Covariant RFM-1 8B, Cohere Command-R 35B, DeepSeek-VL 7B, Fudan University AnyGPT 7B, Stability AI Stable Beluga 2.5 70B, Inflection AI Inflection-2.5 1.2T, SRIBD/CUHK Apollo 7B, Anthropic Claude 3 Opus, SambaNova Samba-1 1.4T, HF/ServiceNow StarCoder 2 15B, ByteDance 530B, ByteDance 175B, Mistral AI Mistral Small 7B, Mistral AI Mistral Large 540B, Reliance Hanooman 40B, Apple Ask 20B, Reka AI Reka Edge 7B, Reka AI Reka Flash 21B, Google DeepMind Gemma 7B, Google DeepMind Gemini 1.5 Pro 1.5T, Alibaba Qwen-1.5 72B, BRAIN GOODY-2, ChatDB Natural-SQL-7B, AI Singapore Sea-Lion 7.5B, Google TimesFM 200M, Allen AI OLMo 7B, Cerebras FLOR-6.3B, AIWaves.cn Weaver 34B, Mistral AI miqu 70b, iFlyTek iFlytekSpark-13B, iFlyTek Xinghuo 3.5 (Spark) 200B, Apple MGIE 7B, Meta AI CodeLlama-70B, RWKV RWKV-v5 Eagle 7B, LMU MaLA-500 10B, Cornell MambaByte 972M, DeepSeek-AI DeepSeek-Coder 33B, Tencent FuseLLM 7B, Adept Fuyu-Heavy 120B, Zhipu AI (Tsinghua) GLM-4 200B, DeepSeek-AI DeepSeekMoE 16B, DeepSeek-AI DeepSeek 67B, Tencent LLaMA Pro 8.3B, SUTD/Independent TinyLlama 1.1B, JPMorgan DocLLM 7B
thanks. can you add month headings in bold for jan/feb/mar, you'll have to reverse the order. can you also only apply the link to the official model name, not the lab and not the param count
March 2024
EvoLLM-JP Sakana AI 10B, Parakeet Independent 378M, RWKV-v5 EagleX RWKV 1.7T, MM1 Apple, RFM-1 Covariant 8B, Command-R Cohere 35B, DeepSeek-VL DeepSeek-AI 7B, AnyGPT Fudan University 7B, Stable Beluga 2.5 Stability AI 70B, Inflection-2.5 Inflection AI 1.2T, Apollo SRIBD/CUHK 7B, Claude 3 Opus Anthropic 2T
February 2024
Samba-1 SambaNova 1.4T, StarCoder 2 HF/ServiceNow 15B, 530B ByteDance, 175B ByteDance, Mistral Small Mistral AI 7B, Mistral Large Mistral AI 540B, Hanooman Reliance 40B, Ask Apple 20B, Reka Edge Reka AI 7B, Reka Flash Reka AI 21B, Gemma Google DeepMind 7B, Gemini 1.5 Pro Google DeepMind 1.5T, Qwen-1.5 Alibaba 72B, GOODY-2 BRAIN, Natural-SQL-7B ChatDB 7B, Sea-Lion AI Singapore 7.5B, TimesFM Google 200M, OLMo Allen AI 7B, FLOR-6.3B Cerebras 6.3B
January 2024
Weaver AIWaves.cn 34B, miqu 70b Mistral AI 70B, iFlytekSpark-13B iFlyTek 13B, Xinghuo 3.5 (Spark) iFlyTek 200B, MGIE Apple 7B, CodeLlama-70B Meta AI 70B, RWKV-v5 Eagle 7B RWKV 7.52B, MaLA-500 LMU 10B, MambaByte Cornell 972M, DeepSeek-Coder DeepSeek-AI 33B, FuseLLM Tencent 7B, Fuyu-Heavy Adept 120B, GLM-4 Zhipu AI (Tsinghua) 200B, DeepSeekMoE DeepSeek-AI 16B, DeepSeek DeepSeek-AI 67B, LLaMA Pro Tencent 8.3B, TinyLlama SUTD/Independent 1.1B, DocLLM JPMorgan 7B
that doesn't look right. the link should be on the model name only. and the months should be ascending, as well as the model releases
January 2024
DocLLM JPMorgan 7B, TinyLlama SUTD/Independent 1.1B, LLaMA Pro Tencent 8.3B, DeepSeek DeepSeek-AI 67B, DeepSeekMoE DeepSeek-AI 16B, GLM-4 Zhipu AI (Tsinghua) 200B, Fuyu-Heavy Adept 120B, FuseLLM Tencent 7B, DeepSeek-Coder DeepSeek-AI 33B, MambaByte Cornell 972M, MaLA-500 LMU 10B, RWKV-v5 Eagle 7B RWKV 7.52B, CodeLlama-70B Meta AI 70B, MGIE Apple 7B, Xinghuo 3.5 (Spark) iFlyTek 200B, iFlytekSpark-13B iFlyTek 13B, miqu 70b Mistral AI 70B, Weaver AIWaves.cn 34B
February 2024
FLOR-6.3B Cerebras 6.3B, OLMo Allen AI 7B, TimesFM Google 200M, Sea-Lion AI Singapore 7.5B, Natural-SQL-7B ChatDB 7B, GOODY-2 BRAIN, Qwen-1.5 Alibaba 72B, Gemini 1.5 Pro Google DeepMind 1.5T, Gemma Google DeepMind 7B, Reka Flash Reka AI 21B, Reka Edge Reka AI 7B, Ask Apple 20B, Hanooman Reliance 40B, Mistral Large Mistral AI 540B, Mistral Small Mistral AI 7B, 175B ByteDance, 530B ByteDance, StarCoder 2 HF/ServiceNow 15B, Samba-1 SambaNova 1.4T
March 2024
Claude 3 Opus Anthropic 2T, Apollo SRIBD/CUHK 7B, Inflection-2.5 Inflection AI 1.2T, Stable Beluga 2.5 Stability AI 70B, AnyGPT Fudan University 7B, DeepSeek-VL DeepSeek-AI 7B, Command-R Cohere 35B, RFM-1 Covariant 8B, MM1 Apple, RWKV-v5 EagleX RWKV 1.7T, Parakeet Independent 378M, EvoLLM-JP Sakana AI 10B
that doesn't look right. in the first example, JPMorgan is the lab, so that name comes first before the model name. don't repeat parameter count if it's already in the name
January 2024
JPMorgan DocLLM 7B, SUTD/Independent TinyLlama 1.1B, Tencent LLaMA Pro 8.3B, DeepSeek-AI DeepSeek 67B, DeepSeek-AI DeepSeekMoE 16B, Zhipu AI (Tsinghua) GLM-4 200B, Adept Fuyu-Heavy 120B, Tencent FuseLLM 7B, DeepSeek-AI DeepSeek-Coder 33B, Cornell MambaByte 972M, LMU MaLA-500 10B, RWKV RWKV-v5 Eagle 7B, Meta AI CodeLlama-70B, Apple MGIE 7B, iFlyTek Xinghuo 3.5 (Spark) 200B, iFlyTek iFlytekSpark-13B, Mistral AI miqu 70b, AIWaves.cn Weaver 34B
February 2024
Cerebras FLOR-6.3B, Allen AI OLMo 7B, Google TimesFM 200M, AI Singapore Sea-Lion 7.5B, ChatDB Natural-SQL-7B, BRAIN GOODY-2, Alibaba Qwen-1.5 72B, Google DeepMind Gemini 1.5 Pro 1.5T, Google DeepMind Gemma 7B, Reka AI Reka Flash 21B, Reka AI Reka Edge 7B, Apple Ask 20B, Reliance Hanooman 40B, Mistral AI Mistral Large 540B, Mistral AI Mistral Small 7B, ByteDance 175B, ByteDance 530B, HF/ServiceNow StarCoder 2 15B, SambaNova Samba-1 1.4T
March 2024
Anthropic Claude 3 Opus 2T, SRIBD/CUHK Apollo 7B, Inflection AI Inflection-2.5 1.2T, Stability AI Stable Beluga 2.5 70B, Fudan University AnyGPT 7B, DeepSeek-AI DeepSeek-VL 7B, Cohere Command-R 35B, Covariant RFM-1 8B, Apple MM1, RWKV RWKV-v5 EagleX 1.7T, Independent Parakeet 378M, Sakana AI EvoLLM-JP 10B
This conversation may have been edited by the sharer.