Red Pajama 2: The Public Dataset With a Whopping 30 Trillion Tokens
By A Mystery Man Writer
Description
Together, the developer, claims it is the largest public dataset specifically for language model pre-training
![Red Pajama 2: The Public Dataset With a Whopping 30 Trillion Tokens](https://eu-images.contentstack.com/v3/assets/blt6b0f74e5591baa03/blt27a3c281737dc8db/65786d651f8f75040a0dbed5/YouTube_Thumbnail_Template_(8).jpg)
Data science recent news
![Red Pajama 2: The Public Dataset With a Whopping 30 Trillion Tokens](https://eu-images.contentstack.com/v3/assets/blt6b0f74e5591baa03/bltd22cf8cd3226e8e2/65e8c020033ab0040af9ee27/News_Image_-_2024-03-06T131222.371.png)
Leaderboard: OpenAI's GPT-4 Has Lowest Hallucination Rate
![Red Pajama 2: The Public Dataset With a Whopping 30 Trillion Tokens](https://s10251.pcdn.co/wp-content/uploads/2023/04/2023-Alan-D-Thompson-Datasets-Simple-Rev-1b.png)
GPT-4 – Dr Alan D. Thompson – Life Architect
![Red Pajama 2: The Public Dataset With a Whopping 30 Trillion Tokens](https://heise.cloudimg.io/v7/_www-heise-de_/imgs/18/4/1/4/5/8/3/2/Screenshot_2023-04-19_132035-b1ba8ff7081a688c.jpg?force_format=avif%2Cwebp%2Cjpeg&org_if_sml=1&q=70&width=1019)
LLaMA clone: RedPajama – first open-source decentralized AI with open dataset
Ahead of AI #8: The Latest Open Source LLMs and Datasets
![Red Pajama 2: The Public Dataset With a Whopping 30 Trillion Tokens](https://ar5iv.labs.arxiv.org/html/2311.17035/assets/x2.png)
2311.17035] Scalable Extraction of Training Data from (Production) Language Models
![Red Pajama 2: The Public Dataset With a Whopping 30 Trillion Tokens](https://www.kdnuggets.com/wp-content/uploads/wijaya_redpajama_project_opensource_initiative_democratizing_llms_1.png)
RedPajama Project: An Open-Source Initiative to Democratizing LLMs - KDnuggets
![Red Pajama 2: The Public Dataset With a Whopping 30 Trillion Tokens](https://s10251.pcdn.co/wp-content/uploads/2023/03/2023-Alan-D-Thompson-GPT-4-One-Pager-Rev-0b.png)
GPT-4 – Dr Alan D. Thompson – Life Architect
![Red Pajama 2: The Public Dataset With a Whopping 30 Trillion Tokens](https://eu-images.contentstack.com/v3/assets/blt6b0f74e5591baa03/blt140c831d213cdaf2/658b0a6570feb2040a5e1e46/News_Image_-_2023-12-26T111603.036.png?width=700&auto=webp&quality=10&disable=upscale&blur=40)
ChatGPT / Generative AI recent news, page 3 of 19
![Red Pajama 2: The Public Dataset With a Whopping 30 Trillion Tokens](https://miro.medium.com/v2/resize:fit:1358/0*djaXj7-t11YhnEha.png)
RedPajama Reproducing LLaMA🦙 Dataset on 1.2 Trillion Tokens, by Angelina Yang
![Red Pajama 2: The Public Dataset With a Whopping 30 Trillion Tokens](http://www.marktechpost.com/wp-content/uploads/2024/03/Screenshot-2024-03-03-at-8.28.39-AM.png)
Together AI Releases RedPajama v2: An Open Dataset with 30 Trillion Tokens for Training Large Language Models - MarkTechPost
![Red Pajama 2: The Public Dataset With a Whopping 30 Trillion Tokens](https://eu-images.contentstack.com/v3/assets/blt6b0f74e5591baa03/blt288fa3679d8651a7/658c7452fb1fbd040a1d48ce/News_Image_-_2023-12-27T130018.614.png?width=700&auto=webp&quality=80&disable=upscale)
ChatGPT / Generative AI recent news, page 3 of 19
from
per adult (price varies by group size)