Activity
Mon
Wed
Fri
Sun
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
What is this?
Less
More

Memberships

Data Alchemy

38k members • Free

4 contributions to Data Alchemy
Reinforcement Learning: An Overview
This manuscript gives a big-picture, up-to-date overview of the field of (deep) reinforcement learning and sequential decision making, covering value-based methods, policy-based methods, model-based methods, multi-agent RL, LLMs and RL, and various other topics (e.g., offline RL, hierarchical RL, intrinsic reward). https://arxiv.org/abs/2412.05265
2 likes • Jun 21
Nice. I am learning DeekSeek's reinforcement learning.
2× RTX 4090 (24GB each) OR 1× RTX 6000 Ada (48GB)
Hi all, which GPU would you recommend? I am building a multi-agent on local with multiple LLM Models. I can start with small open source models but i want to build a machine enough for 70B GGUF or 33B full precision in the future. Does anyone consider this type of hardware? Or you use all on Cloud Services? Any suggestion appreciated.
1 like • Jun 18
Thanks. I have decided to use a cloud service with A100s GPU option. Just to run my code faster with reinforcement learning.
Deepseek R1 Released
An open source model that compares to Open AI's o1 reasoning model on several benchmarks. The research paper: DeepSeek_R1.pdf. https://github.com/deepseek-ai/DeepSeek-R1 """ We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. """
1 like • Jan 27
This is a Chinese one. Hackers are checking how much it is censored by their gov for certain topics.
what should i choose
i am on journey in cybersecurity and i do wanna learn ai as well right now at this point i dont know which one should (ai or cybersecurity).can any suggest please
2 likes • Feb '24
I am on similar journey - I try to combine them. Honestly, my interest started in Security, and then come to realize that ML would help my job (better queries for threat hunting with LLM and RAG). But there are many data engineers that are involved in security product engineering world, without having cybersecurity background. Believe in your instinct and passion.
2 likes • Mar '24
@Melvin Jones Sorry for being late. Good question, basically was trial and error with self-taught approach, following big trends like Big Data, Auto Drive, Neural Network, etc. Using the available tools, I have been trying to create some tools for security. With Generative AI, I am looking for an effective way to become a master on this subject. Really looking forward to this community and program providing a good insight to become the one.
1-4 of 4
Kiyoshi Watanabe
2
10points to level up
@kiyoshi-watanabe-4533
Hi, I am Kiyoshi. Looking forward to connecting to many people here. I am interested in LLM + Knowlege graph implementation.

Active 162d ago
Joined Feb 10, 2024
Powered by