User
Write something
Pinned
n8n training files
n8n training files for Claude Project. Please join this Skool prior to downloading...
Pinned
welcome to the new Burstiness and Perplexity community
Our mission is to create a true learning community where an exploration of AI, tools, agents and use cases can merge with thoughtful conversations about implications and fundamental ideas. If you are joining, please consider engaging, not just lurking.Tell us about yourself and where you are in life journey and how tech and AI intersect it. for updates on research, models, and use cases, click on the Classrooms tab and then find the Bleeding Edge Classroom
Evaluating the DeepSeek tech stack with a critical eye
We've obtained and evaluated a pre-print DeepSeek Technical Report.... DeepSeek-V3: Core Contributions and Characteristics This report details DeepSeek-V3, a Mixture-of-Experts (MoE) language model with a total of 671 billion parameters, where 37 billion are activated for each token. The model's design prioritizes efficient inference and cost-effective training, incorporating specific architectural components and training strategies. Architectural Innovations: - Multi-head Latent Attention (MLA): DeepSeek-V3 utilizes MLA, which aims to reduce Key-Value (KV) cache during inference through a low-rank compression for attention keys and values. This technique involves compressing the latent vectors for queries, keys, and values, which can be cached during inference. The caching significantly reduces the memory footprint while maintaining performance. - DeepSeekMoE with Auxiliary-Loss-Free Load Balancing: The model employs the DeepSeekMoE architecture, using finer-grained experts and isolating some as shared. It introduces an auxiliary-loss-free load balancing strategy to minimize performance degradation caused by imbalanced expert load, which occurs with MoE training. This strategy avoids using conventional auxiliary losses and instead employs a dynamic bias term added to affinity scores to distribute the load. There is also a sequence-wise auxiliary loss to prevent imabalance within a single sequence. - Multi-Token Prediction (MTP): The model incorporates a multi-token prediction objective, extending the prediction scope to multiple future tokens at each position. The implementation uses sequential modules to predict additional tokens and keeps the causal chain at each prediction depth. During inference, MTP modules can be discarded to function normally, or used to improve latency via speculative decoding. Infrastructure and Training Framework: - Compute Infrastructure: DeepSeek-V3 was trained on a cluster equipped with 2048 NVIDIA H800 GPUs. GPUs are connected by NVLink within nodes and by InfiniBand (IB) across nodes. - DualPipe Algorithm: A pipeline parallelism method named DualPipe is used, which overlaps the computation and communication across forward and backward passes. This method divides the computation into components and rearranges them with manual adjustment to ensure that communication is hidden during execution. - Cross-Node All-to-All Communication: The authors implement custom kernels for cross-node all-to-all communication, leveraging IB and NVLink. A node-limited routing mechanism limits the number of receiving nodes for each token, using only 20 SMs to implement all-to-all communication. - Memory Saving Techniques: Several methods are employed to reduce memory usage, including recomputing RMSNorm and MLA up-projections during back-propagation, storing the exponential moving average (EMA) of model parameters in CPU memory, and sharing the embedding and output head between modules. - FP8 Training: The model leverages a fine-grained mixed precision framework using the FP8 data format to accelerate training and reduce GPU memory usage. Techniques are introduced to ensure high precision, including a tile-wise or block-wise quantization strategy to handle feature outliers, and a promotion of GEMM operations to CUDA cores. Also, they retain FP32 and BF16 for key components of the architecture.
It’s linkbait, but purdy cool nonetheless
a technically rich guide covering core LLM concepts, architectures, training paradigms, fine-tuning strategies, and deployment challenges. It is a comprehensive preparation tool for AI/ML interviews and LLM-based technical assessments, suitable for both professionals and advanced learners. But really, it’s just a good primer-level descriptive doc.
1
0
The BS is starting to pile up
Beginning to see increasingly granular non-sense about AI Overviews or ChatGPT responses. I’ve analysed over 10,000 ChatGPT results using QA (quantitative analysis) tools, and read countless papers— particularly in the AML space and the adversarial information retrieval space. That’s given me an (perhaps slightly warped) intuitive sense of response dynamics. But let us allow ChatGPT weigh in: I do not directly "consider" Bing reviews in the way a human would—that is, I don't passively browse them or have awareness of specific reviews unless you provide them or explicitly ask for them to be included in a response. However, I can use Bing reviews indirectly when necessary through a specific mechanism: ✅ Mechanism: Access via Web Tool (Search Plugin) When enabled, I can perform real-time web searches using a tool that retrieves data from Bing and other indexed sources. If you ask me something like: “What are people saying about [business/product] on Bing?” “Compare Bing reviews with Google reviews for [location].” —then I can query Bing's indexed content using that tool. When I do this: Search is initiated with a query tailored to your request (e.g., "Bing reviews for La Mar SF"). The search result summaries are returned to me. I extract, summarize, or quote what’s relevant to your prompt, including review scores, quoted user feedback, and sentiment trends. The data I pull in this way is not live-streamed or permanently cached—it’s used in the moment for that response only. ❌ What I Don’t Do: I don’t crawl Bing.com or any review site in the background autonomously. I don’t weight Bing reviews differently from others unless you instruct me to. I don’t store or favor any one platform's data in my model’s memory or reasoning—unless it’s trained broadly from public internet data (which includes a general awareness of how reviews work, but not current or specific ones). Example Use Cases Where I Would Access Bing Reviews: “What do Bing users think of the new MacBook Pro?”
0
0
1-30 of 38
Burstiness and Perplexity
skool.com/burstiness-and-perplexity
Master AI use cases from legal & the supply chain to digital marketing & SEO. Agents, analysis, content creation--Burstiness & Perplexity from NovCog
powered by