2025-11-18

In today’s issue:

  • AIGC Papers – π*0.6 brings real-world experience into home robots, “Back to Basics” rethinks diffusion as true denoising, and Depth Anything 3 offers a cleaner geometric backbone for future world-models.

  • AIGC Projects – Depth Anything 3 lands in ComfyUI, AgentEvolver lets LLM agents train themselves, and a new Qwen upscaling LoRA targets real-world photography.

  • AIGC News – Gemini 3 Pro’s powerful model card leaks, xAI rolls out Grok 4.1, and Replicate joins Cloudflare right as a major outage hits the platform.

Today’s AIGC Papers

  • π*0.6 shows how a VLA can truly learn from experience, turning home robots into durable, all-day performers through demonstrations, corrections, and self-practice.

  • Back to Basics pulls diffusion models back to real denoising, using a pure image-space Transformer to achieve high-fidelity generation with a simpler formulation.

  • Depth Anything 3 reconstructs consistent 3D geometry from any view using a single Transformer, giving future world-models a more stable and cleaner geometric backbone.

  1. π*0.6: a VLA that Learns from Experience( blog | paper )

    Physical Intelligence presents π*0.6, a VLA combined with advantage-conditioned RL, trained through demonstrations, corrections, and self-experience. The system doubles success rates and throughput on real household tasks—making coffee, folding clothes, packing objects—and supports long, continuous, real-world operation, and the model card for π0.6 is also provided in the paper.

  2. Kaiming He just Introduced JiT! Back to Basics: Let Denoising Generative Models Denoisepaper | code)

    MIT’s Li Tianhong and Kaiming He propose predicting the clean image instead of noise, using a large patch-based ViT (“Just-image Transformers”) in pixel space. The method needs no tokenizer or pretraining yet achieves competitive high-res ImageNet generation, offering a simpler and clearer theoretical view of diffusion.

  1. Depth Anything 3: Recovering the Visual Space from Any Views ( webpage | paper | code )

    ByteDance introduces Depth Anything 3, using a single Transformer and depth-ray representation to recover consistent geometry from single images, multi-view inputs, or videos. It surpasses VGGT in camera-pose and geometry accuracy, and outperforms DA2 in monocular depth, enabling high-fidelity 3DGS reconstruction.

🛠️AIGC Projects

Today’s AIGC Projects

  • ComfyUI-DepthAnythingV3 turns Depth Anything 3 into drop-in ComfyUI nodes, making DA3 usable in everyday image and video workflows.

  • AgentEvolver provides a self-evolving training loop where LLM agents generate their own tasks, rewards, and improvements.

  • Qwen-Edit-2509-Upscale-LoRA gives Qwen-Image-Edit a practical, detail-preserving photography upscaler for real-world enhancement.

  1. ComfyUI-DepthAnythingV3 ( link )

    ComfyUI-DepthAnythingV3 wraps Depth Anything 3 into ready-made ComfyUI nodes, letting users run DA3 on images and, with custom graphs, on multi-view / video inputs. It exposes DA3’s spatially consistent depth prediction in a visual workflow, so you can feed depth maps directly into ControlNet, 3DGS, or other geometry-aware pipelines.

  1. AgentEvolver: Towards Efficient Self-Evolving Agent System ( link )

    AgentEvolver is an end-to-end framework where LLM-based agents teach themselves via three loops: self-questioning (generate new tasks), self-navigating (reuse past trajectories with hybrid policies), and self-attributing (fine-grained credit assignment over states/actions. It cuts dataset engineering cost, improves exploration efficiency, and yields faster capability gains than traditional RL-style agent training.

  1. vafipas663/Qwen-Edit-2509-Upscale-LoRA ( link )

    Qwen-Edit-2509-Upscale-LoRA is a LoRA adapter for Qwen-Image-Edit-2509 focused on realistic photography. Trained on UltraHR-100K and Unsplash-lite, it repairs extreme low resolution, oversharpening, strong JPEG artifacts, motion blur, pixelation, and heavy noise—often up to 16×—while preserving composition and structure, making it a practical replacement for many “magic” commercial upscalers.

🗞️ AIGC News

Today’s AIGC News

  • Gemini 3 Pro model card leak – A leaked model card outlines Google’s next-gen sparse MoE multimodal model with 1M context, 64K outputs, RL training, detailed agent benchmarks, safety evals, and a Jan 2025 knowledge cutoff.

  • Grok 4.1 announced – xAI unveils Grok 4.1, a new iteration of the Grok family with upgraded capabilities and overall performance.

  • Replicate × Cloudflare, amid outage – Replicate is joining Cloudflare just as Cloudflare suffers a major service outage impacting large parts of its platform.

  1. The model card for Gemini 3 Pro has reportedly leaked, revealing a model with extremely strong capabilities

Leaked Gemini 3 Pro model card describing Google’s next-generation sparse MoE multimodal model with 1M-token context, 64K outputs, trained on large-scale web, code and media with RL. It details agentic performance, deployment channels, benchmarks, safety evaluations, frontier safety status, remaining risks, and a January 2025 knowledge cutoff.

  1. xAI has announced Grok 4.1, a new iteration of its Grok model family with upgraded capabilities and performance. ( link )

  2. Replicate is joining Cloudflare, at the same time that Cloudflare has been experiencing a significant service outage affecting its platform. ( link )

Always fresh, always live

New models, papers, and projects as they drop — stay ahead of the AI curve.


For deeper insights and long-form analysis, subscribe to our weekly briefings at newsletter.aigc.news.

That’s it for today.


Keep building, keep thinking for yourself — we’ll be here tracking the next wave.


The aigc.news Team

Keep Reading

No posts found