2025-11-18

In today’s issue:

AIGC Papers – π*0.6 brings real-world experience into home robots, “Back to Basics” rethinks diffusion as true denoising, and Depth Anything 3 offers a cleaner geometric backbone for future world-models.
AIGC Projects – Depth Anything 3 lands in ComfyUI, AgentEvolver lets LLM agents train themselves, and a new Qwen upscaling LoRA targets real-world photography.
AIGC News – Gemini 3 Pro’s powerful model card leaks, xAI rolls out Grok 4.1, and Replicate joins Cloudflare right as a major outage hits the platform.

Today’s AIGC Papers

π*0.6 shows how a VLA can truly learn from experience, turning home robots into durable, all-day performers through demonstrations, corrections, and self-practice.
Back to Basics pulls diffusion models back to real denoising, using a pure image-space Transformer to achieve high-fidelity generation with a simpler formulation.
Depth Anything 3 reconstructs consistent 3D geometry from any view using a single Transformer, giving future world-models a more stable and cleaner geometric backbone.

π*0.6: a VLA that Learns from Experience( blog | paper )
Physical Intelligence presents π*0.6, a VLA combined with advantage-conditioned RL, trained through demonstrations, corrections, and self-experience. The system doubles success rates and throughput on real household tasks—making coffee, folding clothes, packing objects—and supports long, continuous, real-world operation, and the model card for π0.6 is also provided in the paper.
Kaiming He just Introduced JiT! Back to Basics: Let Denoising Generative Models Denoise（ paper | code)
MIT’s Li Tianhong and Kaiming He propose predicting the clean image instead of noise, using a large patch-based ViT (“Just-image Transformers”) in pixel space. The method needs no tokenizer or pretraining yet achieves competitive high-res ImageNet generation, offering a simpler and clearer theoretical view of diffusion.

Depth Anything 3: Recovering the Visual Space from Any Views ( webpage | paper | code )
ByteDance introduces Depth Anything 3, using a single Transformer and depth-ray representation to recover consistent geometry from single images, multi-view inputs, or videos. It surpasses VGGT in camera-pose and geometry accuracy, and outperforms DA2 in monocular depth, enabling high-fidelity 3DGS reconstruction.

🛠️AIGC Projects

Today’s AIGC Projects

ComfyUI-DepthAnythingV3 turns Depth Anything 3 into drop-in ComfyUI nodes, making DA3 usable in everyday image and video workflows.
AgentEvolver provides a self-evolving training loop where LLM agents generate their own tasks, rewards, and improvements.
Qwen-Edit-2509-Upscale-LoRA gives Qwen-Image-Edit a practical, detail-preserving photography upscaler for real-world enhancement.

ComfyUI-DepthAnythingV3 ( link )
ComfyUI-DepthAnythingV3 wraps Depth Anything 3 into ready-made ComfyUI nodes, letting users run DA3 on images and, with custom graphs, on multi-view / video inputs. It exposes DA3’s spatially consistent depth prediction in a visual workflow, so you can feed depth maps directly into ControlNet, 3DGS, or other geometry-aware pipelines.

AgentEvolver: Towards Efficient Self-Evolving Agent System ( link )
AgentEvolver is an end-to-end framework where LLM-based agents teach themselves via three loops: self-questioning (generate new tasks), self-navigating (reuse past trajectories with hybrid policies), and self-attributing (fine-grained credit assignment over states/actions. It cuts dataset engineering cost, improves exploration efficiency, and yields faster capability gains than traditional RL-style agent training.

vafipas663/Qwen-Edit-2509-Upscale-LoRA ( link )
Qwen-Edit-2509-Upscale-LoRA is a LoRA adapter for Qwen-Image-Edit-2509 focused on realistic photography. Trained on UltraHR-100K and Unsplash-lite, it repairs extreme low resolution, oversharpening, strong JPEG artifacts, motion blur, pixelation, and heavy noise—often up to 16×—while preserving composition and structure, making it a practical replacement for many “magic” commercial upscalers.

🗞️ AIGC News

Today’s AIGC News

Gemini 3 Pro model card leak – A leaked model card outlines Google’s next-gen sparse MoE multimodal model with 1M context, 64K outputs, RL training, detailed agent benchmarks, safety evals, and a Jan 2025 knowledge cutoff.
Grok 4.1 announced – xAI unveils Grok 4.1, a new iteration of the Grok family with upgraded capabilities and overall performance.
Replicate × Cloudflare, amid outage – Replicate is joining Cloudflare just as Cloudflare suffers a major service outage impacting large parts of its platform.

The model card for Gemini 3 Pro has reportedly leaked, revealing a model with extremely strong capabilities

Leaked Gemini 3 Pro model card describing Google’s next-generation sparse MoE multimodal model with 1M-token context, 64K outputs, trained on large-scale web, code and media with RL. It details agentic performance, deployment channels, benchmarks, safety evaluations, frontier safety status, remaining risks, and a January 2025 knowledge cutoff.

xAI has announced Grok 4.1, a new iteration of its Grok model family with upgraded capabilities and performance. ( link )
Replicate is joining Cloudflare, at the same time that Cloudflare has been experiencing a significant service outage affecting its platform. ( link )