PythonHub Logo Python Hub Weekly Digest for 2025-06-01

This week in Python, the popular topics included a guide on building a modern data lakehouse architecture with open-source tools, a tutorial on using Ruff, a Python linter and formatter, and an introduction to NLWeb, a tool for building conversational interfaces for websites. There was also an exploration of a highly efficient leap year check algorithm and a discussion on AlphaEvolve, an autonomous coding agent that uses evolutionary strategies to improve algorithms. In addition, there were insights into the challenges of managing database connections in high-traffic environments and the introduction of template strings in Python 3.14. Wishing you a good week and happy coding!

đź’– Most Popular

Turning Data into Insight
The article demonstrates how to build a flexible, modern data lakehouse architecture using open-source tools like MinIO, Apache Iceberg, Airflow, dbt, Spark, Pandera, and Superset. By integrating these technologies with Docker for easy deployment, it shows how to orchestrate robust data pipelines, ensure data quality, and enable scalable analytics from raw ingestion to interactive dashboards.

Ruff - A Fast Linter & Formatter to Replace Multiple Tools and Improve Code Quality
This video is a hands-on tutorial showing how to use Ruff, a super-fast Python linter and formatter written in Rust that consolidates tools like Flake8, Black, and isort into a single, efficient solution. The guide covers installing Ruff, running it from the command line, configuring it for projects, and integrating it with VS Code to improve code quality and developer workflow.

nlweb
Building conversational interfaces for websites is hard. NLWeb seeks to make it easy for websites to do this. And since NLWeb natively speaks MCP, the same natural language APIs can be used both by humans and agents.

A leap year check in three instructions
The article explores how to check if a year is a leap year using just three CPU instructions, leveraging clever bit manipulation and "magic numbers" to optimize the standard algorithm. By reverse-engineering and brute-forcing constants, the author demonstrates a branchless, highly efficient leap year check for years up to 102,499, illustrating both the mathematical tricks and practical l...

Juvio
UV kernel for Jupyter.


đź“– Articles

Python in LibreOffice (LibrePythonista Extension)

AlphaEvolve: A coding agent for scientific and algorithmic discovery
AlphaEvolve is an autonomous coding agent that uses evolutionary strategies to improve algorithms by iteratively modifying code and learning from evaluator feedback. It has achieved breakthroughs in data center scheduling, hardware design, and mathematical discovery—including surpassing Strassen’s 4×4 matrix multiplication algorithm for the first time in 56 years.

Unravelling t-strings
PEP 750 introduced t-strings for Python 3.14. In fact, they are so new that as of Python 3.14.0b1 there still isn't any documentation yet for t-strings. As such, this blog post will hopefully help explain what exactly t-strings are and what you might use them for by unravelling the syntax and briefly talking about potential uses for t-strings.

I don't like NumPy
The author, once a fan of NumPy, now criticizes its complexity and opacity when working with high-dimensional arrays, arguing that common operations often become unreadable and error-prone due to confusing broadcasting, indexing, and function conventions. While NumPy excels at simple cases, the post contends that its design choices—especially around implicit broadcasting and lack of expl...

Beyond Query Optimization
Lyft engineers detail how they improved the scalability and reliability of their Aurora Postgres databases by implementing connection pooling with SQLAlchemy and Amazon RDS Proxy. The article explains the challenges of managing database connections in high-traffic environments and describes how these solutions reduced connection limits, improved application stability, and optimized resou...

Machine Learning Prototyping with DuckDB and scikit-learn
In this post, we prototype a machine learning workflow using DuckDB for data handling and scikit-learn for modeling.

Web Apps for Python Devs with Auto-Generated UI

Python Tooling at Scale: LlamaIndex’s Monorepo Overhaul

Template Strings in Python 3.14: Structured Interpolation
Python 3.14 introduces template strings (t""), which return structured Template objects instead of plain strings, enabling full inspection and control of interpolated expressions. This allows safer, customizable rendering for use cases like shell commands, HTML output, logging, and config generation—offering a powerful alternative to f-strings when you need pre-render control.

Dagster - Data Orchestration and Pipelines with Python & DAGs
This video is a practical introduction to using Dagster for Python-based data orchestration, covering core concepts like assets, definitions, scheduling, and the Dagster UI. Through hands-on examples—including building a pipeline with Polars and DuckDB—the tutorial demonstrates how to define, manage, and automate complex data workflows in modern data engineering.

LangGraph Complete Course for Beginners – Complex AI Agents with Python
This video course introduces LangGraph, a Python library for building advanced conversational AI workflows using a graph-based approach. It guides viewers through designing, implementing, and managing scalable dialogue systems, covering both theoretical concepts and hands-on coding exercises.

Free-Threaded Python Library Compatibility Checker

Python Hub Weekly Digest for 2025-05-25

Mutmut – Python Mutation Tester


⚙️ Projects

ii-agent
A new open-source framework to build and deploy intelligent agents.

Datatune
Perform transformations on your data with natural language using LLMs

Flowfile
Flowfile is a visual ETL tool combining drag-and-drop workflows with the speed of Polars dataframes. Build and analyze data pipelines without code. Perfect for analysts and engineers needing fast, intuitive data processing. Designed to run locally or deploy to production environments.

muscle-mem
A cache for AI agents to learn and replay complex behaviors.

sre-bot
A Google Agent Development Kit (ADK) powered assistant designed to help Site Reliability Engineers (SREs) with operational tasks and monitoring, particularly focused on Kubernetes interactions.

workflow-use
Create and run workflows (RPA 2.0).

OpenThinkIMG
OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.

LiveSplat
Live Gaussian Splatting for RGBD Camera Streams.

Voice_Extractor
Extract voice segments of a target speaker from podcasts - Useful for creating speech datasets.

pyfuze
pyfuze makes your Python project run anywhere.


👾 Reddits

Ruff users, what rules are using and what are you ignoring?

Modern Python Boilerplate - good package basic structure

How to become a data scientist in 2025 ?

Do you really use redis-py seriously?


← Previous

Project by Ruslan Keba. Since 2012. Powered by Python. Made in đź‡şđź‡¦Ukraine.