Vadim Orlov Vadim Orlov

Refresh Strategies in DataForge

Discover the power of DataForge Cloud's refresh patterns to streamline your data pipelines. In this video, you'll learn about six key refresh methods: full refresh for initial dataset ingestion, append-only for incremental data updates, and advanced options like timestamp, sequence, and custom patterns for handling time-series data or unique scenarios. Watch as we demonstrate configurations, simulate dataset changes, and explore features like watermarks for tracking updates, historical data preservation, and atomic processing. Whether managing small datasets or complex time-series data, DataForge Cloud empowers you to optimize data transformations with precision and flexibility.

Read More
Vadim Orlov Vadim Orlov

Data Transformation at Scale: Rule Templates & Cloning

Vadim Orlov, CTO of DataForge, tackles common data transformation challenges like repetitive coding and platform complexity in this video. He introduces DataForge Cloud’s rule templates and cloning features to streamline data management through a DRY (Don’t Repeat Yourself) approach.

Vadim walks through setting up data connections, creating reusable rule templates across datasets, and calculating metrics like sale prices and totals. He then demonstrates configuring an output table for reporting and, when the company adds a subsidiary, shows how the cloning feature replicates configurations for new platforms effortlessly.

This demonstration reveals how DataForge Cloud’s tools save time and centralize code management, enabling efficient, scalable, and reusable data engineering without constant rewrites.

Read More
Vadim Orlov Vadim Orlov

Mastering Schema Evolution & Type Safety with DataForge

Schema changes are a common cause of pipeline failures. DataForge addresses this by focusing on type safety and schema evolution.

Type safety ensures reliable transformations through compile-time validation, preventing unexpected errors. Schema evolution automates handling of changes like new columns, data type updates, and nested structures.

With DataForge’s configurable strategies, such as upcasting and cloning, pipelines adapt smoothly to schema changes, reducing manual effort and improving reliability.

Read More
Vadim Orlov Vadim Orlov

Sub-Sources: Simplifying Complex Data Structures with DataForge

In DataForge Cloud 8.1, we introduced Sub-Sources, simplifying the handling of nested complex arrays (NCAs) like ARRAY<STRUCT<..>>. This feature allows you to use standard SQL syntax on NCAs without needing to normalize or modify the underlying data. Sub-Sources act as "virtual" tables, enabling easy transformations while preserving the original structure. This innovation saves time and effort for data engineers working with complex, semi-structured data.

Read More
Vadim Orlov Vadim Orlov

DataForge vs. Databricks Delta Live Tables for Change Data Capture

Check out our latest video where Vadim Orlov, CTO of DataForge, compares automating Change Data Capture (CDC) in DataForge Cloud versus Databricks Delta Live Tables. Discover how DataForge simplifies CDC processes, saving time and effort with automation, and watch a live demo showcasing its efficiency in real-world use cases.

Read More