Page Not Found
Page not found. Your pixels are in another canvas.
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Page not found. Your pixels are in another canvas.
About me
This is a page not in th emain menu
Published:
Expanding upon the groundwork laid in the previous blog post which introduced the core concepts of Parquet metadata, this subsequent entry delves even deeper into the intricate realm of metadata information and exploring the nuances of logical types. Join me on this illuminating journey as we unravel the inner workings of Parquet metadata, uncovering its profound impact on data management and analytics.
Published:
Engaging daily with Parquet, I find it fascinating to delve into the intricate workings spanning from data residing in memory to its storage as byte arrays in HDFS block storage or within object stores like ADLS or S3, facilitated by a block storage style wrapper interface such as ABFS or S3a. To grasp the serialization process of Parquet, it’s imperative to explore how data is stored on disk and the consequential impacts on performance and cost. Let’s embark on a detailed exploration of Parquet’s serialisation, tracing its journey from the rudimentary realms of JSON and XML to its definition with Thrift.
Published:
Parquet is one of the important and impactful format in recent data engineering history. So this blog tries to understand how does parquet works at a very basic level. Parquet format was largely influenced by Dremel Paper as mentioned in the motivation statement. This blog post is designed to walk you through the key points of the paper using language that’s more approachable. It can be particularly useful if you’ve already read the paper and are looking for clarification on certain parts, or if you simply prefer the conversational tone of a blog over the formal language of an academic paper. However, I want to emphasize that the original paper is quite straightforward, and I recommend reviewing it either before or after reading this post.
Published in International Conference on Computer Science, Industrial Electronics(ICCSIE), 2018
This paper is about developing a method for autonomous tagging of stack-overflow questions
Download here