Vipul Pathak

September 14, 2025

Why GPU Powers the Future of Artificial Intelligence

As artificial intelligence reshapes how we interact with data, one architectural shift stands out: the move from CPU-centric computing to GPU-dominant infrastructure. But why did this happen? And what makes GPUs so uniquely suited for AI workloads? Let’s unpack the evolution, the math, and the mechanics behind this transformation.
Read more →
April 25, 2021

Google Certified Professional Cloud Architect

Verify
Read more →
February 2, 2020

VPC Peering ….. Quick Facts

How to share resource present in two separate VPCs? It is VPC Peering. Connecting two cloud environments make it possible to use each other’s resources.
Read more →
January 18, 2020

Global v/s Region Level Resources

Resources in a cloud environment are at different level of scope. While cloud providers operates from multiple geographic regions, many resources are accessible globally, but some are at much smaller scope (called AZ).
Read more →
January 13, 2020

Glacier ….. 101

Glacier is a low cost Cold-Data-Storage service used for data archival. It exposes a REST interface. “Archive” and “Vault” are the two main components in Glacier service.
Read more →
August 20, 2019

AWS Certified Solutions Architect

Glad to add another credential …… The AWS certified solutions architect badge!
Read more →
May 24, 2019

Network Address Transition in AWS Environment

Network Address Translation in AWS world is done via a dedicated EC2 instance as well as an Amazon provisioned managed component. So which one should the users use?
Read more →
January 25, 2019

S3 Cross Region Replication ….. 101

Cross Region Replication (CRR) is a feature of S3 that can be activated at Bucket level by adding a replication configuration to the source bucket. A bucket’s automatic replication is specific to a Region and it is not Global. With CRR, automatic data replication can be setup across regions.
Read more →
December 30, 2018

AWS Simple Storage Service From 10,000 Feet

The Simple Storage Service is a secure, durable, highly scalable and highly available Object Storage by AWS. Object Store means we can store Files but cannot install software. S3 provides easy to use, web service interface to store/retrieve any amount of data from anywhere.
Read more →
December 16, 2018

Amazon S3 Security: In a Nutshell

S3 Security is divided in four categories- Privacy, Log Trail, Access Permissions and Encryption.
Read more →
October 9, 2018

Amazon VPC: Deep Dive into Network ACLs

Access Control Lists plays an instrumental role in network security. In AWS world, Network ACLs (NACLs) are referred as “Security at the Gate”, since rules are applied at Subnet level
Read more →
September 24, 2018

AWS Virtual Private Cloud: Security Groups ….. Quick Facts

A Security Group (SG) is a firewall that controls traffic at the NIC level of the Virtual Server (An EC2 instance running virtually over a physical hardware)
Read more →
September 16, 2018

Check List for Troubleshooting SSH connectivity to an EC2 Instance when Default Configuration is not in place

Trying to login to an EC2 instance over Internet? Not able to connect? Use this checklist to verify, that each piece of configuration is in place.
Read more →
September 8, 2018

AWS Virtual Private Cloud: Components, Routing and IPs ….. Quick Facts

When an organization starts working with AWS, they create a private cloud. A private cloud that contains IT resources for use by the organization on which it has full control. This post outlines some basic facts about AWS private cloud.
Read more →
August 14, 2018

AWS Identity and Access Management: In a Nutshell

IAM allow Admins to manage users and their level of access to AWS resources. IAM gives centralized access to manage permissions. IAM can act on Users and Groups, can create or use Policy documents and define Roles.
Read more →
April 16, 2018

What are these- Shards and Replicas in ElasticSearch?

In an elastic cluster, there are many entities that makes up a functioning cluster. Nodes, Primary Shards and Replicas are the ones frequently getting referred in documentation and articles.
Read more →
April 14, 2018

MapR Certified HBase Developer

Another Certification Bagged … 🙂 MapR’s Certification exams are very conceptual and test knowledge deeply. This certification exam covers a balance of Conceptual and API level questions. This is not at all a low hanging fruit. I used the book HBase Definitive Guide extensively to study the Architecture and internals of HBase. Certain YouTube video…
Read more →
March 30, 2018

MapR Certified Spark Developer

Happy to share the earning of MCSD certification 🙂 Verification URL: https://verify.skilljar.com/c/5if4ohmy68dr Certifications solidify the conceptual knowledge. Spark carries a lot of concepts as well. The Learning Spark Book as well as certain YouTube Videos are a must to build conceptual knowledge. YouTube videos by Sameer Farooque and Brian Clapper are sources. These Certification exams…
Read more →
September 28, 2015

Ingesting Log4J Application Logs Into HDFS Using Flume In Realtime

Log file analysis is popular usecase in Big Data world. Log files contains evidence of historical events that an application witnessed under their execution environment. Monitoring applications intend to find out traces of actual events that happened during program execution. Several analysis usecases are possible from simply counting occurrence of some event to specific processing.
Read more →
August 16, 2015

Creating An ElasticSearch Index With Schema and Analyzers

. Analysis and Analyzers Elastic Search breaks (tokenizes) data in the document and build index of words (tokens). For each token, it points to the documents that matches the token. The words (tokens) are transformed a particular manner before being stored in the index. This process of breaking the document in a set of words…
Read more →
August 2, 2015

EMP and DEPT Tables For Use With Hive and Pig

. EMP and DEPT tables are pretty popular between Oracle users. These tables were very handy in quickly trying new queries. Also, there exists a DUAL table in Oracle that was pretty useful in evaluate expressions, like- “Select (SYSDATE + 1/24) as OneHourFromNow FROM DUAL“. These tables doesn’t exists in Hive, but we can create…
Read more →
July 24, 2015

Cloudera Certified Developer for Apache Hadoop (CCDH)

. Happy to share the earning of CCDH certification 🙂 Verification URL: http://certification.cloudera.com/verify (with License # 100-013-285) . Loads of conceptual as well as programming questions that include multiple choice questions as well. Reading Hadoop: The Definitive Guide and Programming Hive a couple of times and practicing Map-Reduce programming model rigorously, was instrumental in clearing the exams. Setting up…
Read more →
July 18, 2015

Writing a Generic UDF in Hive

. There are few type of UDFs that we can write in Hive. Functions that act on each column value passed to it, e.g. Select Length(name) From Customer Specific functions written for a specific data type (simple UDFs) Generic functions written to working with more than one data type Functions that act on a group…
Read more →
June 25, 2015

Loading Data in Pig Using HCatalog

. HCatalog is an extension of Hive and in a nutshell, it exposes the schema information in Hive Metastore such that applications outside of Hive can use it. The objective of HCatalog is to hold the following type of information about the data in HDFS – Location of the data Metadata about the data (e.g.…
Read more →
June 13, 2015

Writing Simple UDF in Hive

. There are a few type of UDFs that we can write in Hive. Functions that act on each column value passed to it, e.g. Select Length(name) From Customer Specific functions written for a specific data type Generic functions written to working with more than one data type (GenericUDF) Functions that act on a group…
Read more →
June 8, 2015

Loading an HDFS Data File in Pig

. emp = LOAD ‘/path/to/data/file/on/hdfc/Employees.txt’ [ USING PigStorage(‘ ‘) ] AS ( emp_id: INT, name: CHARARRAY, joining_date: DATETIME, department: INT, salary: FLOAT, mgr_id: INT, residence: BAG { b:(addr1: CHARARRAY, addr2: CHARARRAY, city: CHARARRAY) }) ; The Alias for data in file “Employees.txt” is emp and using emp,…
Read more →
June 6, 2015

Writing EvalFunc UDF in Pig

. UDFs (User Defined Functions) are ways in pig to extend its functionality. There are two type of UDFs that we can write in pig – Evaluate (extends from EvalFunc base class) Load/Store functions (extends from LoadFunc base class) Here we will stepwise develop an Evaluate UDF. Lets start by conceptualizing a UDF (named VowelCount)…
Read more →
May 31, 2015

Filtering and Limiting Data in Pig

. — emp = LOAD ‘Employees.txt’ … Data in text file resembles the “EMP” table in Oracle — dept = LOAD ‘Dept.txt’ …….. Data in text file resembles the “DEPT” table in Oracle — Filter data in emp to only those whose job is Clerk. Filtered_Emp = FILTER emp BY (job == ‘CLERK’); — Supports…
Read more →
May 30, 2015

Pig Data Types

. Simple: INT and FLOAT are 32 bit signed numeric datatypes backed by java.lang.Integer and java.lang.Float Simple: LONG and DOUBLE are 64 bit signed numeric Java datatypes Simple: CHARARRAY (Unicode backed by java.lang.String) Simple: BYTEARRAY (Bytes / Blob, backed by Pig’s DataByteArray class that wraps byte[]) Simple: BOOLEAN (“true” or “false” case sensitive) Simple: DATETIME…
Read more →
May 28, 2015

Sample Pig Statements Demonstrating Many Functions

. Pig Statements — Load command loads the data — Every placeholder like “A_Rel” and “Filter_A” are called Alias, and they are useful — in holding the relation returned by pig statements. Aliases are relations (not variables). A_Rel = LOAD ‘/hdfs/path/to/file’ [AS (col_1[: type], col_2[: type], col_3[: type], …)] ; — Record set returned by…
Read more →
May 27, 2015

Apache Pig at a High Level

Pig is a data flow language developed at Yahoo and is a high level language. Pig programs are translated into a lower level instructions supported by underlying execution engine. Pig is designed for working on complex operations with speed.
Read more →
May 20, 2015

Default Values Set in Job class in Map/Reduce v2

MapReduce default settings
Read more →
May 17, 2015

Apache Sqoop in a Nutshell

. Sqoop is a utility that can be used to transfer data between SQL based relational data stores tO/from hadoOP. The main operation this utility carry out is performing a data Import to Hadoop from supported relational data sources and Exporting data back to them. Sqoop uses connectors as extensions to connect to data stores.…
Read more →
May 14, 2015

Apache Hive From 5000 Feets

. Apache Hive is an abstraction on top of HDFS data, that allow querying the data using the familiar SQL like language, called HiveQL (Hive Query Language). Hive was developed at Facebook to allow data analysts to query data using an SQL like language. Hive has limited commands and is similar to basic SQL (advance SQL options…
Read more →
July 26, 2013

Difference Between Software Architecture And Software Design

Architecture is a branch of design. So every architectural work is a kind of design, but the reverse is not true.
Read more →
July 20, 2013

What is Software Architecture – Some Thoughts

Casually speaking, Software architecture is the careful partitioning or subdivision of a system as whole, into sub-parts with specific relationship between these sub-parts. This partitioning is what allows different people (or group(s) of people) to work together cooperatively to solve a much bigger problem, then any of them could solve by individually dealing with…
Read more →
May 30, 2010

Making It appear on BlackBerry Handheld that a Self-Created Message is Received from a Different Sender (Hacking the FROM Field)

This article explains how to create an Email and make it appear to the user that a Self-Created Message is Received from a Different Sender.
Read more →
May 14, 2010

Status Returned By BlackBerry API During "Data Connection Refused" Conditions

Status Returned By BlackBerry API During "Data Connection Refused" Conditions
Read more →
May 8, 2010

Setting READ/WRITE Permissions for the ISAPI Extension Hosted on IIS

This article explains how to set READ/WRITE permissions for an ISAPI Extension or a script which is hosted on an IIS server.
Read more →
May 8, 2010

Installing an ISAPI Extension as a Wildcard Handler

Installing an ISAPI Extension as a Wildcard Handler
Read more →
August 20, 2009

Hello World!

Greetings !! Wikipedia is great source of free knowledge for nearly all of us. Lets start with supporting Wikipedia : There is also a great source of software engineering audio podcasts available, that you can find at: http://www.se-radio.net. If you like the podcast, you may want to support SE-Radio too: Will meet you soon here,…
Read more →

Instagram / TikTok / X

Designed with WordPress