Technical – Vipul Pathak

May 24, 2019

Network Address Transition in AWS Environment

Network Address Translation in AWS world is done via a dedicated EC2 instance as well as an Amazon provisioned managed component. So which one should the users use?

Read more →

January 25, 2019

S3 Cross Region Replication ….. 101

Cross Region Replication (CRR) is a feature of S3 that can be activated at Bucket level by adding a replication configuration to the source bucket. A bucket’s automatic replication is specific to a Region and it is not Global. With CRR, automatic data replication can be setup across regions.

Read more →

October 9, 2018

Amazon VPC: Deep Dive into Network ACLs

Access Control Lists plays an instrumental role in network security. In AWS world, Network ACLs (NACLs) are referred as “Security at the Gate”, since rules are applied at Subnet level

Read more →

September 24, 2018

AWS Virtual Private Cloud: Security Groups ….. Quick Facts

A Security Group (SG) is a firewall that controls traffic at the NIC level of the Virtual Server (An EC2 instance running virtually over a physical hardware)

Read more →

September 16, 2018

Check List for Troubleshooting SSH connectivity to an EC2 Instance when Default Configuration is not in place

Trying to login to an EC2 instance over Internet? Not able to connect? Use this checklist to verify, that each piece of configuration is in place.

Read more →

September 8, 2018

AWS Virtual Private Cloud: Components, Routing and IPs ….. Quick Facts

When an organization starts working with AWS, they create a private cloud. A private cloud that contains IT resources for use by the organization on which it has full control. This post outlines some basic facts about AWS private cloud.

Read more →

August 14, 2018

AWS Identity and Access Management: In a Nutshell

IAM allow Admins to manage users and their level of access to AWS resources. IAM gives centralized access to manage permissions. IAM can act on Users and Groups, can create or use Policy documents and define Roles.

Read more →

March 30, 2018

MapR Certified Spark Developer

Happy to share the earning of MCSD certification 🙂 Verification URL: https://verify.skilljar.com/c/5if4ohmy68dr Certifications solidify the conceptual knowledge. Spark carries a lot of concepts as well. The Learning Spark Book as well as certain YouTube Videos are a must to build conceptual knowledge. YouTube videos by Sameer Farooque and Brian Clapper are sources. These Certification exams

Read more →

August 16, 2015

Creating An ElasticSearch Index With Schema and Analyzers

. Analysis and Analyzers Elastic Search breaks (tokenizes) data in the document and build index of words (tokens). For each token, it points to the documents that matches the token. The words (tokens) are transformed a particular manner before being stored in the index. This process of breaking the document in a set of words

Read more →

August 2, 2015

EMP and DEPT Tables For Use With Hive and Pig

. EMP and DEPT tables are pretty popular between Oracle users. These tables were very handy in quickly trying new queries. Also, there exists a DUAL table in Oracle that was pretty useful in evaluate expressions, like- “Select (SYSDATE + 1/24) as OneHourFromNow FROM DUAL“. These tables doesn’t exists in Hive, but we can create

Read more →

July 24, 2015

Cloudera Certified Developer for Apache Hadoop (CCDH)

. Happy to share the earning of CCDH certification 🙂 Verification URL: http://certification.cloudera.com/verify (with License # 100-013-285) . Loads of conceptual as well as programming questions that include multiple choice questions as well. Reading Hadoop: The Definitive Guide and Programming Hive a couple of times and practicing Map-Reduce programming model rigorously, was instrumental in clearing the exams. Setting up

Read more →

July 18, 2015

Writing a Generic UDF in Hive

. There are few type of UDFs that we can write in Hive. Functions that act on each column value passed to it, e.g. Select Length(name) From Customer Specific functions written for a specific data type (simple UDFs) Generic functions written to working with more than one data type Functions that act on a group

Read more →

June 25, 2015

Loading Data in Pig Using HCatalog

. HCatalog is an extension of Hive and in a nutshell, it exposes the schema information in Hive Metastore such that applications outside of Hive can use it. The objective of HCatalog is to hold the following type of information about the data in HDFS – Location of the data Metadata about the data (e.g.

Read more →

June 13, 2015

Writing Simple UDF in Hive

. There are a few type of UDFs that we can write in Hive. Functions that act on each column value passed to it, e.g. Select Length(name) From Customer Specific functions written for a specific data type Generic functions written to working with more than one data type (GenericUDF) Functions that act on a group

Read more →

May 30, 2015

Pig Data Types

. Simple: INT and FLOAT are 32 bit signed numeric datatypes backed by java.lang.Integer and java.lang.Float Simple: LONG and DOUBLE are 64 bit signed numeric Java datatypes Simple: CHARARRAY (Unicode backed by java.lang.String) Simple: BYTEARRAY (Bytes / Blob, backed by Pig’s DataByteArray class that wraps byte[]) Simple: BOOLEAN (“true” or “false” case sensitive) Simple: DATETIME

Read more →