How to make current job more fulfilling

Are you losing interest in your current job? Do you feel perhaps changing it will bring more happiness? Before you go further with stronger steps, what if you re-evaluate your situation? Is it really the job you less like or something else within the job that needs adjustments? Let’s review some steps on how we can make our current job more meaningful:

  • Know why behind the job you’re in: Understand why you’re in this job. Is it your financial responsibility that’s bringing you to the work every day? Why did you choose this line of work? Does it relate to your interests?
  • Is there anything that is lacking in the current job? Evaluate reasons what can increase the engagement at work. What growth perspective you’d like to focus on? Is it more salary? Is it the increase in the impact to the people or business you support? Is it the technology/skills you use at work? Here’s a quick trick to evaluate. Would you prefer to continue doing the work you do, if:
    • You’re offered 2 times the current salary or
    • You’re offered a promotion or
    • You’re offered to increase your business impact or
    • You’re offered to work with more people or
    • You’re offered to switch to use different technology/skills
  • Evaluating it would help to increase the engagement. If we find out that we should move for a better option, we should plan for it. Sometimes, we find out that we’re already in a great environment. For example, a friend of mine was not engaged at work. After the analysis, he realized that given two times the salary, he’d continue working at the same place with the same set of people, supporting the same business. So, there was nothing wrong in his current job except a low pay situation. He decided to explore if he’s really paid low. The reality was that he’s reasonably paid. His comparison of low-pay was also somewhat unreasonable. Sometimes, it’s our comparison, judgement, and other such inner-enemies that cause us distress. Sometimes some people have a feeling that we’re not doing enough or we’re simply not happy where we are. After the realization, he’s now focusing on his current job with a desire to increase his impact. Another person in a similar situation found out that he can get a better salary outside. He planned to move to a better paying job.
  • What’s the minimum required from you at work: Make a list of minimum required expectations from you at work. How knowing the minimum required expectations help? Many times, we’re overwhelmed and lost with many demands at work. Sometimes, it becomes confusing to navigate the day at work successfully. Making a list of minimum required tasks provides us the clarity of expectations. For example, if someone is a software developer at work, they can write minimum required tasks as:
    • Prime tasks like driving projects A, B, and, C. Write down what’s the minimum required expectation for each project.
    • Supporting tasks: update on progress every week, month, and quarter. What does it means? How much times does it require every week?
    • Trainings: are there any mandatory trainings that you have to attend?
  • After making a list of minimum required tasks, plan your time at work to first address these tasks. In my experience, I noticed that many times, ambiguous situations and confusions take up a lot of time. Without the clarity of minimum required tasks, we maybe wasting time. After a certain time, we start being overwhelmed with many things undone.
  • Look at the bigger picture of your current job. Let’s look at three areas to evaluate it:
    • People: how are your work relationships with people? Do you feel connected with people at work? Can you make friends at work to share your life and listen to their life situations? Connecting with people at work can make the work more interesting. Also, knowing that you’re not alone in life’s common challenges can help to find mentor support. For example, a friend found that his colleague also has interest in writing. What if he and his colleague can find moments in the day to share their writing content?
    • Business: what business does your job supports? What’s the bigger picture of it? For example, are you in an IT development job? What social cause your company support? If it’s a public company, what is the sector of your company’s stock? How your work impacts the society? When we relate our work with a larger cause, it could help in bringing the engagement.
    • Technology/Skills: What skills do you need, to perform your current job? What are your areas of expertise? How valuable are those skills in the outside market? Is there any skill that other would like to learn from you? If you’re interested in mentoring others, would you like to mentor someone at work? Is there any particular skill you’d like to learn more at work?
  • Plan for periodic leaves. Sometimes, we feel that we have no choice other than going to the same work every week. Taking leaves helps us to reflect upon life. If it’s feasible, plan for a longer leave, like a week. If you’re free without any work tasks, what do you feel to do? For example, my one friend likes learning about technologies even when he’s out of the work. He’s naturally inclined towards technologies. By taking leaves, he realized that even if he doesn’t need a job for his financial needs, he’d like to write programs in his free time. He realized that it’s not the work type that he wants to change. Sometimes, he’s overwhelmed at work. So, the solution is to find out how can he prioritize his work so he can set the right expectations.
  • Plan for continuous evaluations: Plan for a periodic evaluation to assess where you’re. If you’re interested in something else, keep pivoting from the current situations. For example, after a decade of experience in a technical skills set, if you’re interested in moving to a new skills set, start planning for a steady move.

I’d love to know yours feedback on it. Thank you.

Common Design Patterns Overview

In this article, we’ll go through the overview of basic design patterns.

Circuit Breaker: This is a pattern that helps to manage calls from one service to another. There are three states of it:

  • Open: Calls from one service to another service are not allowed.
  • Closed: Calls from one service to another service are allowed.
  • Half-Open: A few calls from one service to another service are allowed but not all calls are allowed.

Two implementations for circuit breakers: Hystrix and Resilience4J.

Bulkhead: It allows to set maximum concurrent users that can connect to a service.

Backpressure: We will add details of it later.

Bloom filters: it is a data structure to search an element in a data set quickly with the level of certainty. Guava is a known Java API implementation of bloom filters.

HyperLogLog: it is a data structure that can provide the probabilistic calculation of the cardinality of a data set. Let’s say we want to understand how many unique visitors visited a mall. We can use HyperLogLog data structure to do it efficiently.

Gang of Four Design Patterns: It consists of three types of patterns: structural, creational, and behavioral, In total there are 23 of patterns.

References:

Bloom filters: https://richardstartin.github.io/posts/building-a-bloom-filter-from-scratch

HyperLogLog: https://www.baeldung.com/java-hyperloglog

Common Java libraries

Lombok API: This is a very helpful library that allows reducing infrastructural code. If we use Lombok API, we don’t have to write the code for getters, setters, constructor, equals, hash code methods, and even more.

Resilience4j API: It is a library designed for functional programming. One example use case is the rate limiter functionality, to limit number of maximum requests served by an API in a defined time period. Other examples are:

  • Concurrency control using bulkhead module
  • Fault Tolerant using retry

Hystrix API: Hystrix API can help make a service fault tolerant and resilient.

Javatuples: It’s an API that allows us to work with tuples. A tuple is a sequence of unrelated objects of different types. For example, a tuple may contain an integer, a string, and an object.

Javasist, CGLib, and ASM: These are APIs to manipulate Java byte codes.

P6Spy: It is a library that allows logging of database operations in the realtime.

Java Transaction Management: A transaction is a series of actions that must be completed. Java provides multiple ways to control transactions. Java provides transactions that are based on JDBC, JPA, JMS, Global Transactions, Java Transaction API (JTA), Java Transaction Service (JTS), and other related ways.

References:

Strength app part 5: Enable https on AWS

This is the part 5 of application development series. Refer to part 4 for the previous information. On our strength application, we wanted to enable https certificate. As it is for learning purpose, we wanted to keep it low cost.

Here were our options:

  • Enable AWS provided https option.
  • Get a free https certificate via letsencrypt and enable it on AWS. For our Sprint Boot application, we needed to generate a keystore.p12 file. We decided to opt for option2: get a free https certificate via letsencrypt website.

Our next challenge is to access the generated certificate into Spring Boot application in a way that is scalable in the future and does not go away if we terminate our EC2 instance on the ECS cluster. Here are options for us:

  • Manually copy https certificate to EC2 instance. We did not opt for this option. Reason is, if we terminate our ECS instance (attached to the ECS cluster), the https certificate will be deleted with the termination of the EC2 instance.
  • Keep the certificate at Amazon S3. Then, copy it to EC2 instance manually. We did not opt this option because every time we have a need to recreate an EC2 instance, we will have to manually copy the certificate.
  • When creating an EC2 instance within ECS cluster, add commands in user data option, to copy the certificate from AWS S3. We think this is an optimum option. But we couldn’t enable it. Free version of ECS enabled EC2 instance did not allow adding user data properly. To allow running user data into EC2 instance, we had to run an EC2 agent configuration. Running these configurations were either not easily available or too complicated within the free tier EC2 instance. So, we did not opt this option.
  • Add the https certificate within the Spring Boot application via S3 copy using SSL configuration. This could have been a considerable option. Within Spring Boot code, we can add SSL configuration bean to copy the certificate from AWS S3 and recreate a certificate file within the Spring Boot application. Below is a sample code to do it:

import java.io.File;

import org.apache.catalina.Context;
import org.apache.catalina.connector.Connector;
import org.apache.tomcat.util.descriptor.web.SecurityCollection;
import org.apache.tomcat.util.descriptor.web.SecurityConstraint;
import org.springframework.boot.web.embedded.tomcat.TomcatServletWebServerFactory;
import org.springframework.boot.web.servlet.server.ServletWebServerFactory;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

//@Configuration
public class SslConfiguration {

@Bean
public ServletWebServerFactory servletContainer() {
    TomcatServletWebServerFactory tomcat = new TomcatServletWebServerFactory() {
        @Override
        protected void postProcessContext(Context context) {
            SecurityConstraint securityConstraint = new SecurityConstraint();
            securityConstraint.setUserConstraint("CONFIDENTIAL");
            SecurityCollection collection = new SecurityCollection();
            collection.addPattern("/*");
            securityConstraint.addCollection(collection);
            context.addConstraint(securityConstraint);
        }
    };
    tomcat.addAdditionalTomcatConnectors(redirectConnector());
    return tomcat;
}

private Connector redirectConnector() {

    Connector connector = new Connector("org.apache.coyote.http11.Http11NioProtocol");

    connector.setPort(8443);
    connector.setSecure(true);
    connector.setScheme("https");
    connector.setAttribute("keyAlias", "tomcat");
	connector.setAttribute("keystorePass", "<hidden>");
	connector.setAttribute("keyStoreType", "PKCS12");

    Object keystoreFile;
    File file = new File("");// ADD PATH
    String absoluteKeystoreFile = file.getAbsolutePath();

    connector.setAttribute("keystoreFile", absoluteKeystoreFile);
    connector.setAttribute("clientAuth", "false");
    connector.setAttribute("sslProtocol", "TLS");
    connector.setAttribute("SSLEnabled", true);
    return connector;

}

}


  • Add the https certificate within the Spring Boot application via S3 using properties file. To use this option, we need to read the https certificate file from application.properties. Below is a sample code to do it:

private static void copySSLCertificateFromS3() {

try {

Properties props = readPropertiesFile("src/main/resources/application.properties");

String clientRegion = props.getProperty("clientRegion");

String bucketName = props.getProperty("bucketName");

String sslFileNameWithPath = props.getProperty("sslFileNameWithPath");

String keyStoreFileName = props.getProperty("server.ssl.key-store");

AmazonS3 s3Client = AmazonS3ClientBuilder.standard().withRegion(clientRegion)

.withCredentials(new ProfileCredentialsProvider()).build();

S3Object object = s3Client.getObject(new GetObjectRequest(bucketName, sslFileNameWithPath));

InputStream objectData = object.getObjectContent();

// Process the objectData stream.

File file = new File(keyStoreFileName);

try (OutputStream outputStream = new FileOutputStream(file)) {

IOUtils.copy(objectData, outputStream);

} catch (FileNotFoundException e) {

e.printStackTrace();

// handle exception here

} catch (IOException e) {

e.printStackTrace();

// handle exception here

}

objectData.close();

} catch (Exception e) {

e.printStackTrace();

}

}

public static Properties readPropertiesFile(String fileName) throws IOException {

FileInputStream fis = null;

Properties prop = null;

try {

fis = new FileInputStream(fileName);

prop = new Properties();

prop.load(fis);

} catch (FileNotFoundException fnfe) {

fnfe.printStackTrace();

} catch (IOException ioe) {

ioe.printStackTrace();

} finally {

fis.close();

}

return prop;

}

private static void copySSLCertificateFromS3() {

try {

Properties props = readPropertiesFile("src/main/resources/application.properties");

String clientRegion = props.getProperty("clientRegion");

String bucketName = props.getProperty("bucketName");

String sslFileNameWithPath = props.getProperty("sslFileNameWithPath");

String keyStoreFileName = props.getProperty("server.ssl.key-store");

AmazonS3 s3Client = AmazonS3ClientBuilder.standard().withRegion(clientRegion)

.withCredentials(new ProfileCredentialsProvider()).build();

S3Object object = s3Client.getObject(new GetObjectRequest(bucketName, sslFileNameWithPath));

InputStream objectData = object.getObjectContent();

// Process the objectData stream.

File file = new File(keyStoreFileName);

try (OutputStream outputStream = new FileOutputStream(file)) {

IOUtils.copy(objectData, outputStream);

} catch (FileNotFoundException e) {

e.printStackTrace();

// handle exception here

} catch (IOException e) {

e.printStackTrace();

// handle exception here

}

objectData.close();

} catch (Exception e) {

e.printStackTrace();

}

}

public static Properties readPropertiesFile(String fileName) throws IOException {

FileInputStream fis = null;

Properties prop = null;

try {

fis = new FileInputStream(fileName);

prop = new Properties();

prop.load(fis);

} catch (FileNotFoundException fnfe) {

fnfe.printStackTrace();

} catch (IOException ioe) {

ioe.printStackTrace();

} finally {

fis.close();

}

return prop;

}

  • Use a docker container to copy https certificate form S3 to EC2 instance. Every time we have a new EC2 instance, we can copy the https certificate file from S3 to EC2 instance using a very light weight docker container task. So far, this seems to be the best possible approach within the free tier EC2 instance of ECS type. We’re exploring this option.

If anyone has suggestions to us for a better approach, feel free to share your comments.

Strengths app part 4

This article is a part of application development series. We are providing details of creating the strength application. In part 3, we discussed about REST and other backend APIs for the application, In this part, we will discuss user interface details of the web application.

  • User login functionality: We have finalized a basic login page to authenticate a user. If a user is not authenticated and attempts to view the home page of the application, we redirect the user back to the user login page.
  • Home page: Home page provides these details:
    • User information: Name of the logged in user.
    • Number of total votes: Total votes for the user.
    • Strengths details: We show strengths of the user in a tabular form. Each row of the table has these details:
      • Strength title
      • Total votes on the strength
      • Created by
      • Buttons to view strength details, update a strength, and delete a strength

As we add more features to the application, we will update this page. Stay tuned for the updates and new articles on it.

Interview questions on leadership skills

Audience: anyone conducting/preparing for a technical leadership interview position.

What’s the best way to work with executives?

To work with executive, I prefer to follow the formal process per the company standards. Executive’s time is generally limited and they prefer a brief and to the point communication. Before reaching out to executives, it is important to understand if the topic worth the need of executive’s time. If it needs the time from executive, setting the expectation is important. I will determine if the topic to share with executives is to provide an information, or ask for any feedback or a decision. If it is about asking a decision, it’s helpful to prepare options. If a help is needed for an executive, it should be clear what kind of a help is needed. Executives prefer a clear communication and expectations. I will plan for it accordingly.

Is consensus always a good thing?

Consensus is not a good thing always. Depending on the situation, this strategy should be used. In my experience, there was a product that was working on a stable way but the company needed something completely innovative. To get such an environment, a leader challenged the existing environment. He did not have the consensus to proceed. But he did the right thing. In other situation, a consensus helped when I had to make a go/no-go decision on a success of failure of a User Acceptance Testing. Consensus helped as there were only few people in oppose and all such open issues could either be delayed or worked around.

What is the best way to work with customers and users?

The best way to work with customers or users is to understand their perspective and requirements. I believe the main goal of a product is to meet a customer’s or a user’s needs.

What kinds of people do you like to work with?

I like working with people who can collaborate well towards a common goal. This needs keeping individual approach secondary and thinking and planning for a common objective. This needs individuals to think beyond their individual achievements and focus on a teamwork. It is important to understand everyone’s perspective in the situations. The final decision must be that is needed for the success of the common goal.

What kind of people do you have a hard time working with?

I get hard time working with people who do not think for the success of the common goal. Instead, they may be driven from some other individual or team goals. To deal with such situation, I prefer to remind everyone about the common goal.

What would you do to get a team to stick to a schedule?

This depends on the type of the task. I will first provide a high level context of why we have a schedule and what do we want to achieve as a team within a schedule. Then, I will prepare a plan with everyone’s collaboration. Once everyone agrees with the plan and understand why we want to stick to the schedule, I suggest someone to schedule recurring meetings to check the progress status, blockers, and next steps. Ib the middle of the schedule, I prefer to remind everyone why this schedule is important , why we all agreed to it, and what are positive consequences of sticking to the schedule.

What’s the difference between leadership and management?

Leadership is about influencing, inspiring, and enabling others to make a positive impact. Management is about controlling group or entities to complete a goal on time.

Reference:

As I learn more, I will update this page. Thank you !

Behavioral interviews questions

Who should read this article: Anyone appearing in or conducting in an interview for a developer, engineering manager, project/program, or a product manager or above role.

Disclaimer: Answers to these questions below are based on my experience and I may be wrong for some answers or you may have another opinion or an answer to it. It’s advisable to come up with your own answers.

Tell me about a time when you faced a challenge and overcame it.

Nugget: Sure. I will describe a challenge of integrating with a system that provides flag of who’s eligible to contact and who’s not.

Situation: We had a migration project on marketing campaigns, to reach out to customers, to get their feedback on products. We were using a central user identification system, to know who is eligible to be contacted. I was not getting the answer to it.

Result: After more than 5 attempts to convince this group with repeatedly asking for same information in different ways, finally they were convinced. We did it in a way that provided us information.

Tell me about how you interact with customers or clients?

In my various roles, this has been done in a. different way. In my current role, I interact with my internal customers as my business partners. We discuss about customer experiences, new changes to the business, and how it can impact my owned infrastructure or business area.

Talk about how you overcame product failures/challenges or poor feedback.

There was a time when our product was not meeting the customer’s needs. We met with the group of customers periodically and explained our limitations. Unfortunately these failures increased day by day to a point that these were not manageable. Then, collectively, we took an innovative approach to come up with a next suite of products that overcame the failures for a longer time.

Tell me about a time when you had to influence a team.

Nugget: Sure, let me tell you a time when I had to convince a team to increase the scope of testing in a project.

Situation: For a customer facing campaign emails go-live, engineering team was in a hurry, to go-live with the campaign. They had a pressure of reaching an end of life of an application. I still had to convince them to increase testing coverage as it was a customer facing application.

Action: I listened to their situation/proposal. In this situation, I had to disagree with proposed testing coverage. I convinced them with past data when a problem occured, due to lack of testing. I shared my past mistake when customers experienced an issue. I also told them that I would have agreed to their proposal if it was an internal release BUT we represent the company and can’t risk the customers.

Result: Unfortunately, there were tough deadlines, due to a legacy end of life. Team agreed to increase testing coverage upto 80% of what I sugegsted. I compromised to reduce my ask a little bit. But it was a good enough testing plan that gave both the teams a win/win feeling.

Tell me about a time when you have made a mistake.

I will describe a situation of a project in that I provided I underestimated the level of efforts for the development. This situation added burden on me to finish the development task within the given timeline. When I came up with it, I didn’t know it’s a large effort task. I estimated it as a medium task. To mitigate the delay of the go-live, I had to work extra hours, to develop the component within the expected timeline. Ultimately, after working long hours, I managed to get through it and complete the task in the given timeline.

How would you handle if two executives are asking to prioritize two different features and you can plan only for

one?

I will describe a prioritization exercise for the situation. I will go through the features requested by both the executives, understand level of efforts in implementing it, and business priority of it. After going through it, to both executives together, I will explain the current bandwidth of how much we can accomplish/deliver. To do that, I will utilize the appropriate forum that is suitable for such a prioritization. Then, after the discussions, I will decide the outcome of selecting one feature to deliver it on time with the expected quality.

Tell me about a time you used data to make a decision.

I will describe a situation when we determined the priority of a fix depending on the volume of the issues. In a customer campaign, we wanted to fix an issue. But we had limited bandwidth to fix the issue. We analyzed the data to understand the criticality of the issue. It turned out that the chances of issues were less than 0.5%. We had to delay this issue fix over other higher priority issues. So, this is how data helped us making the decision.

As I learn more, I will update this page. Thank you !

System Design Basics

Why and who should read this article: This article is for readers who are looking for a brief summary of system design concepts, as a reference for interviews, project needs, or a general curiosity.

What are key features of distributed systems:

  • Scalability: Scalability helps to meet the increased demand.
    • Horizontal scaling: it can be achieved by adding more servers to the existing infrastructure. Examples are MongoDB and Cassandra DB that are horizontally scalable by adding more servers.
    • Vertical scaling: it can be achieved by adding more resources (like CPU, RAM, etc.) to the same servers. An example is MySQL that can be scaled by switching it from smaller to a larger capacity machine.
  • Reliability: If I can be confident that the system will always succeed to serve the needs at all the time, I will call it a reliable system. For example, I can rely on my email system that it will always be available. Even if some of the services might fail, it is still reliable.
  • Availability: Availability means a system is operational in the time period. If a system has no downtime, it could be 100% available. Note: A system may be available but may not be reliable. For example, my email is available to use but what if it has a security issue? If it has a security issue, I will not rely on it to keep my emails secured.
  • Efficiency: Being efficient means the system works in a right way.
  • Effectiveness: Being effective means the systems provides output within the accepted quality and quantity.
  • Manageability: A system is manageable or serviceable if it can be repaired within the defined criteria.

What is a Load Balancer and how it distributes the traffic:

  • A Load Balancer helps in managing the traffic on servers. If a server is busy or if a server is not responding, the load balancer can route the traffic to another server. There are many ways to design the architecture using load balancers at the right place. For example, we can have load balancer before web servers, we can have it between middle layer/application servers or before the databases.
  • Load Balancing algorithms: Load Balancers have algorithm options to decide how to choose a server to send a requests to serve. Below are some of the options:
    • Least connection method
    • Least response time
    • Least bandwidth method
    • Round robin
    • Weighted round robin
    • IP hash

What are Redundancy and Replication concepts:

  • Redundancy is a way to avoid fault tolerance. In simpler terms, let’s say we have two web servers serving the traffic. What if we keep a third web server on a stand by mode? If by any chance one out of two web application servers fails to serve the traffic, we can utilize the third web application server with the failed web server. This third web server is a redundant server. I can recall my personal experience when one of two web servers failed. It was easy to utilize the standby/redundant server within an hour. It’s possible to plan for it without waiting for an hour.
  • Replication is a way to ensure the redundant servers are in sync. In the above example. what if a server fails but the redundant server is not up-to-date with the data/files on main traffic serving web servers? To avoid such a situation, it’s advisable to ensure such a redundant server has same information as other servers. Another common example is a database replication from a primary to a secondary database. In my experience, a server failed but the redundant server was not ready. It happened because the script responsible for the replication failed. To prevent such a failure of a replication, an automated period health and data check of replicated servers is important. Periodic dry-run of such a change process is recommended. As Benjamin Franklin said, “Failing to prepare is preparing to fail.”

What is Caching: and how it works:

Cache is a way to store data in memory for a defined period. This helps to access data faster. One simple example is HTTP session. In a web application architecture, we can keep the commonly asked data into cache. Other related information:

  • Cache helps in reducing latency and increasing throughput.
  • Content Delivery Network (CDN) systems use static media files. Generally, web applications use CDNs to store static media files. A CDN could be using a light weight HTTP server using NGINX. CDN is a network of servers that distributes the content from original server to multiple locations by caching the content closest to the users’ locations.
  • Cache invalidation: It’s important to plan for cache invalidation when theta is changed at the source (like a database). There are different ways to do it:
    • Write-through cache: this technique suggest to write the data in the cache at the same time when the data is written to the database. This prevents any data sync issue. But this increases latency of write operation as it has to write data twice.
    • Write-around cache: this technique suggests to just write the data to the storage (like a database) and not to write to the cache. When the data is accessed, it will take time.
    • Write-back cache: this technique suggests to write the data only in cache, not in the storage. After the end of a time period or conditions, data is saved into the storage. This has the side effect of missing the data from cache, if there is a crash in the servers string the cache data.

What is data partitioning:

Definition: It is a process of splitting the data into multiple small parts. After a certain point, it is better to scale horizontally, by adding more machines. Below are some ways to partition the data:

  • Horizontal partitioning: It is also known as data sharing. We put different rows into different tables. It could be done based on a range.
  • Vertical partitioning: In this type of partition, data table is divided vertically. For example, data can be stored in one DB server and images can be stored in another DB server.
  • Directory based partitioning: In this approach, partitioning plan can be stored in a look up service.

What are Partitioning criteria:

For more about partitioning, refer here. Below are brief notes:

  • Range based partitioning: This approach assigns rows to partitions as per a range. For example, I partitioned a MySQL DB table by months so that the database performance is optimized.
  • Key or hash based partitioning: this approach partitions the data with hash code on a key field. But hash based can be problematic to further expand in the future. So, using consistent hashing is recommended.
  • List partitioning: This is similar to range. In this approach, we partition the data based on a list. For example, we can partition based on region or based on a language preference.
  • Round-robin partitioning: In this approach, new rows are assigned to a partition on a round robin basis.
  • Composite partitioning: It combines more than one partitioning approaches. For example, we can first apply list partitioning and the hash partitioning.

What are the problems with data partitioning:

  • Joining database tables can be performance inefficient. To avoid it, try to denormalize data in a way that avoids data cross-joins.
  • Data partitioning can cause referential integrity. To avoid it, store the referential integrity logic in application code.
  • Schema changes are difficult with data partitioning. To avoid it, use directory based partitioning or consistent hashing.

When to use data partitioning:

We should use it when it is not possible to manage the data within a single node or a performance improvement is necessary.

What is a proxy server:

When a client sends a request, the first server that receives the request could be a limited, light-weight server. This light-weight server can further pass the request to the actual backend server. Such a first server is called a proxy server. It could be a hardware or a software. A proxy server acts as a firewall.

Advantages/usages of proxy servers:

  • Logging the requests.
  • It can also help in caching the responses.
  • Serving a downtime message when required.
  • Proxy server helps to add the security to the backend server.
  • Blocking some websites for the users within a company. Proxy server can also be used to bypass the restriction of a website for a company users.

Here are some types of proxy servers:

  • Open proxy: An open proxy is accessible by any user on internet. It can be anonymous (that hides the identify of the originated machine) or a transparent (show the identity of the originated machine).
  • Reverse proxy: A reverse proxy is for the server to get the response from other servers and send the response to the client. A Load Balancer is a use case of a reverse proxy.

What is a Heartbeat for systems:

In a distributed systems architecture, we need to know if other servers are working. To achieve it, we can have a centralized monitoring system that can get the uptime status of each server. We can decide the steps if a server is not working as expected.

What is a checksum:

Checksum is a way to ensure the data transferred from one system to another system is as expected. Checksum is calculated and stored with the data. To calculate the checksum, a hash function like MD5 can be used. Source and destination servers can match the checksum to ensure data is transferred from the genuine source.

What is quorum:

In a distributed system, a quorum is a process to ensure all required systems have the same information and it only completes a transaction complete when all systems have the needed information. For example, if we have three database servers. If we want to ensure that a transaction is only considered complete when all three databases instances get the same information and agree to the transaction. Quorum can help to ensure such an operation.

What is the Bloom filter:

Bloom filter is a data structure approach to quickly find an element in a set. Bloom filter structure informs if an element MAYBE in a set or DEFINITELY not.

How HTTP works: A user hits a URL on the browser. We use either http or https protocol. Second is a domain (like http://www.abc.com). We use a DNS (Domain Name Service) lookup to look for an IP for a domain. DNS information is generally cached. To look for a DNS, we have DNS servers. Finally a browser has the IP address of the server. Next, the browser get a TCP connection with the server. Browser sends a request to the server. Server sends an http response to the browser. Browser parses the responds and shows the response to the user on the browser.

Bare metal infrastructure: this is a term used for legacy physical server infrastructure. When an application needs the highest level of security, bare metals could be the most appropriate solution.

Virtual machines: This uses a hardware that is shared for multiple virtual servers. We use a hypervisor underneath guest OSs. The downside is that these could be vulnerable by noisy neighbor problems.

Containers: it’s a light weight stand alone package. We use a hardware and host OS. On top of it, a container engine is installed. On top of container engine, multiple containers are deployed. Containers are scalable and portable. Containers are less secured. They are vulnerable to security issues at OS level. To avoid security issues, we can run containers inside virtual machines.

Thank you for reading it. As I learn more, I will revise it.

References:

Databases basics

Who should read it: It is for you if you are looking for an overview of this topic for a project, to conduct/appear in an interview, or in general. As we learn more, we will update this article.

Why we need distributed databases: 

  • Difficult to store entire data set into a single database
  • Single point of failure
  • Slow in performance
  • If we make a big computer, it will be more expensive

For the distributed database architectures, we have a master database and can have multiple secondary databases. Single node databases are classic databases like PostgreSQL, MySQL, etc. Distributed databases are made of multiple nodes. These are fault tolerant. Database clusters means multiple database instances. In general, we have leader nodes and follower nodes. Leader node is in charge of returning the final data results. Followers receive the data. If the leader node fails, a follower node can become the leader node.

Types of distributed databases:

  • Big compute databases: Split data across multiple nodes. These are suitable for analytical workloads.
  • High availability databases: are extremely fault tolerant. Each node has a full copy of the data.

Some key points about distributed databases:

  • Imbalance node: the problem when a node has more data load. Moving the data between nodes is slow.
  • Asking data from hard disk is slow. Asking data from RAM is fast.
  • Leader node examines the query. Leader node distributes the jobs to different nodes.
  • Sharding: It is a model in that all database instances acts as the primary databases. We segment data into multiple instances. Problem is that if we have more load on one segmented database, it will cause problems. Also, if we have to join data with two databases, we will have network connections.

CAP theorem:

  • C stands for Consistency. If we write information, we want to get same data. That is the consistency. To maintain consistency, data in the primary and secondary databases should replicate asap.
  • A stands for Availability. If we have two databases, if one machine goes down, as a whole system, users should be able to read or write.
  • P stands for Partitions. Partition tells us that even if connections to two machines not working, we should still be able to read/write the data.
  • RBDMS databases provide strong consistency. NoSQL databases generally prioritize availability, partition tolerance, and provide eventual consistency.

As per CAP theorem, generally, databases can achieve up to two features out of three. For distributed databases, assume network failures will be inevitable. So, for distributed databases, we need to choose between C and A.

PACELC theorem:

As per CAPLEC theorem, if partition happens, choose Availability and Consistency. Else, choose Latency and Consistency.

Other theorems:

BASE: BASE stands for Basically Available Soft state Eventually Consistent. NoSQL is an example of a BASE.

  • Basically Available: The system is guaranteed to be available in event of failure.
  • Soft State: The state of the data could change without application interactions due to eventual consistency.
  • Eventual Consistency: The system will be eventually consistent after the application input. The data will be replicated to different nodes and will eventually reach a consistent state. But the consistency is not guaranteed at a transaction level.

Indexes:

Database index is a data structure that helps to retrieve data faster from a table. Indexes are like library catalog that helps to know the location of a book. For more about index, refer here.

Relational databases:

Relational databases store data in rows and columns. Some famous relational databases are MySQL, Oracle, and Postgres.

Advantages of relational databases:

  1. Well defined relationships and structured: data in relational databases is structured, with foreign and primary key constraints. It helps in organizing the data. Defined relationships and structure also helps in retrieving the data effectively.
  2. ACID (Atomicity, Consistency, Isolation, Durability): As relational databases support ACID properties, it’s helpful in ensuring the data changes for a transaction.

Disadvantages of relational databases:

  1. Rigidity due to structured data: As the data is well defined and structured, it’s not easy to store a new data set for that a structure is unknown. For example, to add a new column into a table, the table has to be changed, to support it.
  2. Difficult to scale: scaling means supporting more volume of data. For relational databases, scaling is difficult. For read-only operations, it’s easier to replicate the data. For write operations, a general approach is to add more capacity (vertical scaling) to the primary database server, which is costlier than replacing read-only databases.

NoSQL Databases:

There are multiple types of NoSQL databases like:

  • Key-Value storage type: Data is stored in key-value pairs in arrays. Some examples of such databases are Redis and Dynamo databases.
  • Document databases: In these databases, data is stored in the documents. A collection is a group of documents. Each document can have a different structure. An example of such a database is MongoDB.
  • Wide-column databases: In these databases, the number of columns can vary per row in the same table. We can consider it as a two dimensional key-value storage. Some examples of wide-column databases are Cassandra and HBase. For more, refer here.
  • Graph databases: These represent data in a form of a graph. Examples of such databases are Neo4J and Infinite graph.
  • Some NoSQL databases:
    • Couchbase: It is a NoSQL database that stores the data either in key/value pair or in JSON document format. In a traditional database model, we begin with a schema. We add tables and the columns in the tables.
    • MongoDB: MongoDB is an open source document database. It works on concept of collections and documents. A collection is a document which is equivalent to an RDBMS table. A document is a set of key-value pairs. It is a schema-less database. It is easy to scale. It is a good choice for a Big Data need.

Advantages of NoSQL databases:

  • Flexibility with unstructured data: As the data in NoSQL databases is unstructured, these databases provide more flexibility to store the data.
  • Horizontal scaling: Horizontal scaling means distributing data into multiple server instances. Data in NoSQL databases are distributed, by using sharding. These databases support horizontal scaling for both, read and write operations.

SQL versus NoSQL databases:

  • Storage: Data in SQL databases is stored in rows. NoSQL databases have different data storage models like key-value or graph
  • Schema: SQL databases have a fixed schema. noSQL databases can have different schemas.
  • Querying: SQL databases use Structural Query Language (SQL), to retrieve the data. NoSQL databases uses UnQL (Unstructured Query Language). NoSQL are focused as a collection of documents.
  • Scalability: Horizontal scaling in SQL databases is difficult as compared to NoSQL databases.
  • Reliability: Most SQL databases are reliable and ACID compliant. Whereas, NoSQL databases may compromise reliability and ACID compliance.
  • Language: SQL databases use transactional SQL. They support core ANSI/ISO language elements. Whereas, NoSQL databases are not limited to one particular language. For example, MongoDb uses Javascript based query language.

 How to choose between SQL and NoSQL databases:

  • Consider SQL databases when:
    • Data is structured and structure is not changed frequently.
    • Supporting transaction-oriented use cases.
    • No need to scale the database.
  • Consider NoSQL databases when:
    • Data is not structured and the structure can change frequently.
    • A flexibility of dynamic schema is needed.
    • We anticipate the scaling of the database in the future.
    • Level of data integrity is not needed.

Other databases terms:

  • Purpose built databases: There are many options like relational database, key-value database, document database, graph database, in-memory database, time series database, and ledger database. Depending on the situation, today’s application developers need to pick a right database for the use case, by analyzing pros and cons of the situation.

Partitioning and Sharding: it is the process of splitting the data into columns or features. Vertical partitioning splits the data into the same database by columns or features within the same tables. Horizontal partitioning splits the table data to multiple shards ( e.g. multiple database locations). In case of sharding, a table may have a customer ID 1 on one server and customer ID 2 on another server. In case of partitioning, other customer Ids 1 and 2 are on the same database servers and in the same table.

Distributed transaction types:

  • Two-phase commit (2PC): In case of 2 Phase commit, there is a coordinator that prepares multiple transactions. Then, the coordinator either commits or rolls back all transactions together. While preparing each transactions, the database table rows (that to be updated in DB) are locked using local transactions. This prevents any updates during the 2-phase transaction. We also need to plan for a time limit for each transaction so that the coordinator is waiting to commit or rollback within a defined time period.
  • 3 phase commit: First phase is that the coordinator asks if it’s ok to commit. Second phase is to pre-commit the transaction. Third step is to commit a transaction. I’m yet to learn more and add more notes about it.
  • Try-Confirm/Cancel (TC/C): In the first phase, the coordinator asks all databases to reserve resources. In the second phase, the coordinator captures the replies from all databases. If the response is yes, the coordinator asks all databases to confirm the transaction. If any of the databases respond as no, the coordinator asks all databases to cancel the transaction.
  • Saga: It’s an asynchronous way to achieve the transaction. It’s an event driven process. Micro-services generally use Saga as their default choice. In Saga, all operations are executed in a sequence. When one operation finishes, then the next operation is executed. For the rollback purpose, we need to prepare double operations: one for execution and the another one for the rollback. To coordinate operations in Saga, there are two ways:
    • Choreography: All services do their jobs by subscribing to other services’ events. This is a decentralized coordination.
    • Orchestration: In this way, there is a single coordinator to instruct all services in a defined order.

References: