Common Java libraries

Lombok API: This is a very helpful library that allows reducing infrastructural code. If we use Lombok API, we don’t have to write the code for getters, setters, constructor, equals, hash code methods, and even more.

Resilience4j API: It is a library designed for functional programming. One example use case is the rate limiter functionality, to limit number of maximum requests served by an API in a defined time period. Other examples are:

  • Concurrency control using bulkhead module
  • Fault Tolerant using retry

Hystrix API: Hystrix API can help make a service fault tolerant and resilient.

Javatuples: It’s an API that allows us to work with tuples. A tuple is a sequence of unrelated objects of different types. For example, a tuple may contain an integer, a string, and an object.

Javasist, CGLib, and ASM: These are APIs to manipulate Java byte codes.

P6Spy: It is a library that allows logging of database operations in the realtime.

Java Transaction Management: A transaction is a series of actions that must be completed. Java provides multiple ways to control transactions. Java provides transactions that are based on JDBC, JPA, JMS, Global Transactions, Java Transaction API (JTA), Java Transaction Service (JTS), and other related ways.

References:

Behavioral interviews questions

Who should read this article: Anyone appearing in or conducting in an interview for a developer, engineering manager, project/program, or a product manager or above role.

Disclaimer: Answers to these questions below are based on my experience and I may be wrong for some answers or you may have another opinion or an answer to it. It’s advisable to come up with your own answers.

Tell me about a time when you faced a challenge and overcame it.

Nugget: Sure. I will describe a challenge of integrating with a system that provides flag of who’s eligible to contact and who’s not.

Situation: We had a migration project on marketing campaigns, to reach out to customers, to get their feedback on products. We were using a central user identification system, to know who is eligible to be contacted. I was not getting the answer to it.

Result: After more than 5 attempts to convince this group with repeatedly asking for same information in different ways, finally they were convinced. We did it in a way that provided us information.

Tell me about how you interact with customers or clients?

In my various roles, this has been done in a. different way. In my current role, I interact with my internal customers as my business partners. We discuss about customer experiences, new changes to the business, and how it can impact my owned infrastructure or business area.

Talk about how you overcame product failures/challenges or poor feedback.

There was a time when our product was not meeting the customer’s needs. We met with the group of customers periodically and explained our limitations. Unfortunately these failures increased day by day to a point that these were not manageable. Then, collectively, we took an innovative approach to come up with a next suite of products that overcame the failures for a longer time.

Tell me about a time when you had to influence a team.

Nugget: Sure, let me tell you a time when I had to convince a team to increase the scope of testing in a project.

Situation: For a customer facing campaign emails go-live, engineering team was in a hurry, to go-live with the campaign. They had a pressure of reaching an end of life of an application. I still had to convince them to increase testing coverage as it was a customer facing application.

Action: I listened to their situation/proposal. In this situation, I had to disagree with proposed testing coverage. I convinced them with past data when a problem occured, due to lack of testing. I shared my past mistake when customers experienced an issue. I also told them that I would have agreed to their proposal if it was an internal release BUT we represent the company and can’t risk the customers.

Result: Unfortunately, there were tough deadlines, due to a legacy end of life. Team agreed to increase testing coverage upto 80% of what I sugegsted. I compromised to reduce my ask a little bit. But it was a good enough testing plan that gave both the teams a win/win feeling.

Tell me about a time when you have made a mistake.

I will describe a situation of a project in that I provided I underestimated the level of efforts for the development. This situation added burden on me to finish the development task within the given timeline. When I came up with it, I didn’t know it’s a large effort task. I estimated it as a medium task. To mitigate the delay of the go-live, I had to work extra hours, to develop the component within the expected timeline. Ultimately, after working long hours, I managed to get through it and complete the task in the given timeline.

How would you handle if two executives are asking to prioritize two different features and you can plan only for

one?

I will describe a prioritization exercise for the situation. I will go through the features requested by both the executives, understand level of efforts in implementing it, and business priority of it. After going through it, to both executives together, I will explain the current bandwidth of how much we can accomplish/deliver. To do that, I will utilize the appropriate forum that is suitable for such a prioritization. Then, after the discussions, I will decide the outcome of selecting one feature to deliver it on time with the expected quality.

Tell me about a time you used data to make a decision.

I will describe a situation when we determined the priority of a fix depending on the volume of the issues. In a customer campaign, we wanted to fix an issue. But we had limited bandwidth to fix the issue. We analyzed the data to understand the criticality of the issue. It turned out that the chances of issues were less than 0.5%. We had to delay this issue fix over other higher priority issues. So, this is how data helped us making the decision.

As I learn more, I will update this page. Thank you !

Databases basics

Who should read it: It is for you if you are looking for an overview of this topic for a project, to conduct/appear in an interview, or in general. As we learn more, we will update this article.

Why we need distributed databases: 

  • Difficult to store entire data set into a single database
  • Single point of failure
  • Slow in performance
  • If we make a big computer, it will be more expensive

For the distributed database architectures, we have a master database and can have multiple secondary databases. Single node databases are classic databases like PostgreSQL, MySQL, etc. Distributed databases are made of multiple nodes. These are fault tolerant. Database clusters means multiple database instances. In general, we have leader nodes and follower nodes. Leader node is in charge of returning the final data results. Followers receive the data. If the leader node fails, a follower node can become the leader node.

Types of distributed databases:

  • Big compute databases: Split data across multiple nodes. These are suitable for analytical workloads.
  • High availability databases: are extremely fault tolerant. Each node has a full copy of the data.

Some key points about distributed databases:

  • Imbalance node: the problem when a node has more data load. Moving the data between nodes is slow.
  • Asking data from hard disk is slow. Asking data from RAM is fast.
  • Leader node examines the query. Leader node distributes the jobs to different nodes.
  • Sharding: It is a model in that all database instances acts as the primary databases. We segment data into multiple instances. Problem is that if we have more load on one segmented database, it will cause problems. Also, if we have to join data with two databases, we will have network connections.

CAP theorem:

  • C stands for Consistency. If we write information, we want to get same data. That is the consistency. To maintain consistency, data in the primary and secondary databases should replicate asap.
  • A stands for Availability. If we have two databases, if one machine goes down, as a whole system, users should be able to read or write.
  • P stands for Partitions. Partition tells us that even if connections to two machines not working, we should still be able to read/write the data.
  • RBDMS databases provide strong consistency. NoSQL databases generally prioritize availability, partition tolerance, and provide eventual consistency.

As per CAP theorem, generally, databases can achieve up to two features out of three. For distributed databases, assume network failures will be inevitable. So, for distributed databases, we need to choose between C and A.

PACELC theorem:

As per CAPLEC theorem, if partition happens, choose Availability and Consistency. Else, choose Latency and Consistency.

Other theorems:

BASE: BASE stands for Basically Available Soft state Eventually Consistent. NoSQL is an example of a BASE.

  • Basically Available: The system is guaranteed to be available in event of failure.
  • Soft State: The state of the data could change without application interactions due to eventual consistency.
  • Eventual Consistency: The system will be eventually consistent after the application input. The data will be replicated to different nodes and will eventually reach a consistent state. But the consistency is not guaranteed at a transaction level.

Indexes:

Database index is a data structure that helps to retrieve data faster from a table. Indexes are like library catalog that helps to know the location of a book. For more about index, refer here.

Relational databases:

Relational databases store data in rows and columns. Some famous relational databases are MySQL, Oracle, and Postgres.

Advantages of relational databases:

  1. Well defined relationships and structured: data in relational databases is structured, with foreign and primary key constraints. It helps in organizing the data. Defined relationships and structure also helps in retrieving the data effectively.
  2. ACID (Atomicity, Consistency, Isolation, Durability): As relational databases support ACID properties, it’s helpful in ensuring the data changes for a transaction.

Disadvantages of relational databases:

  1. Rigidity due to structured data: As the data is well defined and structured, it’s not easy to store a new data set for that a structure is unknown. For example, to add a new column into a table, the table has to be changed, to support it.
  2. Difficult to scale: scaling means supporting more volume of data. For relational databases, scaling is difficult. For read-only operations, it’s easier to replicate the data. For write operations, a general approach is to add more capacity (vertical scaling) to the primary database server, which is costlier than replacing read-only databases.

NoSQL Databases:

There are multiple types of NoSQL databases like:

  • Key-Value storage type: Data is stored in key-value pairs in arrays. Some examples of such databases are Redis and Dynamo databases.
  • Document databases: In these databases, data is stored in the documents. A collection is a group of documents. Each document can have a different structure. An example of such a database is MongoDB.
  • Wide-column databases: In these databases, the number of columns can vary per row in the same table. We can consider it as a two dimensional key-value storage. Some examples of wide-column databases are Cassandra and HBase. For more, refer here.
  • Graph databases: These represent data in a form of a graph. Examples of such databases are Neo4J and Infinite graph.
  • Some NoSQL databases:
    • Couchbase: It is a NoSQL database that stores the data either in key/value pair or in JSON document format. In a traditional database model, we begin with a schema. We add tables and the columns in the tables.
    • MongoDB: MongoDB is an open source document database. It works on concept of collections and documents. A collection is a document which is equivalent to an RDBMS table. A document is a set of key-value pairs. It is a schema-less database. It is easy to scale. It is a good choice for a Big Data need.

Advantages of NoSQL databases:

  • Flexibility with unstructured data: As the data in NoSQL databases is unstructured, these databases provide more flexibility to store the data.
  • Horizontal scaling: Horizontal scaling means distributing data into multiple server instances. Data in NoSQL databases are distributed, by using sharding. These databases support horizontal scaling for both, read and write operations.

SQL versus NoSQL databases:

  • Storage: Data in SQL databases is stored in rows. NoSQL databases have different data storage models like key-value or graph
  • Schema: SQL databases have a fixed schema. noSQL databases can have different schemas.
  • Querying: SQL databases use Structural Query Language (SQL), to retrieve the data. NoSQL databases uses UnQL (Unstructured Query Language). NoSQL are focused as a collection of documents.
  • Scalability: Horizontal scaling in SQL databases is difficult as compared to NoSQL databases.
  • Reliability: Most SQL databases are reliable and ACID compliant. Whereas, NoSQL databases may compromise reliability and ACID compliance.
  • Language: SQL databases use transactional SQL. They support core ANSI/ISO language elements. Whereas, NoSQL databases are not limited to one particular language. For example, MongoDb uses Javascript based query language.

 How to choose between SQL and NoSQL databases:

  • Consider SQL databases when:
    • Data is structured and structure is not changed frequently.
    • Supporting transaction-oriented use cases.
    • No need to scale the database.
  • Consider NoSQL databases when:
    • Data is not structured and the structure can change frequently.
    • A flexibility of dynamic schema is needed.
    • We anticipate the scaling of the database in the future.
    • Level of data integrity is not needed.

Other databases terms:

  • Purpose built databases: There are many options like relational database, key-value database, document database, graph database, in-memory database, time series database, and ledger database. Depending on the situation, today’s application developers need to pick a right database for the use case, by analyzing pros and cons of the situation.

Partitioning and Sharding: it is the process of splitting the data into columns or features. Vertical partitioning splits the data into the same database by columns or features within the same tables. Horizontal partitioning splits the table data to multiple shards ( e.g. multiple database locations). In case of sharding, a table may have a customer ID 1 on one server and customer ID 2 on another server. In case of partitioning, other customer Ids 1 and 2 are on the same database servers and in the same table.

Distributed transaction types:

  • Two-phase commit (2PC): In case of 2 Phase commit, there is a coordinator that prepares multiple transactions. Then, the coordinator either commits or rolls back all transactions together. While preparing each transactions, the database table rows (that to be updated in DB) are locked using local transactions. This prevents any updates during the 2-phase transaction. We also need to plan for a time limit for each transaction so that the coordinator is waiting to commit or rollback within a defined time period.
  • 3 phase commit: First phase is that the coordinator asks if it’s ok to commit. Second phase is to pre-commit the transaction. Third step is to commit a transaction. I’m yet to learn more and add more notes about it.
  • Try-Confirm/Cancel (TC/C): In the first phase, the coordinator asks all databases to reserve resources. In the second phase, the coordinator captures the replies from all databases. If the response is yes, the coordinator asks all databases to confirm the transaction. If any of the databases respond as no, the coordinator asks all databases to cancel the transaction.
  • Saga: It’s an asynchronous way to achieve the transaction. It’s an event driven process. Micro-services generally use Saga as their default choice. In Saga, all operations are executed in a sequence. When one operation finishes, then the next operation is executed. For the rollback purpose, we need to prepare double operations: one for execution and the another one for the rollback. To coordinate operations in Saga, there are two ways:
    • Choreography: All services do their jobs by subscribing to other services’ events. This is a decentralized coordination.
    • Orchestration: In this way, there is a single coordinator to instruct all services in a defined order.

References: