Data Lake is a new storage concept that is gaining ground in the Cloud. Often the Data Lake term is being used as part of the Big Data solution. In theory, it is where you can store raw data in its native format, usually in Hadoop and Hadoop Distributed File System (HDFS). It can be used and processed to create data sets for other applications and users as and when needed. You don’t worry about the complex (and often expensive) data pipeline needed to simply collect and store diverse data.
The credit for coining the term Data Lake goes to James Dixon, Pentaho Chief Technology Officer. Dixon used the term initially to contrast with “data mart”, which is a smaller repository of interesting attributes extracted from the raw data.