MongoDB vs. MySQL: How to choose

MongoDB and MySQL are the leading open source NoSQL and relational databases, respectively. Which is best for your application?

MongoDB vs. MySQL: How to choose
Thinkstock

During the dot-com bubble in the 1990s, one common software stack for web applications was LAMP, which originally stood for Linux (OS), Apache (web server), MySQL (relational database), and PHP (server programming language). MySQL was the preferred database mostly because it was free open source and had good read performance, which fit well with “Web 2.0” apps that dynamically generated sites from the database.

Later the MEAN stack, which stood for MongoDB (document database), Express (web server), AngularJS (front-end framework), and Node.js (back-end JavaScript runtime), came to prominence. The MEAN stack was attractive, among other reasons, because the only language you needed to know was JavaScript. It also needed less RAM than an equivalent LAMP stack.

What is MySQL/MariaDB?

Monty Widenius and David Axmark of MySQL AB originally developed MySQL starting in 1994. The “My” in the product name refers to Widenius’ daughter, not the English word “my.” MySQL was designed to be API-compatible with mSQL (a.k.a. Mini SQL), with the addition of a SQL query layer and an open source license (actually a dual license, both proprietary and GPL). Public MySQL releases started at the end of 1996, and continued every year or two. MySQL is currently the most popular relational database.

Sun Microsystems acquired MySQL AB in 2008 (for $1 billion), and Oracle acquired Sun in 2010. Widenius forked MySQL 5.5 into MariaDB just prior to the Oracle acquisition, amid widespread concern about Oracle’s intentions for MySQL. MariaDB has tried hard to maintain compatibility with Oracle MySQL versions.

MySQL started off as a fairly low-end relational database compared to more capable commercial relational databases such as Oracle Database, IBM DB/2, and Microsoft SQL Server, although it was good enough to be the backing store for dynamic websites. Over the years it has added most of the features you expect from a relational database, including transactions, referential integrity constraints, stored procedures, cursors, full-text indexing and searching, geographic indexing and searching, and clustering.

MySQL is still usually used in small to medium-sized deployments, although it now supports “big database” features such as master-slave deployments, use with Memcached, and horizontal sharding. Scaling MySQL out to multiple slaves improves read performance, but only the master accepts write requests.

AWS offers MySQL as a service in two flavors, Amazon RDS and Amazon Aurora. The latter has much higher performance, can handle terabytes of data, has lower lag time for updating replicas, and directly competes with Oracle Database and SQL Server.

What is MongoDB?

MongoDB is highly scalable, operational document database available in both open source and commercial enterprise versions, and it can be run on premises or as a managed cloud service. The managed cloud service is called MongoDB Atlas.

MongoDB is far and away the most popular of the NoSQL databases. Its document data model gives developers great flexibility, while its distributed architecture allows for great scalability. As a result, MongoDB is often chosen for applications that must manage large volumes of data, that benefit from horizontal scalability, and that handle data structures that don’t fit the relational model.

MongoDB is a document-based store that also has a graph-based store implemented on top of it. MongoDB doesn’t actually store JSON: it stores BSON (Binary JSON), which extends the JSON representation (strings) to include additional types such as int, long, date, floating point, decimal128, and geospatial coordinates.

MongoDB can generate multi-modal graph, geospatial, B-tree, and full text indexes on a single copy of the data, using the type of the data to generate the correct type of index. MongoDB lets you create indexes on any document field. MongoDB 4 has multi-document transactions, which means that you can still get ACID properties even if you have to normalize your data design.

By default, MongoDB uses dynamic schemas, sometimes called schema-less. The documents in a single collection do not need to have the same set of fields, and the data type for a field can differ across documents within a collection. You can change document structures with dynamic schemas at any time.

Schema governance is available, however. Starting in MongoDB 3.6, MongoDB supports JSON schema validation, which you can turn on in your validator expression.

The LAMP and MEAN Stacks

Lots of variations on the LAMP and MEAN stacks exist. Instead of the Linux OS, for example, you can run on Windows (WAMP) or MacOS (MAMP). Instead of the Apache web server on Windows, you can run IIS (WIMP).

Instead of the MySQL relational database in the LAMP stack, you could run PostgreSQL or SQL Server. If you need global distribution, you could run CockroachDB or Google Cloud Spanner. Instead of the PHP language, you could code in Perl or Python. If you want to code in Java or C#, there are separate families of stacks to consider.

Instead of the MongoDB document database in the MEAN stack, you could run Couchbase or Azure Cosmos DB for better global distribution. Instead of Express, you could use any of a dozen Node.js web server frameworks. Instead of the AngularJS front-end framework, you can run Angular 2 or React.

How to choose a database for your application

The most important questions to ask when you are picking a database are:

  • How much data do you expect to store when the application is mature?
  • How many users do you expect to handle simultaneously at peak load?
  • What availability, scalability, latency, throughput, and data consistency does your application need?
  • How often will your database schemas change?
  • What is the geographic distribution of your user population?
  • What is the natural “shape” of your data?
  • Does your application need online transaction processing (OLTP), analytic queries (OLAP), or both?
  • What ratio of reads to writes do you expect in production?
  • Do you need geographic queries and/or full-text queries?
  • What are your preferred programming languages?
  • Do you have a budget? If so, will it cover licenses and support contracts?

Several of these questions will tend to narrow the selection of a database, but we have many more choices available than when the LAMP stack was formulated. If you’re building an application that needs to be available 99.999 percent of the time to users all over the world with strong consistency, only a few databases will fit the bill. If your application will be used in one country from 9 a.m. to 6 p.m. on weekdays and can tolerate eventual consistency, almost any database will work, although some will be easier for developers and operators and some will give you better performance for your primary usage scenarios.

While the LAMP and MEAN stacks were good solutions for web applications at one time, neither one is optimum now. Rather than blindly adopting one or the other, you should think through your use cases and find an architecture that will serve your application for the foreseeable future.

SQL or NoSQL?

When would you want a relational database such as MySQL for a new application? Aside from the obvious support for standard SQL, relational databases per se force the data into a tabular schema with consistent strong typing of fields, and help you to avoid data duplication as long as you take advantage of normalization.

If you need to avoid missing data, you can declare fields NOT NULL when you create or modify tables. If you need geographic queries as defined by the Open Geospatial Consortium, most relational databases supply a robust implementation. And if you need full-text search, most relational databases allow you to define inverted list indexes on text fields, called FULLTEXT indexes in MySQL.

On the other hand, if you also need an occasional free-form document, MySQL and many other relational databases also support JSON data as defined by RFC 7159. And if you also want to use XML documents and XPath or XSLT, most relational databases provide that capability.

When would you want a document database such as MongoDB? If your primary use case needs to allow free-form data, fields that change types from document to document, a schema that changes over time, or nested documents, then a NoSQL database will fulfill the requirements. In addition, if your application is written in JavaScript, then the JSON format of document databases will be a natural fit.

Copyright © 2019 IDG Communications, Inc.