How to Validate a URL Using Regular Expressions

Programming has made it easy to deal with structured and unstructured text data. Tools like regular expressions and external libraries make these tasks much easier.


You can use most languages, including Python and JavaScript, to validate URLs using a regular expression. This example regex isn’t perfect, but you can use it to validate URLs for simple use cases.


A regular expression to validate a URL

The regex presented in this article for validating a URL isn’t perfect. There may be several examples of valid URLs that may fail this regex validation. This includes URLs with IP addresses, non-ASCII characters, and protocols like FTP. The following regex only validates the most common URLs.

The regex considers a URL valid if it meets the following conditions:

  1. The string should start with either http or https followed by ://.
  2. The combined length of the subdomain and domain must be between 2 and 256. It may only contain alphanumeric characters and/or special characters.
  3. The TLD (top-level domain) should contain only alphabetic characters and be between two and six characters long.
  4. The end of the URL string can contain alphanumeric characters and/or special characters. And it could repeat zero or more times.

You can validate a URL in JavaScript using the following regular expression:

^(http(s):\/\/.)[-a-zA-Z0-9@:%._\+~

Similarly, you can use the following regex to validate a URL in Python:

^((http|https):

Where:

  • (http|https)://) makes sure the string starts with either http or https followed by ://.
  • [[email protected]:%._\\+~#?&//=] identifies alphanumeric characters and/or special characters. The first instance of this set represents the set of characters allowed in the subdomain and domain part. While the second instance of this set represents the set of characters allowed in the query string or part of the subdirectory.
  • {2.256} represents 2 to 256 (both inclusive) times of the occurrence indicator. This indicates that the combined length of the subdomain and the domain must be between 2 and 256.
  • \. represents the dot character.
  • [a-z]{2.6} means all lowercase letters from a to z with a length between two and six. This represents the set of characters allowed in the top-level domain part.
  • \b represents the boundary of a word, i.e. the beginning of a word or the end of a word.
  • * is a repetition operator that specifies zero or more copies of the query string, parameters, or subdirectories.
  • ^ and $ indicate the beginning and end of the string.
Read  Call of Duty: Modern Warfare 2

If the above expression makes you uncomfortable, check out a beginner’s guide to regular expressions first. Regular expressions take some time to get used to. Examining some examples like validating user account details using regular expressions should help.

The above regex satisfies the following URL types:

  • https://www.something.com/
  • http://www.something.com/
  • https://www.something.edu.co.in
  • http://www.url-mit-pfad.com/pfad
  • https://www.url-mit-querystring.com/?url=has-querystring
  • http://url-without-www-subdomain.com/
  • https://mail.google.com

Using the regular expression in a program

The code used in this project is available in a GitHub repository and is free for you to use under the MIT license.

This is a Python approach to validating a URL:

import re

def validateURL(url):
regex = "^((http|https):
r = re.compile(regex)

if (re.search(r, url)):
print("Valid")
else:
print("Not Valid")

url1 = "https://www.linkedin.com/"
validateURL(url1)
url2 = "http://apple"
validateURL(url2)
url3 = "iywegfuykegf"
validateURL(url3)
url4 = "https://w"
validateURL(url4)

This code uses Pythons recompile() -Method to compile the regular expression pattern. This method accepts the Regex pattern as a string parameter and returns a Regex pattern object. This regex pattern object is further used to search for occurrences of the regex pattern within the target string using Research() Method.

If it finds at least one match, it will Research() method returns the first match. Note that if you want to search for all pattern matches from the target string, you must use the re.findall() Method.

Running the above code will confirm that the first URL is valid but the rest are not.

Similarly, you can validate a URL in JavaScript with the following code:

function validateURL(url) {
if(/^(http(s):\/\/.)[-a-zA-Z0-9@:%._\+~
console.log('Valid');
} else {
console.log('Not Valid');
}
}

validateURL("https://www.linkedin.com/");
validateURL("http://apple");
validateURL("iywegfuykegf");
validateURL("https://w");

Again, running this code will confirm that the first URL is valid and the rest are invalid. It uses JavaScript fit() method to match the target string against a regular expression pattern.

Validate important data with regular expressions

You can use regular expressions to search, match, or parse text. They are also used for natural language processing, pattern matching, and lexical analysis.

You can use this powerful tool to validate important types of data like credit card numbers, user account details, IP addresses, and more.

Leave a Comment

Your email address will not be published. Required fields are marked *