GeoIP and rangebans

GeoIP and rangebans

This tutorial will explain how you can set up a GeoIP system without relying on external services that usually involve HTTP(). The only drawback to this method is that you will periodically (every few months or so) need to update the IP data to be, well, up-to-date.

Warning: Advanced. I don't recommend you attempting this if you're just starting with Pawn and/or MySQL. It gets really technical at some points.

This system can also be used, however, to easily instate range bans that use CIDR ranges. Most systems that I have seen so far store the IP address in its human readable string representation (e.g. 127.0.0.1) which kind of limits the ways you can instate bans. You can ban 127.0.0.* which will ban 127.0.0.0 through 127.0.0.255 or you can ban 127.0.*.* which will ban 127.0.0.0 through 127.0.255.255. But what if a particular ISP has been assigned (hypothetically) 127.0.248.0 through 127.0.255.255? Oops. How are we going to bans this range? 127.0.248.* is not enough and 127.0.*.* is way too much. This is even more prevalent with ranges that extend past the a.b.*.* boundary. For example, a range in Australia ranges from 1.40.0.0 through 1.44.255.255. You can't possibly ban 1.*.*.* because that would also include ranges from many different countries in Asia.

Preparing the data

For this system we will use the MaxMind Legacy GeoLite database. Download the CSV/zip for GeoLite country and unpack it somewhere. We are going to prepare this file to insert into MySQL. Use a text editor that has proper support for regular expressions such as Notepad++ or Sublime. You really don't want to prepare this file by hand, trust me. Note for Notepad++ users: close any other open documents and then restart Notepad++ before you start the next part, I have experienced glitchy behavior and even an access violation crash.

Now open the CSV file in the text editor. You will notice that it is more than 106 thousand lines long. This is the reason you don't want to do this by hand. Open the search and replace window, which is usually the most easily accessed by pressing CTRL-H. Make sure that the option to use a regular expression is enabled. In the Find box type (or copy):

"[\.\d]+","[\.\d]+","(\d+)","(\d+)","(.+)",".+"

and in the replace box type (or copy):

\($1, $2, '$3'\),

Once you have done that, click Replace All.

Because of the size of the file this may take several minutes and it may look like the application hangs or crashes. This is normal. Don't do anything in the application while it's busy replacing or you might risk actually crashing the application.

Once the replacing is done you should end up with something like this:

Notepad++ replacing completed

We have now extracted the numerical IP addresses (more on that further down) and the ISO country codes and parsed them into an SQL query (or rather, soon to be query). You can find more information on regular expression on the Internet (I prefer regexr.com). Now the only thing we need to do is complete our query. First replace the trailing comma at the end of the file with a semicolon. Then scroll to the top (or press CTRL-Home) and write at the top of the file:

INSERT INTO list_ipv4 (range_start, range_end, country_code) VALUES 

Save the file with a .sql extension anywhere you can find it. We will use this later.

Preparing the tables

We will first need to prepare two new tables. The first one we will call list_countries and it will evidently store a list of all the ISO two letter country codes and their names. In the interest of saving time, I have already prepared this. The SQL script can be found on Pastebin. Run it against your database.

The second one we will call list_ipv4 and it will of course store the list of IP ranges we prepared. Create the table:

CREATE TABLE list_ipv4 (
  range_start INT UNSIGNED NOT NULL,
  range_end INT UNSIGNED NOT NULL,
  country_code CHAR(2) NOT NULL,
  banned INT UNSIGNED NOT NULL DEFAULT 0,
  PRIMARY KEY (range_start, range_end, country_code),
  CONSTRAINT fk_ipv4_countrycode
    FOREIGN KEY (country_code)
    REFERENCES list_countries (country_code)
    ON DELETE RESTRICT
    ON UPDATE RESTRICT
); 

Importing the data

Now for importing the main bulk of our data, use the file we prepared earlier. Importing it may be tricky, given its size. You may not be able to use phpMyAdmin due to the size restriction on file uploads. Importing with the command line is probably the fastest, but this can be quite daunting and if you don't feel comfortable doing that then you may use another client such as MySQL Workbench (and the latter one may be your only choice if you're on shared host, because you don't have CLI access). For importing with the command line you can follow this video tutorial of mine (with my not-so-awesome voice).

Usage

Once you have done all this, we can finally start actually using it. One thing I have mentioned, and which is also visible in the screenshot, is numerical IP addresses. Why numerical? First: because it is faster. Any RDBMS is much faster with numbers than it is with strings. Secondly, it allows allows you to use the BETWEEN keyword (example below). You can convert from string representation (as returned by GetPlayerIp) to numerical representation by using the inbuilt INET_ATON() function, and you can convert from the numerical representation to the string representation by using the also inbuilt INET_NTOA() function. More information on dev.mysql.com.

This is a simple snippet from my own code that shows a connecting player's IP address and country to all online admins:

public OnPlayerConnect(playerid)
{
    GetPlayerName(playerid, gPlayerInfo[playerid][pName], MAX_PLAYER_NAME);
    GetPlayerIp     (playerid, gPlayerInfo[playerid][pIP]    , 16);

    new mysqlquery[160];
    mysql_format(db, mysqlquery, sizeof(mysqlquery), "SELECT c.country_name FROM list_ipv4 i JOIN list_countries c ON i.country_code = c.country_code WHERE INET_ATON('%e') BETWEEN range_start AND range_end;", gPlayerInfo[playerid][pIP]);
    mysql_tquery(db, mysqlquery, "GetCountryName", "d", playerid);

} 
public GetCountryName(playerid)
{
    new country[44] = "Unknown";

    if(cache_get_row_count(db))
    {
        cache_get_row(0, 0, country, db);
    }

    new msg[144];
    format(msg, sizeof(msg), "Join: %s (%d) [%s / %s] [%d / %d]", gPlayerInfo[playerid][pName], playerid, gPlayerInfo[playerid][pIP], country, Iter_Count(Player), MAX_PLAYERS);
    SendAdminMessage(1, COLOR_GRAY, msg);
} 

That will show something similar to: Join: Vince0789 (3) [127.0.0.1 / Belgium] [5 / 100]

Banning

You may have noticed that I have purposefully added a "banned" column to the list_ipv4 table. Initially all values in that column are 0, which means no range is banned. If you set that value to 1 (or a Unix timestamp if you're interested in the time the ban was set) you can mark that range as being banned. You can then fetch the banned value when a player connects (using a similar system as outlined above), and boot him out if the value is anything other than 0. It may be a good idea to mark all A1 (anonymous proxy) ranges as banned to begin with.