Posts Tagged php

How to unchunk data received through PHP sockets

When using the PHP socket functions to pull data from another website, you’ll often find yourself dealing with chunked data. Chunked data often looks like this:

1e
this line is 30 characters lon
2c
g, while this line is 44 characters long.
an
21
d this line is 33 characters long

As you can see, each chunk is preceded by a hex string. This hex string corresponds to the string length of the following chunk. Line breaks are included in the chunk, as you can see in the second chunk. So to unchunk the string, you have to find every hex string that is preceded by, and followed by, a line break (or start of end of string), and if the following block of code is equal in length to the hex value, strip out the hex value.

I accomplished this with a regular expression and preg_replace_callback. In essence, it finds each block of code that follows this pattern:

  1. start of string / hex string on it’s own line
  2. anything
  3. hex string on it’s own line / end of string

Once it has found all of those patterns, it replaces only the ones where the hex value of #1 is equal to the string length of #2.

My unchunk function:

function unchunk($result) {
    return preg_replace_callback(
        '/(?:(?:\r\n|\n)|^)([0-9A-F]+)(?:\r\n|\n){1,2}(.*?)'
        .'((?:\r\n|\n)(?:[0-9A-F]+(?:\r\n|\n))|$)/si',
        create_function(
            '$matches',
            'return hexdec($matches[1]) == strlen($matches[2]) ?
                 $matches[2] :
                 $matches[0];'
        ),
        $result
    );
}

, ,

1 Comment

Greased PMA – Stripping the suck out of phpmyadmin’s SQL query window

Anyone who has used phpmyadmin‘s query window knows how frustratingly small the textarea is, and how difficult editing SQL query blocks can be. I wrote a Greasemonkey script to de-suck this query window. As of now, Greased PMA is version 0.1 and has the following features:

  • Automatic resizing of the textarea to make best use of the window size
  • Ctrl+Enter submits the query
  • Tab inserts 4 spaces into the query

Click here to download Greased PMA version 0.1.

Update: While messing around at home on Vista / Chrome, I noticed that Greased PMA partially works on Chrome. I wasn’t expecting this to work outside of Firefox / Greasemonkey, but Chrome has a similar extension feature, it seems. The textarea resizing works, but the Ctrl+Enter submitting query and tab modification do not work. Perhaps this weekend I will debug those two functions on Chrome.

, , , , ,

6 Comments

Storing on/off switches through binary representation

As part of the Nightlife Project, you will have to store some data through binary representation in the payment_rates table. While doing the writeup on the MySQL schema, I decided that this section was too large to easily fit inside of that article and it should be its own article. The goal of this article is to explain how database schemas employ the principles of binary notation to store a sequence of on/off flags in a single field.

Imagine this scenario: You are the head of maintenance at an office and are trying to determine which lightbulbs get the most usage and should be replaced with high efficiency bulbs. In order to determine which gets the most usage, you walk through the building in the morning, at lunch and again at night and make a note of which bulbs are on and which are off. Once you’ve gathered your data, you store the information in a database and after 2 months, you will view the results and determine the top used bulbs and replace them with higher efficiency bulbs to save electricity.

After each walk through the building, you have a list of lights and a corresponding on/off value for each one. Your list may look something like this:

Reception Break Room Bathroom Office 1 Office 2 Office 3 Office 4 Hallway Copy Room
On Off Off Off On On Off On Off

The obvious way to store this data is to create a table called ‘lights’ and give it 10 fields – a timestamp (we need to know when this walk through occurred, after all) and a field for each of the 9 rooms. However, this obvious way has a few nasty shortcomings. What if you decide to add additional lights to be checked, such as desk lamps or the front walkway? Adding additional fields rarely ends well. This is also extremely inefficient, as you now have an 10 field table when a 2 field table would suffice.

Instead of storing it through this obvious, but inefficient method, instead you can convert that list of on/off switches into a binary string. Using the same data above, the binary string would look like this: 100011010. Each of the lights that are on is a 1 and each light that is off is a 0. Each character represents a room – the first position is reception, the second position is break room, etc. Converting this binary string to decimal, 100011010 becomes 282 (why?).

Reception Break Room Bathroom Office 1 Office 2 Office 3 Office 4 Hallway Copy Room
1 0 0 0 1 1 0 1 0

The key thing behind storing binary information in a database is this: no other combination of binary will equal that number. There is simply no other set of on/off values that will add up to 282. This lets us store an infinite amount of on/off values in a single integer slot, without ambiguity and always available for reading. As far as how you will read it, look into your language’s “bitwise and” and “bitwise or” operators. An example in PHP would be:

$user_permission = 6; // binary 0110

$view            = 1; // binary 0001
$edit            = 2; // binary 0010
$create          = 4; // binary 0100
$delete          = 8; // binary 1000

if($user_permission & $view)
    echo 'user can view';

if($user_permission & $edit)
    echo 'user can edit';

if($user_permission & $create)
    echo 'user can create';

if($user_permission & $delete)
    echo 'user can delete';

You can use a system like this for storing user permissions(which is how *nix permissions work), which days of the week an event occurs(the nightlife project will be doing this for storing payment rates), or any other piece of information that can be summed up in a series of on/off switches. Using binary storage is extremely efficient and flexible. Remember, however, that any additional fields to be added should be added to the left side of the string to preserve old data.

, , ,

4 Comments

The Nightlife Project – Part 1 – Introduction to the Problem

“A carelessly planned project takes three times longer to complete than expected; a carefully planned project takes only twice as long.”

The Nightlife Project is a novice PHP/MySQL tutorial. While you don’t have to know a whole lot to get started, you should understand the basics of PHP and MySQL. Knowledge of variables, functions and flow control is required. If the following block of code makes sense to you, you are ready to start the project:

<?php
repeat('foo', 15);
function repeat($string, $count) {
     for($i = 0; $i < $count; ++$i) {
          mysql_query("INSERT INTO `data` VALUES ('$string')");
     }
}
?>

Now, an explanation is due.

You work for Vision Nightlife, a nightlife promotion company working out of Las Vegas. Your company hires promoters to hand out nightclub flyers for the various clubs that have hired your company. These promoters give flyers to tourists for things like free entry to Domi Lounge, a free drink at Club Septuro, etc. Each flyer is stamped with a unique code identifying the promoter that drove the tourists to the club. The club will then pay your company, based on how many people your promoters drove to their club. The amount is based on how many people and what day of the week. For example, Hoodoo Lounge pays $1 per person on Friday or Saturday nights, if you bring between 1 and 10 people. If you bring between 11 and 20 people, it is $1.50 per person, and 21 or more people is $2.25 per person. On a Wednesday night, those amounts are cut in half. Your company wants you to write software to keep track of all of this information.

Your software needs to track clubs, promoters, payment rates, referral amounts and payout amounts.

This is just an introduction to the nightlife project. Part 2 will begin looking at the database structure and writing the ideal schema for each table.

, , ,

No Comments

Filtering an array with a string mask in PHP

I’ve been working on a solution the Kohana problem of the Validation library being unable to validate cleanly inside multidimensional arrays, and the root of this problem is passing a mask to the Validation::add_rules() method to identify deeply nested elements. In the course of trying to work out this problem, I wrote a function that accepts a mask and an array and returns the filtered array. While it doesn’t get me any closer to my actual goal, I do think it is an interesting function nonetheless and could be quite useful in the right circumstances. In the future, I might tweak this to allow you to invert the results, whereby it deletes anything that matches the mask, rather than preserves those that match the mask.

An example of a mask would be “messages/*/timestamp” would only keep elements of the array that matched $array['messages'][*]['timestamp'] with * being a wildcard.

function array_mask_filter($mask, $array, $ci = false) {
    if(!is_string($mask)) {
        throw new exception('filter mask must be a string');
    }

    if(!is_array($array)) {
        throw new exception('variable to be filtered must be an array');
    }

    $mask_chunks = explode('/', $mask);

    $this_mask = array_shift($mask_chunks);

    if($this_mask != '*') {
        foreach(array_keys($array) as $key) {
            $key = $ci ? strtolower($key) : $key;
            $this_mask = $ci ? strtolower($this_mask) : $this_mask;
            if($key !== $this_mask) {
                unset($array[$key]);
            }
        }
    }

    foreach(array_filter($array, 'is_array') as $key=>$element) {
        $array[$key] = array_mask_filter($element, implode('/', $mask_chunks));
        if(empty($array[$key])) {
            unset($array[$key]);
        }
    }

    return $array;
}

,

No Comments

Revised convertObjectToArray function for PHP

Back in July, I wrote an article about converting an object to array in PHP and mentioned the difficulties I was having with filtering private and protected keys to remove undesirable characters. I found a solution not long after, but never got around to writing a follow up. Here is my current function library to convert an object to an array, recursively, while maintaing nice keys.

< ?php

function convertObjectToArray(&$element) {
    // the recursive call can't operate through objects, so they
    // must be handled specially
    if(is_object($element)) {
        // typecast the object to an array, and clean up private and
        // protected keys
        $element = convertObjectToArray_keyCleanup((array)$element);
        // begin the recursion again to go through this object-turned-array
        // this is not strictly necesary, as removing it will cause the
        // recursion to happen in convertToXml, but putting it here makes it
        // more readable and ever so slightly faster.
        array_walk_recursive(
            $element,
           'convertObjectToArray'
        );
    }
    return $element;
}

function convertObjectToArray_keyCleanup($array) {
    // find every invalid key (private and protected member properties)
    foreach(array_filter(array_keys($array), 'convertObjectToArray_invalidKey')
        as $invalidKey) {
        // change the key name by copy / delete / create
        $data = $array[$invalidKey];
        unset($array[$invalidKey]);
        // find out the correct key name by getting the last chunk that
        // is only ascii 32 - 126, the standard set of printable characters
        // User�Types => Types
        $key = preg_replace(
            '/^.*[^\x20-\x7E]([\x20-\x7E]*)$/',
            '\\1',
            $invalidKey
        );
        $array[$key] = $data;
    }

    return $array;
}

function convertObjectToArray_invalidKey($key) {
    // a key is invalid if it has any characters that are outside
    // of the ascii range 32 - 126, which is the standard set of printable
    // characters
    return preg_match('/[^\x20-\x7E]/', $key);
}

?>

,

1 Comment

mysql_insert_id and insert ignore

While working on some code recently, I realized that mysql_insert_id fails when using insert ignore. When using insert, mysql_insert_id returns the primary key of the newly inserted row. However, nothing is returned with insert ignore if a key conflict prevents a record from being inserted. If you are wanting to get the key that a conflict was just hit against, in as dynamic of a way as possible, you can use this script to find the primary key when insert ignore does not enter a record. This is useful if you are setting up a many-to-many pivot table and don’t want duplicate data on either side. When you attempt to insert a new record, it either gives you the new key or the key of the one that already existed.

$db->query($sql)

// if an insert id exists, use it
if ($db->insert_id != 0) {
    $id = $db->insert_id;
// if there is no insert id and there was no error and insert ignore was ued
} elseif($db->insert_id == 0 &&
    empty($db->error) &&
    preg_match('/^\s*insert\s+ignore/si', $sql)) {

    // find the table that was queried
    preg_match('/^\s*insert\s+ignore\s+into\s+([-`a-zA-Z0-9_]+)/si', $sql, $extract);
    $table = trim($extract[1], '`');

    // change insert ignore  to insert
    $Sql = preg_replace('/^\s*insert\s+ignore/si', 'insert', $sql);

    // query and scan the error for the key conflict
    $db->query($sql);
    $error = $db->error;
    preg_match('/^Duplicate entry \'(.*)\' for key (\d+)$/', $error, $extract);

    $value = $extract[1];
    $key = $extract[2];

    // in the case of multi column keys, figure out what the keys actually are
    if(strstr($value, '-') && !strstr($sql, $value)) {
        $values = explode('-', $value);
        $finished = false;
        while(!$finished) {
            foreach($values as $k=>&$v) {
                if(strstr($Sql, $v.'-'.$values[$k+1])) {
                    $values[$k] = $v.'-'.$values[$k+1];
                    unset($values[$k+1]);
                    $values = array_values($values);
                    break;
                }
                $finished = true;
            }
        }
        $value = $values;
    }

    // look up all keys on the table, isolating the primary key
    $keySql = "show keys from `$table`";
    $keyResult = $db->query($keySql);
    $keys = array();
    while($row = $keyResult->fetch_assoc()) {
        if(strtolower($row['Key_name']) == 'primary') {
            $primary = $row['Column_name'];
        }
        $keys[$row['Key_name']][] = $row;
    }

    // build a where clause to find the primary based on key conflicts
    $keys = array_values($keys);
    if(!is_array($keys[$key-1])) {
        $unique = $keys[$key-1]['Column_name'];
        $where = "`$unique` = '".$db->real_escape_string($value)."'";
    } else {
        foreach($keys[$key-1] as $key) {
            $whereParts[] = "`{$key['Column_name']}` = '".$db->real_escape_string($value[$key['Seq_in_index']-1])."'";
        }
        $where = implode(' and ', $whereParts);
    }

    // get the primary key that conflicted with the insert
    $sql = "select `$primary` from `$table` where $where";
    if(is_object($result)) {
        $result = $result->fetch_assoc();
        $id = $result[$primary];
    }
}

echo $id;

, ,

No Comments

Debugging Sociable

Not too long ago, I came across a nasty Sociable bug that completely prevented it from being used with certain server configurations. I located the source of the bug and nailed out a quick workaround that allowed the plugin to be used, but you lost the ability to change the site order. Tonight, I created a diff patch that is the first part of fixing that issue.

This diff has some lingering issues – namely dealing with legacy support and continued support. It works fine as is, but any previous installations will need to redo their social media choices, and future social media choices will need to be added in at the end of the data, rather than alphabetically. The first issue can be solved with some key scanning, the second issue isn’t a huge concern to me, since this array is only visible to developers, anyway.

When I got down to working on this plugin, I really had no idea what I was working with. I’ve never really looked into the Sociable code, and even less time on the WordPress code. However, I was going to get this thing fixes come hell or high water, and I think I accomplished my goal.

The first thing I did was write a regex find / replace to adjust the array. This array had nearly 100 elements in it, so a manual adjustment was out of the question. After I had adjusted that array, I went through the sociable.php file and put a bookmark everywhere that this array was possibly being used. WordPress is heavy on procedural code, so global variables such as this have to use the global command. I just sought out any function calling this variable globally. I also had to bookmark the end of scope on those functions, since the whitespacing and indentation was often done pretty haphazardly and in non-obvious ways.

I narrowed down my search to just three functions – sociable_html, sociable_restore_config and sociable_submenu. These three functions called this variable from the global scope and thus could be affected by my changes.

I first delved into sociable_restore_config(), which didn’t appear to actually use the array I modified. It really should not be calling it globally (which is horrible, anyway) and it might be some fragment of old code that never got properly cleaned up. The only place that things would be affected was the function to restore default settings. It was storing the keys based on the name of the engine, which won’t work under my patch. I modified this to save the IDs, but it is a bit hard to read. I’m not fully satisfied with this little trick, but it will do for now.

I next got into sociable_submenu() and noticed that most of the function will work with this new system. The most major concern is dealing with support on determining what is and is not an active site. Currently, it uses the string keys, but my patch changed those keys to autoincrement IDs in an array. This means that if you have Digg chosen, and it is #18, but a new engine is added before Digg, Digg is now #19, and the previous #17 will be chosen, as it is now #18. This could be a problem if the developer desperately wants to keep the array alphabetised, even though it really should be in the database.

This will also cause some legacy support issues in this area. Existing sites will store the value in the database as ‘Digg|Sphinn|Facebook’, but new sites will use ’18|47|22′ instead. This means that when it removes active sites, it won’t recognize ‘Digg’ as meaning ’18′ and will remove nothing. I can fix this with some variable scanning. If the variable is not an integer, do a foreach across the $sociable_known_sites and find which one it actually means. A single correction cycle will fix all issues – it won’t need to run that scan more than once per social media site.

The next thing I did was changed the form to display the correct information. Since the array has been adjusted and the keys no longer do what is expected, I had to change it to use the right value and display that on screen.

The only part I had difficulty with was the ternary statements in showing a social media site as being selected. Prior to my patches, it was doing an array merge on active sites and inactive sites, with the array keys being the name of the site. array_merge() leaves keys on strings, but wipes keys on numeric, so the keys used to remain intact but no longer would. This meant that if Facebook was ID #11, and it was chosen, it wouldn’t always display Facebook as being active. Instead it would merge active and inactive, and say the 11th item in the list is chosen.

In order to counter this issue, I did a little toying with array_combine(), array_keys() and array_merge() to merge the two arrays together and leave the array keys intact, even though they are numeric.

After this fix, all seemed to be well on the site. Legacy support for older key names and continued support to not break the data if the array is modified are going to be needed, but this patch changes the way the post data is sent to [hopefully] not cause an issue with certain Apache servers. Unfortunately, I can’t get my Virtual Machine (Ubuntu 9.04) to replicate the issue. I have a Fedora 10 VM that I may try to get the issue reappearing on, but if that doesn’t work, I can test the issue on Monday when I have access to the server that has a known issue.

Here is the diff patch, hopefully it will be implemented soon enough, as I think this does some good in correcting an issue that is out there. The traffic to my site searching for that issue is pretty substantial, and this patch covers the problem pretty well.

Update: I did some further thinking on the array_combine() trick and ended up removing it. It seemed like there should’ve been an easier way, and apparently you can just use + to union arrays based on keys, which is exactly what I doing. So, I adjusted the diff patch to use this setup, which is far cleaner and easier to read.

, , ,

No Comments

Debugging is a hell of a drug

Lately, I’ve been spending some time on the WordPress support forums just helping debug issues. I don’t really know much about WordPress, but nothing makes you learn the ins and outs of a system like isolating and fixing bugs. I set up a new WordPress site on one of my virtual machines just to do these tests. If I trash it too much while debugging, I can always just scrap and start over and get myself cleaned up again.

While setting up this new WordPress site, I somehow managed to trigger the no credentials updating system that visual77.com and septuro.com use, but I’m not sure how I did that. Whenever you update or install a plugin, it often asks for FTP / SSH credentials to transfer the data, but neither visual77.com nor septuro.com require credentials. Every other WordPress site I have set up does require credentials – but this test bed does not. It may be a permissions issue, and since this test site is 0777 for everything, I have sufficient permissions. I’d never set a live site to 0777 for everything, but since it is on a virtual machine that is inaccessible outside of my network, it’s safe to do that.

I’m having a good time on the WordPress support forums with these bugs – anything I can replicate, I can fix. Much of my early PHP days was just based on trying to make small tweaks to PHPNuke, and that helped me learn much more rapidly than some boring tutorials or bullshit code exercises. I learn by doing, and doing stuff on fully built systems is my favorite way to understand the system. At this rate, I’ll know WordPress as well as the creators within a month and I can start debugging WordPress core bugs.

, ,

No Comments