Explicit content microformat

Since I develop my blog, naturally i get interested around formatting its contents.

Problem being explicit

Explicit content differs in a way that impacts readers psyche. If not changing his world views, then changing subjective opinion. Not every reader is ready for this, not everyone wants to see it. Thats why warnings, discretion, content notices are needed. 

Another problem is that rating should not depend on viewers age. Filtering if it exists, should be contextual, based on county legislation like COPPA. Ofcourse, if warnings are present, then automatic censorship by browser or search engines is possible. Editors should be ready for this, since this is totally voluntary.

Its clear that censorship of content is targeted for young audience, but "children" and "rate of violence" might greatly vary. You don't expect forbidding playing Mario games to everyone who is not yet 21 years old just because hero jumps on turtles, crushing them, still PETA may find that you do.

Existing standards

1. SafeSurf - dead

2. RTA - probably most used standard on modern adult-websites. It uses either header

header("Rating: RTA-5042-1996-1400-1577-RTA");

Or meta data on page:

<meta name="RATING" content="RTA-5042-1996-1400-1577-RTA" />

Or forcing headers for all resources through apache's .htaccess

<ifmodule mod_headers.c> Header set Rating "RTA-5042-1996-1400-1577-RTA" </ifmodule>

3. ICRA - got old, used page context and used rather large table of ICRA and RSAC codes.

<meta http-equiv="pics-label" content='(PICS-1.1 entries)' />

4. PICS - migrated into W3C POWDER. Last one is based on rdf, which means its hard to manage and understand

Thats it! Nothing real, no microformat for average webmaster. What we need is a level-based rating of any content on the web, like MPAA does for movies or ESRB and PEGI do for games.

New microformat / xrate 1.0

I see proper way to format this data very clearly - author just has to mark any html element with attribute, based on rating. HTML5 gives attribute data- in free use, so i suggest using namespace xrate with integer values (0-100) which represent danger to viewer (the higher - the bigger), which will also ease filtering. Note that its NOT VIEWER'S AGE. 

For simplicity, you can use general param, let say data-xrate="20", but its more semantically proper to choose its area:

data-xrate-lang  Obscene language
data-xrate-sex Romantic, erotic, pornographic
data-xrate-nude Level of nudity
data-xrate-disgust Might cause disgust (shit, larvae, decomposition)
data-xrate-violence Violence and its results - weapons, wounds, dead bodies, blood
data-xrate-asocial Smoking, alcohol, drugs, gambling, prostitution
data-xrate-blink Blinking animation that might cause eplilepsy
data-xrate-spoiler Story is retold
data-xrate-camera If application (flash/applet?) gains access to videocamera
data-xrate-malware If resource can cause infection (viruses, trojans etc.) on viewer's machine

Sex(xrate-sex)

Obviously sex and nudity are separate things. Naturally they mostly do follow each other, but paintings or chilling nude old dudes mostly don't carry as much erotic energy.

Romance. Kissing, wish Erotica. Breasts, will Porn. Genitals, action
0-30 30-70 70-100
romance-creative-family-beach-vacation.jpeg Nudity 30, sex 30 Nudity 50

Nudity(xrate-nude)

Dressed Partially nude  Fully nude
0-30  30-70 70-100 

forest_mystery_by_vvola-d4adnj9.jpeg   Nudity 50 Nudity 50   Nudity 80
<img src="http://www.tema.ru/jjj/tits/renuar.jpg" data-xrate-nude="60" data-xrate-sex="0" />

Violence(xrate-violence)

Violence also differs

<a href="http://meatvideo.com/" data-xrate-violence="100">omg</a>
fantasy heroes fighting, wounds blood, dead bodies, grevious wounds
0-30 30-70  70-100
Jerryscousin1.jpeg Бокс Longfin pilot whales, Faroe Islands / GEORGIA MANNION The Passion of the Christ / Violence = 70

Antisocial behaviour (xrate-asocial)

What does scare most parents is not someone swearing, viewing erotic movies or violent games. Parents are afraid of spiritual fall of their kids which begins with misunderstanding and indifference. This attribute should fight against smoking, drinking, drugs, prostitution, so that it wouldn't be considered ok.

Kids smoking. Keystone/Epa Yuri Kadobnov Тайланд / fresher.ru Взлом банкомата Алкоголики

Obscene language (xrate-lang)

Language directly influences thinking. Obscene language cannot be limited with usual word filters, because its an emotion, which can take form of drawings or hand gestures.

Хуй на Литейном мосту, акция арт-группы "Война".

Analysis

Filtering html if these attributes were added is easy. For example if i have paragraph with data-xrate-asocial=30.. then i can easily find it with jquery, and based on user's preferences either hide, replace, increase opacity or add warning to it.

$('#show_sensitive').click(function(){   $.each($('p[data-xrate-asocial]'), function(i,v){     if($(v).data('xrate-asocial')>10 ) $(v).show();   }) });

Styling is a bit more complex, since there is only comparison css selector. But you can use fixed values, like 30 and 70 - this way you have fixed diapasons:

p[data-xrate-asocial='30']:before{ content: "Warning - stupid behaviour detected"; }

Social rating

Obviously xrate- is subjective rating that author has decided upon. But if you have complex content and lots of viewers, then members can rate it themselves. Voting is like usual karma, except that you dont vote on agreeing/disagreeing with author, but on how dangerous content is.

Its a microformat. Its not complex enought to warn about scary part of the movie at some defined point, it also doesnt have any auditor or certificate of proof, and its not centralized - one user might think that bible has danger rating 10, another - that 100. Hope its enough.

Authentication using Google and OAuth

Google, just like Twitter, allows developers to use OAuth 1.0 to authenticate and authorize data use for our application , providing defined data using API, and since Google is not as centralized as Facebook is, it has lots of independent services like Youtube and Picasa, meaning that each one has its own data structure. In general this authorization (keywords here - OpenID, AuthSub, Federated Login) and data access (JSON, XML, REST, Atom) are implemented as Google Data Protocol.


Lets use Zend Framework

There are lots of OAuth libraries - there is one for twitter, there is one provided by Google.. but I'll use Zend framework.

1. First, we register our domain = web application, and write down Consumer Key + Secret. We can't really test it on localhost - only externally accessible domain

2. Set up two modules from Zend Framework - Crypt and Oauth.

3. Create settings where we set URLs to auth services

$aGoogleConfig = array( 'callbackUrl' => 'http://kurapov.name', 'siteUrl' => 'https://www.google.com/accounts/', 'authorizeUrl' => 'https://www.google.com/accounts/OAuthAuthorizeToken', 'requestTokenUrl' => 'https://www.google.com/accounts/OAuthGetRequestToken', 'accessTokenUrl' => 'https://www.google.com/accounts/OAuthGetAccessToken', 'consumerKey' => 'kurapov.name', 'consumerSecret' => 'netetonenastojashijsekreteokfpwoekrf' ); $consumer = new Zend_Oauth_Consumer($aGoogleConfig); $token = null;

Installing Virtual Mac OS X with VMWare

step_1_vmware.png As I wrote before, testing web applications should be handled the right way, considering all kinds of platform varieties. Selenium Grid works on different OS, but not always do you have a separate machine for every OS. Thats where virtualization software comes in - Virtual PC, Parallels and in particular, VMware Workstation 7 (btw. some company got a bonus of 9 billion when it aquired VMWare).

Tenth's Mac OS has been released since 2000 just like Android with code names for microversions, except that instead of sweet candies it used cat familiae - cheetah, puma, jaguar, panther, leopard and now tiger is expected by the summer. Installing it on Intel CPU is done with modified boot-loader (because it uses some sort of digital verification)

  • OS distribution disc (in iso, you can convert dmg to it using PowerISO)
  • 17-mb darwin_snow.iso loader
  • VMware Workstation 7

So you need to create a virtual machine..

step_2_vmware.png step_3_vmware.png step_4_vmware.png

In settings enable USB support, and increase memory from 256 mb. Now you need to edit generated vmx-file in order to pass "Invalid front-side bus frequency 6600000 Hz" error during load — change system name from guestOS to "darwin10" (or darwin, darwin-64). Insert loader in virtual CD and run.

Leopard-2011-04-05-23-12-25.png Leopard-2011-04-05-23-18-08.png Leopard-2011-04-05-23-26-25.png

Loader is suggesting inserting DVD, since HDD is empty, everything is good. So we press CD icon below, change disc to installation DVD, confirm with C and right away F8 - now we can enter in verbose mode by entering "-v" to see load progress.

After GUI is loaded, we need to format virtual HDD; OS will take up to 10 GB of space.

Leopard-2011-04-05-22-38-06.png Leopard-2011-04-05-23-34-24.png Leopard-2011-04-05-23-35-00.png

Now if you need to increase screen size from 1024x768 to something bigger, open in terminal or in TextEdit file at /Library/Preferences/SystemConfiguration/com.apple.Boot.plist and change is as admin by adding

<key>Graphics Mode</key>
<string>1280x1024x32</string>

If you happen to have missing bridged internet connection, edit your vmx file and add

ethernet0.virtualDev = "e1000"

To enable sound drivers, use file below

System testing with Selenium and PHPStorm

selenium-rc.png

Everyone keeps talking about unit-testing and TDD, but to change paradigm of thinking thoroughly along with megabytes of code you do not have enough will, money and time? System blackbox testing with Selenium Server (RC) could help you. It means that we are not testing every single class from within, how unit-tests and white-box techniques do, but instead we only test UI that end-user sees

In my opinion it is better to start integrating automatic QA in company from this kind of testing, simply because it can be first signal that something went wrong before deployment with less money and effort spent on writing all of the tests (compare results with unit-tests where they tell you wether everything is done correctly)

If you have already seen Mozilla Firefox plugin that allows to record all of your actions, creating macro-command list in the process, then you are in the right place, because that is Selenium IDE, which by now can transform commends to ruby, python, java, perl, c# and PHP code. And here is why..

After creating macro, copying php code (see Options → Clipboard format) and making php-file, you can see that for test to run, it needs

  1. PHPUnit and local php installed
  2. Selenium Server - a proxy written in java, that runs browser and executes your code
  3. PHPStorm or console that runs phpunit with relevant params
Second point is done by installing and running from console
java -jar selenium-server.jar -interactive

Making cases

By extending PHPUnit_Extensions_SeleniumTestCase, we get some of the parent methods that customize functionality in setUp(), in particular

  • Browser - firefox, googlechrome, iexplore, safari, konqueror
    $this->setBrowser("*chrome");

  • Pause between commands
    $this->setSleep(1);
Selenium_IDE_suite.png

But that is not enough, since tests can be commited to SVN for other develpers, who can have different URLs, login/password for testing or even DB access for graybox testing.. Thats why before running tests, you can include bootstrap files. XML-file for PHPUnit is quite empty, and simply links to php file for each developer

<phpunit bootstrap="bootstrap.php"></phpunit>

You can see settings for PHPUnit-tests on the same image. Not its enough to click Run and enjoy the show in browser

Now lets talk about test grouping.. If test case is elementary set of instructions for a single page (or at least I tend to think about it that way), then test suite is more global grouping, intented to test some process. For example..

PHPUnit_Extensions_SeleniumTestSuite.png

Lets say we have an order made by client, which is processed by priviledged user. To test entire flow with status changes, we group single test methods into groups.

Unfortunately, Selenium does not have «Export Test Suite As → PHP» support yet, thats why we have to create inner methods, that would be used in one test "case" (which is actually more like test suite). Obvious disadvantage is that this grouping is usually rather long (several minutes), and its crash does not really tell where it fell.

Another problem with test sets, is data that gets generated as a result. If you are running user registration test, then you probably have validation somewhere in the middle for existing email or login. Naturally that after one successful registration, the second will fail. Adding user deletion in the end is a bad idea. So as an exit strategy you either need a web-access to manual user deletion from adminpanel, or direct access to DB (hello graybox testing with DB-structure dependency)

If you go deeper into documentation, you may come across PHPUnit_Framework_TestSuite wich as I understood, only groups tests by topics, like «all tests with orders» or «all tests with payment» are successfull, without taking process into consideration.

More to come

Althoug not everything is white as i described, because writing test for Selenium is not easy as you may think, because you have to coupe with DOM changing, popup windows, periodic ajax. There are tonns of obstacles like capthas and input checks, social network authorizations, plugins like flash and java, scroll wheel.

Some things can be avoided by waitForElementPresent

phpstorm_selenium.png

Another, more company related problem is that writing tests should be obligatory, because if you change dom in templates or business logic and forget about running and changing tests, they will become obsolete. And since running all tests with Selenium Server may take minutes, developers may postpone it and commit changes before the results. Thats why you may need Selenium Grid, that runs tests in parallel with different browsers and platforms, and you may need Hudson to run the tests all the time, to monitor project health as continuous integration.

To conclude, I'd like to remind you about reasonability. You can't test everything, and even if you do, it will cost much more to sustain all of the tests, thats why you need to start with most important and complex functionality — especially where money and payment is involved.

Error analysis using Xdebug in PHPStorm

XDebug is great php-module for application debug «the right way», which has been in «older» languages (read - not interpreted) for decades, because of compilter validation. The need to fully debug is obvious in complex applications where error reproduction takes way too much time, and data size makes use of print_r() pretty much unuseable, though this module can doo it tooNormal stack trace in browser

Because xdebug is a module, not every shared hosting has it installed, so everyone assumes that developer will install php+xdebug by himself. After that, you need to edit php.ini. Note that remote_host property limits number of developers that can create session to xdebug by IP.

extension=C:\Program Files\php\ext\php_xdebug-2.1.0-5.3-vc9.dll
xdebug.profiler_enable = 1
xdebug.remote_host=127.0.0.1
xdebug.remote_port=9000
xdebug.remote_handler=dbgp
xdebug.idekey=

To start debugging, you need to connect IDE and Xdebug module, so that it would stop php process and send state dump of all variables to PHPStorm:

  1. Turn port 9000 listening on in PHPStorm and then start debug mode using Shift+F9 or Run → Debug menu.
    listen_off.png

  2. Write debug mode in your cookies for Xdebug on remote host to understand when it should work. Jetbrains have created marklet generator to make it as easy as possible. Browser plugins would be another solution. Ide key parameter is entered with same value as it was entered in IDE and php.ini

Now we can set breakpoint on any line. The problems start when data actually starts flowing - if remote server runs on linux and you have windows so naturally IDE will not be able to understand linux paths in trace hierarchy. Good thing this is easily solved by path mapping. Another problem is encoded files. XDebug will keep telling you paths, but IDE will not be able to check any of it.

Debug example without break point

And the biggest problem is showing stack trace when error/exception occurs without setting breakpoint. I've solved this by using error reporting with xdebug_break() which triggers IDE panel, the only problem remaining is that fatal errors do not show last method in stack trace.

function ErrorHandler($errno, $errstr, $errfile, $errline) { if (!in_array($errno,array(E_NOTICE,2048))) { xdebug_break(); restore_error_handler(); trigger_error($errno.$errstr." in ".$errfile." on line ".$errline."; showed by error handler "); } } function shutDownFunction() { if(!is_null($e = error_get_last())) xdebug_break();} function exceptionHandler($exception) { xdebug_break(); restore_exception_handler();} set_error_handler('ErrorHandler'); register_shutdown_function('shutdownFunction'); set_exception_handler('exceptionHandler');