Announcing D'ni Sphider

Off-topic discussion (discussion that doesn't fall under any of the other categories)

Announcing D'ni Sphider

Postby Mac_Fife » Fri Aug 09, 2013 11:45 am

Over the past couple of weeks I've been building up a search engine that's focussed solely on Myst, Uru, and things directly related to Cyan Worlds activities. I was often frustrated that the regular "big brand" search engines often threw up a lot of irrelevant returns ("Uru" in particular has a couple of other commercial uses) and you sometimes needed to dig through a couple of pages of "hits" to find what you'd been looking for.

So, I set about creating a customized search engine that omitted all that off-topic cruft, and I now bring you D'ni Sphider: http://dnisphider.mac-fife.me.uk/

Keep in mind that this is an experimental project and is still very much a "work in progress", so some things might not work entirely as expected. I may also take it down if things start to go horribly wrong :roll:

What you can do:
Well, firstly, it works much like the regular search engines in that you can search based on a single word or group of words. If you enter a group of words for your search, you can choose to search for any of the words anywhere in the document (an OR search), all of the words anywhere in the document (an AND search) or all the words together in the document (a PHRASE search).

You can also search within a "Category": If you're looking for a particular type of document, say a wiki article or perhaps something specifically related to "language" then you can select a category first (you'll immediately get a listing of related websites) and can then choose to search either within that category only or across the whole scope.

Limitations:
Right now, there are lots of limitations -
a) I haven't been able to index certain files types successfully (such as PDFs or MS Word documents) - I'm working on that (but it'll take time).
b) When indexing some websites, the indexing bot has crashed when it hit very large files (mainly multimedia files), so it was unable to completely index those sites. In some cases I was able to adapt the indexing to avoid those filetypes, but in other cases I couldn't determine what was causing the bot to run out of memory.
c) There are a limited number of sites included at this point and for the most part it mainly covers English language sites. It's still an extensive database but there will be a lot of omissions for now.

Indexing Sites:
I have to thank Tai'lahr for doing most of the work identifying websites to index and doing much of the search bot driving for me.
Once the bot has been given a site to index (and in some cases "site" may mean a single page*), it will attempt to identify keywords on the pages it finds and any links included in those pages. It will then try to follow those links only if they are on the same website (that's how we keep the returns as relevant as possible - we don't follow links to other sites that may be unrelated).
There is no automated re-indexing of sites (as yet) so once we've indexed a site the bot won't visit again unless we tell it to. The bot follows all recognized "robots.txt" rules if finds that file on a site. It will also obey meta instructions like "NOFOLLOW" or "NOINDEX", so it tries its best to behave itself while indexing. In other words being indexed by D'ni Sphider will have no more impact on a website that being indexed by one of the major search engines (and possibly less impact).

* In some cases, a single page on an otherwise out-of-scope site has been included, e.g. where a site covering business maybe included an interview with a key Cyan executive. In those cases we endeavour to exclude links from that page that lead to the rest of the site and could be out-of-scope.

Some Statistics:
At present, Dn'i Sphider has:
  • 153 sites indexed
  • An index of 235923 search keywords
  • A list of 34383 links
  • A total of 8839164 relationships between keywords and links
  • 22 categories and sub-categories
If people want to propose new categories or new sites to index then let me know. I can't promise to implement everything that is requested, but I'll certainly consider it.
Image Mac - MOULagain KI#00004826 00004289
In the interests of the environment, this post has been constructed entirely from recycled electrons.
Mac_Fife
Former MystOnline Moderator
 
Posts: 233
Joined: Fri Nov 10, 2006 7:05 am
Location: Scotland

Re: Announcing D'ni Sphider

Postby Tai'lahr » Fri Aug 09, 2013 12:20 pm

This has been a dream come true for my inner librarian - both as a user and a contributor. I've followed link after link after link to the edges of the Myst/Uru community and discovered a ton of gems in little known and little used sites. But, I know there's still a whole lot more still out there and would appreciate suggestions for additions. But, how can you tell if a site has already been included or not? It wouldn't be feasible for Mac to publish the list of sites, but there's a simple way to confirm if a site is included. Find an unusual phrase within the site and use that to test the sphider.

Thank you, Mac, for letting me be a part of this project. 8-)
Tai'lahr
Obduction Backer
 
Posts: 248
Joined: Thu Mar 15, 2007 10:14 am
Location: Digging around in the dusty archives, uncovering Uru history.

Re: Announcing D'ni Sphider

Postby janaba1 » Fri Aug 09, 2013 1:20 pm

I was pondering if I'd answer on the MOUL forums or here, but then I thought I'd take the opportunity to make my first post here on these forums lol ... :mrgreen:

Ok, Mac, what a mighty and ambitious project that is and it works wonderfully as far as I've tested it so far ... I esp. love the 'Phrase Search' which delivers great results ... Thanks for those great and vigorous efforts ... So much change and 'stuff' everywhere, that must be a good sign ... :D
MO:ULa is Image - Minkata Image
janaba1
Obduction Backer
 
Posts: 150
Joined: Thu Jul 27, 2006 12:09 pm
Location: Berlin, Germany

Re: Announcing D'ni Sphider

Postby Mac_Fife » Sat Aug 10, 2013 2:22 am

janaba1 wrote:I was pondering if I'd answer on the MOUL forums or here, but then I thought I'd take the opportunity to make my first post here on these forums lol ... :mrgreen:
Well, correct choice I think, janaba. The reason I made the main announcement here was that this isn't a purely MOUL tool and with the introduction of these forums, I think Cyan wants to have the MOUL forums for MOUL/Uru topics only and anything that's outside that, or is more general, should be here.

I've already had a suggestion or two for additional sites, which have been added. The database is getting close to 9 million keyword/link relations and is slowing down a bit (not that most users would ever notice, but I can see it in the administration controls), so I'll maybe shut it down briefly later today while I optimize the database.
Image Mac - MOULagain KI#00004826 00004289
In the interests of the environment, this post has been constructed entirely from recycled electrons.
Mac_Fife
Former MystOnline Moderator
 
Posts: 233
Joined: Fri Nov 10, 2006 7:05 am
Location: Scotland

Re: Announcing D'ni Sphider

Postby Mystdee » Sat Aug 10, 2013 5:16 am

Awesome Mac! Thanks! 8-)
Mystdee
Obduction Backer
 
Posts: 472
Joined: Tue May 09, 2006 9:21 am
Location: Garland, TX - New 09/2014 KI # 60406 Moula

Re: Announcing D'ni Sphider

Postby Erik » Sat Aug 10, 2013 10:53 am

Good work! I noticed that my humble D'ni Restoration Archive has been archived as well. :D
Erik
Obduction Backer
 
Posts: 56
Joined: Fri Oct 06, 2006 8:58 am
Location: The Netherlands

Re: Announcing D'ni Sphider

Postby Lyrositor » Sun Aug 11, 2013 2:44 am

Wonderful! I know several people who will be very glad to have this.

Do I have your permission to add it to rel.to?
Lyrositor
Lyrositor
Obduction Backer
 
Posts: 6
Joined: Sun Aug 08, 2010 6:14 am

Re: Announcing D'ni Sphider

Postby Mac_Fife » Sun Aug 11, 2013 3:08 am

Lyrositor wrote:Do I have your permission to add it to rel.to?

Sure, go ahead :).
And since you mention rel.to that gives me an opportunity to answer a question that a couple of people have asked: "Why is this different to rel.to?" Both are "indexes", but while rel.to is more a catalogue of "sites and pages", D'ni Sphider is a catalogue of "words and phrases". I don't see D'ni Sphider as an alternative to rel.to but as a complement to it, offering a different way to find information on Myst, Uru, Cyan, etc.
Image Mac - MOULagain KI#00004826 00004289
In the interests of the environment, this post has been constructed entirely from recycled electrons.
Mac_Fife
Former MystOnline Moderator
 
Posts: 233
Joined: Fri Nov 10, 2006 7:05 am
Location: Scotland

Re: Announcing D'ni Sphider

Postby janaba1 » Sun Aug 11, 2013 4:22 am

Exactly ... very nicely clarified and distinguished, Mac, short and to the point ... :D
MO:ULa is Image - Minkata Image
janaba1
Obduction Backer
 
Posts: 150
Joined: Thu Jul 27, 2006 12:09 pm
Location: Berlin, Germany

Re: Announcing D'ni Sphider

Postby Mac_Fife » Wed Aug 14, 2013 11:11 am

Update: Now with added PDFs (sort of)
D'ni Sphider can now index (some) PDFs: This is an incomplete PDF extraction, but it should deal with most common things. It won't, obviously, extract data from any PDF that has security settings enabled to prevent copying of text; it doesn't seem to handle some older versions of PDF (not sure why yet, though); it can't locate URLs embedded within a PDF; text added as captions to images seems to be getting dropped. But it's better than just letting all those PDFs out there go unindexed.
I'm sure that Tai'lahr (chief indexer and acting unpaid bot driver) has several sites identified as hosting PDFs that will now need to be reindexed ;) .

I cleaned up the database this morning and removed just shy of 10,000 keywords that weren't associated with any valid link in the database. Those will have come from sites that were deleted or edited because the searches were going out of scope and catching things that weren't Uru/Myst/Cyan related. Even with that I've still got almost 230,000 unique keywords in the database which in itself seems quite a remarkable statistic. I also got rid of nearly 1/4 million entries that had got stuck in the temporary tables of the database (probably as a result of timeouts or errors during indexing).

The metrics as of this moment for D'ni Sphider are:
  • Sites: 165
  • Keywords: 229503
  • Links: 35047
  • Keyword-link relations: 8961788
  • Categories: 22

Edit, Aug 15:
There's now a link on the search page to a Feedback form: You can use that to report any bad behaviour you see from D'ni Sphider, suggest sites to add or just make a suggestion or comment.
Image Mac - MOULagain KI#00004826 00004289
In the interests of the environment, this post has been constructed entirely from recycled electrons.
Mac_Fife
Former MystOnline Moderator
 
Posts: 233
Joined: Fri Nov 10, 2006 7:05 am
Location: Scotland

Re: Announcing D'ni Sphider

Postby Mac_Fife » Tue Aug 20, 2013 1:36 pm

D'ni Sphider will be out of action for a little while tomorrow (August 21) while I do some maintenance of the database - it's the kind of thing where I need to be sure nothing's changing while I do the work. Full details are on D'ni Sphider's News page
Image Mac - MOULagain KI#00004826 00004289
In the interests of the environment, this post has been constructed entirely from recycled electrons.
Mac_Fife
Former MystOnline Moderator
 
Posts: 233
Joined: Fri Nov 10, 2006 7:05 am
Location: Scotland

Re: Announcing D'ni Sphider

Postby Jamie Marchant » Mon Aug 26, 2013 5:05 pm

Great work Mac and Tai'lahr! Now do you think you could make a Firefox search plug-in?
Jamie Marchant
 
Posts: 278
Joined: Tue Jan 11, 2011 1:26 pm
Location: Ontario, Canada

Re: Announcing D'ni Sphider

Postby dgelessus » Tue Aug 27, 2013 3:03 am

Jamie Marchant wrote:Great work Mac and Tai'lahr! Now do you think you could make a Firefox search plug-in?

You could make a keyword bookmark - as in, if you type "dni teledahn" it searches for "teledahn" with D'ni Sphider. I believe there's a menu entry for that if you right-click on the search field.
At some point (that was years ago) there was also a menu entry to put it directly into the search field as search engine, but apparently that has been removed for some reason...
MOULa2.dgelessus=32253
dgelessus
 
Posts: 137
Joined: Tue Mar 05, 2013 1:58 pm
Location: GMT +1

Re: Announcing D'ni Sphider

Postby Tai'lahr » Tue Aug 27, 2013 7:35 am

Jamie Marchant wrote:Great work Mac and Tai'lahr! Now do you think you could make a Firefox search plug-in?

Hey Jamie, could you please submit this suggestion via the feedback page? I have no idea what it means or if it can be done or if Mac has time to do it if it can... but, no one has sent us any feedback through the site, yet, so it needs a test drive. Thanks!
Tai'lahr
Obduction Backer
 
Posts: 248
Joined: Thu Mar 15, 2007 10:14 am
Location: Digging around in the dusty archives, uncovering Uru history.

Re: Announcing D'ni Sphider

Postby Mac_Fife » Wed Aug 28, 2013 12:30 pm

Jamie Marchant wrote:Great work Mac and Tai'lahr! Now do you think you could make a Firefox search plug-in?

I guess it's possible but not really too high a priority for me - I think I'd like to get more of the functionality of the main engine improved before looking at that. Even then, maybe a more "mobile device friendly" front end would be a more popular option to begin with.
Image Mac - MOULagain KI#00004826 00004289
In the interests of the environment, this post has been constructed entirely from recycled electrons.
Mac_Fife
Former MystOnline Moderator
 
Posts: 233
Joined: Fri Nov 10, 2006 7:05 am
Location: Scotland

Next

Return to Off-Topic

Who is online

Users browsing this forum: No registered users and 1 guest