Linux批量重命名文件

Posted on 2018-03-27 | Post modified 2019-11-16 | In linux | Visitors

在当 TA 时候，要给本科生改 program 作业，批量下载的文件名又长又臭，如下：

-rw-r--r--  1 mac  staff   4.0K Mar 27 13:36 Program 1_and7697_attempt_2018-03-22-16-56-45_and7697_p1.s
-rw-r--r--  1 mac  staff   1.3K Mar 27 13:36 Program 1_axg6672_attempt_2018-03-20-14-13-01_axg6672_p1.s
-rw-r--r--  1 mac  staff   799B Mar 27 13:36 Program 1_axk5863_attempt_2018-03-21-23-31-29_axk5863_p1.s
-rw-r--r--  1 mac  staff   797B Mar 27 13:36 Program 1_axr8361_attempt_2018-03-21-22-27-28_axr8361_p1.s
-rw-r--r--  1 mac  staff   1.1K Mar 27 13:36 Program 1_bxk5485_attempt_2018-03-21-21-19-30_bxk5485_p1.s
-rw-r--r--  1 mac  staff   1.7K Mar 27 13:36 Program 1_cra5824_attempt_2018-03-19-09-43-14_cra5824_p1.s
-rw-r--r--  1 mac  staff   1.7K Mar 27 13:36 Program 1_cxx4741_attempt_2018-03-21-21-13-51_cxx4741_p1.s
-rw-r--r--  1 mac  staff   1.5K Mar 27 13:36 Program 1_daa5782_attempt_2018-03-21-16-23-26_daa5782_p1.s
-rw-r--r--  1 mac  staff   4.8K Mar 27 13:36 Program 1_dbp4110_attempt_2018-03-16-23-09-09_dbp4110_p1.s
-rw-r--r--  1 mac  staff   1.9K Mar 27 13:36 Program 1_dtn5102_attempt_2018-03-20-20-29-06_dtn5102-p1.s
-rw-r--r--  1 mac  staff   2.1K Mar 27 13:36 Program 1_eep5180_attempt_2018-03-21-23-23-13_eep5180_p1.s
-rw-r--r--  1 mac  staff   3.6K Mar 27 13:36 Program 1_egc8644_attempt_2018-03-23-22-02-19_egc8644_p1.s
-rw-r--r--  1 mac  staff   969B Mar 27 13:36 Program 1_exg6686_attempt_2018-03-21-22-30-05_exgg6686_p1.s
...

所以就想把他们重命名一下，变成只有 xxx1111.s 的形式。查找了一些 Linux 下批量重命名的方法，总结如下：

怎样快速读取文件

Posted on 2018-03-26 | Post modified 2019-11-16 | In OS | Visitors

首先使用 mmap 映射文件到内存中，然后返回该文件对应的 memory 首地址。 char* data。

然后对该 char* 类型构造一个 ifstream（不复制 data 内容到 ifstream 中）。

代码如下：

Pymongo 试验

Posted on 2018-03-18 | Post modified 2019-11-16 | In database | Visitors

首先将 geojson 数据导入到 MongoDB 中，具体操作参考如下。

需要使用两条命令：

1 2	jq --compact-output ".features" input.geojson > output.geojson mongoimport --db dbname -c collectionname --file "output.geojson" --jsonArray

通过上述命令，我导入了 nyu yellow taxi 的 json 数据。该 json 数据是经过后期处理的，格式如下：

{
    "type": "FeatureCollection",
    "features": [
        {
            "type": "Feature",
            "properties": {
                "place": "nyu"
            },
            "geometry": {
                "type": "Point",
                "coordinates": [
                    -73.98268127441406,
                    40.731311798095696
                ]
            }
        },
        {
            "type": "Feature",
            "properties": {
                "place": "nyu"
            },
            "geometry": {
                "type": "Point",
                "coordinates": [
                    -73.99224853515625,
                    40.74908065795898
                ]
            }
        },
        ...

S2 计算给定点到哪条边最近

Posted on 2018-03-17 | Post modified 2019-11-16 | In database | Visitors

在 GIS 系统里，我们可能会遇到如下的场景，给定一个坐标点，计算其到哪条边最近。而生活中对应的实例是：根据你的 GPS 位置信号，找到离你最近的一条公路是哪个。

在 google s2 的库中，提供了如下类，用来进行该类型的空间索引：S2ClosestEdgeQuery。

首先我们将已有的 geospatial objects 存到一个 index 中，也就是对这些 objects 建立了一个空间索引，我们暂且称这个索引为 index。

之后我们给定一个查询目标，Target，从 index 里面查找到在 Target 周围，符合我们要求的 spatial objects，这就是 query 以后返回的 results。

下面看一段简单的查询代码，给定 Target 点，找到距离它最近的边是哪些：

Review:The bright side of sitting in traffic: Crowdsourcing road congestion data

Posted on 2018-03-07 | Post modified 2019-11-16 | In review | Visitors

This blog is written on Aug. 25, 2009, it discusses how to use cell phone GPS signal to identify crowded road in the U.S.

If you use Google Maps for mobile with GPS enabled on your phone, that’s exactly what you can do. When you choose to enable Google Maps with My Location, your phone sends anonymous bits of data back to Google describing how fast you’re moving. When we combine your speed with the speed of other phones on the road, across thousands of phones moving around a city at any given time, we can get a pretty good picture of live traffic conditions. We continuously combine this data and send it back to you for free in the Google Maps traffic layers.

The more users participant in this process, the more precise the report is. And this system was online to cover all U.S. highways and arterials in that week.

MongoDB M201 - Query Plan

Posted on 2018-03-04 | Post modified 2019-11-16 | In database | Visitors

什么是 Quary Plan

当发起一个 query 请求时，当有多个约束条件，就会形成一个 query plan，本质上是怎么样去组织 pipeline 的 stage，使得 query 更有效率。

比如有下面的一个 query 请求：

1
2

// 找出 zip code 大于 50000，并且`cuisine` field 包含 `Sushi`的 documents `stars` 降序排序
db.restaurants.find({"address.zipcode": {$gt:'50000'}, cuisine: 'Sushi'}).sort({"stars": -1})

那么我们应该怎样组织 query 过程呢？是先找到 documents 然后排序，还是先排序，然后再找 document？

使用什么样的 query plan 是跟我们建立的 index 相关的。使用不同的 index，则会得到不同的 query plan。

比如当index分别为如下两个时候：

1 2	{address.zipcode: 1, cuisine: 1} {cuisine: 1, stars: 1}